-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move memory allocation out of elkan_L2_sse #280
Move memory allocation out of elkan_L2_sse #280
Conversation
Welcome @bjzhjing! It looks like this is your first PR to zilliztech/knowhere 🎉 |
@bjzhjing 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
Thanks for this PR, we will take a look ASAP. |
/assign @alexanderguzhva |
/kind enhancement |
@@ -422,6 +422,7 @@ void compute_PQ_dis_tables_dsub2( | |||
* @param y database vectors, size ny * d | |||
* @param ids result array ids | |||
* @param val result array value | |||
* @param data the pointer of memory for symmetric matrix data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call this variable tmp_buffer
in order to stress out its meaning. It is not a data, it is a temporary storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping review, Alex! The change is done.
|
||
namespace faiss { | ||
|
||
IndexFlatElkan::IndexFlatElkan(idx_t d, MetricType metric, bool is_cosine, bool use_elkan) | ||
IndexFlatElkan::IndexFlatElkan(idx_t d, MetricType metric, bool is_cosine, bool use_elkan, float* data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?! data
which is passed as a parameters is not used anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not required to be set by callers. Remove it from the interface.
@@ -29,9 +29,17 @@ namespace faiss { | |||
// Elkan algo was introduced into Knowhere in #2178, #2180 and #2258. | |||
struct IndexFlatElkan : IndexFlat { | |||
bool use_elkan = true; | |||
float* data = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<float> tmp_buffer_for_elkan;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it.
if (nx == 0 || ny == 0) { | ||
return; | ||
} | ||
|
||
const size_t bs_y = 1024; | ||
float* data = (float*)malloc((bs_y * (bs_y - 1) / 2) * sizeof(float)); | ||
bool allocate = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use std::unique_ptr<float[]>
+ new
for allocating a local temporary buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call std::make_unique<float[]>() to allocate memory for tmp_buffer in IndexFlatElkan constructor. Remove the checking from elkan_L2_sse() for use_elkan can help ensure the memory is allocated when come there.
1eba4fd
to
814a0fd
Compare
814a0fd
to
294a96c
Compare
@alexanderguzhva Hi Alex, I use smart pointer instead of std::vector to implement the tmp_buffer finally, for I IndexFlatElkan::search is defined with 'const', which means it's not allowed to modify the class variables. Thus, it's required to use const_cast<std::vector&> for convert while passing tmp_buffer to elkan_L2_sse, there is address change, which make things a bit complex. Smart pointer can help avoid the issue and make the code more clean and safe. Please help review. Thanks very much! |
@bjzhjing Two more notes
|
elkan_L2_sse() does malloc/free each time when it's called by the loop in train_encoded() during data train for most of IVF index building. Memory allocation triggers the process of vma allocation, physical page allocation, page mapping setup and memory cgroup statistic update in kernel side, so more frequent allocation means more overhead. Move it out of the loop and do it only once when IndexFlatElkan is initialized to save CPU cycles. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
For the memory allocation is moved to IndexFlatElkan construction, replace bs_y with the parameter defined and passed by Index. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
294a96c
to
3d2246b
Compare
Greetings @alexanderguzhva! I'm just back from a bad cold, and sorry for the late response. I've addressed your above comments. Please help review, thanks! |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bjzhjing, PwzXxm The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
elkan_L2_sse() does malloc/free each time when it's called by the loop in train_encoded() during data train for most of IVF index building. Memory allocation triggers the process of vma allocation, physical page allocation, page mapping setup and memory cgroup statistic update in kernel side, so more frequent allocation means more overhead. Move it out of the loop and do it only once when IndexFlatElkan is initialized to save CPU cycles.
issue #279