New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: inst_dealloc() Function Execution Is Slow #507
Comments
It is expected that removing an entry from a hash table takes a small amount of time, potentially more so in a hash table with robin hood hashing. The point you make about the load factor does not look correct to me since the hash table will automatically expand once the load factor becomes too high. What I cannot tell from this description is whether you simply expected this function to be faster by a constant amount, or if you were able to create some kind of worst-case behavior that triggers a particularly slow performance case. Nanobind uses a pointer hash "finisher" from the MurmurHash project that should normally exclude this possibility. If you have found a legitimate issue, then what this report needs is a minimal reproducer. There is no way that I can help without one. I will close this issue for now but will reopen it if you are able to produce such a reproducer. |
@wjakob, I've provided a reproducible example code for the issue. Could you please review and consider reopening it. Reproducible example codec++file: #include <nanobind/nanobind.h>
#include <nanobind/stl/shared_ptr.h>
#include <memory>
#include <chrono>
#include <time.h>
namespace nb = nanobind;
using namespace nb::literals;
struct test_struct {
int a = 0;
};
typedef std::shared_ptr<test_struct> test_struct_ptr;
nb::handle fun_expend_hash_table, fun_clear_hash_table, fun_test1, fun_test2;
void set_test_fun(nb::handle h1, nb::handle h2,nb::handle h3,nb::handle h4) noexcept {
fun_expend_hash_table = h1;
fun_clear_hash_table = h2;
fun_test1 = h3;
fun_test2 = h4;
return ;
}
void test_hash_op() {
const int LOOP_CNT_MAX = 10000;
for(int i=0; i < LOOP_CNT_MAX; ++i){
test_struct_ptr p(new test_struct() );
fun_test1( p );
}
}
void test_expend_hash_table() {
const int LOOP_CNT_MAX = 0x8000 / 2 + 1; // expend to 32k
for(int i=0; i < LOOP_CNT_MAX; ++i){
test_struct_ptr p(new test_struct() );
fun_expend_hash_table( p );
}
}
void test_run(int run_flame_test_s) {
using Duration = std::chrono::duration<double, std::nano>;
using time_point_double = std::chrono::time_point<std::chrono::high_resolution_clock, Duration>;
//normal op
time_point_double time1 = std::chrono::high_resolution_clock::now();
test_hash_op();
time_point_double time2 = std::chrono::high_resolution_clock::now();
//expend_hash_table to big
test_expend_hash_table() ;
fun_clear_hash_table();
//here will be slowed
time_point_double time3 = std::chrono::high_resolution_clock::now();
test_hash_op();
time_point_double time4 = std::chrono::high_resolution_clock::now();
double time_cost1 = (time2 - time1).count();
double time_cost2 = (time4 - time3).count();
printf("time_cost1=%f time_cost2=%f mult=%f\n", time_cost1,time_cost2, time_cost2/time_cost1);
//loop x seconds for flame graph, Use a profiling tool to profile and generate a flame graph.
if(run_flame_test_s > 0){
printf("begin for run_flame_test\n");
time_t loop_time_len_s = (time_t)run_flame_test_s;
time_t time_begin = time(nullptr);
do{
for(int i=0; i< 1000; ++i){
test_struct_ptr p(new test_struct() );
fun_test1(p);
fun_test2(1);
}
}while( (time(nullptr) - time_begin) < loop_time_len_s );
}
printf("finish\n");
return ;
}
NB_MODULE(test_hash, m) {
nb::class_<test_struct>(m, "test_struct")
.def_ro("a", &test_struct::a);
m.def("set_test_fun", set_test_fun, "set_test_fun");
m.def("test_run", test_run, "test_run");
} python file: test_instdealloc.py
run result: running the command "python3 test_instdealloc.py 120", |
Thank you for finding this issue -- it should now be fixed. |
See Tessil/robin-map#76 and #507 for context. (This is still a pre-release version of robin-map, I will update the referenced commit once a proper release has been made)
Problem description
Hello, I have a feedback about a performance issue.
The function static void inst_dealloc(PyObject *self) in the nb_type.cpp file is experiencing performance problems.
In some application scenarios, the inst_dealloc() function accounts for over 70% of the performance flame graph.
Within the inst_dealloc() function, the operations internals.keep_alive.erase(it) and inst_c2p.erase(it) are taking a considerable amount of time.
Both keep_alive and inst_c2p are using robin_hash. The erase function of robin_hash has a complexity of O(n) in the worst case, which is related to the load factor of the hash table (load factor = number of elements / capacity). The lower the load factor and the larger the capacity, the slower the erase operation becomes.
The reason is that after deleting an element, the erase() function returns to the next element, which is searched from the current position until a non-empty position is found or the end of the table is reached.
Reproducible example code
No response
The text was updated successfully, but these errors were encountered: