Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_threads specification does not work #12

Closed
anxietymonger opened this issue Mar 20, 2018 · 1 comment
Closed

num_threads specification does not work #12

anxietymonger opened this issue Mar 20, 2018 · 1 comment

Comments

@anxietymonger
Copy link

anxietymonger commented Mar 20, 2018

Although I specify the argument of num_threads in Index.insert(), it seems that there is only one CPU working. If I "top" in the command line, there is only one process with about 100% CPU usage. And it also won't change when I change the value from 16 to 40 (40 CPUs on the machine).

In addition, I am using the Python API via jupyter notebook.

@masajiro
Copy link
Member

Index.insert() consists of two steps: loading data and building index. Since the loading data is not parallelized, the parameter num_threads works only for the building index. It means that in the beginning insert() uses only one thread.

If you are trying to insert a huge amount of data, you should use insert_object() to reduce the data loading time and memory usage. Even though the article below is Japanese, the sample in it might be useful to use insert_object().

https://techblog.yahoo.co.jp/data_solution/ngtpython/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants