num_threads specification does not work #12

anxietymonger · 2018-03-20T02:58:39Z

Although I specify the argument of num_threads in Index.insert(), it seems that there is only one CPU working. If I "top" in the command line, there is only one process with about 100% CPU usage. And it also won't change when I change the value from 16 to 40 (40 CPUs on the machine).

In addition, I am using the Python API via jupyter notebook.

masajiro · 2018-03-20T04:11:12Z

Index.insert() consists of two steps: loading data and building index. Since the loading data is not parallelized, the parameter num_threads works only for the building index. It means that in the beginning insert() uses only one thread.

If you are trying to insert a huge amount of data, you should use insert_object() to reduce the data loading time and memory usage. Even though the article below is Japanese, the sample in it might be useful to use insert_object().

https://techblog.yahoo.co.jp/data_solution/ngtpython/

anxietymonger closed this as completed Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

num_threads specification does not work #12

num_threads specification does not work #12

anxietymonger commented Mar 20, 2018 •

edited

Loading

masajiro commented Mar 20, 2018

num_threads specification does not work #12

num_threads specification does not work #12

Comments

anxietymonger commented Mar 20, 2018 • edited Loading

masajiro commented Mar 20, 2018

anxietymonger commented Mar 20, 2018 •

edited

Loading