In [3]:
%matplotlib inline


Ensemble LDA
============

Introduces Gensim's EnsembleLda model




In [4]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

This tutorial will explain how to use the EnsembleLDA model class.

EnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,
it can be used to remove topics from your results that are noise and are not reproducible.




Corpus
------
We will use the gensim downloader api to get a small corpus for training our ensemble.

The preprocessing is similar to `sphx_glr_auto_examples_tutorials_run_word2vec.py`,
so it won't be explained again in detail.




In [5]:
import gensim.downloader as api
from gensim.corpora import Dictionary
from nltk.stem.wordnet import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
docs = api.load('text8')

dictionary = Dictionary()
for doc in docs:
    dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])
dictionary.filter_extremes(no_below=20, no_above=0.5)

corpus = [dictionary.doc2bow(doc) for doc in docs]

2021-11-22 16:46:58,744 : INFO : adding document #0 to Dictionary<0 unique tokens: []>
2021-11-22 16:46:58,750 : INFO : built Dictionary<2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1 documents (total 10000 corpus positions)
2021-11-22 16:46:58,778 : INFO : adding document #0 to Dictionary<2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:46:58,783 : INFO : built Dictionary<3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 2 documents (total 20000 corpus positions)
2021-11-22 16:46:58,811 : INFO : adding document #0 to Dictionary<3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:46:58,816 : INFO : built Dictionary<5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 3 documents (total 30000 corpus positions)
2021-11-22 16:46:58,844 : INFO : adding document #0 to Dictionary<5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']

2021-11-22 16:46:59,653 : INFO : adding document #0 to Dictionary<20766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:46:59,658 : INFO : built Dictionary<21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 28 documents (total 280000 corpus positions)
2021-11-22 16:46:59,686 : INFO : adding document #0 to Dictionary<21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:46:59,691 : INFO : built Dictionary<21602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 29 documents (total 290000 corpus positions)
2021-11-22 16:46:59,719 : INFO : adding document #0 to Dictionary<21602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:46:59,725 : INFO : built Dictionary<21878 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 30 documents (total 300000 corpus positions)
2021-11-22 16:46:59,753 : INFO : adding document #0 to Dictionary<2187

2021-11-22 16:47:00,547 : INFO : built Dictionary<31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 54 documents (total 540000 corpus positions)
2021-11-22 16:47:00,576 : INFO : adding document #0 to Dictionary<31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:00,584 : INFO : built Dictionary<32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 55 documents (total 550000 corpus positions)
2021-11-22 16:47:00,612 : INFO : adding document #0 to Dictionary<32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:00,619 : INFO : built Dictionary<32761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 56 documents (total 560000 corpus positions)
2021-11-22 16:47:00,648 : INFO : adding document #0 to Dictionary<32761 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:00,653 : INFO : built Dictionary<33053 unique tokens:

2021-11-22 16:47:01,494 : INFO : adding document #0 to Dictionary<40614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:01,502 : INFO : built Dictionary<41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 81 documents (total 810000 corpus positions)
2021-11-22 16:47:01,531 : INFO : adding document #0 to Dictionary<41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:01,538 : INFO : built Dictionary<41378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 82 documents (total 820000 corpus positions)
2021-11-22 16:47:01,567 : INFO : adding document #0 to Dictionary<41378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:01,573 : INFO : built Dictionary<41734 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 83 documents (total 830000 corpus positions)
2021-11-22 16:47:01,602 : INFO : adding document #0 to Dictionary<4173

2021-11-22 16:47:02,438 : INFO : built Dictionary<49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 107 documents (total 1070000 corpus positions)
2021-11-22 16:47:02,467 : INFO : adding document #0 to Dictionary<49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:02,473 : INFO : built Dictionary<50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 108 documents (total 1080000 corpus positions)
2021-11-22 16:47:02,502 : INFO : adding document #0 to Dictionary<50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:02,509 : INFO : built Dictionary<50348 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 109 documents (total 1090000 corpus positions)
2021-11-22 16:47:02,538 : INFO : adding document #0 to Dictionary<50348 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:02,545 : INFO : built Dictionary<50686 unique t

2021-11-22 16:47:03,409 : INFO : adding document #0 to Dictionary<56128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:03,416 : INFO : built Dictionary<56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 134 documents (total 1340000 corpus positions)
2021-11-22 16:47:03,445 : INFO : adding document #0 to Dictionary<56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:03,451 : INFO : built Dictionary<56401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 135 documents (total 1350000 corpus positions)
2021-11-22 16:47:03,481 : INFO : adding document #0 to Dictionary<56401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:03,489 : INFO : built Dictionary<56673 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 136 documents (total 1360000 corpus positions)
2021-11-22 16:47:03,519 : INFO : adding document #0 to Dictionar

2021-11-22 16:47:04,380 : INFO : built Dictionary<62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 160 documents (total 1600000 corpus positions)
2021-11-22 16:47:04,409 : INFO : adding document #0 to Dictionary<62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:04,416 : INFO : built Dictionary<62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 161 documents (total 1610000 corpus positions)
2021-11-22 16:47:04,446 : INFO : adding document #0 to Dictionary<62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:04,453 : INFO : built Dictionary<62495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 162 documents (total 1620000 corpus positions)
2021-11-22 16:47:04,482 : INFO : adding document #0 to Dictionary<62495 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:04,490 : INFO : built Dictionary<62821 unique t

2021-11-22 16:47:05,342 : INFO : adding document #0 to Dictionary<68096 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:05,350 : INFO : built Dictionary<68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 187 documents (total 1870000 corpus positions)
2021-11-22 16:47:05,381 : INFO : adding document #0 to Dictionary<68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:05,390 : INFO : built Dictionary<68621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 188 documents (total 1880000 corpus positions)
2021-11-22 16:47:05,420 : INFO : adding document #0 to Dictionary<68621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:05,428 : INFO : built Dictionary<68875 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 189 documents (total 1890000 corpus positions)
2021-11-22 16:47:05,458 : INFO : adding document #0 to Dictionar

2021-11-22 16:47:06,279 : INFO : built Dictionary<73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 213 documents (total 2130000 corpus positions)
2021-11-22 16:47:06,308 : INFO : adding document #0 to Dictionary<73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:06,314 : INFO : built Dictionary<73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 214 documents (total 2140000 corpus positions)
2021-11-22 16:47:06,342 : INFO : adding document #0 to Dictionary<73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:06,348 : INFO : built Dictionary<73889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 215 documents (total 2150000 corpus positions)
2021-11-22 16:47:06,378 : INFO : adding document #0 to Dictionary<73889 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:06,384 : INFO : built Dictionary<73986 unique t

2021-11-22 16:47:07,249 : INFO : adding document #0 to Dictionary<78577 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:07,255 : INFO : built Dictionary<78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 240 documents (total 2400000 corpus positions)
2021-11-22 16:47:07,284 : INFO : adding document #0 to Dictionary<78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:07,291 : INFO : built Dictionary<78896 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 241 documents (total 2410000 corpus positions)
2021-11-22 16:47:07,319 : INFO : adding document #0 to Dictionary<78896 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:07,326 : INFO : built Dictionary<79128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 242 documents (total 2420000 corpus positions)
2021-11-22 16:47:07,355 : INFO : adding document #0 to Dictionar

2021-11-22 16:47:08,173 : INFO : built Dictionary<83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 266 documents (total 2660000 corpus positions)
2021-11-22 16:47:08,201 : INFO : adding document #0 to Dictionary<83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:08,207 : INFO : built Dictionary<83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 267 documents (total 2670000 corpus positions)
2021-11-22 16:47:08,234 : INFO : adding document #0 to Dictionary<83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:08,240 : INFO : built Dictionary<83536 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 268 documents (total 2680000 corpus positions)
2021-11-22 16:47:08,268 : INFO : adding document #0 to Dictionary<83536 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:08,274 : INFO : built Dictionary<83762 unique t

2021-11-22 16:47:09,090 : INFO : adding document #0 to Dictionary<87224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:09,096 : INFO : built Dictionary<87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 293 documents (total 2930000 corpus positions)
2021-11-22 16:47:09,124 : INFO : adding document #0 to Dictionary<87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:09,130 : INFO : built Dictionary<87368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 294 documents (total 2940000 corpus positions)
2021-11-22 16:47:09,158 : INFO : adding document #0 to Dictionary<87368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:09,170 : INFO : built Dictionary<87452 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 295 documents (total 2950000 corpus positions)
2021-11-22 16:47:09,200 : INFO : adding document #0 to Dictionar

2021-11-22 16:47:09,992 : INFO : built Dictionary<90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 319 documents (total 3190000 corpus positions)
2021-11-22 16:47:10,020 : INFO : adding document #0 to Dictionary<90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,025 : INFO : built Dictionary<90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 320 documents (total 3200000 corpus positions)
2021-11-22 16:47:10,053 : INFO : adding document #0 to Dictionary<90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,058 : INFO : built Dictionary<91088 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 321 documents (total 3210000 corpus positions)
2021-11-22 16:47:10,085 : INFO : adding document #0 to Dictionary<91088 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,090 : INFO : built Dictionary<91214 unique t

2021-11-22 16:47:10,916 : INFO : adding document #0 to Dictionary<94701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,923 : INFO : built Dictionary<94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 346 documents (total 3460000 corpus positions)
2021-11-22 16:47:10,952 : INFO : adding document #0 to Dictionary<94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,958 : INFO : built Dictionary<94935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 347 documents (total 3470000 corpus positions)
2021-11-22 16:47:10,986 : INFO : adding document #0 to Dictionary<94935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:10,992 : INFO : built Dictionary<95081 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 348 documents (total 3480000 corpus positions)
2021-11-22 16:47:11,021 : INFO : adding document #0 to Dictionar

2021-11-22 16:47:11,828 : INFO : built Dictionary<98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 372 documents (total 3720000 corpus positions)
2021-11-22 16:47:11,857 : INFO : adding document #0 to Dictionary<98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:11,863 : INFO : built Dictionary<98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 373 documents (total 3730000 corpus positions)
2021-11-22 16:47:11,892 : INFO : adding document #0 to Dictionary<98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:11,898 : INFO : built Dictionary<98886 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 374 documents (total 3740000 corpus positions)
2021-11-22 16:47:11,928 : INFO : adding document #0 to Dictionary<98886 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:11,934 : INFO : built Dictionary<99025 unique t

2021-11-22 16:47:12,759 : INFO : adding document #0 to Dictionary<101874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:12,765 : INFO : built Dictionary<101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 399 documents (total 3990000 corpus positions)
2021-11-22 16:47:12,794 : INFO : adding document #0 to Dictionary<101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:12,800 : INFO : built Dictionary<102121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 400 documents (total 4000000 corpus positions)
2021-11-22 16:47:12,827 : INFO : adding document #0 to Dictionary<102121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:12,833 : INFO : built Dictionary<102210 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 401 documents (total 4010000 corpus positions)
2021-11-22 16:47:12,862 : INFO : adding document #0 to Dic

2021-11-22 16:47:13,667 : INFO : built Dictionary<105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 425 documents (total 4250000 corpus positions)
2021-11-22 16:47:13,695 : INFO : adding document #0 to Dictionary<105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:13,701 : INFO : built Dictionary<105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 426 documents (total 4260000 corpus positions)
2021-11-22 16:47:13,729 : INFO : adding document #0 to Dictionary<105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:13,736 : INFO : built Dictionary<105677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 427 documents (total 4270000 corpus positions)
2021-11-22 16:47:13,765 : INFO : adding document #0 to Dictionary<105677 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:13,772 : INFO : built Dictionary<105814 u

2021-11-22 16:47:14,622 : INFO : adding document #0 to Dictionary<109089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:14,629 : INFO : built Dictionary<109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 452 documents (total 4520000 corpus positions)
2021-11-22 16:47:14,659 : INFO : adding document #0 to Dictionary<109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:14,666 : INFO : built Dictionary<109307 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 453 documents (total 4530000 corpus positions)
2021-11-22 16:47:14,695 : INFO : adding document #0 to Dictionary<109307 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:14,702 : INFO : built Dictionary<109441 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 454 documents (total 4540000 corpus positions)
2021-11-22 16:47:14,731 : INFO : adding document #0 to Dic

2021-11-22 16:47:15,554 : INFO : built Dictionary<112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 478 documents (total 4780000 corpus positions)
2021-11-22 16:47:15,582 : INFO : adding document #0 to Dictionary<112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:15,587 : INFO : built Dictionary<112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 479 documents (total 4790000 corpus positions)
2021-11-22 16:47:15,616 : INFO : adding document #0 to Dictionary<112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:15,624 : INFO : built Dictionary<112426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 480 documents (total 4800000 corpus positions)
2021-11-22 16:47:15,652 : INFO : adding document #0 to Dictionary<112426 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:15,657 : INFO : built Dictionary<112484 u

2021-11-22 16:47:16,499 : INFO : adding document #0 to Dictionary<116297 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:16,506 : INFO : built Dictionary<116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 505 documents (total 5050000 corpus positions)
2021-11-22 16:47:16,534 : INFO : adding document #0 to Dictionary<116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:16,542 : INFO : built Dictionary<116658 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 506 documents (total 5060000 corpus positions)
2021-11-22 16:47:16,570 : INFO : adding document #0 to Dictionary<116658 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:16,577 : INFO : built Dictionary<116770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 507 documents (total 5070000 corpus positions)
2021-11-22 16:47:16,604 : INFO : adding document #0 to Dic

2021-11-22 16:47:17,424 : INFO : built Dictionary<120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 531 documents (total 5310000 corpus positions)
2021-11-22 16:47:17,453 : INFO : adding document #0 to Dictionary<120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:17,459 : INFO : built Dictionary<120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 532 documents (total 5320000 corpus positions)
2021-11-22 16:47:17,488 : INFO : adding document #0 to Dictionary<120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:17,495 : INFO : built Dictionary<120304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 533 documents (total 5330000 corpus positions)
2021-11-22 16:47:17,524 : INFO : adding document #0 to Dictionary<120304 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:17,531 : INFO : built Dictionary<120424 u

2021-11-22 16:47:18,376 : INFO : adding document #0 to Dictionary<123363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:18,383 : INFO : built Dictionary<123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 558 documents (total 5580000 corpus positions)
2021-11-22 16:47:18,411 : INFO : adding document #0 to Dictionary<123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:18,417 : INFO : built Dictionary<123643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 559 documents (total 5590000 corpus positions)
2021-11-22 16:47:18,445 : INFO : adding document #0 to Dictionary<123643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:18,453 : INFO : built Dictionary<123711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 560 documents (total 5600000 corpus positions)
2021-11-22 16:47:18,481 : INFO : adding document #0 to Dic

2021-11-22 16:47:19,286 : INFO : built Dictionary<126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 584 documents (total 5840000 corpus positions)
2021-11-22 16:47:19,316 : INFO : adding document #0 to Dictionary<126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:19,322 : INFO : built Dictionary<126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 585 documents (total 5850000 corpus positions)
2021-11-22 16:47:19,351 : INFO : adding document #0 to Dictionary<126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:19,357 : INFO : built Dictionary<126982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 586 documents (total 5860000 corpus positions)
2021-11-22 16:47:19,386 : INFO : adding document #0 to Dictionary<126982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:19,392 : INFO : built Dictionary<127080 u

2021-11-22 16:47:20,230 : INFO : adding document #0 to Dictionary<129807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:20,237 : INFO : built Dictionary<129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 611 documents (total 6110000 corpus positions)
2021-11-22 16:47:20,265 : INFO : adding document #0 to Dictionary<129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:20,272 : INFO : built Dictionary<130017 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 612 documents (total 6120000 corpus positions)
2021-11-22 16:47:20,299 : INFO : adding document #0 to Dictionary<130017 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:20,305 : INFO : built Dictionary<130108 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 613 documents (total 6130000 corpus positions)
2021-11-22 16:47:20,335 : INFO : adding document #0 to Dic

2021-11-22 16:47:21,156 : INFO : built Dictionary<132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 637 documents (total 6370000 corpus positions)
2021-11-22 16:47:21,184 : INFO : adding document #0 to Dictionary<132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:21,190 : INFO : built Dictionary<132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 638 documents (total 6380000 corpus positions)
2021-11-22 16:47:21,219 : INFO : adding document #0 to Dictionary<132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:21,225 : INFO : built Dictionary<133060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 639 documents (total 6390000 corpus positions)
2021-11-22 16:47:21,253 : INFO : adding document #0 to Dictionary<133060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:21,260 : INFO : built Dictionary<133151 u

2021-11-22 16:47:22,101 : INFO : adding document #0 to Dictionary<136011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:22,109 : INFO : built Dictionary<136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 664 documents (total 6640000 corpus positions)
2021-11-22 16:47:22,138 : INFO : adding document #0 to Dictionary<136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:22,145 : INFO : built Dictionary<136293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 665 documents (total 6650000 corpus positions)
2021-11-22 16:47:22,174 : INFO : adding document #0 to Dictionary<136293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:22,181 : INFO : built Dictionary<136353 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 666 documents (total 6660000 corpus positions)
2021-11-22 16:47:22,209 : INFO : adding document #0 to Dic

2021-11-22 16:47:23,018 : INFO : built Dictionary<139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 690 documents (total 6900000 corpus positions)
2021-11-22 16:47:23,046 : INFO : adding document #0 to Dictionary<139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:23,052 : INFO : built Dictionary<139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 691 documents (total 6910000 corpus positions)
2021-11-22 16:47:23,080 : INFO : adding document #0 to Dictionary<139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:23,087 : INFO : built Dictionary<139430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 692 documents (total 6920000 corpus positions)
2021-11-22 16:47:23,114 : INFO : adding document #0 to Dictionary<139430 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:23,120 : INFO : built Dictionary<139521 u

2021-11-22 16:47:23,963 : INFO : adding document #0 to Dictionary<142299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:23,970 : INFO : built Dictionary<142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 717 documents (total 7170000 corpus positions)
2021-11-22 16:47:23,999 : INFO : adding document #0 to Dictionary<142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:24,005 : INFO : built Dictionary<142453 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 718 documents (total 7180000 corpus positions)
2021-11-22 16:47:24,034 : INFO : adding document #0 to Dictionary<142453 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:24,041 : INFO : built Dictionary<142562 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 719 documents (total 7190000 corpus positions)
2021-11-22 16:47:24,069 : INFO : adding document #0 to Dic

2021-11-22 16:47:24,898 : INFO : built Dictionary<145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 743 documents (total 7430000 corpus positions)
2021-11-22 16:47:24,926 : INFO : adding document #0 to Dictionary<145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:24,936 : INFO : built Dictionary<145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 744 documents (total 7440000 corpus positions)
2021-11-22 16:47:24,964 : INFO : adding document #0 to Dictionary<145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:24,970 : INFO : built Dictionary<145596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 745 documents (total 7450000 corpus positions)
2021-11-22 16:47:24,999 : INFO : adding document #0 to Dictionary<145596 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:25,006 : INFO : built Dictionary<145704 u

2021-11-22 16:47:25,857 : INFO : adding document #0 to Dictionary<148526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:25,864 : INFO : built Dictionary<148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 770 documents (total 7700000 corpus positions)
2021-11-22 16:47:25,893 : INFO : adding document #0 to Dictionary<148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:25,900 : INFO : built Dictionary<148663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 771 documents (total 7710000 corpus positions)
2021-11-22 16:47:25,929 : INFO : adding document #0 to Dictionary<148663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:25,936 : INFO : built Dictionary<148735 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 772 documents (total 7720000 corpus positions)
2021-11-22 16:47:25,966 : INFO : adding document #0 to Dic

2021-11-22 16:47:26,794 : INFO : built Dictionary<151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 796 documents (total 7960000 corpus positions)
2021-11-22 16:47:26,821 : INFO : adding document #0 to Dictionary<151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:26,827 : INFO : built Dictionary<151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 797 documents (total 7970000 corpus positions)
2021-11-22 16:47:26,855 : INFO : adding document #0 to Dictionary<151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:26,862 : INFO : built Dictionary<151812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 798 documents (total 7980000 corpus positions)
2021-11-22 16:47:26,889 : INFO : adding document #0 to Dictionary<151812 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:26,896 : INFO : built Dictionary<151891 u

2021-11-22 16:47:27,721 : INFO : adding document #0 to Dictionary<155322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:27,728 : INFO : built Dictionary<155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 823 documents (total 8230000 corpus positions)
2021-11-22 16:47:27,756 : INFO : adding document #0 to Dictionary<155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:27,763 : INFO : built Dictionary<155639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 824 documents (total 8240000 corpus positions)
2021-11-22 16:47:27,792 : INFO : adding document #0 to Dictionary<155639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:27,799 : INFO : built Dictionary<155714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 825 documents (total 8250000 corpus positions)
2021-11-22 16:47:27,827 : INFO : adding document #0 to Dic

2021-11-22 16:47:28,639 : INFO : built Dictionary<158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 849 documents (total 8490000 corpus positions)
2021-11-22 16:47:28,667 : INFO : adding document #0 to Dictionary<158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:28,674 : INFO : built Dictionary<158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 850 documents (total 8500000 corpus positions)
2021-11-22 16:47:28,702 : INFO : adding document #0 to Dictionary<158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:28,709 : INFO : built Dictionary<158990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 851 documents (total 8510000 corpus positions)
2021-11-22 16:47:28,737 : INFO : adding document #0 to Dictionary<158990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:28,743 : INFO : built Dictionary<159060 u

2021-11-22 16:47:29,574 : INFO : adding document #0 to Dictionary<161458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:29,580 : INFO : built Dictionary<161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 876 documents (total 8760000 corpus positions)
2021-11-22 16:47:29,608 : INFO : adding document #0 to Dictionary<161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:29,615 : INFO : built Dictionary<161691 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 877 documents (total 8770000 corpus positions)
2021-11-22 16:47:29,644 : INFO : adding document #0 to Dictionary<161691 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:29,651 : INFO : built Dictionary<161832 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 878 documents (total 8780000 corpus positions)
2021-11-22 16:47:29,679 : INFO : adding document #0 to Dic

2021-11-22 16:47:30,505 : INFO : built Dictionary<166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 902 documents (total 9020000 corpus positions)
2021-11-22 16:47:30,534 : INFO : adding document #0 to Dictionary<166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:30,540 : INFO : built Dictionary<166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 903 documents (total 9030000 corpus positions)
2021-11-22 16:47:30,569 : INFO : adding document #0 to Dictionary<166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:30,575 : INFO : built Dictionary<166717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 904 documents (total 9040000 corpus positions)
2021-11-22 16:47:30,603 : INFO : adding document #0 to Dictionary<166717 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:30,610 : INFO : built Dictionary<166828 u

2021-11-22 16:47:31,445 : INFO : adding document #0 to Dictionary<168988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:31,451 : INFO : built Dictionary<169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 929 documents (total 9290000 corpus positions)
2021-11-22 16:47:31,480 : INFO : adding document #0 to Dictionary<169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:31,487 : INFO : built Dictionary<169212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 930 documents (total 9300000 corpus positions)
2021-11-22 16:47:31,515 : INFO : adding document #0 to Dictionary<169212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:31,522 : INFO : built Dictionary<169560 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 931 documents (total 9310000 corpus positions)
2021-11-22 16:47:31,550 : INFO : adding document #0 to Dic

2021-11-22 16:47:32,371 : INFO : built Dictionary<171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 955 documents (total 9550000 corpus positions)
2021-11-22 16:47:32,401 : INFO : adding document #0 to Dictionary<171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:32,408 : INFO : built Dictionary<171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 956 documents (total 9560000 corpus positions)
2021-11-22 16:47:32,438 : INFO : adding document #0 to Dictionary<171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:32,444 : INFO : built Dictionary<171840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 957 documents (total 9570000 corpus positions)
2021-11-22 16:47:32,474 : INFO : adding document #0 to Dictionary<171840 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:32,481 : INFO : built Dictionary<171897 u

2021-11-22 16:47:33,314 : INFO : adding document #0 to Dictionary<174089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:33,320 : INFO : built Dictionary<174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 982 documents (total 9820000 corpus positions)
2021-11-22 16:47:33,348 : INFO : adding document #0 to Dictionary<174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:33,354 : INFO : built Dictionary<174227 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 983 documents (total 9830000 corpus positions)
2021-11-22 16:47:33,381 : INFO : adding document #0 to Dictionary<174227 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:33,387 : INFO : built Dictionary<174288 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 984 documents (total 9840000 corpus positions)
2021-11-22 16:47:33,415 : INFO : adding document #0 to Dic

2021-11-22 16:47:34,243 : INFO : built Dictionary<177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1008 documents (total 10080000 corpus positions)
2021-11-22 16:47:34,272 : INFO : adding document #0 to Dictionary<177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:34,278 : INFO : built Dictionary<177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1009 documents (total 10090000 corpus positions)
2021-11-22 16:47:34,306 : INFO : adding document #0 to Dictionary<177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:34,313 : INFO : built Dictionary<177166 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1010 documents (total 10100000 corpus positions)
2021-11-22 16:47:34,342 : INFO : adding document #0 to Dictionary<177166 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:34,348 : INFO : built Dictionary<17

2021-11-22 16:47:35,170 : INFO : built Dictionary<179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1034 documents (total 10340000 corpus positions)
2021-11-22 16:47:35,198 : INFO : adding document #0 to Dictionary<179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:35,205 : INFO : built Dictionary<179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1035 documents (total 10350000 corpus positions)
2021-11-22 16:47:35,234 : INFO : adding document #0 to Dictionary<179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:35,241 : INFO : built Dictionary<179969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1036 documents (total 10360000 corpus positions)
2021-11-22 16:47:35,269 : INFO : adding document #0 to Dictionary<179969 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:35,276 : INFO : built Dictionary<18

2021-11-22 16:47:36,084 : INFO : built Dictionary<182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1060 documents (total 10600000 corpus positions)
2021-11-22 16:47:36,113 : INFO : adding document #0 to Dictionary<182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:36,120 : INFO : built Dictionary<182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1061 documents (total 10610000 corpus positions)
2021-11-22 16:47:36,149 : INFO : adding document #0 to Dictionary<182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:36,156 : INFO : built Dictionary<182170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1062 documents (total 10620000 corpus positions)
2021-11-22 16:47:36,185 : INFO : adding document #0 to Dictionary<182170 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:36,192 : INFO : built Dictionary<18

2021-11-22 16:47:36,985 : INFO : built Dictionary<184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1086 documents (total 10860000 corpus positions)
2021-11-22 16:47:37,012 : INFO : adding document #0 to Dictionary<184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,017 : INFO : built Dictionary<184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1087 documents (total 10870000 corpus positions)
2021-11-22 16:47:37,044 : INFO : adding document #0 to Dictionary<184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,051 : INFO : built Dictionary<184402 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1088 documents (total 10880000 corpus positions)
2021-11-22 16:47:37,079 : INFO : adding document #0 to Dictionary<184402 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,084 : INFO : built Dictionary<18

2021-11-22 16:47:37,880 : INFO : built Dictionary<186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1112 documents (total 11120000 corpus positions)
2021-11-22 16:47:37,907 : INFO : adding document #0 to Dictionary<186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,913 : INFO : built Dictionary<186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1113 documents (total 11130000 corpus positions)
2021-11-22 16:47:37,941 : INFO : adding document #0 to Dictionary<186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,947 : INFO : built Dictionary<186602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1114 documents (total 11140000 corpus positions)
2021-11-22 16:47:37,975 : INFO : adding document #0 to Dictionary<186602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:37,983 : INFO : built Dictionary<18

2021-11-22 16:47:38,781 : INFO : built Dictionary<188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1138 documents (total 11380000 corpus positions)
2021-11-22 16:47:38,810 : INFO : adding document #0 to Dictionary<188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:38,816 : INFO : built Dictionary<188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1139 documents (total 11390000 corpus positions)
2021-11-22 16:47:38,844 : INFO : adding document #0 to Dictionary<188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:38,850 : INFO : built Dictionary<188798 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1140 documents (total 11400000 corpus positions)
2021-11-22 16:47:38,878 : INFO : adding document #0 to Dictionary<188798 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:38,885 : INFO : built Dictionary<18

2021-11-22 16:47:39,683 : INFO : built Dictionary<191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1164 documents (total 11640000 corpus positions)
2021-11-22 16:47:39,712 : INFO : adding document #0 to Dictionary<191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:39,719 : INFO : built Dictionary<191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1165 documents (total 11650000 corpus positions)
2021-11-22 16:47:39,747 : INFO : adding document #0 to Dictionary<191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:39,755 : INFO : built Dictionary<191212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1166 documents (total 11660000 corpus positions)
2021-11-22 16:47:39,783 : INFO : adding document #0 to Dictionary<191212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:39,790 : INFO : built Dictionary<19

2021-11-22 16:47:40,609 : INFO : built Dictionary<193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1190 documents (total 11900000 corpus positions)
2021-11-22 16:47:40,637 : INFO : adding document #0 to Dictionary<193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:40,643 : INFO : built Dictionary<193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1191 documents (total 11910000 corpus positions)
2021-11-22 16:47:40,671 : INFO : adding document #0 to Dictionary<193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:40,677 : INFO : built Dictionary<193556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1192 documents (total 11920000 corpus positions)
2021-11-22 16:47:40,705 : INFO : adding document #0 to Dictionary<193556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:40,712 : INFO : built Dictionary<19

2021-11-22 16:47:41,522 : INFO : built Dictionary<195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1216 documents (total 12160000 corpus positions)
2021-11-22 16:47:41,550 : INFO : adding document #0 to Dictionary<195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:41,556 : INFO : built Dictionary<195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1217 documents (total 12170000 corpus positions)
2021-11-22 16:47:41,585 : INFO : adding document #0 to Dictionary<195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:41,591 : INFO : built Dictionary<195836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1218 documents (total 12180000 corpus positions)
2021-11-22 16:47:41,619 : INFO : adding document #0 to Dictionary<195836 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:41,623 : INFO : built Dictionary<19

2021-11-22 16:47:42,408 : INFO : built Dictionary<198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1242 documents (total 12420000 corpus positions)
2021-11-22 16:47:42,437 : INFO : adding document #0 to Dictionary<198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:42,445 : INFO : built Dictionary<198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1243 documents (total 12430000 corpus positions)
2021-11-22 16:47:42,475 : INFO : adding document #0 to Dictionary<198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:42,481 : INFO : built Dictionary<198312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1244 documents (total 12440000 corpus positions)
2021-11-22 16:47:42,511 : INFO : adding document #0 to Dictionary<198312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:42,519 : INFO : built Dictionary<19

2021-11-22 16:47:43,316 : INFO : built Dictionary<199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1268 documents (total 12680000 corpus positions)
2021-11-22 16:47:43,344 : INFO : adding document #0 to Dictionary<199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:43,350 : INFO : built Dictionary<199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1269 documents (total 12690000 corpus positions)
2021-11-22 16:47:43,379 : INFO : adding document #0 to Dictionary<199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:43,386 : INFO : built Dictionary<200060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1270 documents (total 12700000 corpus positions)
2021-11-22 16:47:43,414 : INFO : adding document #0 to Dictionary<200060 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:43,421 : INFO : built Dictionary<20

2021-11-22 16:47:44,239 : INFO : built Dictionary<202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1294 documents (total 12940000 corpus positions)
2021-11-22 16:47:44,267 : INFO : adding document #0 to Dictionary<202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:44,274 : INFO : built Dictionary<202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1295 documents (total 12950000 corpus positions)
2021-11-22 16:47:44,303 : INFO : adding document #0 to Dictionary<202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:44,311 : INFO : built Dictionary<202681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1296 documents (total 12960000 corpus positions)
2021-11-22 16:47:44,339 : INFO : adding document #0 to Dictionary<202681 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:44,347 : INFO : built Dictionary<20

2021-11-22 16:47:45,160 : INFO : built Dictionary<204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1320 documents (total 13200000 corpus positions)
2021-11-22 16:47:45,188 : INFO : adding document #0 to Dictionary<204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:45,195 : INFO : built Dictionary<204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1321 documents (total 13210000 corpus positions)
2021-11-22 16:47:45,223 : INFO : adding document #0 to Dictionary<204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:45,229 : INFO : built Dictionary<204982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1322 documents (total 13220000 corpus positions)
2021-11-22 16:47:45,258 : INFO : adding document #0 to Dictionary<204982 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:45,264 : INFO : built Dictionary<20

2021-11-22 16:47:46,084 : INFO : built Dictionary<206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1346 documents (total 13460000 corpus positions)
2021-11-22 16:47:46,112 : INFO : adding document #0 to Dictionary<206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:46,119 : INFO : built Dictionary<206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1347 documents (total 13470000 corpus positions)
2021-11-22 16:47:46,147 : INFO : adding document #0 to Dictionary<206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:46,155 : INFO : built Dictionary<206924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1348 documents (total 13480000 corpus positions)
2021-11-22 16:47:46,183 : INFO : adding document #0 to Dictionary<206924 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:46,190 : INFO : built Dictionary<20

2021-11-22 16:47:47,007 : INFO : built Dictionary<209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1372 documents (total 13720000 corpus positions)
2021-11-22 16:47:47,036 : INFO : adding document #0 to Dictionary<209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:47,043 : INFO : built Dictionary<209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1373 documents (total 13730000 corpus positions)
2021-11-22 16:47:47,071 : INFO : adding document #0 to Dictionary<209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:47,078 : INFO : built Dictionary<209214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1374 documents (total 13740000 corpus positions)
2021-11-22 16:47:47,107 : INFO : adding document #0 to Dictionary<209214 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:47,114 : INFO : built Dictionary<20

2021-11-22 16:47:47,938 : INFO : built Dictionary<212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1398 documents (total 13980000 corpus positions)
2021-11-22 16:47:47,968 : INFO : adding document #0 to Dictionary<212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:47,974 : INFO : built Dictionary<212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1399 documents (total 13990000 corpus positions)
2021-11-22 16:47:48,003 : INFO : adding document #0 to Dictionary<212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:48,010 : INFO : built Dictionary<212395 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1400 documents (total 14000000 corpus positions)
2021-11-22 16:47:48,038 : INFO : adding document #0 to Dictionary<212395 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:48,046 : INFO : built Dictionary<21

2021-11-22 16:47:48,844 : INFO : built Dictionary<214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1424 documents (total 14240000 corpus positions)
2021-11-22 16:47:48,873 : INFO : adding document #0 to Dictionary<214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:48,879 : INFO : built Dictionary<214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1425 documents (total 14250000 corpus positions)
2021-11-22 16:47:48,907 : INFO : adding document #0 to Dictionary<214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:48,913 : INFO : built Dictionary<214805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1426 documents (total 14260000 corpus positions)
2021-11-22 16:47:48,941 : INFO : adding document #0 to Dictionary<214805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:48,948 : INFO : built Dictionary<21

2021-11-22 16:47:49,753 : INFO : built Dictionary<216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1450 documents (total 14500000 corpus positions)
2021-11-22 16:47:49,781 : INFO : adding document #0 to Dictionary<216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:49,788 : INFO : built Dictionary<216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1451 documents (total 14510000 corpus positions)
2021-11-22 16:47:49,816 : INFO : adding document #0 to Dictionary<216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:49,822 : INFO : built Dictionary<216913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1452 documents (total 14520000 corpus positions)
2021-11-22 16:47:49,850 : INFO : adding document #0 to Dictionary<216913 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:49,857 : INFO : built Dictionary<21

2021-11-22 16:47:50,667 : INFO : built Dictionary<219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1476 documents (total 14760000 corpus positions)
2021-11-22 16:47:50,694 : INFO : adding document #0 to Dictionary<219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:50,698 : INFO : built Dictionary<219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1477 documents (total 14770000 corpus positions)
2021-11-22 16:47:50,726 : INFO : adding document #0 to Dictionary<219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:50,735 : INFO : built Dictionary<220238 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1478 documents (total 14780000 corpus positions)
2021-11-22 16:47:50,763 : INFO : adding document #0 to Dictionary<220238 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:50,769 : INFO : built Dictionary<22

2021-11-22 16:47:51,575 : INFO : built Dictionary<222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1502 documents (total 15020000 corpus positions)
2021-11-22 16:47:51,603 : INFO : adding document #0 to Dictionary<222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:51,609 : INFO : built Dictionary<222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1503 documents (total 15030000 corpus positions)
2021-11-22 16:47:51,638 : INFO : adding document #0 to Dictionary<222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:51,646 : INFO : built Dictionary<222977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1504 documents (total 15040000 corpus positions)
2021-11-22 16:47:51,673 : INFO : adding document #0 to Dictionary<222977 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:51,680 : INFO : built Dictionary<22

2021-11-22 16:47:52,480 : INFO : built Dictionary<224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1528 documents (total 15280000 corpus positions)
2021-11-22 16:47:52,509 : INFO : adding document #0 to Dictionary<224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:52,515 : INFO : built Dictionary<224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1529 documents (total 15290000 corpus positions)
2021-11-22 16:47:52,543 : INFO : adding document #0 to Dictionary<224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:52,550 : INFO : built Dictionary<225079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1530 documents (total 15300000 corpus positions)
2021-11-22 16:47:52,578 : INFO : adding document #0 to Dictionary<225079 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:52,585 : INFO : built Dictionary<22

2021-11-22 16:47:53,401 : INFO : built Dictionary<227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1554 documents (total 15540000 corpus positions)
2021-11-22 16:47:53,430 : INFO : adding document #0 to Dictionary<227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:53,436 : INFO : built Dictionary<227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1555 documents (total 15550000 corpus positions)
2021-11-22 16:47:53,464 : INFO : adding document #0 to Dictionary<227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:53,470 : INFO : built Dictionary<227302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1556 documents (total 15560000 corpus positions)
2021-11-22 16:47:53,499 : INFO : adding document #0 to Dictionary<227302 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:53,506 : INFO : built Dictionary<22

2021-11-22 16:47:54,323 : INFO : built Dictionary<229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1580 documents (total 15800000 corpus positions)
2021-11-22 16:47:54,351 : INFO : adding document #0 to Dictionary<229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:54,358 : INFO : built Dictionary<229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1581 documents (total 15810000 corpus positions)
2021-11-22 16:47:54,386 : INFO : adding document #0 to Dictionary<229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:54,393 : INFO : built Dictionary<229212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1582 documents (total 15820000 corpus positions)
2021-11-22 16:47:54,420 : INFO : adding document #0 to Dictionary<229212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:54,427 : INFO : built Dictionary<22

2021-11-22 16:47:55,237 : INFO : built Dictionary<231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1606 documents (total 16060000 corpus positions)
2021-11-22 16:47:55,265 : INFO : adding document #0 to Dictionary<231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:55,273 : INFO : built Dictionary<231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1607 documents (total 16070000 corpus positions)
2021-11-22 16:47:55,302 : INFO : adding document #0 to Dictionary<231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:55,308 : INFO : built Dictionary<231083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1608 documents (total 16080000 corpus positions)
2021-11-22 16:47:55,337 : INFO : adding document #0 to Dictionary<231083 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:55,345 : INFO : built Dictionary<23

2021-11-22 16:47:56,430 : INFO : built Dictionary<232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1632 documents (total 16320000 corpus positions)
2021-11-22 16:47:56,482 : INFO : adding document #0 to Dictionary<232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:56,493 : INFO : built Dictionary<233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1633 documents (total 16330000 corpus positions)
2021-11-22 16:47:56,545 : INFO : adding document #0 to Dictionary<233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:56,555 : INFO : built Dictionary<233076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1634 documents (total 16340000 corpus positions)
2021-11-22 16:47:56,601 : INFO : adding document #0 to Dictionary<233076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:56,610 : INFO : built Dictionary<23

2021-11-22 16:47:57,444 : INFO : built Dictionary<235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1658 documents (total 16580000 corpus positions)
2021-11-22 16:47:57,472 : INFO : adding document #0 to Dictionary<235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:57,478 : INFO : built Dictionary<235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1659 documents (total 16590000 corpus positions)
2021-11-22 16:47:57,508 : INFO : adding document #0 to Dictionary<235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:57,515 : INFO : built Dictionary<235224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1660 documents (total 16600000 corpus positions)
2021-11-22 16:47:57,543 : INFO : adding document #0 to Dictionary<235224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:57,550 : INFO : built Dictionary<23

2021-11-22 16:47:58,376 : INFO : built Dictionary<237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1684 documents (total 16840000 corpus positions)
2021-11-22 16:47:58,404 : INFO : adding document #0 to Dictionary<237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:58,411 : INFO : built Dictionary<237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1685 documents (total 16850000 corpus positions)
2021-11-22 16:47:58,439 : INFO : adding document #0 to Dictionary<237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:58,446 : INFO : built Dictionary<237604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...> from 1686 documents (total 16860000 corpus positions)
2021-11-22 16:47:58,474 : INFO : adding document #0 to Dictionary<237604 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...>
2021-11-22 16:47:58,482 : INFO : built Dictionary<23

Training
--------

Training the ensemble works very similar to training a single model,

You can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.
In experiments, LdaMulticore showed better results.




In [6]:
from gensim.models import LdaModel
topic_model_class = LdaModel

Any arbitrary number of models can be used, but it should be a multiple of your workers so that the
load can be distributed properly. In this example, 4 processes will train 8 models each.




In [7]:
ensemble_workers = 4
num_models = 8

After training all the models, some distance computations are required which can take quite some
time as well. You can speed this up by using workers for that as well.




In [8]:
distance_workers = 4

All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as




In [9]:
num_topics = 20
passes = 2

Now start the training

Since 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.
The number of stable topics which are clustered from all those topics is smaller.




In [10]:
from gensim.models import EnsembleLda
ensemble = EnsembleLda(
    corpus=corpus,
    id2word=dictionary,
    num_topics=num_topics,
    passes=passes,
    num_models=num_models,
    topic_model_class=LdaModel,
    ensemble_workers=ensemble_workers,
    distance_workers=distance_workers
)

print(len(ensemble.ttda))
print(len(ensemble.get_topics()))

2021-11-22 16:48:27,931 : INFO : generating 8 topic models using 4 workers
2021-11-22 16:49:14,085 : INFO : generating a 160 x 160 asymmetric distance matrix...
2021-11-22 16:49:18,011 : INFO : fitting the clustering model, using 4 for min_samples
2021-11-22 16:49:18,036 : INFO : generating stable topics, using 3 for min_cores
2021-11-22 16:49:18,037 : INFO : found 1 clusters
2021-11-22 16:49:18,052 : INFO : found 1 stable topics
2021-11-22 16:49:18,056 : INFO : generating classic gensim model representation based on results from the ensemble
2021-11-22 16:49:18,310 : INFO : using symmetric alpha at 1.0
2021-11-22 16:49:18,311 : INFO : using symmetric eta at 1.0
2021-11-22 16:49:18,313 : INFO : using serial LDA version on this node
2021-11-22 16:49:18,316 : INFO : running online (multi-pass) LDA training, 1 topics, 0 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence t

160
1


In [18]:
ensemble.ttda

array([[5.98256975e-06, 3.22500673e-05, 5.57418243e-05, ...,
        1.00351954e-05, 1.16328520e-05, 3.76091498e-06],
       [1.90727969e-05, 2.31121612e-05, 7.44916833e-05, ...,
        8.99155748e-06, 1.63097902e-05, 4.22692028e-06],
       [7.80791652e-06, 1.02401409e-05, 1.79055540e-04, ...,
        5.91321214e-06, 5.74727983e-06, 4.53962230e-06],
       ...,
       [7.37241089e-06, 3.80876772e-05, 1.51034081e-04, ...,
        6.90094976e-06, 1.62373908e-05, 6.00045951e-06],
       [1.96135570e-05, 2.23037132e-05, 7.49191458e-05, ...,
        5.94797757e-06, 6.22282550e-06, 5.16965974e-06],
       [7.38083554e-06, 2.31202539e-05, 1.15748757e-04, ...,
        5.79557400e-06, 7.86173132e-06, 5.08735229e-06]])

Tuning
------

Different from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.

You can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.

Play around until you get as many topics as you desire, which however may reduce their quality.
If your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.

Having an epsilon that is smaller than the smallest distance doesn't make sense.
Make sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.




In [23]:
import numpy as np
shape = ensemble.asymmetric_distance_matrix.shape
without_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)
print(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())

ensemble.recluster(eps=0.07, min_samples=2, min_cores=2)

print(len(ensemble.get_topics()))

2021-11-22 16:54:46,806 : INFO : fitting the clustering model
2021-11-22 16:54:46,830 : INFO : generating stable topics
2021-11-22 16:54:46,830 : INFO : found 6 clusters
2021-11-22 16:54:46,843 : INFO : found 1 stable topics
2021-11-22 16:54:46,847 : INFO : generating classic gensim model representation based on results from the ensemble
2021-11-22 16:54:46,849 : INFO : using symmetric alpha at 1.0
2021-11-22 16:54:46,849 : INFO : using symmetric eta at 1.0
2021-11-22 16:54:46,852 : INFO : using serial LDA version on this node
2021-11-22 16:54:46,854 : INFO : running online (multi-pass) LDA training, 1 topics, 0 passes over the supplied corpus of 1701 documents, updating model once every 1701 documents, evaluating perplexity every 1701 documents, iterating 50x with a convergence threshold of 0.001000
2021-11-22 16:54:46,855 : INFO : LdaModel lifecycle event {'msg': 'trained LdaModel<num_terms=20076, num_topics=1, decay=0.5, chunksize=2000> in 0.00s', 'datetime': '2021-11-22T16:54:46.85

0.006337482145469475 0.03868349253117548 0.1346845900576048
1


Increasing the Size
-------------------

If you have some models lying around that were trained on a corpus based on the same dictionary,
they are compatible and you can add them to the ensemble.

By setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is
entirely made out of your existing topic models with the following method.

Afterwards the number and quality of stable topics might be different depending on your added topics and parameters.




In [None]:
from gensim.models import LdaMulticore

model1 = LdaMulticore(
    corpus=corpus,
    id2word=dictionary,
    num_topics=9,
    passes=4,
)

model2 = LdaModel(
    corpus=corpus,
    id2word=dictionary,
    num_topics=11,
    passes=2,
)

# add_model supports various types of input, check out its docstring
ensemble.add_model(model1)
ensemble.add_model(model2)

ensemble.recluster()

print(len(ensemble.ttda))
print(len(ensemble.get_topics()))