In [1]:
%matplotlib inline


Ensemble LDA
============

Introduces Gensim's EnsembleLda model




In [2]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

This tutorial will explain how to use the EnsembleLDA model class.

EnsembleLda is a method of finding and generating stable topics from the results of multiple topic models,
it can be used to remove topics from your results that are noise and are not reproducible.




Corpus
------
We will use the gensim downloader api to get a small corpus for training our ensemble.

The preprocessing is similar to `sphx_glr_auto_examples_tutorials_run_word2vec.py`,
so it won't be explained again in detail.




In [3]:
import gensim.downloader as api
from gensim.corpora import Dictionary
from nltk.stem.wordnet import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
docs = api.load('text8')

dictionary = Dictionary()
for doc in docs:
    dictionary.add_documents([[lemmatizer.lemmatize(token) for token in doc]])
dictionary.filter_extremes(no_below=20, no_above=0.5)

corpus = [dictionary.doc2bow(doc) for doc in docs]

2021-11-22 16:28:51,062 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2021-11-22 16:28:51,068 : INFO : built Dictionary(2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1 documents (total 10000 corpus positions)
2021-11-22 16:28:51,095 : INFO : adding document #0 to Dictionary(2312 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:51,101 : INFO : built Dictionary(3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 2 documents (total 20000 corpus positions)
2021-11-22 16:28:51,129 : INFO : adding document #0 to Dictionary(3906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:51,134 : INFO : built Dictionary(5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 3 documents (total 30000 corpus positions)
2021-11-22 16:28:51,163 : INFO : adding document #0 to Dictionary(5147 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']

2021-11-22 16:28:51,964 : INFO : adding document #0 to Dictionary(20332 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:51,971 : INFO : built Dictionary(20766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 27 documents (total 270000 corpus positions)
2021-11-22 16:28:52,000 : INFO : adding document #0 to Dictionary(20766 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:52,006 : INFO : built Dictionary(21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 28 documents (total 280000 corpus positions)
2021-11-22 16:28:52,035 : INFO : adding document #0 to Dictionary(21174 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:52,041 : INFO : built Dictionary(21602 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 29 documents (total 290000 corpus positions)
2021-11-22 16:28:52,070 : INFO : adding document #0 to Dictionary(2160

2021-11-22 16:28:52,941 : INFO : built Dictionary(31140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 53 documents (total 530000 corpus positions)
2021-11-22 16:28:52,971 : INFO : adding document #0 to Dictionary(31140 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:52,980 : INFO : built Dictionary(31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 54 documents (total 540000 corpus positions)
2021-11-22 16:28:53,012 : INFO : adding document #0 to Dictionary(31611 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:53,018 : INFO : built Dictionary(32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 55 documents (total 550000 corpus positions)
2021-11-22 16:28:53,048 : INFO : adding document #0 to Dictionary(32277 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:53,055 : INFO : built Dictionary(32761 unique tokens:

2021-11-22 16:28:53,931 : INFO : adding document #0 to Dictionary(40362 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:53,939 : INFO : built Dictionary(40614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 80 documents (total 800000 corpus positions)
2021-11-22 16:28:53,970 : INFO : adding document #0 to Dictionary(40614 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:53,978 : INFO : built Dictionary(41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 81 documents (total 810000 corpus positions)
2021-11-22 16:28:54,008 : INFO : adding document #0 to Dictionary(41071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:54,015 : INFO : built Dictionary(41378 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 82 documents (total 820000 corpus positions)
2021-11-22 16:28:54,045 : INFO : adding document #0 to Dictionary(4137

2021-11-22 16:28:54,873 : INFO : built Dictionary(49389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 106 documents (total 1060000 corpus positions)
2021-11-22 16:28:54,901 : INFO : adding document #0 to Dictionary(49389 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:54,906 : INFO : built Dictionary(49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 107 documents (total 1070000 corpus positions)
2021-11-22 16:28:54,935 : INFO : adding document #0 to Dictionary(49713 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:54,942 : INFO : built Dictionary(50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 108 documents (total 1080000 corpus positions)
2021-11-22 16:28:54,971 : INFO : adding document #0 to Dictionary(50011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:54,979 : INFO : built Dictionary(50348 unique t

2021-11-22 16:28:55,860 : INFO : adding document #0 to Dictionary(55983 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:55,867 : INFO : built Dictionary(56128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 133 documents (total 1330000 corpus positions)
2021-11-22 16:28:55,899 : INFO : adding document #0 to Dictionary(56128 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:55,906 : INFO : built Dictionary(56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 134 documents (total 1340000 corpus positions)
2021-11-22 16:28:55,934 : INFO : adding document #0 to Dictionary(56299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:55,940 : INFO : built Dictionary(56401 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 135 documents (total 1350000 corpus positions)
2021-11-22 16:28:55,970 : INFO : adding document #0 to Dictionar

2021-11-22 16:28:56,824 : INFO : built Dictionary(61915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 159 documents (total 1590000 corpus positions)
2021-11-22 16:28:56,853 : INFO : adding document #0 to Dictionary(61915 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:56,860 : INFO : built Dictionary(62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 160 documents (total 1600000 corpus positions)
2021-11-22 16:28:56,889 : INFO : adding document #0 to Dictionary(62163 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:56,896 : INFO : built Dictionary(62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 161 documents (total 1610000 corpus positions)
2021-11-22 16:28:56,926 : INFO : adding document #0 to Dictionary(62295 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:56,934 : INFO : built Dictionary(62495 unique t

2021-11-22 16:28:57,801 : INFO : adding document #0 to Dictionary(67914 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:57,808 : INFO : built Dictionary(68096 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 186 documents (total 1860000 corpus positions)
2021-11-22 16:28:57,841 : INFO : adding document #0 to Dictionary(68096 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:57,848 : INFO : built Dictionary(68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 187 documents (total 1870000 corpus positions)
2021-11-22 16:28:57,879 : INFO : adding document #0 to Dictionary(68319 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:57,887 : INFO : built Dictionary(68621 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 188 documents (total 1880000 corpus positions)
2021-11-22 16:28:57,917 : INFO : adding document #0 to Dictionar

2021-11-22 16:28:58,788 : INFO : built Dictionary(73442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 212 documents (total 2120000 corpus positions)
2021-11-22 16:28:58,818 : INFO : adding document #0 to Dictionary(73442 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:58,826 : INFO : built Dictionary(73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 213 documents (total 2130000 corpus positions)
2021-11-22 16:28:58,855 : INFO : adding document #0 to Dictionary(73613 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:58,863 : INFO : built Dictionary(73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 214 documents (total 2140000 corpus positions)
2021-11-22 16:28:58,895 : INFO : adding document #0 to Dictionary(73755 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:58,901 : INFO : built Dictionary(73889 unique t

2021-11-22 16:28:59,805 : INFO : adding document #0 to Dictionary(78374 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:59,812 : INFO : built Dictionary(78577 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 239 documents (total 2390000 corpus positions)
2021-11-22 16:28:59,841 : INFO : adding document #0 to Dictionary(78577 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:59,848 : INFO : built Dictionary(78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 240 documents (total 2400000 corpus positions)
2021-11-22 16:28:59,877 : INFO : adding document #0 to Dictionary(78758 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:28:59,884 : INFO : built Dictionary(78896 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 241 documents (total 2410000 corpus positions)
2021-11-22 16:28:59,913 : INFO : adding document #0 to Dictionar

2021-11-22 16:29:00,812 : INFO : built Dictionary(83056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 265 documents (total 2650000 corpus positions)
2021-11-22 16:29:00,842 : INFO : adding document #0 to Dictionary(83056 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:00,848 : INFO : built Dictionary(83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 266 documents (total 2660000 corpus positions)
2021-11-22 16:29:00,878 : INFO : adding document #0 to Dictionary(83269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:00,885 : INFO : built Dictionary(83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 267 documents (total 2670000 corpus positions)
2021-11-22 16:29:00,913 : INFO : adding document #0 to Dictionary(83411 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:00,919 : INFO : built Dictionary(83536 unique t

2021-11-22 16:29:01,799 : INFO : adding document #0 to Dictionary(87071 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:01,805 : INFO : built Dictionary(87224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 292 documents (total 2920000 corpus positions)
2021-11-22 16:29:01,836 : INFO : adding document #0 to Dictionary(87224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:01,844 : INFO : built Dictionary(87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 293 documents (total 2930000 corpus positions)
2021-11-22 16:29:01,875 : INFO : adding document #0 to Dictionary(87289 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:01,882 : INFO : built Dictionary(87368 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 294 documents (total 2940000 corpus positions)
2021-11-22 16:29:01,912 : INFO : adding document #0 to Dictionar

2021-11-22 16:29:02,780 : INFO : built Dictionary(90730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 318 documents (total 3180000 corpus positions)
2021-11-22 16:29:02,810 : INFO : adding document #0 to Dictionary(90730 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:02,817 : INFO : built Dictionary(90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 319 documents (total 3190000 corpus positions)
2021-11-22 16:29:02,846 : INFO : adding document #0 to Dictionary(90831 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:02,852 : INFO : built Dictionary(90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 320 documents (total 3200000 corpus positions)
2021-11-22 16:29:02,882 : INFO : adding document #0 to Dictionary(90966 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:02,888 : INFO : built Dictionary(91088 unique t

2021-11-22 16:29:03,798 : INFO : adding document #0 to Dictionary(94589 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:03,805 : INFO : built Dictionary(94701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 345 documents (total 3450000 corpus positions)
2021-11-22 16:29:03,834 : INFO : adding document #0 to Dictionary(94701 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:03,842 : INFO : built Dictionary(94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 346 documents (total 3460000 corpus positions)
2021-11-22 16:29:03,872 : INFO : adding document #0 to Dictionary(94809 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:03,880 : INFO : built Dictionary(94935 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 347 documents (total 3470000 corpus positions)
2021-11-22 16:29:03,909 : INFO : adding document #0 to Dictionar

2021-11-22 16:29:04,777 : INFO : built Dictionary(98510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 371 documents (total 3710000 corpus positions)
2021-11-22 16:29:04,806 : INFO : adding document #0 to Dictionary(98510 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:04,812 : INFO : built Dictionary(98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 372 documents (total 3720000 corpus positions)
2021-11-22 16:29:04,842 : INFO : adding document #0 to Dictionary(98608 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:04,849 : INFO : built Dictionary(98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 373 documents (total 3730000 corpus positions)
2021-11-22 16:29:04,878 : INFO : adding document #0 to Dictionary(98746 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:04,885 : INFO : built Dictionary(98886 unique t

2021-11-22 16:29:05,779 : INFO : adding document #0 to Dictionary(101752 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:05,786 : INFO : built Dictionary(101874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 398 documents (total 3980000 corpus positions)
2021-11-22 16:29:05,816 : INFO : adding document #0 to Dictionary(101874 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:05,824 : INFO : built Dictionary(101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 399 documents (total 3990000 corpus positions)
2021-11-22 16:29:05,854 : INFO : adding document #0 to Dictionary(101971 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:05,861 : INFO : built Dictionary(102121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 400 documents (total 4000000 corpus positions)
2021-11-22 16:29:05,891 : INFO : adding document #0 to Dic

2021-11-22 16:29:06,730 : INFO : built Dictionary(105224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 424 documents (total 4240000 corpus positions)
2021-11-22 16:29:06,759 : INFO : adding document #0 to Dictionary(105224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:06,765 : INFO : built Dictionary(105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 425 documents (total 4250000 corpus positions)
2021-11-22 16:29:06,796 : INFO : adding document #0 to Dictionary(105377 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:06,803 : INFO : built Dictionary(105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 426 documents (total 4260000 corpus positions)
2021-11-22 16:29:06,833 : INFO : adding document #0 to Dictionary(105511 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:06,840 : INFO : built Dictionary(105677 u

2021-11-22 16:29:07,708 : INFO : adding document #0 to Dictionary(108989 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:07,714 : INFO : built Dictionary(109089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 451 documents (total 4510000 corpus positions)
2021-11-22 16:29:07,744 : INFO : adding document #0 to Dictionary(109089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:07,751 : INFO : built Dictionary(109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 452 documents (total 4520000 corpus positions)
2021-11-22 16:29:07,781 : INFO : adding document #0 to Dictionary(109217 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:07,789 : INFO : built Dictionary(109307 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 453 documents (total 4530000 corpus positions)
2021-11-22 16:29:07,818 : INFO : adding document #0 to Dic

2021-11-22 16:29:08,671 : INFO : built Dictionary(112111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 477 documents (total 4770000 corpus positions)
2021-11-22 16:29:08,700 : INFO : adding document #0 to Dictionary(112111 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:08,707 : INFO : built Dictionary(112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 478 documents (total 4780000 corpus positions)
2021-11-22 16:29:08,736 : INFO : adding document #0 to Dictionary(112224 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:08,742 : INFO : built Dictionary(112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 479 documents (total 4790000 corpus positions)
2021-11-22 16:29:08,772 : INFO : adding document #0 to Dictionary(112314 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:08,779 : INFO : built Dictionary(112426 u

2021-11-22 16:29:09,650 : INFO : adding document #0 to Dictionary(116153 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:09,657 : INFO : built Dictionary(116297 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 504 documents (total 5040000 corpus positions)
2021-11-22 16:29:09,687 : INFO : adding document #0 to Dictionary(116297 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:09,695 : INFO : built Dictionary(116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 505 documents (total 5050000 corpus positions)
2021-11-22 16:29:09,726 : INFO : adding document #0 to Dictionary(116466 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:09,733 : INFO : built Dictionary(116658 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 506 documents (total 5060000 corpus positions)
2021-11-22 16:29:09,764 : INFO : adding document #0 to Dic

2021-11-22 16:29:10,620 : INFO : built Dictionary(120025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 530 documents (total 5300000 corpus positions)
2021-11-22 16:29:10,649 : INFO : adding document #0 to Dictionary(120025 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:10,656 : INFO : built Dictionary(120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 531 documents (total 5310000 corpus positions)
2021-11-22 16:29:10,685 : INFO : adding document #0 to Dictionary(120113 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:10,692 : INFO : built Dictionary(120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 532 documents (total 5320000 corpus positions)
2021-11-22 16:29:10,724 : INFO : adding document #0 to Dictionary(120216 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:10,731 : INFO : built Dictionary(120304 u

2021-11-22 16:29:11,635 : INFO : adding document #0 to Dictionary(123212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:11,641 : INFO : built Dictionary(123363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 557 documents (total 5570000 corpus positions)
2021-11-22 16:29:11,671 : INFO : adding document #0 to Dictionary(123363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:11,678 : INFO : built Dictionary(123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 558 documents (total 5580000 corpus positions)
2021-11-22 16:29:11,706 : INFO : adding document #0 to Dictionary(123482 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:11,712 : INFO : built Dictionary(123643 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 559 documents (total 5590000 corpus positions)
2021-11-22 16:29:11,742 : INFO : adding document #0 to Dic

2021-11-22 16:29:12,598 : INFO : built Dictionary(126716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 583 documents (total 5830000 corpus positions)
2021-11-22 16:29:12,627 : INFO : adding document #0 to Dictionary(126716 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:12,634 : INFO : built Dictionary(126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 584 documents (total 5840000 corpus positions)
2021-11-22 16:29:12,664 : INFO : adding document #0 to Dictionary(126805 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:12,672 : INFO : built Dictionary(126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 585 documents (total 5850000 corpus positions)
2021-11-22 16:29:12,701 : INFO : adding document #0 to Dictionary(126906 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:12,709 : INFO : built Dictionary(126982 u

2021-11-22 16:29:13,584 : INFO : adding document #0 to Dictionary(129698 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:13,590 : INFO : built Dictionary(129807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 610 documents (total 6100000 corpus positions)
2021-11-22 16:29:13,620 : INFO : adding document #0 to Dictionary(129807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:13,626 : INFO : built Dictionary(129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 611 documents (total 6110000 corpus positions)
2021-11-22 16:29:13,656 : INFO : adding document #0 to Dictionary(129869 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:13,663 : INFO : built Dictionary(130017 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 612 documents (total 6120000 corpus positions)
2021-11-22 16:29:13,693 : INFO : adding document #0 to Dic

2021-11-22 16:29:14,560 : INFO : built Dictionary(132627 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 636 documents (total 6360000 corpus positions)
2021-11-22 16:29:14,589 : INFO : adding document #0 to Dictionary(132627 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:14,596 : INFO : built Dictionary(132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 637 documents (total 6370000 corpus positions)
2021-11-22 16:29:14,627 : INFO : adding document #0 to Dictionary(132711 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:14,635 : INFO : built Dictionary(132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 638 documents (total 6380000 corpus positions)
2021-11-22 16:29:14,664 : INFO : adding document #0 to Dictionary(132941 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:14,671 : INFO : built Dictionary(133060 u

2021-11-22 16:29:15,558 : INFO : adding document #0 to Dictionary(135902 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:15,564 : INFO : built Dictionary(136011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 663 documents (total 6630000 corpus positions)
2021-11-22 16:29:15,594 : INFO : adding document #0 to Dictionary(136011 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:15,601 : INFO : built Dictionary(136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 664 documents (total 6640000 corpus positions)
2021-11-22 16:29:15,632 : INFO : adding document #0 to Dictionary(136172 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:15,640 : INFO : built Dictionary(136293 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 665 documents (total 6650000 corpus positions)
2021-11-22 16:29:15,671 : INFO : adding document #0 to Dic

2021-11-22 16:29:16,601 : INFO : built Dictionary(139148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 689 documents (total 6890000 corpus positions)
2021-11-22 16:29:16,632 : INFO : adding document #0 to Dictionary(139148 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:16,638 : INFO : built Dictionary(139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 690 documents (total 6900000 corpus positions)
2021-11-22 16:29:16,667 : INFO : adding document #0 to Dictionary(139208 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:16,673 : INFO : built Dictionary(139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 691 documents (total 6910000 corpus positions)
2021-11-22 16:29:16,704 : INFO : adding document #0 to Dictionary(139310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:16,711 : INFO : built Dictionary(139430 u

2021-11-22 16:29:17,614 : INFO : adding document #0 to Dictionary(142196 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:17,623 : INFO : built Dictionary(142299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 716 documents (total 7160000 corpus positions)
2021-11-22 16:29:17,653 : INFO : adding document #0 to Dictionary(142299 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:17,659 : INFO : built Dictionary(142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 717 documents (total 7170000 corpus positions)
2021-11-22 16:29:17,688 : INFO : adding document #0 to Dictionary(142356 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:17,695 : INFO : built Dictionary(142453 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 718 documents (total 7180000 corpus positions)
2021-11-22 16:29:17,725 : INFO : adding document #0 to Dic

2021-11-22 16:29:18,571 : INFO : built Dictionary(145076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 742 documents (total 7420000 corpus positions)
2021-11-22 16:29:18,600 : INFO : adding document #0 to Dictionary(145076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:18,608 : INFO : built Dictionary(145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 743 documents (total 7430000 corpus positions)
2021-11-22 16:29:18,638 : INFO : adding document #0 to Dictionary(145168 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:18,647 : INFO : built Dictionary(145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 744 documents (total 7440000 corpus positions)
2021-11-22 16:29:18,675 : INFO : adding document #0 to Dictionary(145555 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:18,681 : INFO : built Dictionary(145596 u

2021-11-22 16:29:19,544 : INFO : adding document #0 to Dictionary(148431 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:19,550 : INFO : built Dictionary(148526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 769 documents (total 7690000 corpus positions)
2021-11-22 16:29:19,579 : INFO : adding document #0 to Dictionary(148526 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:19,585 : INFO : built Dictionary(148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 770 documents (total 7700000 corpus positions)
2021-11-22 16:29:19,614 : INFO : adding document #0 to Dictionary(148588 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:19,620 : INFO : built Dictionary(148663 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 771 documents (total 7710000 corpus positions)
2021-11-22 16:29:19,649 : INFO : adding document #0 to Dic

2021-11-22 16:29:20,507 : INFO : built Dictionary(151541 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 795 documents (total 7950000 corpus positions)
2021-11-22 16:29:20,538 : INFO : adding document #0 to Dictionary(151541 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:20,547 : INFO : built Dictionary(151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 796 documents (total 7960000 corpus positions)
2021-11-22 16:29:20,578 : INFO : adding document #0 to Dictionary(151637 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:20,585 : INFO : built Dictionary(151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 797 documents (total 7970000 corpus positions)
2021-11-22 16:29:20,614 : INFO : adding document #0 to Dictionary(151731 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:20,622 : INFO : built Dictionary(151812 u

2021-11-22 16:29:21,511 : INFO : adding document #0 to Dictionary(155187 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:21,519 : INFO : built Dictionary(155322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 822 documents (total 8220000 corpus positions)
2021-11-22 16:29:21,553 : INFO : adding document #0 to Dictionary(155322 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:21,560 : INFO : built Dictionary(155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 823 documents (total 8230000 corpus positions)
2021-11-22 16:29:21,590 : INFO : adding document #0 to Dictionary(155492 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:21,599 : INFO : built Dictionary(155639 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 824 documents (total 8240000 corpus positions)
2021-11-22 16:29:21,627 : INFO : adding document #0 to Dic

2021-11-22 16:29:22,470 : INFO : built Dictionary(158684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 848 documents (total 8480000 corpus positions)
2021-11-22 16:29:22,499 : INFO : adding document #0 to Dictionary(158684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:22,506 : INFO : built Dictionary(158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 849 documents (total 8490000 corpus positions)
2021-11-22 16:29:22,539 : INFO : adding document #0 to Dictionary(158816 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:22,547 : INFO : built Dictionary(158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 850 documents (total 8500000 corpus positions)
2021-11-22 16:29:22,576 : INFO : adding document #0 to Dictionary(158890 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:22,584 : INFO : built Dictionary(158990 u

2021-11-22 16:29:23,456 : INFO : adding document #0 to Dictionary(161363 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:23,465 : INFO : built Dictionary(161458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 875 documents (total 8750000 corpus positions)
2021-11-22 16:29:23,496 : INFO : adding document #0 to Dictionary(161458 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:23,504 : INFO : built Dictionary(161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 876 documents (total 8760000 corpus positions)
2021-11-22 16:29:23,535 : INFO : adding document #0 to Dictionary(161575 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:23,542 : INFO : built Dictionary(161691 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 877 documents (total 8770000 corpus positions)
2021-11-22 16:29:23,572 : INFO : adding document #0 to Dic

2021-11-22 16:29:24,416 : INFO : built Dictionary(166285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 901 documents (total 9010000 corpus positions)
2021-11-22 16:29:24,447 : INFO : adding document #0 to Dictionary(166285 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:24,454 : INFO : built Dictionary(166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 902 documents (total 9020000 corpus positions)
2021-11-22 16:29:24,483 : INFO : adding document #0 to Dictionary(166408 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:24,491 : INFO : built Dictionary(166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 903 documents (total 9030000 corpus positions)
2021-11-22 16:29:24,520 : INFO : adding document #0 to Dictionary(166513 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:24,527 : INFO : built Dictionary(166717 u

2021-11-22 16:29:25,758 : INFO : adding document #0 to Dictionary(168853 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:25,766 : INFO : built Dictionary(168988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 928 documents (total 9280000 corpus positions)
2021-11-22 16:29:25,799 : INFO : adding document #0 to Dictionary(168988 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:25,806 : INFO : built Dictionary(169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 929 documents (total 9290000 corpus positions)
2021-11-22 16:29:25,838 : INFO : adding document #0 to Dictionary(169048 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:25,851 : INFO : built Dictionary(169212 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 930 documents (total 9300000 corpus positions)
2021-11-22 16:29:25,881 : INFO : adding document #0 to Dic

2021-11-22 16:29:26,771 : INFO : built Dictionary(171655 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 954 documents (total 9540000 corpus positions)
2021-11-22 16:29:26,801 : INFO : adding document #0 to Dictionary(171655 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:26,808 : INFO : built Dictionary(171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 955 documents (total 9550000 corpus positions)
2021-11-22 16:29:26,839 : INFO : adding document #0 to Dictionary(171709 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:26,846 : INFO : built Dictionary(171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 956 documents (total 9560000 corpus positions)
2021-11-22 16:29:26,876 : INFO : adding document #0 to Dictionary(171774 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:26,883 : INFO : built Dictionary(171840 u

2021-11-22 16:29:27,798 : INFO : adding document #0 to Dictionary(173997 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:27,806 : INFO : built Dictionary(174089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 981 documents (total 9810000 corpus positions)
2021-11-22 16:29:27,834 : INFO : adding document #0 to Dictionary(174089 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:27,842 : INFO : built Dictionary(174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 982 documents (total 9820000 corpus positions)
2021-11-22 16:29:27,871 : INFO : adding document #0 to Dictionary(174151 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:27,877 : INFO : built Dictionary(174227 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 983 documents (total 9830000 corpus positions)
2021-11-22 16:29:27,905 : INFO : adding document #0 to Dic

2021-11-22 16:29:28,784 : INFO : built Dictionary(176990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1007 documents (total 10070000 corpus positions)
2021-11-22 16:29:28,813 : INFO : adding document #0 to Dictionary(176990 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:28,820 : INFO : built Dictionary(177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1008 documents (total 10080000 corpus positions)
2021-11-22 16:29:28,850 : INFO : adding document #0 to Dictionary(177047 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:28,858 : INFO : built Dictionary(177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1009 documents (total 10090000 corpus positions)
2021-11-22 16:29:28,887 : INFO : adding document #0 to Dictionary(177105 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:28,894 : INFO : built Dictionary(17

2021-11-22 16:29:29,751 : INFO : built Dictionary(179647 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1033 documents (total 10330000 corpus positions)
2021-11-22 16:29:29,780 : INFO : adding document #0 to Dictionary(179647 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:29,786 : INFO : built Dictionary(179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1034 documents (total 10340000 corpus positions)
2021-11-22 16:29:29,815 : INFO : adding document #0 to Dictionary(179754 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:29,821 : INFO : built Dictionary(179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1035 documents (total 10350000 corpus positions)
2021-11-22 16:29:29,850 : INFO : adding document #0 to Dictionary(179865 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:29,856 : INFO : built Dictionary(17

2021-11-22 16:29:30,674 : INFO : built Dictionary(181946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1059 documents (total 10590000 corpus positions)
2021-11-22 16:29:30,703 : INFO : adding document #0 to Dictionary(181946 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:30,710 : INFO : built Dictionary(182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1060 documents (total 10600000 corpus positions)
2021-11-22 16:29:30,740 : INFO : adding document #0 to Dictionary(182005 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:30,747 : INFO : built Dictionary(182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1061 documents (total 10610000 corpus positions)
2021-11-22 16:29:30,777 : INFO : adding document #0 to Dictionary(182107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:30,784 : INFO : built Dictionary(18

2021-11-22 16:29:31,606 : INFO : built Dictionary(184116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1085 documents (total 10850000 corpus positions)
2021-11-22 16:29:31,634 : INFO : adding document #0 to Dictionary(184116 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:31,641 : INFO : built Dictionary(184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1086 documents (total 10860000 corpus positions)
2021-11-22 16:29:31,668 : INFO : adding document #0 to Dictionary(184269 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:31,674 : INFO : built Dictionary(184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1087 documents (total 10870000 corpus positions)
2021-11-22 16:29:31,703 : INFO : adding document #0 to Dictionary(184310 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:31,709 : INFO : built Dictionary(18

2021-11-22 16:29:32,552 : INFO : built Dictionary(186195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1111 documents (total 11110000 corpus positions)
2021-11-22 16:29:32,580 : INFO : adding document #0 to Dictionary(186195 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:32,586 : INFO : built Dictionary(186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1112 documents (total 11120000 corpus positions)
2021-11-22 16:29:32,615 : INFO : adding document #0 to Dictionary(186415 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:32,622 : INFO : built Dictionary(186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1113 documents (total 11130000 corpus positions)
2021-11-22 16:29:32,651 : INFO : adding document #0 to Dictionary(186557 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:32,657 : INFO : built Dictionary(18

2021-11-22 16:29:33,480 : INFO : built Dictionary(188549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1137 documents (total 11370000 corpus positions)
2021-11-22 16:29:33,509 : INFO : adding document #0 to Dictionary(188549 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:33,516 : INFO : built Dictionary(188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1138 documents (total 11380000 corpus positions)
2021-11-22 16:29:33,546 : INFO : adding document #0 to Dictionary(188580 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:33,553 : INFO : built Dictionary(188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1139 documents (total 11390000 corpus positions)
2021-11-22 16:29:33,582 : INFO : adding document #0 to Dictionary(188680 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:33,589 : INFO : built Dictionary(18

2021-11-22 16:29:34,448 : INFO : built Dictionary(190877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1163 documents (total 11630000 corpus positions)
2021-11-22 16:29:34,485 : INFO : adding document #0 to Dictionary(190877 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:34,492 : INFO : built Dictionary(191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1164 documents (total 11640000 corpus positions)
2021-11-22 16:29:34,527 : INFO : adding document #0 to Dictionary(191000 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:34,534 : INFO : built Dictionary(191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1165 documents (total 11650000 corpus positions)
2021-11-22 16:29:34,565 : INFO : adding document #0 to Dictionary(191107 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:34,574 : INFO : built Dictionary(19

2021-11-22 16:29:35,445 : INFO : built Dictionary(193349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1189 documents (total 11890000 corpus positions)
2021-11-22 16:29:35,474 : INFO : adding document #0 to Dictionary(193349 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:35,481 : INFO : built Dictionary(193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1190 documents (total 11900000 corpus positions)
2021-11-22 16:29:35,510 : INFO : adding document #0 to Dictionary(193410 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:35,516 : INFO : built Dictionary(193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1191 documents (total 11910000 corpus positions)
2021-11-22 16:29:35,550 : INFO : adding document #0 to Dictionary(193460 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:35,559 : INFO : built Dictionary(19

2021-11-22 16:29:36,405 : INFO : built Dictionary(195522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1215 documents (total 12150000 corpus positions)
2021-11-22 16:29:36,434 : INFO : adding document #0 to Dictionary(195522 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:36,442 : INFO : built Dictionary(195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1216 documents (total 12160000 corpus positions)
2021-11-22 16:29:36,472 : INFO : adding document #0 to Dictionary(195714 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:36,477 : INFO : built Dictionary(195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1217 documents (total 12170000 corpus positions)
2021-11-22 16:29:36,511 : INFO : adding document #0 to Dictionary(195786 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:36,518 : INFO : built Dictionary(19

2021-11-22 16:29:37,362 : INFO : built Dictionary(198043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1241 documents (total 12410000 corpus positions)
2021-11-22 16:29:37,393 : INFO : adding document #0 to Dictionary(198043 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:37,402 : INFO : built Dictionary(198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1242 documents (total 12420000 corpus positions)
2021-11-22 16:29:37,433 : INFO : adding document #0 to Dictionary(198121 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:37,443 : INFO : built Dictionary(198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1243 documents (total 12430000 corpus positions)
2021-11-22 16:29:37,473 : INFO : adding document #0 to Dictionary(198266 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:37,480 : INFO : built Dictionary(19

2021-11-22 16:29:38,317 : INFO : built Dictionary(199855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1267 documents (total 12670000 corpus positions)
2021-11-22 16:29:38,346 : INFO : adding document #0 to Dictionary(199855 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:38,352 : INFO : built Dictionary(199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1268 documents (total 12680000 corpus positions)
2021-11-22 16:29:38,381 : INFO : adding document #0 to Dictionary(199926 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:38,387 : INFO : built Dictionary(199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1269 documents (total 12690000 corpus positions)
2021-11-22 16:29:38,416 : INFO : adding document #0 to Dictionary(199984 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:38,422 : INFO : built Dictionary(20

2021-11-22 16:29:39,250 : INFO : built Dictionary(202476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1293 documents (total 12930000 corpus positions)
2021-11-22 16:29:39,279 : INFO : adding document #0 to Dictionary(202476 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:39,285 : INFO : built Dictionary(202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1294 documents (total 12940000 corpus positions)
2021-11-22 16:29:39,313 : INFO : adding document #0 to Dictionary(202532 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:39,321 : INFO : built Dictionary(202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1295 documents (total 12950000 corpus positions)
2021-11-22 16:29:39,350 : INFO : adding document #0 to Dictionary(202593 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:39,357 : INFO : built Dictionary(20

2021-11-22 16:29:40,182 : INFO : built Dictionary(204710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1319 documents (total 13190000 corpus positions)
2021-11-22 16:29:40,213 : INFO : adding document #0 to Dictionary(204710 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:40,220 : INFO : built Dictionary(204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1320 documents (total 13200000 corpus positions)
2021-11-22 16:29:40,248 : INFO : adding document #0 to Dictionary(204770 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:40,255 : INFO : built Dictionary(204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1321 documents (total 13210000 corpus positions)
2021-11-22 16:29:40,283 : INFO : adding document #0 to Dictionary(204881 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:40,290 : INFO : built Dictionary(20

2021-11-22 16:29:41,102 : INFO : built Dictionary(206732 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1345 documents (total 13450000 corpus positions)
2021-11-22 16:29:41,130 : INFO : adding document #0 to Dictionary(206732 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:41,136 : INFO : built Dictionary(206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1346 documents (total 13460000 corpus positions)
2021-11-22 16:29:41,163 : INFO : adding document #0 to Dictionary(206807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:41,170 : INFO : built Dictionary(206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1347 documents (total 13470000 corpus positions)
2021-11-22 16:29:41,198 : INFO : adding document #0 to Dictionary(206867 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:41,204 : INFO : built Dictionary(20

2021-11-22 16:29:42,001 : INFO : built Dictionary(208909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1371 documents (total 13710000 corpus positions)
2021-11-22 16:29:42,030 : INFO : adding document #0 to Dictionary(208909 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:42,036 : INFO : built Dictionary(209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1372 documents (total 13720000 corpus positions)
2021-11-22 16:29:42,065 : INFO : adding document #0 to Dictionary(209014 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:42,071 : INFO : built Dictionary(209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1373 documents (total 13730000 corpus positions)
2021-11-22 16:29:42,099 : INFO : adding document #0 to Dictionary(209091 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:42,105 : INFO : built Dictionary(20

2021-11-22 16:29:42,934 : INFO : built Dictionary(212078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1397 documents (total 13970000 corpus positions)
2021-11-22 16:29:42,963 : INFO : adding document #0 to Dictionary(212078 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:42,972 : INFO : built Dictionary(212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1398 documents (total 13980000 corpus positions)
2021-11-22 16:29:43,002 : INFO : adding document #0 to Dictionary(212183 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:43,008 : INFO : built Dictionary(212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1399 documents (total 13990000 corpus positions)
2021-11-22 16:29:43,037 : INFO : adding document #0 to Dictionary(212313 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:43,044 : INFO : built Dictionary(21

2021-11-22 16:29:43,855 : INFO : built Dictionary(214556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1423 documents (total 14230000 corpus positions)
2021-11-22 16:29:43,883 : INFO : adding document #0 to Dictionary(214556 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:43,889 : INFO : built Dictionary(214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1424 documents (total 14240000 corpus positions)
2021-11-22 16:29:43,918 : INFO : adding document #0 to Dictionary(214603 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:43,925 : INFO : built Dictionary(214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1425 documents (total 14250000 corpus positions)
2021-11-22 16:29:43,955 : INFO : adding document #0 to Dictionary(214706 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:43,962 : INFO : built Dictionary(21

2021-11-22 16:29:44,774 : INFO : built Dictionary(216742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1449 documents (total 14490000 corpus positions)
2021-11-22 16:29:44,803 : INFO : adding document #0 to Dictionary(216742 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:44,809 : INFO : built Dictionary(216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1450 documents (total 14500000 corpus positions)
2021-11-22 16:29:44,838 : INFO : adding document #0 to Dictionary(216807 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:44,845 : INFO : built Dictionary(216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1451 documents (total 14510000 corpus positions)
2021-11-22 16:29:44,874 : INFO : adding document #0 to Dictionary(216841 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:44,881 : INFO : built Dictionary(21

2021-11-22 16:29:46,086 : INFO : built Dictionary(219842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1475 documents (total 14750000 corpus positions)
2021-11-22 16:29:46,116 : INFO : adding document #0 to Dictionary(219842 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:46,124 : INFO : built Dictionary(219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1476 documents (total 14760000 corpus positions)
2021-11-22 16:29:46,152 : INFO : adding document #0 to Dictionary(219954 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:46,157 : INFO : built Dictionary(219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1477 documents (total 14770000 corpus positions)
2021-11-22 16:29:46,187 : INFO : adding document #0 to Dictionary(219965 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:46,197 : INFO : built Dictionary(22

2021-11-22 16:29:47,041 : INFO : built Dictionary(222684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1501 documents (total 15010000 corpus positions)
2021-11-22 16:29:47,071 : INFO : adding document #0 to Dictionary(222684 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:47,077 : INFO : built Dictionary(222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1502 documents (total 15020000 corpus positions)
2021-11-22 16:29:47,106 : INFO : adding document #0 to Dictionary(222771 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:47,112 : INFO : built Dictionary(222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1503 documents (total 15030000 corpus positions)
2021-11-22 16:29:47,141 : INFO : adding document #0 to Dictionary(222857 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:47,149 : INFO : built Dictionary(22

2021-11-22 16:29:47,991 : INFO : built Dictionary(224728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1527 documents (total 15270000 corpus positions)
2021-11-22 16:29:48,021 : INFO : adding document #0 to Dictionary(224728 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:48,029 : INFO : built Dictionary(224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1528 documents (total 15280000 corpus positions)
2021-11-22 16:29:48,058 : INFO : adding document #0 to Dictionary(224859 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:48,066 : INFO : built Dictionary(224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1529 documents (total 15290000 corpus positions)
2021-11-22 16:29:48,096 : INFO : adding document #0 to Dictionary(224994 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:48,104 : INFO : built Dictionary(22

2021-11-22 16:29:48,934 : INFO : built Dictionary(226979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1553 documents (total 15530000 corpus positions)
2021-11-22 16:29:48,964 : INFO : adding document #0 to Dictionary(226979 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:48,971 : INFO : built Dictionary(227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1554 documents (total 15540000 corpus positions)
2021-11-22 16:29:49,000 : INFO : adding document #0 to Dictionary(227117 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:49,006 : INFO : built Dictionary(227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1555 documents (total 15550000 corpus positions)
2021-11-22 16:29:49,035 : INFO : adding document #0 to Dictionary(227203 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:49,041 : INFO : built Dictionary(22

2021-11-22 16:29:49,871 : INFO : built Dictionary(229022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1579 documents (total 15790000 corpus positions)
2021-11-22 16:29:49,899 : INFO : adding document #0 to Dictionary(229022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:49,906 : INFO : built Dictionary(229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1580 documents (total 15800000 corpus positions)
2021-11-22 16:29:49,935 : INFO : adding document #0 to Dictionary(229076 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:49,941 : INFO : built Dictionary(229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1581 documents (total 15810000 corpus positions)
2021-11-22 16:29:49,970 : INFO : adding document #0 to Dictionary(229136 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:49,977 : INFO : built Dictionary(22

2021-11-22 16:29:50,812 : INFO : built Dictionary(230953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1605 documents (total 16050000 corpus positions)
2021-11-22 16:29:50,842 : INFO : adding document #0 to Dictionary(230953 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:50,849 : INFO : built Dictionary(231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1606 documents (total 16060000 corpus positions)
2021-11-22 16:29:50,879 : INFO : adding document #0 to Dictionary(231024 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:50,886 : INFO : built Dictionary(231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1607 documents (total 16070000 corpus positions)
2021-11-22 16:29:50,915 : INFO : adding document #0 to Dictionary(231054 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:50,922 : INFO : built Dictionary(23

2021-11-22 16:29:51,767 : INFO : built Dictionary(232908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1631 documents (total 16310000 corpus positions)
2021-11-22 16:29:51,795 : INFO : adding document #0 to Dictionary(232908 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:51,801 : INFO : built Dictionary(232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1632 documents (total 16320000 corpus positions)
2021-11-22 16:29:51,831 : INFO : adding document #0 to Dictionary(232962 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:51,838 : INFO : built Dictionary(233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1633 documents (total 16330000 corpus positions)
2021-11-22 16:29:51,867 : INFO : adding document #0 to Dictionary(233022 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:51,873 : INFO : built Dictionary(23

2021-11-22 16:29:52,707 : INFO : built Dictionary(235036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1657 documents (total 16570000 corpus positions)
2021-11-22 16:29:52,746 : INFO : adding document #0 to Dictionary(235036 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:52,752 : INFO : built Dictionary(235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1658 documents (total 16580000 corpus positions)
2021-11-22 16:29:52,781 : INFO : adding document #0 to Dictionary(235093 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:52,790 : INFO : built Dictionary(235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1659 documents (total 16590000 corpus positions)
2021-11-22 16:29:52,819 : INFO : adding document #0 to Dictionary(235150 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:52,827 : INFO : built Dictionary(23

2021-11-22 16:29:53,657 : INFO : built Dictionary(237365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1683 documents (total 16830000 corpus positions)
2021-11-22 16:29:53,685 : INFO : adding document #0 to Dictionary(237365 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:53,692 : INFO : built Dictionary(237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1684 documents (total 16840000 corpus positions)
2021-11-22 16:29:53,720 : INFO : adding document #0 to Dictionary(237475 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:53,726 : INFO : built Dictionary(237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...) from 1685 documents (total 16850000 corpus positions)
2021-11-22 16:29:53,755 : INFO : adding document #0 to Dictionary(237547 unique tokens: ['a', 'abacus', 'ability', 'able', 'abnormal']...)
2021-11-22 16:29:53,762 : INFO : built Dictionary(23

Training
--------

Training the ensemble works very similar to training a single model,

You can use any model that is based on LdaModel, such as LdaMulticore, to train the Ensemble.
In experiments, LdaMulticore showed better results.




In [None]:
from gensim.models import LdaModel
topic_model_class = LdaModel

Any arbitrary number of models can be used, but it should be a multiple of your workers so that the
load can be distributed properly. In this example, 4 processes will train 8 models each.




In [None]:
ensemble_workers = 4
num_models = 8

After training all the models, some distance computations are required which can take quite some
time as well. You can speed this up by using workers for that as well.




In [None]:
distance_workers = 4

All other parameters that are unknown to EnsembleLda are forwarded to each LDA Model, such as




In [None]:
num_topics = 20
passes = 2

Now start the training

Since 20 topics were trained on each of the 8 models, we expect there to be 160 different topics.
The number of stable topics which are clustered from all those topics is smaller.




In [4]:
from gensim.models import EnsembleLda
ensemble = EnsembleLda(
    corpus=corpus,
    id2word=dictionary,
    num_topics=num_topics,
    passes=passes,
    num_models=num_models,
    topic_model_class=LdaModel,
    ensemble_workers=ensemble_workers,
    distance_workers=distance_workers
)

print(len(ensemble.ttda))
print(len(ensemble.get_topics()))

ImportError: cannot import name 'EnsembleLda' from 'gensim.models' (D:\ProgramFiles\conda\lib\site-packages\gensim\models\__init__.py)

Tuning
------

Different from LdaModel, the number of resulting topics varies greatly depending on the clustering parameters.

You can provide those in the ``recluster()`` function or the ``EnsembleLda`` constructor.

Play around until you get as many topics as you desire, which however may reduce their quality.
If your ensemble doesn't have enough topics to begin with, you should make sure to make it large enough.

Having an epsilon that is smaller than the smallest distance doesn't make sense.
Make sure to chose one that is within the range of values in ``asymmetric_distance_matrix``.




In [None]:
import numpy as np
shape = ensemble.asymmetric_distance_matrix.shape
without_diagonal = ensemble.asymmetric_distance_matrix[~np.eye(shape[0], dtype=bool)].reshape(shape[0], -1)
print(without_diagonal.min(), without_diagonal.mean(), without_diagonal.max())

ensemble.recluster(eps=0.09, min_samples=2, min_cores=2)

print(len(ensemble.get_topics()))

Increasing the Size
-------------------

If you have some models lying around that were trained on a corpus based on the same dictionary,
they are compatible and you can add them to the ensemble.

By setting num_models of the EnsembleLda constructor to 0 you can also create an ensemble that is
entirely made out of your existing topic models with the following method.

Afterwards the number and quality of stable topics might be different depending on your added topics and parameters.




In [None]:
from gensim.models import LdaMulticore

model1 = LdaMulticore(
    corpus=corpus,
    id2word=dictionary,
    num_topics=9,
    passes=4,
)

model2 = LdaModel(
    corpus=corpus,
    id2word=dictionary,
    num_topics=11,
    passes=2,
)

# add_model supports various types of input, check out its docstring
ensemble.add_model(model1)
ensemble.add_model(model2)

ensemble.recluster()

print(len(ensemble.ttda))
print(len(ensemble.get_topics()))