<a href="https://colab.research.google.com/github/tomonari-masada/course2022-stats2/blob/main/PLSI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PLSIの実装例

In [None]:
!pip install torchdata

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from tqdm import tqdm
import numpy as np
from numpy.random import default_rng
from sklearn.feature_extraction.text import CountVectorizer
from torchtext.datasets import AG_NEWS

rng = default_rng()

* https://pytorch.org/text/stable/datasets.html

In [None]:
train_iter = AG_NEWS(split='train')

def tokenize(line):
  return line.split()

corpus = []
for label, line in tqdm(train_iter):
  corpus.append(line.lower())
  if len(corpus) >= 20000:
    break

19999it [00:00, 25267.06it/s]


In [None]:
corpus[0]

"wall st. bears claw back into the black (reuters) reuters - short-sellers, wall street's dwindling\\band of ultra-cynics, are seeing green again."

In [None]:
vectorizer = CountVectorizer(stop_words='english', min_df=0.001, max_df=0.1)
X = vectorizer.fit_transform(corpus).toarray()
vocab = vectorizer.get_feature_names_out()

* 全ての単語が削除されてしまっている文書がないことを確認
 * このチェックをしておかないと、ゼロで割ることによるエラーが出るため。

In [None]:
(X.sum(-1) == 0).sum()

0

In [None]:
N, W = X.shape
print(f"{N} docs, {W} different words")

20000 docs, 3959 different words


* トピック数の設定とパラメータの初期化

In [None]:
K=10
theta = rng.random([len(corpus), K])
theta = theta / theta.sum(1, keepdims=True)
phi = rng.random([K, len(vocab)])
phi = phi / phi.sum(1, keepdims=True)

In [None]:
mb_size = 1000
for iter in range(200):
  new_theta = theta.copy()
  new_phi = np.zeros_like(phi)
  for i in tqdm(range(1, len(corpus) + 1, mb_size)):
    q = theta[i:i+mb_size,np.newaxis,:] * phi.transpose()[np.newaxis,:,:]
    pseudo_counts = q * X[i:i+mb_size,:,np.newaxis]
    temp_theta = pseudo_counts.sum(1)
    temp_theta = temp_theta / temp_theta.sum(1, keepdims=True)
    new_theta[i:i+mb_size,:] = temp_theta
    new_phi += pseudo_counts.sum(0).transpose()
  theta = new_theta
  phi = new_phi / new_phi.sum(1, keepdims=True)
  if (iter + 1) % 10 == 0:
    for k in range(K):
      print(' '.join(list(vocab[(- phi).argsort()[k,:20]])))
    print('-' * 80)

100%|██████████| 20/20 [00:07<00:00,  2.76it/s]
100%|██████████| 20/20 [00:07<00:00,  2.65it/s]
100%|██████████| 20/20 [00:05<00:00,  3.45it/s]
100%|██████████| 20/20 [00:05<00:00,  3.50it/s]
100%|██████████| 20/20 [00:05<00:00,  3.45it/s]
100%|██████████| 20/20 [00:05<00:00,  3.47it/s]
100%|██████████| 20/20 [00:05<00:00,  3.48it/s]
100%|██████████| 20/20 [00:05<00:00,  3.48it/s]
100%|██████████| 20/20 [00:05<00:00,  3.51it/s]
100%|██████████| 20/20 [00:05<00:00,  3.41it/s]


company yesterday microsoft olympic gold security athens china com software internet medal corp windows united saturday thursday today state market
week friday athens olympic afp gold says team win microsoft iraq president york years night lt al tuesday court gt
year monday thursday government athens million president world olympic market tuesday bank percent york time sales second group report minister
quot oil stocks company prices york monday united today tuesday gt friday world google high day lt corp says open
ap thursday monday wednesday time night olympic microsoft athens tuesday internet second people iraq gold world company friday officials government
lt gt tuesday world ap fullquote stocks com company year wednesday york http day ticker investor target million washington iraq
world iraq york sunday second thursday oil day najaf people prices monday group city lt afp president al iraqi record
ap time tuesday iraq million games olympic years gold monday game sunday president bu

100%|██████████| 20/20 [00:05<00:00,  3.39it/s]
100%|██████████| 20/20 [00:05<00:00,  3.44it/s]
100%|██████████| 20/20 [00:06<00:00,  3.08it/s]
100%|██████████| 20/20 [00:06<00:00,  3.09it/s]
100%|██████████| 20/20 [00:05<00:00,  3.42it/s]
100%|██████████| 20/20 [00:05<00:00,  3.37it/s]
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:06<00:00,  3.32it/s]
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]


company yesterday microsoft security software windows internet today corp com china market united apple computer music million state saturday google
athens olympic week friday gold afp team says win years medal president night games olympics men united court greece al
year monday government million percent old president thursday sales second bank research market world report profit time tuesday york end
quot company world says united president today yesterday oil athens gt friday monday olympic microsoft lt iraq time high china
ap monday thursday wednesday night tuesday time friday game iraq people officials internet bush microsoft team second olympic saturday world
gt lt tuesday fullquote stocks com http ticker href target investor www quickinfo aspx york world font washington corp company
world iraq york thursday sunday second people najaf day city group iraqi al killed government monday president record win prices
ap time tuesday bush iraq monday sunday president wednesday game year

100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.13it/s]
100%|██████████| 20/20 [00:07<00:00,  2.83it/s]
100%|██████████| 20/20 [00:07<00:00,  2.78it/s]
100%|██████████| 20/20 [00:07<00:00,  2.77it/s]
100%|██████████| 20/20 [00:07<00:00,  2.69it/s]
100%|██████████| 20/20 [00:06<00:00,  2.89it/s]
100%|██████████| 20/20 [00:06<00:00,  3.00it/s]
100%|██████████| 20/20 [00:06<00:00,  3.05it/s]


company yesterday microsoft security software windows today corp internet com china market apple computer service music google million state xp
athens olympic week friday gold afp team says medal win games years olympics night president united men greece final american
year monday government million old percent sales president second research profit market thursday time world bank report end billion quarter
quot company says world president united yesterday athens olympic microsoft today gt time iraq friday lt monday night week minister
ap monday wednesday thursday night friday tuesday time game bush officials iraq people saturday team sunday internet victory year president
gt lt tuesday fullquote stocks com http ticker href www target investor quickinfo aspx york font corp washington world profit
world iraq thursday york sunday second people najaf day city group iraqi al killed government win president saturday police tuesday
ap time tuesday bush monday president wednesday sunday iraq

100%|██████████| 20/20 [00:06<00:00,  3.06it/s]
100%|██████████| 20/20 [00:06<00:00,  3.09it/s]
100%|██████████| 20/20 [00:06<00:00,  3.19it/s]
100%|██████████| 20/20 [00:06<00:00,  3.19it/s]
100%|██████████| 20/20 [00:06<00:00,  3.21it/s]
100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:06<00:00,  3.31it/s]
100%|██████████| 20/20 [00:06<00:00,  3.32it/s]


company yesterday microsoft security software windows today corp internet com market service google china apple computer music million xp state
athens olympic gold week friday afp team medal says win games olympics years night president united men greece final american
year monday million government old percent sales second president profit research market time thursday end world report billion bank quarter
quot says company president world athens yesterday olympic united microsoft today time iraq gt lt night friday group gold minister
ap monday wednesday night thursday friday tuesday time game bush officials saturday iraq team sunday people victory year president internet
gt lt tuesday fullquote stocks com http ticker href www target investor quickinfo aspx york font corp washington profit face
world iraq thursday york sunday people second najaf day city al iraqi group killed government win president tuesday police saturday
ap time bush tuesday monday wednesday president sunday game i

100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.34it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]


company yesterday microsoft software security windows today corp internet service google com market apple computer china music million xp announced
athens olympic gold week friday afp team medal says games olympics win years night united men president greece final american
year monday million government old percent sales second profit president research market time end thursday world quarter billion talks report
quot says company president world athens olympic yesterday microsoft united today time iraq gt night lt group gold friday minister
ap monday wednesday night thursday friday tuesday bush game time officials saturday iraq sunday team victory people year president york
gt lt fullquote tuesday stocks com http ticker href www target investor quickinfo aspx york font corp washington profit face
world iraq thursday york sunday people najaf second day al city iraqi group killed government win tuesday police cup president
ap time bush tuesday wednesday monday president sunday game iraq 

100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.34it/s]
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.39it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]


company yesterday microsoft software security windows today corp internet service google market com apple computer music xp million china announced
athens olympic gold week friday afp team medal says games olympics win years united night men president greece saturday american
year monday million government old percent sales profit second president research market end time quarter thursday world billion talks report
quot says president company world athens olympic yesterday microsoft united today time iraq night gt group gold lt games minister
ap monday wednesday night thursday friday tuesday bush game time officials saturday sunday iraq victory team people president year 151
gt lt fullquote tuesday stocks http com ticker href www target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al city iraqi group killed government tuesday win cup police end
ap time bush tuesday wednesday monday president sunday game saturday f

100%|██████████| 20/20 [00:06<00:00,  3.32it/s]
100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:06<00:00,  3.32it/s]
100%|██████████| 20/20 [00:06<00:00,  3.31it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]


company yesterday microsoft software security windows today corp internet service google market com apple computer xp music million china business
athens olympic gold week friday afp team medal says olympics games win united years night men president greece saturday american
year monday million old government percent sales profit second president research market end time quarter billion thursday world talks report
quot says president company world athens olympic yesterday microsoft united today time iraq night group gold gt lt games minister
ap monday wednesday night thursday friday bush tuesday game time saturday officials sunday victory team iraq president people year 151
gt lt fullquote tuesday stocks http com ticker href www target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al city iraqi group killed government tuesday win cup police end
ap time bush tuesday wednesday monday president sunday game saturday fr

100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.09it/s]
100%|██████████| 20/20 [00:06<00:00,  3.32it/s]
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]
100%|██████████| 20/20 [00:06<00:00,  3.32it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]


company yesterday microsoft software security windows today corp internet service google market com apple computer xp million music china business
athens olympic gold week friday afp team medal says olympics games win united years men night president greece saturday american
year monday million old government percent profit sales second president research market end time quarter billion world thursday talks report
quot says president company athens world olympic yesterday microsoft united time today iraq night gold group gt lt games minister
ap monday wednesday night thursday friday bush tuesday game time saturday officials sunday victory team iraq president 151 year people
gt lt fullquote tuesday stocks http com href ticker www target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al city iraqi group killed government tuesday win cup police end
ap time bush tuesday wednesday monday president sunday game saturday fr

100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.39it/s]
100%|██████████| 20/20 [00:06<00:00,  3.31it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.41it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.37it/s]


company yesterday microsoft software security windows today corp internet service google market com apple computer million xp music business china
athens olympic gold week friday afp medal team olympics games says united win years men night president greece saturday american
year monday old million percent government profit sales second president research market end time quarter billion world thursday talks report
quot says president company athens olympic world yesterday microsoft united time today iraq gold night group gt lt games minister
ap monday wednesday night thursday friday bush tuesday game time saturday officials sunday victory team iraq 151 president year people
gt lt fullquote tuesday stocks http com href ticker www target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al city iraqi killed group government tuesday cup win police end
ap time bush tuesday wednesday monday president sunday saturday game se

100%|██████████| 20/20 [00:05<00:00,  3.39it/s]
100%|██████████| 20/20 [00:05<00:00,  3.43it/s]
100%|██████████| 20/20 [00:05<00:00,  3.37it/s]
100%|██████████| 20/20 [00:05<00:00,  3.44it/s]
100%|██████████| 20/20 [00:05<00:00,  3.37it/s]
100%|██████████| 20/20 [00:05<00:00,  3.35it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.37it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.25it/s]


company yesterday microsoft software security windows today corp internet service google market com apple million computer xp music business china
athens olympic gold week friday afp medal team olympics games says united win years men night greece president saturday american
year monday old million percent government profit sales second president research market end time quarter billion world thursday talks report
quot says president company athens olympic world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night thursday friday bush tuesday game time saturday sunday officials victory team iraq 151 president year people
gt lt fullquote tuesday stocks http com href www ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al city iraqi killed group government tuesday cup win police end
ap time bush tuesday wednesday monday president saturday sunday game se

100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.27it/s]
100%|██████████| 20/20 [00:06<00:00,  3.19it/s]
100%|██████████| 20/20 [00:06<00:00,  3.30it/s]
100%|██████████| 20/20 [00:06<00:00,  3.25it/s]


company microsoft yesterday software security windows today corp internet service google market million apple com computer xp music business china
athens olympic gold week friday afp medal team olympics games says united win years men night greece president saturday american
year monday old million percent government profit sales second president research market end time quarter billion world thursday talks united
quot says president company athens olympic world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night thursday friday bush game tuesday time saturday sunday officials victory team iraq 151 president year people
gt lt fullquote tuesday stocks http com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed group government tuesday cup win police end
ap time bush tuesday wednesday president monday saturday sunday season 

100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.25it/s]
100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.22it/s]
100%|██████████| 20/20 [00:06<00:00,  3.21it/s]
100%|██████████| 20/20 [00:06<00:00,  3.22it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.20it/s]


company microsoft yesterday software security windows today corp internet service google market million apple com computer xp business music announced
athens olympic gold week friday afp medal team olympics games says united win years men night greece president saturday american
year monday old million percent government profit second sales president market end research time quarter billion world thursday talks united
quot says president athens company olympic world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night thursday friday bush game tuesday time saturday sunday officials victory team 151 iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed group government tuesday cup end police win
ap time bush tuesday wednesday president monday saturday season sun

100%|██████████| 20/20 [00:06<00:00,  3.03it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.26it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.26it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.31it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]


company microsoft yesterday software security windows today corp internet service google million market apple com computer xp business music update
athens olympic gold week friday afp medal team olympics games says united win men years night greece president saturday american
year monday old million percent government profit second sales president market end research quarter time billion world thursday talks united
quot says president athens olympic company world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night friday thursday bush game tuesday time saturday sunday officials victory team 151 iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed government group tuesday cup end police win
ap time bush tuesday wednesday president monday saturday season sunday

100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.22it/s]
100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:06<00:00,  3.23it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.17it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.14it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]


company microsoft yesterday software security windows today corp internet service google million market apple com computer xp business music update
athens olympic gold week friday afp medal team olympics games says united win men years night greece president saturday american
year monday old million percent government profit second sales president end market research quarter time billion world thursday talks united
quot says president athens olympic company world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night friday thursday bush game tuesday time saturday sunday officials victory 151 team iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed government group tuesday cup end police win
ap time bush tuesday wednesday president monday saturday season night 

100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.25it/s]
100%|██████████| 20/20 [00:06<00:00,  3.20it/s]
100%|██████████| 20/20 [00:06<00:00,  3.19it/s]
100%|██████████| 20/20 [00:06<00:00,  3.15it/s]
100%|██████████| 20/20 [00:06<00:00,  3.18it/s]
100%|██████████| 20/20 [00:06<00:00,  3.13it/s]


company microsoft yesterday software security windows today corp internet service google million market apple com computer xp business music update
athens olympic gold week friday afp medal team olympics games united says win men years night greece president saturday american
year monday old million percent government profit second sales president end market research quarter time billion world thursday united talks
quot says president athens olympic company world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night friday thursday bush game tuesday time saturday sunday victory officials 151 team iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed government group tuesday cup end police win
ap time bush tuesday wednesday president saturday monday season night 

100%|██████████| 20/20 [00:06<00:00,  3.17it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.15it/s]
100%|██████████| 20/20 [00:06<00:00,  3.19it/s]
100%|██████████| 20/20 [00:06<00:00,  3.17it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
100%|██████████| 20/20 [00:06<00:00,  3.17it/s]
100%|██████████| 20/20 [00:06<00:00,  3.17it/s]
100%|██████████| 20/20 [00:06<00:00,  3.22it/s]


company microsoft yesterday software security windows today corp internet service google million market apple computer com business xp music update
athens olympic gold week friday afp medal team olympics games united says win men years night greece president saturday american
year monday old million percent government profit second sales president end market research quarter time billion world thursday united talks
quot says president athens olympic company world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night friday thursday bush game tuesday time saturday sunday victory officials 151 team iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed government group tuesday end cup police cleric
ap time bush tuesday wednesday president saturday monday season nig

100%|██████████| 20/20 [00:06<00:00,  3.24it/s]
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.27it/s]
100%|██████████| 20/20 [00:06<00:00,  3.29it/s]
100%|██████████| 20/20 [00:06<00:00,  3.28it/s]
100%|██████████| 20/20 [00:06<00:00,  3.33it/s]
100%|██████████| 20/20 [00:05<00:00,  3.39it/s]
100%|██████████| 20/20 [00:06<00:00,  3.16it/s]
  if __name__ == '__main__':
100%|██████████| 20/20 [00:05<00:00,  3.36it/s]


company microsoft yesterday software security windows today corp internet service google million market apple computer business com xp music update
athens olympic gold week friday afp medal team olympics games united says men win years night greece saturday president american
year monday old percent million profit government second sales president end market quarter research time billion world thursday united talks
quot says president athens olympic company world yesterday microsoft united time iraq today gold night group games lt gt minister
ap monday wednesday night friday thursday bush game tuesday time saturday sunday victory officials 151 team iraq president year people
gt lt fullquote tuesday http stocks com www href ticker target investor quickinfo aspx york font corp profit washington face
world iraq thursday york sunday people najaf second day al iraqi city killed government group tuesday end cup police cleric
ap time bush tuesday saturday president wednesday night season mond

100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.48it/s]
100%|██████████| 20/20 [00:05<00:00,  3.46it/s]
100%|██████████| 20/20 [00:05<00:00,  3.48it/s]
100%|██████████| 20/20 [00:05<00:00,  3.51it/s]
100%|██████████| 20/20 [00:05<00:00,  3.48it/s]
100%|██████████| 20/20 [00:05<00:00,  3.47it/s]
100%|██████████| 20/20 [00:05<00:00,  3.47it/s]
100%|██████████| 20/20 [00:05<00:00,  3.50it/s]
100%|██████████| 20/20 [00:05<00:00,  3.51it/s]


000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possibility poor possible post posted posting postponed posts
000 pop pope popular population porn port portable portfolio position positions positive possi

100%|██████████| 20/20 [00:05<00:00,  3.47it/s]
100%|██████████| 20/20 [00:05<00:00,  3.41it/s]
100%|██████████| 20/20 [00:05<00:00,  3.38it/s]
100%|██████████| 20/20 [00:05<00:00,  3.45it/s]
100%|██████████| 20/20 [00:05<00:00,  3.40it/s]
100%|██████████| 20/20 [00:05<00:00,  3.47it/s]
100%|██████████| 20/20 [00:05<00:00,  3.46it/s]
100%|██████████| 20/20 [00:05<00:00,  3.43it/s]
100%|██████████| 20/20 [00:05<00:00,  3.42it/s]
 75%|███████▌  | 15/20 [00:04<00:01,  3.08it/s]


KeyboardInterrupt: ignored

In [None]:
for k in range(K):
  print(' '.join(list(vocab[(- phi).argsort()[k,:30]])))
print('-' * 80)