# fastText, MeCabで文書分類

## 参考

- [DeepAge文書分類に挑戦する](https://deepage.net/bigdata/machine_learning/2016/08/28/fast_text_facebook.html#文書分類に挑戦する)
- [YoshihitoAso/gist:9048005 ubuntu環境にmecabをインストールする方法](https://gist.github.com/YoshihitoAso/9048005)

## 事前準備

livedoorのニュースコーパスLDCCをfastTextの規定行フォーマットに直す。

    __label__ラベル名 , 単語をスペース区切りにした本文

In [1]:
# livedoor.py

# import, def
import MeCab
import os
from os import listdir, walk
from os.path import join, isfile

mecab = MeCab.Tagger('mecabrc')

labels = [
    'dokujo-tsushin',
    'it-life-hack',
    'kaden-channel',
    'livedoor-homme',
    'movie-enter',
    'peachy',
    'smax',
    'sports-watch',
    'topic-news'
]

def tokenize(text):
    node = mecab.parseToNode(text.strip())
    tokens = []
    while node:
        tokens.append(node.surface)
        node = node.next
    return ' '.join(tokens)

def is_post(directory, filename):
    if isfile(join(directory, filename)):
        if filename != 'LICENSE.txt':
            return True
    return False

def read_content(directory, filename):
    print(join(directory, filename))
    body = [l.strip() for i, l in 
            enumerate(open(join(directory, filename), 'r', encoding="utf-8")) if i > 1]
    text = ''.join(body)
    return tokenize(text)

def read_posts(directory):
    if os.path.exists(join(directory, 'LICENSE.txt')):
        files = [f for f in listdir(directory) if is_post(directory, f)]
        for f in files:
            yield read_content(directory, f)

def read_corpus(input_directory, output_file):
    for (dirpath, _, _) in walk(input_directory):
        label_name = os.path.basename(dirpath)
        with open(output_file, "a", encoding="utf-8") as file:
            for post in read_posts(dirpath):
                file.write('__label__{} , {}'.format(labels.index(label_name), post) + "\n")

In [2]:
# 実行（corpora/ldcc/textフォルダに格納）
!mkdir -p data && rm -f data/ldcc.txt
read_corpus('corpora/ldcc/text/', 'data/ldcc.txt')

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4948016.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5797904.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5368649.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5220452.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5765250.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5866028.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5403411.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6624494.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4788357.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4971076.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6233611.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5143945.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6246330.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4832209.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6076462.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5659126.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4842348.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6524058.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4839912.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6166428.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5624738.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6151716.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6455366.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4796054.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5642937.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5971956.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5455376.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6846670.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5253509.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5011384.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5711816.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6143393.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5071625.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6151186.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6218046.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6758308.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6106776.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4897873.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4854648.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5543008.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5901870.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6646614.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6208121.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5344936.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5313907.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4948012.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4920182.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5365261.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6582412.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6327417.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5230609.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4966331.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5271492.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6514622.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6008173.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5005793.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6176454.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4880091.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6800344.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5489562.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5936803.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6709382.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5604583.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5140190.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4866257.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6870248.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5283348.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5110253.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5760117.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4847422.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5507187.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4814765.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5629982.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5476676.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6556566.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4961679.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5022407.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5541227.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5793016.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4842333.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5854846.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6181839.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5446569.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5939461.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5586871.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5180406.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6828472.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4880092.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5943498.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4778031.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5641160.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6763585.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-6864107.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-4854640.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5376845.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-tsushin-5188153.txt
corpora/ldcc/text/dokujo-tsushin/dokujo-

corpora/ldcc/text/movie-enter/movie-enter-6571818.txt
corpora/ldcc/text/movie-enter/movie-enter-6338863.txt
corpora/ldcc/text/movie-enter/movie-enter-6653375.txt
corpora/ldcc/text/movie-enter/movie-enter-5931721.txt
corpora/ldcc/text/movie-enter/movie-enter-6391931.txt
corpora/ldcc/text/movie-enter/movie-enter-6202560.txt
corpora/ldcc/text/movie-enter/movie-enter-5925383.txt
corpora/ldcc/text/movie-enter/movie-enter-6336656.txt
corpora/ldcc/text/movie-enter/movie-enter-6295484.txt
corpora/ldcc/text/movie-enter/movie-enter-6371653.txt
corpora/ldcc/text/movie-enter/movie-enter-6008419.txt
corpora/ldcc/text/movie-enter/movie-enter-6094704.txt
corpora/ldcc/text/movie-enter/movie-enter-6444280.txt
corpora/ldcc/text/movie-enter/movie-enter-5952955.txt
corpora/ldcc/text/movie-enter/movie-enter-5856137.txt
corpora/ldcc/text/movie-enter/movie-enter-6715002.txt
corpora/ldcc/text/movie-enter/movie-enter-6490414.txt
corpora/ldcc/text/movie-enter/movie-enter-6459613.txt
corpora/ldcc/text/movie-ente

corpora/ldcc/text/movie-enter/movie-enter-6582445.txt
corpora/ldcc/text/movie-enter/movie-enter-6537974.txt
corpora/ldcc/text/movie-enter/movie-enter-6389848.txt
corpora/ldcc/text/movie-enter/movie-enter-6632298.txt
corpora/ldcc/text/movie-enter/movie-enter-6148384.txt
corpora/ldcc/text/movie-enter/movie-enter-6062290.txt
corpora/ldcc/text/movie-enter/movie-enter-6524381.txt
corpora/ldcc/text/movie-enter/movie-enter-6152270.txt
corpora/ldcc/text/movie-enter/movie-enter-5886917.txt
corpora/ldcc/text/movie-enter/movie-enter-6472280.txt
corpora/ldcc/text/movie-enter/movie-enter-6330568.txt
corpora/ldcc/text/movie-enter/movie-enter-6146256.txt
corpora/ldcc/text/movie-enter/movie-enter-6307476.txt
corpora/ldcc/text/movie-enter/movie-enter-6269274.txt
corpora/ldcc/text/movie-enter/movie-enter-6636902.txt
corpora/ldcc/text/movie-enter/movie-enter-6508806.txt
corpora/ldcc/text/movie-enter/movie-enter-5882746.txt
corpora/ldcc/text/movie-enter/movie-enter-6563418.txt
corpora/ldcc/text/movie-ente

corpora/ldcc/text/movie-enter/movie-enter-6727351.txt
corpora/ldcc/text/movie-enter/movie-enter-6024592.txt
corpora/ldcc/text/movie-enter/movie-enter-6794030.txt
corpora/ldcc/text/movie-enter/movie-enter-5921068.txt
corpora/ldcc/text/movie-enter/movie-enter-5919267.txt
corpora/ldcc/text/movie-enter/movie-enter-6519776.txt
corpora/ldcc/text/movie-enter/movie-enter-6350050.txt
corpora/ldcc/text/movie-enter/movie-enter-6240239.txt
corpora/ldcc/text/movie-enter/movie-enter-6352654.txt
corpora/ldcc/text/movie-enter/movie-enter-6625135.txt
corpora/ldcc/text/movie-enter/movie-enter-6223416.txt
corpora/ldcc/text/movie-enter/movie-enter-6123568.txt
corpora/ldcc/text/movie-enter/movie-enter-6274984.txt
corpora/ldcc/text/movie-enter/movie-enter-6425811.txt
corpora/ldcc/text/movie-enter/movie-enter-6429087.txt
corpora/ldcc/text/movie-enter/movie-enter-6211811.txt
corpora/ldcc/text/movie-enter/movie-enter-6284716.txt
corpora/ldcc/text/movie-enter/movie-enter-6695458.txt
corpora/ldcc/text/movie-ente

corpora/ldcc/text/movie-enter/movie-enter-6379626.txt
corpora/ldcc/text/movie-enter/movie-enter-6383646.txt
corpora/ldcc/text/movie-enter/movie-enter-6477923.txt
corpora/ldcc/text/movie-enter/movie-enter-6251354.txt
corpora/ldcc/text/movie-enter/movie-enter-6590228.txt
corpora/ldcc/text/movie-enter/movie-enter-5845521.txt
corpora/ldcc/text/movie-enter/movie-enter-6077077.txt
corpora/ldcc/text/movie-enter/movie-enter-6224881.txt
corpora/ldcc/text/movie-enter/movie-enter-5944099.txt
corpora/ldcc/text/movie-enter/movie-enter-6128471.txt
corpora/ldcc/text/movie-enter/movie-enter-6609122.txt
corpora/ldcc/text/movie-enter/movie-enter-5931053.txt
corpora/ldcc/text/movie-enter/movie-enter-6398017.txt
corpora/ldcc/text/movie-enter/movie-enter-6572796.txt
corpora/ldcc/text/movie-enter/movie-enter-6852057.txt
corpora/ldcc/text/movie-enter/movie-enter-6323595.txt
corpora/ldcc/text/movie-enter/movie-enter-5921161.txt
corpora/ldcc/text/movie-enter/movie-enter-6768567.txt
corpora/ldcc/text/movie-ente

corpora/ldcc/text/movie-enter/movie-enter-6243533.txt
corpora/ldcc/text/movie-enter/movie-enter-6076279.txt
corpora/ldcc/text/movie-enter/movie-enter-6097490.txt
corpora/ldcc/text/movie-enter/movie-enter-6311321.txt
corpora/ldcc/text/movie-enter/movie-enter-6573929.txt
corpora/ldcc/text/movie-enter/movie-enter-6514686.txt
corpora/ldcc/text/movie-enter/movie-enter-6355023.txt
corpora/ldcc/text/movie-enter/movie-enter-6564570.txt
corpora/ldcc/text/movie-enter/movie-enter-6692606.txt
corpora/ldcc/text/movie-enter/movie-enter-6808742.txt
corpora/ldcc/text/movie-enter/movie-enter-6307143.txt
corpora/ldcc/text/movie-enter/movie-enter-6784454.txt
corpora/ldcc/text/movie-enter/movie-enter-6187666.txt
corpora/ldcc/text/movie-enter/movie-enter-6882196.txt
corpora/ldcc/text/movie-enter/movie-enter-6316975.txt
corpora/ldcc/text/movie-enter/movie-enter-6692786.txt
corpora/ldcc/text/movie-enter/movie-enter-6163763.txt
corpora/ldcc/text/movie-enter/movie-enter-6024707.txt
corpora/ldcc/text/movie-ente

corpora/ldcc/text/peachy/peachy-5148051.txt
corpora/ldcc/text/peachy/peachy-4953576.txt
corpora/ldcc/text/peachy/peachy-4483613.txt
corpora/ldcc/text/peachy/peachy-6662015.txt
corpora/ldcc/text/peachy/peachy-6661820.txt
corpora/ldcc/text/peachy/peachy-4793562.txt
corpora/ldcc/text/peachy/peachy-5130005.txt
corpora/ldcc/text/peachy/peachy-4999538.txt
corpora/ldcc/text/peachy/peachy-4304647.txt
corpora/ldcc/text/peachy/peachy-5082544.txt
corpora/ldcc/text/peachy/peachy-4575826.txt
corpora/ldcc/text/peachy/peachy-4301698.txt
corpora/ldcc/text/peachy/peachy-6719236.txt
corpora/ldcc/text/peachy/peachy-4473146.txt
corpora/ldcc/text/peachy/peachy-6510122.txt
corpora/ldcc/text/peachy/peachy-6707225.txt
corpora/ldcc/text/peachy/peachy-6093088.txt
corpora/ldcc/text/peachy/peachy-6776250.txt
corpora/ldcc/text/peachy/peachy-6593988.txt
corpora/ldcc/text/peachy/peachy-5080120.txt
corpora/ldcc/text/peachy/peachy-5427625.txt
corpora/ldcc/text/peachy/peachy-5126883.txt
corpora/ldcc/text/peachy/peachy-

corpora/ldcc/text/peachy/peachy-6617204.txt
corpora/ldcc/text/peachy/peachy-4926769.txt
corpora/ldcc/text/peachy/peachy-4844138.txt
corpora/ldcc/text/peachy/peachy-5785872.txt
corpora/ldcc/text/peachy/peachy-5056841.txt
corpora/ldcc/text/peachy/peachy-4567316.txt
corpora/ldcc/text/peachy/peachy-6353784.txt
corpora/ldcc/text/peachy/peachy-4289213.txt
corpora/ldcc/text/peachy/peachy-6684289.txt
corpora/ldcc/text/peachy/peachy-6907988.txt
corpora/ldcc/text/peachy/peachy-6316746.txt
corpora/ldcc/text/peachy/peachy-4349529.txt
corpora/ldcc/text/peachy/peachy-5367555.txt
corpora/ldcc/text/peachy/peachy-6862939.txt
corpora/ldcc/text/peachy/peachy-5102663.txt
corpora/ldcc/text/peachy/peachy-5861811.txt
corpora/ldcc/text/peachy/peachy-4587744.txt
corpora/ldcc/text/peachy/peachy-4304732.txt
corpora/ldcc/text/peachy/peachy-4663741.txt
corpora/ldcc/text/peachy/peachy-5830486.txt
corpora/ldcc/text/peachy/peachy-5209713.txt
corpora/ldcc/text/peachy/peachy-6835551.txt
corpora/ldcc/text/peachy/peachy-

corpora/ldcc/text/peachy/peachy-5240023.txt
corpora/ldcc/text/peachy/peachy-6278406.txt
corpora/ldcc/text/peachy/peachy-6137222.txt
corpora/ldcc/text/peachy/peachy-4449009.txt
corpora/ldcc/text/peachy/peachy-5991492.txt
corpora/ldcc/text/peachy/peachy-6639781.txt
corpora/ldcc/text/peachy/peachy-6037684.txt
corpora/ldcc/text/peachy/peachy-5693553.txt
corpora/ldcc/text/peachy/peachy-6617118.txt
corpora/ldcc/text/peachy/peachy-5023610.txt
corpora/ldcc/text/peachy/peachy-6018386.txt
corpora/ldcc/text/peachy/peachy-5183028.txt
corpora/ldcc/text/peachy/peachy-6584106.txt
corpora/ldcc/text/peachy/peachy-4506355.txt
corpora/ldcc/text/peachy/peachy-5164920.txt
corpora/ldcc/text/peachy/peachy-4804202.txt
corpora/ldcc/text/peachy/peachy-6436065.txt
corpora/ldcc/text/peachy/peachy-4997019.txt
corpora/ldcc/text/peachy/peachy-4508336.txt
corpora/ldcc/text/peachy/peachy-4549159.txt
corpora/ldcc/text/peachy/peachy-4791639.txt
corpora/ldcc/text/peachy/peachy-4576896.txt
corpora/ldcc/text/peachy/peachy-

corpora/ldcc/text/peachy/peachy-5029469.txt
corpora/ldcc/text/peachy/peachy-6817439.txt
corpora/ldcc/text/peachy/peachy-4487833.txt
corpora/ldcc/text/peachy/peachy-4574211.txt
corpora/ldcc/text/peachy/peachy-4734621.txt
corpora/ldcc/text/peachy/peachy-4739743.txt
corpora/ldcc/text/peachy/peachy-5869154.txt
corpora/ldcc/text/peachy/peachy-4708158.txt
corpora/ldcc/text/peachy/peachy-6617370.txt
corpora/ldcc/text/peachy/peachy-5210836.txt
corpora/ldcc/text/peachy/peachy-4424988.txt
corpora/ldcc/text/peachy/peachy-4475179.txt
corpora/ldcc/text/peachy/peachy-6139935.txt
corpora/ldcc/text/peachy/peachy-5179422.txt
corpora/ldcc/text/peachy/peachy-5127186.txt
corpora/ldcc/text/peachy/peachy-5112112.txt
corpora/ldcc/text/peachy/peachy-5174805.txt
corpora/ldcc/text/peachy/peachy-5580895.txt
corpora/ldcc/text/peachy/peachy-5897718.txt
corpora/ldcc/text/peachy/peachy-5152642.txt
corpora/ldcc/text/peachy/peachy-6706986.txt
corpora/ldcc/text/peachy/peachy-5394397.txt
corpora/ldcc/text/peachy/peachy-

corpora/ldcc/text/livedoor-homme/livedoor-homme-5583393.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5449440.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4692769.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5661913.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5867768.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5579533.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6052151.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5273364.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5551409.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5769429.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5695218.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5825813.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4957817.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5825794.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6037115.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6754532.txt
corpora/ldcc/text/livedoor-homme/livedoo

corpora/ldcc/text/livedoor-homme/livedoor-homme-5154016.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5087488.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6129688.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4712686.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5793526.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4724334.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5625129.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5668746.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5695247.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5625270.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6429261.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5769293.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5579535.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5819490.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6067593.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6444745.txt
corpora/ldcc/text/livedoor-homme/livedoo

corpora/ldcc/text/livedoor-homme/livedoor-homme-5067754.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5625139.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4764816.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5736739.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4990334.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5271597.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5229491.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5500262.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5648658.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5769412.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5736780.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6690674.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4632362.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5825788.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5855533.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5526749.txt
corpora/ldcc/text/livedoor-homme/livedoo

corpora/ldcc/text/livedoor-homme/livedoor-homme-5608627.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4633404.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5868630.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-6283103.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-5811211.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4786906.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4583815.txt
corpora/ldcc/text/livedoor-homme/livedoor-homme-4743946.txt
corpora/ldcc/text/sports-watch/sports-watch-6251539.txt
corpora/ldcc/text/sports-watch/sports-watch-5447252.txt
corpora/ldcc/text/sports-watch/sports-watch-6634416.txt
corpora/ldcc/text/sports-watch/sports-watch-5487833.txt
corpora/ldcc/text/sports-watch/sports-watch-4797747.txt
corpora/ldcc/text/sports-watch/sports-watch-5804651.txt
corpora/ldcc/text/sports-watch/sports-watch-5664135.txt
corpora/ldcc/text/sports-watch/sports-watch-6786749.txt
corpora/ldcc/text/sports-watch/sports-watch-5587958.txt
corpora/ldcc/tex

corpora/ldcc/text/sports-watch/sports-watch-5612936.txt
corpora/ldcc/text/sports-watch/sports-watch-4597641.txt
corpora/ldcc/text/sports-watch/sports-watch-6182892.txt
corpora/ldcc/text/sports-watch/sports-watch-4767756.txt
corpora/ldcc/text/sports-watch/sports-watch-6889336.txt
corpora/ldcc/text/sports-watch/sports-watch-4793139.txt
corpora/ldcc/text/sports-watch/sports-watch-6864595.txt
corpora/ldcc/text/sports-watch/sports-watch-5059771.txt
corpora/ldcc/text/sports-watch/sports-watch-5726153.txt
corpora/ldcc/text/sports-watch/sports-watch-6337766.txt
corpora/ldcc/text/sports-watch/sports-watch-4919909.txt
corpora/ldcc/text/sports-watch/sports-watch-6735064.txt
corpora/ldcc/text/sports-watch/sports-watch-4875349.txt
corpora/ldcc/text/sports-watch/sports-watch-4632686.txt
corpora/ldcc/text/sports-watch/sports-watch-6308686.txt
corpora/ldcc/text/sports-watch/sports-watch-5624306.txt
corpora/ldcc/text/sports-watch/sports-watch-6247690.txt
corpora/ldcc/text/sports-watch/sports-watch-6216

corpora/ldcc/text/sports-watch/sports-watch-6015458.txt
corpora/ldcc/text/sports-watch/sports-watch-6074004.txt
corpora/ldcc/text/sports-watch/sports-watch-5316272.txt
corpora/ldcc/text/sports-watch/sports-watch-6311569.txt
corpora/ldcc/text/sports-watch/sports-watch-6319302.txt
corpora/ldcc/text/sports-watch/sports-watch-5190953.txt
corpora/ldcc/text/sports-watch/sports-watch-6585955.txt
corpora/ldcc/text/sports-watch/sports-watch-4891140.txt
corpora/ldcc/text/sports-watch/sports-watch-5961236.txt
corpora/ldcc/text/sports-watch/sports-watch-5004255.txt
corpora/ldcc/text/sports-watch/sports-watch-5785824.txt
corpora/ldcc/text/sports-watch/sports-watch-4685847.txt
corpora/ldcc/text/sports-watch/sports-watch-6184005.txt
corpora/ldcc/text/sports-watch/sports-watch-5941949.txt
corpora/ldcc/text/sports-watch/sports-watch-6670074.txt
corpora/ldcc/text/sports-watch/sports-watch-6163537.txt
corpora/ldcc/text/sports-watch/sports-watch-4609913.txt
corpora/ldcc/text/sports-watch/sports-watch-6219

corpora/ldcc/text/sports-watch/sports-watch-6284342.txt
corpora/ldcc/text/sports-watch/sports-watch-6690384.txt
corpora/ldcc/text/sports-watch/sports-watch-5427601.txt
corpora/ldcc/text/sports-watch/sports-watch-6413259.txt
corpora/ldcc/text/sports-watch/sports-watch-6163542.txt
corpora/ldcc/text/sports-watch/sports-watch-6368774.txt
corpora/ldcc/text/sports-watch/sports-watch-5122501.txt
corpora/ldcc/text/sports-watch/sports-watch-5298168.txt
corpora/ldcc/text/sports-watch/sports-watch-6254094.txt
corpora/ldcc/text/sports-watch/sports-watch-5172424.txt
corpora/ldcc/text/sports-watch/sports-watch-5183439.txt
corpora/ldcc/text/sports-watch/sports-watch-5906273.txt
corpora/ldcc/text/sports-watch/sports-watch-4905051.txt
corpora/ldcc/text/sports-watch/sports-watch-6288103.txt
corpora/ldcc/text/sports-watch/sports-watch-6322961.txt
corpora/ldcc/text/sports-watch/sports-watch-6340974.txt
corpora/ldcc/text/sports-watch/sports-watch-5701294.txt
corpora/ldcc/text/sports-watch/sports-watch-6870

corpora/ldcc/text/sports-watch/sports-watch-6796731.txt
corpora/ldcc/text/sports-watch/sports-watch-6738246.txt
corpora/ldcc/text/sports-watch/sports-watch-5219218.txt
corpora/ldcc/text/sports-watch/sports-watch-4822710.txt
corpora/ldcc/text/sports-watch/sports-watch-5341682.txt
corpora/ldcc/text/sports-watch/sports-watch-4974670.txt
corpora/ldcc/text/sports-watch/sports-watch-5530693.txt
corpora/ldcc/text/sports-watch/sports-watch-5682842.txt
corpora/ldcc/text/sports-watch/sports-watch-6073844.txt
corpora/ldcc/text/sports-watch/sports-watch-6258689.txt
corpora/ldcc/text/sports-watch/sports-watch-5278201.txt
corpora/ldcc/text/sports-watch/sports-watch-6388264.txt
corpora/ldcc/text/sports-watch/sports-watch-5108897.txt
corpora/ldcc/text/sports-watch/sports-watch-4684770.txt
corpora/ldcc/text/sports-watch/sports-watch-4623588.txt
corpora/ldcc/text/sports-watch/sports-watch-6316239.txt
corpora/ldcc/text/sports-watch/sports-watch-6134071.txt
corpora/ldcc/text/sports-watch/sports-watch-6129

corpora/ldcc/text/topic-news/topic-news-5937661.txt
corpora/ldcc/text/topic-news/topic-news-6254513.txt
corpora/ldcc/text/topic-news/topic-news-6166579.txt
corpora/ldcc/text/topic-news/topic-news-6005118.txt
corpora/ldcc/text/topic-news/topic-news-6068405.txt
corpora/ldcc/text/topic-news/topic-news-5971498.txt
corpora/ldcc/text/topic-news/topic-news-6389146.txt
corpora/ldcc/text/topic-news/topic-news-6005181.txt
corpora/ldcc/text/topic-news/topic-news-5957294.txt
corpora/ldcc/text/topic-news/topic-news-6408565.txt
corpora/ldcc/text/topic-news/topic-news-5970757.txt
corpora/ldcc/text/topic-news/topic-news-6323205.txt
corpora/ldcc/text/topic-news/topic-news-6388035.txt
corpora/ldcc/text/topic-news/topic-news-6893593.txt
corpora/ldcc/text/topic-news/topic-news-6220910.txt
corpora/ldcc/text/topic-news/topic-news-6139787.txt
corpora/ldcc/text/topic-news/topic-news-6816142.txt
corpora/ldcc/text/topic-news/topic-news-6696248.txt
corpora/ldcc/text/topic-news/topic-news-6771364.txt
corpora/ldcc

corpora/ldcc/text/topic-news/topic-news-6226414.txt
corpora/ldcc/text/topic-news/topic-news-6723331.txt
corpora/ldcc/text/topic-news/topic-news-6627135.txt
corpora/ldcc/text/topic-news/topic-news-6258595.txt
corpora/ldcc/text/topic-news/topic-news-6111624.txt
corpora/ldcc/text/topic-news/topic-news-6247958.txt
corpora/ldcc/text/topic-news/topic-news-6131019.txt
corpora/ldcc/text/topic-news/topic-news-5927208.txt
corpora/ldcc/text/topic-news/topic-news-6675711.txt
corpora/ldcc/text/topic-news/topic-news-6502508.txt
corpora/ldcc/text/topic-news/topic-news-6383943.txt
corpora/ldcc/text/topic-news/topic-news-6361144.txt
corpora/ldcc/text/topic-news/topic-news-6852546.txt
corpora/ldcc/text/topic-news/topic-news-6851917.txt
corpora/ldcc/text/topic-news/topic-news-6765028.txt
corpora/ldcc/text/topic-news/topic-news-6407583.txt
corpora/ldcc/text/topic-news/topic-news-5972318.txt
corpora/ldcc/text/topic-news/topic-news-6361892.txt
corpora/ldcc/text/topic-news/topic-news-6387903.txt
corpora/ldcc

corpora/ldcc/text/topic-news/topic-news-6084702.txt
corpora/ldcc/text/topic-news/topic-news-5952704.txt
corpora/ldcc/text/topic-news/topic-news-5956380.txt
corpora/ldcc/text/topic-news/topic-news-6886897.txt
corpora/ldcc/text/topic-news/topic-news-6058390.txt
corpora/ldcc/text/topic-news/topic-news-6790797.txt
corpora/ldcc/text/topic-news/topic-news-6166177.txt
corpora/ldcc/text/topic-news/topic-news-6428701.txt
corpora/ldcc/text/topic-news/topic-news-5953636.txt
corpora/ldcc/text/topic-news/topic-news-6241641.txt
corpora/ldcc/text/topic-news/topic-news-5933217.txt
corpora/ldcc/text/topic-news/topic-news-5966046.txt
corpora/ldcc/text/topic-news/topic-news-6415438.txt
corpora/ldcc/text/topic-news/topic-news-6466384.txt
corpora/ldcc/text/topic-news/topic-news-6835304.txt
corpora/ldcc/text/topic-news/topic-news-6391150.txt
corpora/ldcc/text/topic-news/topic-news-5955817.txt
corpora/ldcc/text/topic-news/topic-news-5903373.txt
corpora/ldcc/text/topic-news/topic-news-6655079.txt
corpora/ldcc

corpora/ldcc/text/smax/smax-6779958.txt
corpora/ldcc/text/smax/smax-6599022.txt
corpora/ldcc/text/smax/smax-6878858.txt
corpora/ldcc/text/smax/smax-6868134.txt
corpora/ldcc/text/smax/smax-6601757.txt
corpora/ldcc/text/smax/smax-6701937.txt
corpora/ldcc/text/smax/smax-6771999.txt
corpora/ldcc/text/smax/smax-6771946.txt
corpora/ldcc/text/smax/smax-6896251.txt
corpora/ldcc/text/smax/smax-6865709.txt
corpora/ldcc/text/smax/smax-6729990.txt
corpora/ldcc/text/smax/smax-6688437.txt
corpora/ldcc/text/smax/smax-6759825.txt
corpora/ldcc/text/smax/smax-6660824.txt
corpora/ldcc/text/smax/smax-6903954.txt
corpora/ldcc/text/smax/smax-6726025.txt
corpora/ldcc/text/smax/smax-6711448.txt
corpora/ldcc/text/smax/smax-6617241.txt
corpora/ldcc/text/smax/smax-6615161.txt
corpora/ldcc/text/smax/smax-6788213.txt
corpora/ldcc/text/smax/smax-6573680.txt
corpora/ldcc/text/smax/smax-6911073.txt
corpora/ldcc/text/smax/smax-6563723.txt
corpora/ldcc/text/smax/smax-6828924.txt
corpora/ldcc/text/smax/smax-6590044.txt


corpora/ldcc/text/smax/smax-6792560.txt
corpora/ldcc/text/smax/smax-6689892.txt
corpora/ldcc/text/smax/smax-6730987.txt
corpora/ldcc/text/smax/smax-6805345.txt
corpora/ldcc/text/smax/smax-6879979.txt
corpora/ldcc/text/smax/smax-6514064.txt
corpora/ldcc/text/smax/smax-6578713.txt
corpora/ldcc/text/smax/smax-6793832.txt
corpora/ldcc/text/smax/smax-6564609.txt
corpora/ldcc/text/smax/smax-6631888.txt
corpora/ldcc/text/smax/smax-6600570.txt
corpora/ldcc/text/smax/smax-6605698.txt
corpora/ldcc/text/smax/smax-6713879.txt
corpora/ldcc/text/smax/smax-6636743.txt
corpora/ldcc/text/smax/smax-6563636.txt
corpora/ldcc/text/smax/smax-6910653.txt
corpora/ldcc/text/smax/smax-6883152.txt
corpora/ldcc/text/smax/smax-6899352.txt
corpora/ldcc/text/smax/smax-6764325.txt
corpora/ldcc/text/smax/smax-6643040.txt
corpora/ldcc/text/smax/smax-6690718.txt
corpora/ldcc/text/smax/smax-6799194.txt
corpora/ldcc/text/smax/smax-6849018.txt
corpora/ldcc/text/smax/smax-6531999.txt
corpora/ldcc/text/smax/smax-6527989.txt


corpora/ldcc/text/smax/smax-6764641.txt
corpora/ldcc/text/smax/smax-6838627.txt
corpora/ldcc/text/smax/smax-6604610.txt
corpora/ldcc/text/smax/smax-6867630.txt
corpora/ldcc/text/smax/smax-6679825.txt
corpora/ldcc/text/smax/smax-6754050.txt
corpora/ldcc/text/smax/smax-6853346.txt
corpora/ldcc/text/smax/smax-6678600.txt
corpora/ldcc/text/smax/smax-6895018.txt
corpora/ldcc/text/smax/smax-6851590.txt
corpora/ldcc/text/smax/smax-6756923.txt
corpora/ldcc/text/smax/smax-6702682.txt
corpora/ldcc/text/smax/smax-6687337.txt
corpora/ldcc/text/smax/smax-6916709.txt
corpora/ldcc/text/smax/smax-6775399.txt
corpora/ldcc/text/smax/smax-6561866.txt
corpora/ldcc/text/smax/smax-6833065.txt
corpora/ldcc/text/smax/smax-6751259.txt
corpora/ldcc/text/smax/smax-6574641.txt
corpora/ldcc/text/smax/smax-6678539.txt
corpora/ldcc/text/smax/smax-6612334.txt
corpora/ldcc/text/smax/smax-6582940.txt
corpora/ldcc/text/smax/smax-6766483.txt
corpora/ldcc/text/smax/smax-6903745.txt
corpora/ldcc/text/smax/smax-6896678.txt


corpora/ldcc/text/smax/smax-6842219.txt
corpora/ldcc/text/smax/smax-6847081.txt
corpora/ldcc/text/smax/smax-6713273.txt
corpora/ldcc/text/smax/smax-6659937.txt
corpora/ldcc/text/smax/smax-6594621.txt
corpora/ldcc/text/smax/smax-6703721.txt
corpora/ldcc/text/smax/smax-6663887.txt
corpora/ldcc/text/smax/smax-6608153.txt
corpora/ldcc/text/smax/smax-6600767.txt
corpora/ldcc/text/smax/smax-6805090.txt
corpora/ldcc/text/smax/smax-6635980.txt
corpora/ldcc/text/smax/smax-6642148.txt
corpora/ldcc/text/smax/smax-6514980.txt
corpora/ldcc/text/smax/smax-6704949.txt
corpora/ldcc/text/smax/smax-6811453.txt
corpora/ldcc/text/smax/smax-6723552.txt
corpora/ldcc/text/smax/smax-6887070.txt
corpora/ldcc/text/smax/smax-6673414.txt
corpora/ldcc/text/smax/smax-6638832.txt
corpora/ldcc/text/smax/smax-6669158.txt
corpora/ldcc/text/smax/smax-6741178.txt
corpora/ldcc/text/smax/smax-6741327.txt
corpora/ldcc/text/smax/smax-6773922.txt
corpora/ldcc/text/smax/smax-6582464.txt
corpora/ldcc/text/smax/smax-6709649.txt


corpora/ldcc/text/kaden-channel/kaden-channel-6043311.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6388560.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6070642.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6290777.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6066216.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6380155.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6229298.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6192668.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5909995.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6110788.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6265107.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6881294.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6781769.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5999626.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6162712.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6084434.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6764372.txt
corpora/ldcc/t

corpora/ldcc/text/kaden-channel/kaden-channel-6492040.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5841998.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6236826.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6064566.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6604422.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6771753.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6551711.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6574462.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6341520.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6104664.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6127537.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6322405.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6250482.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6138160.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6738506.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6057359.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6011895.txt
corpora/ldcc/t

corpora/ldcc/text/kaden-channel/kaden-channel-6140878.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6429175.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6900638.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6277507.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5888421.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5987733.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5988358.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6828296.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6294341.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6400675.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6378408.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5828555.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6408146.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6352745.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6863206.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5831377.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6100740.txt
corpora/ldcc/t

corpora/ldcc/text/kaden-channel/kaden-channel-6866528.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5800735.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6323807.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6132753.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5943674.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6133546.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6305029.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6831865.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6761367.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6058391.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6033369.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5898406.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6514387.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6022651.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6278958.txt
corpora/ldcc/text/kaden-channel/kaden-channel-6013713.txt
corpora/ldcc/text/kaden-channel/kaden-channel-5962823.txt
corpora/ldcc/t

corpora/ldcc/text/it-life-hack/it-life-hack-6394785.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6842610.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6720225.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6649789.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6476833.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6788402.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6604951.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6738743.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6850254.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6811434.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6547890.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6638851.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6872445.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6726258.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6712884.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6775183.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6387252.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6595

corpora/ldcc/text/it-life-hack/it-life-hack-6486438.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6918825.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6360590.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6730224.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6545602.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6412671.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6901131.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6706704.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6889045.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6444665.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6744400.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6410919.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6643841.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6350168.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6747353.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6544189.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6860823.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6691

corpora/ldcc/text/it-life-hack/it-life-hack-6718931.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6366615.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6534466.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6857586.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6836023.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6363033.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6581954.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6848427.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6530273.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6337707.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6367580.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6649062.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6910399.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6296655.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6889884.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6506494.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6348536.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6877

corpora/ldcc/text/it-life-hack/it-life-hack-6471693.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6577408.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6559330.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6693797.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6863633.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6433376.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6785191.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6449181.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6627907.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6590061.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6856058.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6624672.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6306079.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6695643.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6654103.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6657851.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6739111.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6846

corpora/ldcc/text/it-life-hack/it-life-hack-6634270.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6450329.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6679843.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6703212.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6344928.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6299640.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6716717.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6697341.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6472336.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6587522.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6842412.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6465519.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6460281.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6820932.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6294574.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6429203.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6766499.txt
corpora/ldcc/text/it-life-hack/it-life-hack-6360

## 訓練用と学習用のデータを分ける

訓練用のデータセットと学習用のデータ・セットを9:1で分ける

In [3]:
# rand_split.py
import numpy as np
import os
from os.path import join

def append_newline(filename, line):
    with open(filename, 'a', encoding="utf-8") as file:
        file.write(line)

def split_random(filename):
    dirname = os.path.dirname(filename)
    comps = os.path.basename(filename).split('.')
    train_name = '{}_train.{}'.format(comps[0], comps[1])
    test_name = '{}_test.{}'.format(comps[0], comps[1])
    for l in open(filename, 'r', encoding="utf-8"):
        choice = np.random.choice([True, False], p=[0.9, 0.1])
        if choice: # write train
            append_newline(join(dirname, train_name), l)
        else: # write test
            append_newline(join(dirname, test_name), l)

In [4]:
# 実行：ldcc.textをldcc_train.txtとldcc_test.txtに分ける
!rm -f data/ldcc_{train,test}.txt
split_random('data/ldcc.txt')

## fastText学習

準備が整ったので、fastTextに学習を開始させる。

In [5]:
!../fastText/fasttext supervised -input data/ldcc_train.txt -output data/ldcc_fasttext_supervised \
-dim 10 -lr 0.1 -wordNgrams 2 -minCount 1 -bucket 10000000 -epoch 100 -thread 4

Read 8M words
Number of words:  75596
Number of labels: 9


Progress: 5.5%  words/sec/thread: 2456208  lr: 0.094481  loss: 1.116836  eta: 0h1m 3h-14m 0.0%  words/sec/thread: 2425750  lr: 0.099986  loss: 2.112383  eta: 0h1m 0.0%  words/sec/thread: 2544028  lr: 0.099960  loss: 2.167990  eta: 0h1m 0.1%  words/sec/thread: 2602370  lr: 0.099930  loss: 2.168093  eta: 0h1m 0.1%  words/sec/thread: 2618468  lr: 0.099876  loss: 2.079467  eta: 0h1m 0.2%  words/sec/thread: 2583910  lr: 0.099795  loss: 1.778642  eta: 0h1m   lr: 0.099769  loss: 1.758070  eta: 0h1m 0.3%  words/sec/thread: 2564480  lr: 0.099667  loss: 1.835895  eta: 0h1m 0.4%  words/sec/thread: 2558590  lr: 0.099640  loss: 1.845736  eta: 0h1m 0.4%  words/sec/thread: 2563228  lr: 0.099615  loss: 1.839608  eta: 0h1m 0.4%  words/sec/thread: 2560514  lr: 0.099588  loss: 1.804559  eta: 0h1m 0.4%  words/sec/thread: 2556668  lr: 0.099564  loss: 1.709036  eta: 0h1m 0.5%  words/sec/thread: 2553073  lr: 0.099541  loss: 1.647830  eta: 0h1m 0.7%  words/sec/thread: 2531012  lr: 0.099311  loss: 1.560409  et

Progress: 9.6%  words/sec/thread: 2431367  lr: 0.090374  loss: 0.776644  eta: 0h1m 5.6%  words/sec/thread: 2452792  lr: 0.094366  loss: 1.099968  eta: 0h1m 5.7%  words/sec/thread: 2451746  lr: 0.094348  loss: 1.096688  eta: 0h1m 5.7%  words/sec/thread: 2451140  lr: 0.094323  loss: 1.093429  eta: 0h1m 5.7%  words/sec/thread: 2450809  lr: 0.094300  loss: 1.090352  eta: 0h1m 5.7%  words/sec/thread: 2450573  lr: 0.094275  loss: 1.086706  eta: 0h1m 5.7%  words/sec/thread: 2450521  lr: 0.094255  loss: 1.083833  eta: 0h1m 5.8%  words/sec/thread: 2450202  lr: 0.094228  loss: 1.080232  eta: 0h1m 5.8%  words/sec/thread: 2449908  lr: 0.094206  loss: 1.076719  eta: 0h1m 5.8%  words/sec/thread: 2449127  lr: 0.094186  loss: 1.072677  eta: 0h1m 5.9%  words/sec/thread: 2448065  lr: 0.094141  loss: 1.066674  eta: 0h1m 5.9%  words/sec/thread: 2446447  lr: 0.094100  loss: 1.061712  eta: 0h1m 2445242  lr: 0.094078  loss: 1.059621  eta: 0h1m 6.0%  words/sec/thread: 2443120  lr: 0.094039  loss: 1.053444  et

Progress: 14.6%  words/sec/thread: 2426517  lr: 0.085381  loss: 0.583508  eta: 0h1m   words/sec/thread: 2432599  lr: 0.090246  loss: 0.771520  eta: 0h1m 9.8%  words/sec/thread: 2432962  lr: 0.090223  loss: 0.769970  eta: 0h1m %  words/sec/thread: 2433178  lr: 0.090173  loss: 0.767188  eta: 0h1m 9.9%  words/sec/thread: 2433519  lr: 0.090103  loss: 0.764954  eta: 0h1m 1946  eta: 0h1m 10.0%  words/sec/thread: 2432389  lr: 0.090045  loss: 0.758705  eta: 0h1m 10.0%  words/sec/thread: 2432058  lr: 0.089997  loss: 0.755362  eta: 0h1m 10.0%  words/sec/thread: 2431792  lr: 0.089973  loss: 0.753989  eta: 0h1m 10.2%  words/sec/thread: 2430257  lr: 0.089786  loss: 0.745619  eta: 0h1m 10.2%  words/sec/thread: 2430318  lr: 0.089762  loss: 0.744847  eta: 0h1m 10.3%  words/sec/thread: 2430687  lr: 0.089711  loss: 0.742354  eta: 0h1m 10.3%  words/sec/thread: 2430735  lr: 0.089688  loss: 0.740214  eta: 0h1m 10.3%  words/sec/thread: 2431346  lr: 0.089660  loss: 0.739438  eta: 0h1m 10.4%  words/sec/thread

Progress: 18.1%  words/sec/thread: 2425365  lr: 0.081876  loss: 0.496366  eta: 0h1m 14.7%  words/sec/thread: 2426923  lr: 0.085326  loss: 0.582046  eta: 0h1m 14.7%  words/sec/thread: 2426988  lr: 0.085304  loss: 0.581286  eta: 0h1m 14.7%  words/sec/thread: 2427026  lr: 0.085276  loss: 0.580675  eta: 0h1m 14.8%  words/sec/thread: 2427949  lr: 0.085175  loss: 0.576575  eta: 0h1m 14.9%  words/sec/thread: 2428182  lr: 0.085148  loss: 0.575254  eta: 0h1m 14.9%  words/sec/thread: 2428286  lr: 0.085122  loss: 0.573985  eta: 0h1m 14.9%  words/sec/thread: 2428349  lr: 0.085095  loss: 0.573030  eta: 0h1m 14.9%  words/sec/thread: 2428552  lr: 0.085070  loss: 0.572059  eta: 0h1m 15.0%  words/sec/thread: 2428882  lr: 0.084969  loss: 0.566904  eta: 0h1m 15.1%  words/sec/thread: 2428712  lr: 0.084945  loss: 0.566248  eta: 0h1m 15.1%  words/sec/thread: 2428648  lr: 0.084920  loss: 0.565617  eta: 0h1m 15.1%  words/sec/thread: 2428694  lr: 0.084897  loss: 0.565093  eta: 0h1m 15.1%  words/sec/thread: 242

Progress: 22.0%  words/sec/thread: 2426419  lr: 0.077994  loss: 0.420780  eta: 0h1m 18.2%  words/sec/thread: 2425278  lr: 0.081832  loss: 0.495506  eta: 0h1m 18.2%  words/sec/thread: 2425374  lr: 0.081805  loss: 0.495165  eta: 0h1m 18.2%  words/sec/thread: 2425510  lr: 0.081774  loss: 0.494823  eta: 0h1m 18.2%  words/sec/thread: 2425529  lr: 0.081752  loss: 0.494568  eta: 0h1m   eta: 0h1m 18.4%  words/sec/thread: 2425666  lr: 0.081624  loss: 0.491946  eta: 0h1m 18.4%  words/sec/thread: 2425671  lr: 0.081597  loss: 0.491374  eta: 0h1m 18.4%  words/sec/thread: 2425677  lr: 0.081569  loss: 0.490882  eta: 0h1m 18.5%  words/sec/thread: 2425636  lr: 0.081546  loss: 0.490377  eta: 0h1m 18.5%  words/sec/thread: 2425487  lr: 0.081519  loss: 0.490026  eta: 0h1m 18.5%  words/sec/thread: 2425559  lr: 0.081490  loss: 0.489620  eta: 0h1m 18.5%  words/sec/thread: 2425600  lr: 0.081465  loss: 0.489181  eta: 0h1m 18.6%  words/sec/thread: 2425711  lr: 0.081436  loss: 0.488693  eta: 0h1m 18.6%  words/sec

Progress: 26.2%  words/sec/thread: 2408144  lr: 0.073790  loss: 0.358819  eta: 0h1m 0h1m 0.419296  eta: 0h1m 22.1%  words/sec/thread: 2426007  lr: 0.077883  loss: 0.418865  eta: 0h1m 22.1%  words/sec/thread: 2426112  lr: 0.077854  loss: 0.418249  eta: 0h1m   words/sec/thread: 2426260  lr: 0.077804  loss: 0.417398  eta: 0h1m 22.2%  words/sec/thread: 2426334  lr: 0.077770  loss: 0.417046  eta: 0h1m 22.3%  words/sec/thread: 2426314  lr: 0.077742  loss: 0.416527  eta: 0h1m 0.415662  eta: 0h1m m 22.4%  words/sec/thread: 2425778  lr: 0.077624  loss: 0.413600  eta: 0h1m 22.4%  words/sec/thread: 2425628  lr: 0.077571  loss: 0.412387  eta: 0h1m %  words/sec/thread: 2425522  lr: 0.077552  loss: 0.412015  eta: 0h1m 22.5%  words/sec/thread: 2425313  lr: 0.077505  loss: 0.411467  eta: 0h1m 0.410982  eta: 0h1m 22.5%  words/sec/thread: 2424966  lr: 0.077457  loss: 0.410419  eta: 0h1m 22.6%  words/sec/thread: 2424485  lr: 0.077364  loss: 0.408728  eta: 0h1m 22.7%  words/sec/thread: 2424340  lr: 0.0773

Progress: 30.8%  words/sec/thread: 2401801  lr: 0.069203  loss: 0.309108  eta: 0h1m 26.3%  words/sec/thread: 2407883  lr: 0.073741  loss: 0.357640  eta: 0h1m 26.3%  words/sec/thread: 2407758  lr: 0.073722  loss: 0.357120  eta: 0h1m ress: 26.3%  words/sec/thread: 2407706  lr: 0.073696  loss: 0.356486  eta: 0h1m .073672  loss: 0.355914  eta: 0h1m 26.4%  words/sec/thread: 2407662  lr: 0.073646  loss: 0.355509  eta: 0h1m 26.4%  words/sec/thread: 2407686  lr: 0.073617  loss: 0.355184  eta: 0h1m 26.4%  words/sec/thread: 2407702  lr: 0.073592  loss: 0.354931  eta: 0h1m 26.4%  words/sec/thread: 2407758  lr: 0.073559  loss: 0.354591  eta: 0h1m 26.5%  words/sec/thread: 2407710  lr: 0.073536  loss: 0.354351  eta: 0h1m 26.5%  words/sec/thread: 2407468  lr: 0.073456  loss: 0.353349  eta: 0h1m   eta: 0h1m 26.6%  words/sec/thread: 2407125  lr: 0.073413  loss: 0.352915  eta: 0h1m 26.7%  words/sec/thread: 2406814  lr: 0.073345  loss: 0.351802  eta: 0h1m 26.7%  words/sec/thread: 2406655  lr: 0.073322  l

Progress: 35.3%  words/sec/thread: 2394046  lr: 0.064690  loss: 0.271658  eta: 0h1m 30.9%  words/sec/thread: 2401946  lr: 0.069142  loss: 0.308528  eta: 0h1m 30.9%  words/sec/thread: 2401958  lr: 0.069115  loss: 0.308322  eta: 0h1m 30.9%  words/sec/thread: 2402030  lr: 0.069079  loss: 0.308079  eta: 0h1m 4  loss: 0.307887  eta: 0h1m 31.0%  words/sec/thread: 2402204  lr: 0.069028  loss: 0.307618  eta: 0h1m 0.307240  eta: 0h1m 31.0%  words/sec/thread: 2402339  lr: 0.068966  loss: 0.306937  eta: 0h1m 31.1%  words/sec/thread: 2402418  lr: 0.068937  loss: 0.306747  eta: 0h1m 31.1%  words/sec/thread: 2402439  lr: 0.068901  loss: 0.306455  eta: 0h1m 31.1%  words/sec/thread: 2402397  lr: 0.068865  loss: 0.306119  eta: 0h1m 31.2%  words/sec/thread: 2402378  lr: 0.068837  loss: 0.305810  eta: 0h1m lr: 0.068808  loss: 0.305402  eta: 0h1m rds/sec/thread: 2402127  lr: 0.068777  loss: 0.304878  eta: 0h1m 31.3%  words/sec/thread: 2401969  lr: 0.068740  loss: 0.304402  eta: 0h1m 31.3%  words/sec/threa

Progress: 40.7%  words/sec/thread: 2390904  lr: 0.059315  loss: 0.238336  eta: 0h0m 35.4%  words/sec/thread: 2394270  lr: 0.064639  loss: 0.271383  eta: 0h1m 3  loss: 0.271169  eta: 0h1m 35.4%  words/sec/thread: 2394480  lr: 0.064576  loss: 0.270956  eta: 0h1m 35.4%  words/sec/thread: 2394555  lr: 0.064552  loss: 0.270768  eta: 0h1m 35.5%  words/sec/thread: 2394613  lr: 0.064523  loss: 0.270581  eta: 0h1m 35.5%  words/sec/thread: 2394786  lr: 0.064489  loss: 0.270399  eta: 0h1m   words/sec/thread: 2394738  lr: 0.064431  loss: 0.270018  eta: 0h1m lr: 0.064374  loss: 0.269260  eta: 0h0m %  words/sec/thread: 2394378  lr: 0.064349  loss: 0.268951  eta: 0h0m 35.7%  words/sec/thread: 2394046  lr: 0.064266  loss: 0.268168  eta: 0h0m 35.8%  words/sec/thread: 2394013  lr: 0.064191  loss: 0.267631  eta: 0h0m 35.9%  words/sec/thread: 2393990  lr: 0.064137  loss: 0.267210  eta: 0h0m 35.9%  words/sec/thread: 2394004  lr: 0.064104  loss: 0.266993  eta: 0h0m 35.9%  words/sec/thread: 2394093  lr: 0.06

Progress: 46.4%  words/sec/thread: 2392024  lr: 0.053645  loss: 0.213351  eta: 0h0m 40.7%  words/sec/thread: 2390747  lr: 0.059266  loss: 0.238049  eta: 0h0m 0m 40.8%  words/sec/thread: 2390791  lr: 0.059206  loss: 0.237843  eta: 0h0m 5  loss: 0.237687  eta: 0h0m 41.0%  words/sec/thread: 2391416  lr: 0.059043  loss: 0.237041  eta: 0h0m 41.0%  words/sec/thread: 2391587  lr: 0.059008  loss: 0.236915  eta: 0h0m 41.0%  words/sec/thread: 2391758  lr: 0.058980  loss: 0.236821  eta: 0h0m lr: 0.058754  loss: 0.235709  eta: 0h0m 0.058627  loss: 0.235208  eta: 0h0m lr: 0.058516  loss: 0.234821  eta: 0h0m .058399  loss: 0.234390  eta: 0h0m rds/sec/thread: 2391396  lr: 0.058372  loss: 0.234190  eta: 0h0m eta: 0h0m .058317  loss: 0.233682  eta: 0h0m ogress: 41.8%  words/sec/thread: 2391375  lr: 0.058225  loss: 0.233069  eta: 0h0m  loss: 0.232863  eta: 0h0m 41.8%  words/sec/thread: 2391532  lr: 0.058154  loss: 0.232632  eta: 0h0m 41.9%  words/sec/thread: 2391518  lr: 0.058124  loss: 0.232441  eta: 0

oss: 0.213350  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392022  lr: 0.053645  loss: 0.213348  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392019  lr: 0.053645  loss: 0.213346  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392019  lr: 0.053645  loss: 0.213345  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392022  lr: 0.053644  loss: 0.213343  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392040  lr: 0.053640  loss: 0.213342  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392038  lr: 0.053640  loss: 0.213340  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392038  lr: 0.053640  loss: 0.213338  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392037  lr: 0.053640  loss: 0.213337  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392035  lr: 0.053640  loss: 0.213335  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392034  lr: 0.053640  loss: 0.213334  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392033  lr: 0.053640  loss: 0.213332  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392

read: 2392027  lr: 0.053624  loss: 0.213166  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392026  lr: 0.053623  loss: 0.213166  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392048  lr: 0.053619  loss: 0.213164  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392052  lr: 0.053619  loss: 0.213162  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392054  lr: 0.053619  loss: 0.213161  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392055  lr: 0.053619  loss: 0.213160  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392053  lr: 0.053618  loss: 0.213158  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392055  lr: 0.053618  loss: 0.213156  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392055  lr: 0.053618  loss: 0.213155  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392052  lr: 0.053618  loss: 0.213153  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392050  lr: 0.053618  loss: 0.213151  eta: 0h0m Progress: 46.4%  words/sec/thread: 2392054  lr: 0.053618  loss: 0.213150  eta: 0h0m Progress:

Progress: 52.7%  words/sec/thread: 2393094  lr: 0.047347  loss: 0.190330  eta: 0h0m ss: 0.212205  eta: 0h0m 2392623  lr: 0.053400  loss: 0.211976  eta: 0h0m lr: 0.053371  loss: 0.211844  eta: 0h0m 46.7%  words/sec/thread: 2392904  lr: 0.053342  loss: 0.211729  eta: 0h0m 46.7%  words/sec/thread: 2393013  lr: 0.053304  loss: 0.211607  eta: 0h0m 46.7%  words/sec/thread: 2393162  lr: 0.053268  loss: 0.211487  eta: 0h0m   words/sec/thread: 2393394  lr: 0.053229  loss: 0.211364  eta: 0h0m 0  loss: 0.211009  eta: 0h0m ress: 46.9%  words/sec/thread: 2394243  lr: 0.053085  loss: 0.210680  eta: 0h0m rds/sec/thread: 2394312  lr: 0.053053  loss: 0.210462  eta: 0h0m 4  loss: 0.210286  eta: 0h0m 0.052975  loss: 0.210018  eta: 0h0m 3  loss: 0.209704  eta: 0h0m c/thread: 2394369  lr: 0.052897  loss: 0.209534  eta: 0h0m h0m 0.052818  loss: 0.209184  eta: 0h0m 47.2%  words/sec/thread: 2394481  lr: 0.052791  loss: 0.209082  eta: 0h0m 0.052765  loss: 0.208992  eta: 0h0m  47.3%  words/sec/thread: 2394647  

Progress: 58.5%  words/sec/thread: 2387594  lr: 0.041484  loss: 0.173867  eta: 0h0m 52.7%  words/sec/thread: 2393082  lr: 0.047289  loss: 0.190211  eta: 0h0m lr: 0.047172  loss: 0.189962  eta: 0h0m lr: 0.047140  loss: 0.189857  eta: 0h0m  52.9%  words/sec/thread: 2392684  lr: 0.047082  loss: 0.189688  eta: 0h0m   words/sec/thread: 2392695  lr: 0.047018  loss: 0.189556  eta: 0h0m  53.0%  words/sec/thread: 2392738  lr: 0.046954  loss: 0.189321  eta: 0h0m 53.1%  words/sec/thread: 2392717  lr: 0.046921  loss: 0.189225  eta: 0h0m 53.1%  words/sec/thread: 2392725  lr: 0.046887  loss: 0.189150  eta: 0h0m 1  loss: 0.189063  eta: 0h0m c/thread: 2392735  lr: 0.046814  loss: 0.188975  eta: 0h0m 2392745  lr: 0.046781  loss: 0.188941  eta: 0h0m 2  loss: 0.188874  eta: 0h0m %  words/sec/thread: 2392662  lr: 0.046686  loss: 0.188833  eta: 0h0m 53.3%  words/sec/thread: 2392502  lr: 0.046652  loss: 0.188775  eta: 0h0m 53.4%  words/sec/thread: 2392352  lr: 0.046620  loss: 0.188719  eta: 0h0m 6  loss: 0.

Progress: 65.2%  words/sec/thread: 2385331  lr: 0.034828  loss: 0.159067  eta: 0h0m r: 0.041419  loss: 0.173671  eta: 0h0m 9  loss: 0.173574  eta: 0h0m .173482  eta: 0h0m ta: 0h0m 58.8%  words/sec/thread: 2387967  lr: 0.041226  loss: 0.173198  eta: 0h0m .173079  eta: 0h0m 2387908  lr: 0.041153  loss: 0.172960  eta: 0h0m   words/sec/thread: 2387839  lr: 0.041123  loss: 0.172846  eta: 0h0m 78  loss: 0.172644  eta: 0h0m ec/thread: 2387682  lr: 0.041055  loss: 0.172526  eta: 0h0m 0h0m .041001  loss: 0.172292  eta: 0h0m oss: 0.172204  eta: 0h0m 120  eta: 0h0m 0h0m ress: 59.1%  words/sec/thread: 2387144  lr: 0.040872  loss: 0.171935  eta: 0h0m %  words/sec/thread: 2386974  lr: 0.040838  loss: 0.171839  eta: 0h0m 725  eta: 0h0m 59.2%  words/sec/thread: 2386648  lr: 0.040764  loss: 0.171660  eta: 0h0m 59.3%  words/sec/thread: 2386517  lr: 0.040739  loss: 0.171593  eta: 0h0m 59.3%  words/sec/thread: 2386502  lr: 0.040706  loss: 0.171523  eta: 0h0m 7  loss: 0.171464  eta: 0h0m 59.4%  words/sec/t

Progress: 72.4%  words/sec/thread: 2389883  lr: 0.027599  loss: 0.145942  eta: 0h0m .159054  eta: 0h0m ta: 0h0m 4  loss: 0.158839  eta: 0h0m .158766  eta: 0h0m h0m ss: 0.158563  eta: 0h0m 86  eta: 0h0m r: 0.034443  loss: 0.158350  eta: 0h0m ress: 65.6%  words/sec/thread: 2385471  lr: 0.034391  loss: 0.158191  eta: 0h0m %  words/sec/thread: 2385474  lr: 0.034372  loss: 0.158114  eta: 0h0m 017  eta: 0h0m 0h0m ss: 0.157814  eta: 0h0m 39  eta: 0h0m 90  lr: 0.034202  loss: 0.157649  eta: 0h0m 034159  loss: 0.157571  eta: 0h0m 65.9%  words/sec/thread: 2386000  lr: 0.034111  loss: 0.157484  eta: 0h0m 65.9%  words/sec/thread: 2386070  lr: 0.034082  loss: 0.157431  eta: 0h0m lr: 0.033986  loss: 0.157240  eta: 0h0m 66.1%  words/sec/thread: 2386547  lr: 0.033949  loss: 0.157154  eta: 0h0m ds/sec/thread: 2386587  lr: 0.033913  loss: 0.157075  eta: 0h0m 66.1%  words/sec/thread: 2386710  lr: 0.033875  loss: 0.157010  eta: 0h0m d: 2386745  lr: 0.033841  loss: 0.156928  eta: 0h0m   words/sec/thread: 2

Progress: 79.6%  words/sec/thread: 2394935  lr: 0.020434  loss: 0.135386  eta: 0h0m 72.5%  words/sec/thread: 2389912  lr: 0.027495  loss: 0.145815  eta: 0h0m 72.5%  words/sec/thread: 2389909  lr: 0.027464  loss: 0.145772  eta: 0h0m 72.6%  words/sec/thread: 2389884  lr: 0.027431  loss: 0.145731  eta: 0h0m lr: 0.027408  loss: 0.145696  eta: 0h0m lr: 0.027370  loss: 0.145639  eta: 0h0m lr: 0.027327  loss: 0.145643  eta: 0h0m lr: 0.027301  loss: 0.145643  eta: 0h0m 59  loss: 0.145632  eta: 0h0m 0.145603  eta: 0h0m eta: 0h0m 079  lr: 0.027163  loss: 0.145521  eta: 0h0m ress: 72.9%  words/sec/thread: 2390070  lr: 0.027147  loss: 0.145473  eta: 0h0m oss: 0.145398  eta: 0h0m 338  eta: 0h0m .027012  loss: 0.145278  eta: 0h0m oss: 0.145217  eta: 0h0m read: 2390161  lr: 0.026974  loss: 0.145137  eta: 0h0m 162  lr: 0.026948  loss: 0.145075  eta: 0h0m 73.1%  words/sec/thread: 2390114  lr: 0.026929  loss: 0.144984  eta: 0h0m .144906  eta: 0h0m 2390055  lr: 0.026827  loss: 0.144827  eta: 0h0m : 73.2%

Progress: 87.1%  words/sec/thread: 2391598  lr: 0.012869  loss: 0.126048  eta: 0h0m 2394871  lr: 0.020347  loss: 0.135267  eta: 0h0m lr: 0.020272  loss: 0.135173  eta: 0h0m  79.8%  words/sec/thread: 2394761  lr: 0.020245  loss: 0.135136  eta: 0h0m rds/sec/thread: 2394745  lr: 0.020215  loss: 0.135086  eta: 0h0m read: 2394740  lr: 0.020185  loss: 0.135035  eta: 0h0m 0h0m ress: 79.9%  words/sec/thread: 2394687  lr: 0.020111  loss: 0.134945  eta: 0h0m %  words/sec/thread: 2394601  lr: 0.020073  loss: 0.134893  eta: 0h0m ec/thread: 2394532  lr: 0.020047  loss: 0.134838  eta: 0h0m  2394471  lr: 0.020020  loss: 0.134785  eta: 0h0m .019983  loss: 0.134738  eta: 0h0m 1  loss: 0.134699  eta: 0h0m 6  loss: 0.134671  eta: 0h0m ss: 0.134618  eta: 0h0m 67  eta: 0h0m 34  lr: 0.019815  loss: 0.134500  eta: 0h0m 019780  loss: 0.134448  eta: 0h0m lr: 0.019699  loss: 0.134361  eta: 0h0m ress: 80.3%  words/sec/thread: 2393399  lr: 0.019674  loss: 0.134343  eta: 0h0m ec/thread: 2393329  lr: 0.019638  loss

ead: 2391586  lr: 0.012868  loss: 0.126048  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391580  lr: 0.012868  loss: 0.126047  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391578  lr: 0.012868  loss: 0.126047  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391574  lr: 0.012868  loss: 0.126046  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391571  lr: 0.012868  loss: 0.126046  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391570  lr: 0.012868  loss: 0.126045  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391570  lr: 0.012868  loss: 0.126045  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391570  lr: 0.012868  loss: 0.126044  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391570  lr: 0.012868  loss: 0.126044  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391569  lr: 0.012868  loss: 0.126043  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391568  lr: 0.012868  loss: 0.126043  eta: 0h0m Progress: 87.1%  words/sec/thread: 2391568  lr: 0.012868  loss: 0.126042  eta: 0h0m Progress: 

r: 0.012837  loss: 0.125980  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391508  lr: 0.012836  loss: 0.125980  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391508  lr: 0.012836  loss: 0.125979  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391507  lr: 0.012836  loss: 0.125978  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391510  lr: 0.012836  loss: 0.125978  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391508  lr: 0.012835  loss: 0.125977  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391507  lr: 0.012835  loss: 0.125977  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391509  lr: 0.012835  loss: 0.125976  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391510  lr: 0.012835  loss: 0.125976  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391510  lr: 0.012835  loss: 0.125975  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391510  lr: 0.012834  loss: 0.125975  eta: 0h0m Progress: 87.2%  words/sec/thread: 2391512  lr: 0.012834  loss: 0.125974  eta: 0h0m Progress: 87.2%  words/se

Progress: 94.1%  words/sec/thread: 2395268  lr: 0.005935  loss: 0.119310  eta: 0h0m 6  loss: 0.125858  eta: 0h0m 87.3%  words/sec/thread: 2391547  lr: 0.012691  loss: 0.125830  eta: 0h0m ss: 0.125800  eta: 0h0m c/thread: 2391609  lr: 0.012639  loss: 0.125773  eta: 0h0m h0m r: 0.012570  loss: 0.125694  eta: 0h0m 87.5%  words/sec/thread: 2391688  lr: 0.012514  loss: 0.125642  eta: 0h0m ss: 0.125610  eta: 0h0m ead: 2391771  lr: 0.012445  loss: 0.125556  eta: 0h0m h0m 012383  loss: 0.125473  eta: 0h0m ss: 0.125430  eta: 0h0m .012292  loss: 0.125375  eta: 0h0m rds/sec/thread: 2391541  lr: 0.012252  loss: 0.125325  eta: 0h0m read: 2391502  lr: 0.012223  loss: 0.125283  eta: 0h0m  87.9%  words/sec/thread: 2391428  lr: 0.012140  loss: 0.125206  eta: 0h0m ec/thread: 2391329  lr: 0.012109  loss: 0.125149  eta: 0h0m 87.9%  words/sec/thread: 2391309  lr: 0.012082  loss: 0.125108  eta: 0h0m 87.9%  words/sec/thread: 2391335  lr: 0.012054  loss: 0.125081  eta: 0h0m lr: 0.012015  loss: 0.125035  eta: 

Progress: 100.0%  words/sec/thread: 2398492  lr: 0.000000  loss: 0.114192  eta: 0h0m  words/sec/thread: 2395290  lr: 0.005839  loss: 0.119242  eta: 0h0m .119213  eta: 0h0m 2395272  lr: 0.005785  loss: 0.119165  eta: 0h0m 5  loss: 0.119086  eta: 0h0m .119049  eta: 0h0m   words/sec/thread: 2395064  lr: 0.005630  loss: 0.119010  eta: 0h0m 58  eta: 0h0m 2395001  lr: 0.005581  loss: 0.118927  eta: 0h0m ess: 94.4%  words/sec/thread: 2394943  lr: 0.005550  loss: 0.118876  eta: 0h0m 1  loss: 0.118848  eta: 0h0m c/thread: 2394913  lr: 0.005507  loss: 0.118798  eta: 0h0m   words/sec/thread: 2394954  lr: 0.005418  loss: 0.118772  eta: 0h0m .118743  eta: 0h0m ta: 0h0m 75  lr: 0.005310  loss: 0.118673  eta: 0h0m 005261  loss: 0.118631  eta: 0h0m   words/sec/thread: 2395210  lr: 0.005224  loss: 0.118599  eta: 0h0m .118571  eta: 0h0m ead: 2395320  lr: 0.005154  loss: 0.118540  eta: 0h0m h0m r: 0.005095  loss: 0.118479  eta: 0h0m 6  loss: 0.118439  eta: 0h0m .118397  eta: 0h0m ta: 0h0m r: 0.004970  lo

## fastTextテスト

学習成果をテスト。

In [6]:
!../fastText/fasttext test data/ldcc_fasttext_supervised.bin data/ldcc_test.txt

N	1484
P@1	0.992
R@1	0.992
Number of examples: 1484


99.2%の精度で分類することができた驚異的な（？）結果。