Unable to get the result similar to the paper #2

xiaodaxia · 2018-09-26T09:39:22Z

I do not know whether I am missing something. But from my understanding, the default setting in stm.java should give a result near to 0.952 as in Table 4 of the paper "Effective Document Labeling with Very Few Seed Words", the macro-f1 score for ploitics-religion classification.
However, after 50 iterations I can only get 0.8438.
I used all the default settings.
Below is a part of my running log, Thanks for your help!

5038 have been indexed...
5040 have been indexed...
5042 have been indexed...
5044 have been indexed...
5046 have been indexed...
5048 have been indexed...
loading documents...
calculate co-occurrence...
start to predict...
iter: 0
f1: 0.5363255440310482
cost time 834ms
iter: 1
f1: 0.7122772025090711
cost time 635ms
iter: 2
f1: 0.7657145896016797
cost time 635ms
iter: 3
f1: 0.7998512521261621
cost time 621ms
iter: 4
f1: 0.814890872076508
cost time 708ms
iter: 5
f1: 0.8326626957441149
cost time 680ms
iter: 6
f1: 0.8262305084412165
cost time 612ms
iter: 7
f1: 0.8223032270759543
cost time 618ms
iter: 8
f1: 0.8282995798893804
cost time 651ms
iter: 9
f1: 0.8337883053625725
cost time 696ms
iter: 10
f1: 0.8333308752606785
cost time 674ms
iter: 11
f1: 0.8338216449501352
cost time 626ms
iter: 12
f1: 0.8268768778734245
cost time 620ms
iter: 13
f1: 0.8288385453402936
cost time 620ms
iter: 14
f1: 0.8263970819568979
cost time 620ms
iter: 15
f1: 0.8218679885271558
cost time 627ms
iter: 16
f1: 0.8333608587943848
cost time 612ms
iter: 17
f1: 0.8329123378577755
cost time 610ms
iter: 18
f1: 0.837828990618261
cost time 621ms
iter: 19
f1: 0.8343527584444181
cost time 614ms
iter: 20
f1: 0.834333215353154
cost time 610ms
iter: 21
f1: 0.8418369014143334
cost time 643ms
iter: 22
f1: 0.8378914584256099
cost time 631ms
iter: 23
f1: 0.8324480210666981
cost time 608ms
iter: 24
f1: 0.8304651574106827
cost time 610ms
iter: 25
f1: 0.833925500221955
cost time 609ms
iter: 26
f1: 0.8408377750635577
cost time 616ms
iter: 27
f1: 0.8353951444250514
cost time 625ms
iter: 28
f1: 0.8319127904876824
cost time 641ms
iter: 29
f1: 0.8384024870376721
cost time 623ms
iter: 30
f1: 0.8403959423280034
cost time 623ms
iter: 31
f1: 0.8398977214530726
cost time 624ms
iter: 32
f1: 0.8403669274006524
cost time 650ms
iter: 33
f1: 0.8398807311481298
cost time 636ms
iter: 34
f1: 0.8458103446454425
cost time 657ms
iter: 35
f1: 0.8428717575870712
cost time 616ms
iter: 36
f1: 0.8443508617479418
cost time 627ms
iter: 37
f1: 0.8384132832264879
cost time 609ms
iter: 38
f1: 0.8364137483787288
cost time 615ms
iter: 39
f1: 0.8423834019743461
cost time 631ms
iter: 40
f1: 0.8419055753812179
cost time 616ms
iter: 41
f1: 0.8418636323103821
cost time 626ms
iter: 42
f1: 0.8373855268834323
cost time 622ms
iter: 43
f1: 0.8373708237305959
cost time 617ms
iter: 44
f1: 0.8399027549250704
cost time 623ms
iter: 45
f1: 0.8373629916183107
cost time 658ms
iter: 46
f1: 0.8394046127772457
cost time 612ms
iter: 47
f1: 0.843880895911019
cost time 613ms
iter: 48
f1: 0.8468645220990465
cost time 631ms
iter: 49
f1: 0.8438726134924301
cost time 607ms

xiaodaxia · 2018-09-26T09:48:33Z

I don't know whether the difference comes from the corpus processing, as the original document says
When parsing
the documents, we keep the text contained in the “Subject”, “Key-
words”, and “Content” fields. The information in the other fields
and email addresses are filtered out.
I will check it out.

xiaodaxia closed this as completed Sep 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get the result similar to the paper #2

Unable to get the result similar to the paper #2

xiaodaxia commented Sep 26, 2018

xiaodaxia commented Sep 26, 2018

Unable to get the result similar to the paper #2

Unable to get the result similar to the paper #2

Comments

xiaodaxia commented Sep 26, 2018

xiaodaxia commented Sep 26, 2018