detector

Dataset

We use reddit and twitter, wikipedia.

There are some settings of the KB-prior-LDA model. (1) There are 100 topics, including 47 carefully refined topics. TODO We also should include ice hockey topic, boxing, baby sitting topic, and astrology topic etc. (2) We do 200 iterations, in which we do prior optimization every 8 iterations when after 20 burin iterations. (3) We set KB prior weights to 6.4. (4) We set the topic weights for words initially as tau (words vector defined by refined topics) when other oov words as [0, .., 0, 1/(100-47),.., 1/(100-47)]

TODO we should check why perplexity varies when applying on 2011-07-02.json (perplexity=2050) and 2011-07-23.json (perplexity=1750).

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
20news-18828		20news-18828
20news-jupyter		20news-jupyter
classifier		classifier
hood		hood
lib		lib
paper		paper
src		src
.gitignore		.gitignore
.project		.project
OnBenchmark1.csv		OnBenchmark1.csv
README.md		README.md
TransDetecotrResult_CSV.csv		TransDetecotrResult_CSV.csv
TransDetector_Result2.xlsx		TransDetector_Result2.xlsx
barchartOnBenchmark1.pdf		barchartOnBenchmark1.pdf
barplot.R		barplot.R
benchmark1.py		benchmark1.py
boxplot.R		boxplot.R
config.ini		config.ini
event_size.csv		event_size.csv
makeCorpus.py		makeCorpus.py
npmi.R		npmi.R
npmi.csv		npmi.csv
npmiDf.py		npmiDf.py
runDDLDA.sh		runDDLDA.sh
vocabulary_size.py		vocabulary_size.py
wiki_topics.py		wiki_topics.py

waleking/detector

Folders and files

Latest commit

History

Repository files navigation

detector

Dataset

About

Resources

Stars

Watchers

Forks

Languages