text-tracer

a text-searching script base on bayesian network and word2vec

dependencies

gensim,

install

1. get the latest version of wiki dumpfile

$ curl https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2  
$ 7z e enwiki-latest-pages-articles.xml.bz2

2. parse the origin xml using wikiextractor

$ git clone https://github.com/attardi/wikiextractor && cd wikiextractor
$ python3 WikiExtractor.py --html -s  ../enwiki-latest-pages-articles.xml

3. build graph

$ cd .. && git clone https://github.com/ethanmiles/Bayesian-Network-for-NLP && cd Bayesian-Network-for-NLP/src/py/
$ python3 xmlParser.py --input /path/to/wikiextractor/text/   --work /path/to/workdir/ -p

4. start searching!

$ python3 xmlParser.py  --work /path/to/workdir/ -q feedback

dev schedule

1. corpus processing

xml_parser.hpp contains a series of xml processing tools which can basically cover various requirement of student or scholar.

Method Preview:

split tags

2. bayesian graph construction

3. advance application

building graph from structured documents

4. test

graph_test
xml_parser_test

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-tracer

dependencies

install

1. get the latest version of wiki dumpfile

2. parse the origin xml using wikiextractor

3. build graph

4. start searching!

dev schedule

1. corpus processing

2. bayesian graph construction

3. advance application

4. test

About

Releases

Packages

Languages

License

yira97/text-tracer

Folders and files

Latest commit

History

Repository files navigation

text-tracer

dependencies

install

1. get the latest version of wiki dumpfile

2. parse the origin xml using wikiextractor

3. build graph

4. start searching!

dev schedule

1. corpus processing

2. bayesian graph construction

3. advance application

4. test

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages