GitHub - xiaoyao1991/hmmpy: HMM Sequential Labeling Library

HMM Based publication segmentation

Cora specific

Maybe using the stanford-ner's tokenizer, which seems much more fancier than mine.

Directory Structure

/data: Contains some dictionaries that contains certain tokens. These files ended in *.lst
/deprecated: Some .py files that are not used
/log: Test logs will be stored here
batch_test.sh: a shell script that calls test.py on the testing samples.
hmm.py: The HMM model class, and its function implementation(viterbi, training)
feature.py: The basic feature class. Contains features in both 1st iteration and 2nd iteration
boosting_feature.py: Extended the basic feature class, and contains additional features that used in 2nd iteration.
languange_model.py: The background knowledge model used in the Combine stage.
tokens.py: Tokenize the publications and also apply some preprocessing.
training_set_generator.py: The class that crawl the Google Scholar, and generate different versions of a single piece of record to enlarge the training set.
utils.py: Some utility methods.
classifier.py: The wrapper class that will do the publication segmentation work.
retrainer: The class that will do the retraining job. This is where the 2nd iteration happens

Usage:

Run the test.py to see detail instructions.

Caution:

The training set generating step may take long. And so every time the crawler actually goes to the google scholar to crawl data, the results will be serialized locally, and thus the data can be reused.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
deprecated		deprecated
lib		lib
opennlp-model		opennlp-model
summary		summary
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
batch_test.sh		batch_test.sh
boosting_feature.py		boosting_feature.py
classifier.py		classifier.py
feature.py		feature.py
hmm.py		hmm.py
label_sequences.pkl.cora		label_sequences.pkl.cora
language_model.py		language_model.py
observation_sequences.pkl.cora		observation_sequences.pkl.cora
old-tokens.py		old-tokens.py
opennlp-tokenizer.jar		opennlp-tokenizer.jar
retrainer.py		retrainer.py
sparse_matrix_dict.py		sparse_matrix_dict.py
test.py		test.py
tokens.py		tokens.py
training_examples.txt		training_examples.txt
training_set_generator.py		training_set_generator.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMM Based publication segmentation

Cora specific

Directory Structure

Usage:

Caution:

About

Releases

Packages

Languages

xiaoyao1991/hmmpy

Folders and files

Latest commit

History

Repository files navigation

HMM Based publication segmentation

Cora specific

Directory Structure

Usage:

Caution:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages