GitHub - tianhan4/paper2vec: Paper2vec method implementations

Paper2vec Implementation

Han Tian, Hankz Hankui Zhuo

In this repository we provide open source implementations of paper2vec proposed in the literature. Our implementations learns the paper node embeddings from the paper citation network. (https://arxiv.org/abs/1703.06587)

Input File Formats

The input files contains the paper citation network. For efficiency the cited and citing papers of each nod will be described directly in the input file. The format of the input file is as follows, three lines as a unit:

  <node id>\n
  <citing nodes ids>\n
  <cited nodes ids>\n

For every node, the second line contains the papers it cites and three line the papers that cites it. papers in a line are separated by spaces. For example:

  9236192
  9151905 9034141 8976200 
  20865163 9547333 9841919 9743534 10499921 17645788 18577234 12429062

Make sure the first line contains all ids used in the input file.

Running Ranking Methods

First the input file will be sampled to generate the training corpus.

  python genStairCorpus.py --input adjlistData.txt --output stairCorpus.txt --window 3 --iterate 900 --decay 0.7

Then for training:

  shuf stairCorpus.txt > shufStairCorpus.txt
  ./paper2vec -train shufStairCorpus.txt -output vectors.bin -size 300 -threads 8 -binary 1 -iter 2 -alpha 0.025

Licence

All codes are provided under a gnu/gpl licence.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
genStairCorpus.py		genStairCorpus.py
makefile		makefile
paper2vec.c		paper2vec.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

genStairCorpus.py

genStairCorpus.py

makefile

makefile

paper2vec.c

paper2vec.c

Repository files navigation

Paper2vec Implementation

Han Tian, Hankz Hankui Zhuo

Input File Formats

Running Ranking Methods

Licence

About

Releases

Packages

Languages

License

tianhan4/paper2vec

Folders and files

Latest commit

History

Repository files navigation

Paper2vec Implementation

Han Tian, Hankz Hankui Zhuo

Input File Formats

Running Ranking Methods

Licence

About

Resources

License

Stars

Watchers

Forks

Languages