Unsupervised Keyphrase Extraction

Update 2022-09-02: We release the data preprocess script and chinese keyphrase extraction code in https://github.com/xnliang98/CKE-ZH.

requirements

We employ StanfordCoreNLP Tools to preprocess the data.

Step 1: obtain embeddings of candidate phrases and the whole document.

python src/get_embedding.py --file_path [data_path] --file_name [file_name] --model_name [pretrained model name/path]

Step 2: extract keyphrases

python src/ranker.py [data_path] [model_name]

The middle layer representation of BERT model may get better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
src		src
README.md		README.md