Cross-lingual Lexical Sememe Prediction

This is the open-source code of the EMNLP 2018 paper Cross-lingual Lexical Sememe Prediction [pdf].

Introduction

Sememes are defined as the minimum semantic units of human languages. As important knowledge sources, sememe-based linguistic knowledge bases have been widely used in many NLP tasks. However, most languages still do not have sememe-based linguistic knowledge bases. Thus we present a task of cross-lingual lexical sememe prediction (CLSP), aiming to automatically predict sememes for words in other languages. We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction.

Usage

bash run.sh

To change the training corpus, please just switch the -mono-train1 and -mono-train2 parameters in bash.sh. Notice that lang1 refers to the source language and lang2 refers to the target language.

Datasets

Process	Type	Source	Target
Training	Corpus	Sogou-T	Wikipedia
	Seed Lexicon	Google Translate API
	Sememe-based KB	HowNet_zh	-
Testing	Sememe Prediction	-	HowNet_en
	Bilingual Lexicon Induction	Chinese-English Translation Lexicon 3.0 Version
	Word Similarity Computation	Wordsim-240	WordSim-353
	Word Similarity Computation	WordSim-297	SimLex-999

Cite

If the codes or datasets help you, please cite the following paper:

@InProceedings{qi2018cross,
  Title      = {Cross-lingual lexical sememe prediction},
  Author     = {Qi, Fanchao and Lin, Yankai and Sun, Maosong and Zhu, Hao and Xie, Ruobing and Liu, Zhiyuan},
  Booktitle  = {Proceedings of EMNLP},
  Year       = {2018},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bin		bin
data		data
output		output
src		src
LICENSE		LICENSE
README.md		README.md
config		config
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-lingual Lexical Sememe Prediction

Introduction

Usage

Datasets

Cite

About

Releases

Packages

Languages

License

thunlp/CLSP

Folders and files

Latest commit

History

Repository files navigation

Cross-lingual Lexical Sememe Prediction

Introduction

Usage

Datasets

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages