Skip to content
Code and data for EMNLP 2018 paper "Cross-lingual Lexical Sememe Prediction"
C Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Cross-lingual Lexical Sememe Prediction

This is the open-source code of the EMNLP 2018 paper Cross-lingual Lexical Sememe Prediction [pdf].


Sememes are defined as the minimum semantic units of human languages. As important knowledge sources, sememe-based linguistic knowledge bases have been widely used in many NLP tasks. However, most languages still do not have sememe-based linguistic knowledge bases. Thus we present a task of cross-lingual lexical sememe prediction (CLSP), aiming to automatically predict sememes for words in other languages. We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction.



To change the training corpus, please just switch the -mono-train1 and -mono-train2 parameters in Notice that lang1 refers to the source language and lang2 refers to the target language.


Process Type Source Target
Training Corpus Sogou-T Wikipedia
Seed Lexicon Google Translate API
Sememe-based KB HowNet_zh -
Testing Sememe Prediction - HowNet_en
Bilingual Lexicon Induction Chinese-English Translation Lexicon 3.0 Version
Word Similarity Computation Wordsim-240 WordSim-353
WordSim-297 SimLex-999


If the codes or datasets help you, please cite the following paper:

  Title      = {Cross-lingual lexical sememe prediction},
  Author     = {Qi, Fanchao and Lin, Yankai and Sun, Maosong and Zhu, Hao and Xie, Ruobing and Liu, Zhiyuan},
  Booktitle  = {Proceedings of EMNLP},
  Year       = {2018},
You can’t perform that action at this time.