Must-read Papers on Sememe Computation
Contributed by Fanchao Qi.
A sememe is defined as the minimum semantic unit in linguistics. Some linguists believe that meanings of all words can be decomposed of a limited set of sememes.
Sememes can help us comprehend human languages better. Some studies have proved that neural NLP models benefit from the incorporation of sememes.
HowNet is the most famous sememe-based knowledge base. It predefines a set of 2,000 sememes and uses them to annotate over 100,000 Chinese and English words.
This paper gives an overall introduction to HowNet, including its features, philosophy and constructing method.
- HowNet - a hybrid language and knowledge resource. Zhendong Dong and Qiang Dong. NLP-KE 2003. [pdf]
This paper gives a brief introduction to HowNet.
- KDML — Knowledge Database Mark-up Language. Zhendong Dong and Qiang Dong. [pdf (Chinese)]
This paper gives a detailed introduction to Knowledge Database Mark-up Language, the mark-up language used in HowNet.
Applications of Sememes
- Multi-channel Reverse Dictionary Model. Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng, Wang, Qun Liu, Maosong Sun. AAAI-20. [pdf] [code]
- Modeling Semantic Compositionality with Sememe Knowledge. Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu and Maosong Sun. ACL 2019. [pdf] [code]
- Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge. Ziran Li, Ning Ding, Zhiyuan Liu, Haitao Zheng and Ying Shen. ACL 2019. [pdf] [code]
- Unsupervised Neural Aspect Extraction with Sememes. Ling Luo, Xiang Ao, Yan Song, Jinyao Li, Xiaopeng Yang, Qing He and Dong Yu. IJCAI 2019. [pdf]
- K-BERT: Enabling Language Representation with Knowledge Graph. Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng and Ping Wang. arXiv 2019. [pdf] [code]
- Semantic Representation Learning Based on HowNet. Jingwen Zhu, Yuji Yang, Bin Xu and Juanzi Li. JCIP 2019. [pdf (Chinese)]
- A Word Representation Method Based on HowNet. Yang Chen and Zhiyong Luo. Acta Scientiarum Naturalium Universitatis Pekinensis 2019. [pdf (Chinese)]
- Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network based on HowNet. Shu Liu, Jingjing Xu and Xuancheng Ren. NLPCC 2019. [pdf]
- Language Modeling with Sparse Product of Sememe Experts. Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin and Leyu Lin. EMNLP 2018. [pdf] [code]
- Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention. Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu and Maosong Sun. AAAI-18. [pdf] [code]
- Improved Word Representation Learning with Sememes. Yilin Niu, Ruobing Xie, Zhiyuan Liu and Maosong Sun. ACL 2017. [pdf] [code]
- Embedding for Words and Word Senses Based on Human Annotated Knowledge Base: A Case Study on HowNet. Maosong Sun and Xinxiong Chen. JCIP 2016. [pdf (Chinese)]
- Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. Xianghua Fu, Guo Li, Yanyan Guo and Zhiqiang Wang. Knowledge-Based Systems 2013. [pdf]
- Employing Morphological Structures and Sememes for Chinese Event Extraction. Peifeng Li and Guodong Zhou. COLING 2012. [pdf]
- Method of discriminant for Chinese sentence sentiment orientation based on HowNet. Lei Dang and Lei Zhang. Application Research of Computers 2010. [pdf (Chinese)]
- HowNet Based Chinese Question Automatic Classification. Jingguang Sun, Dongfeng Cai, Dexin Lv, and Yanju Dong. JCIP 2007. [pdf (Chinese)]
- Word Sense Disambiguation through Sememe Labeling. Xiangyu Duan, Jun Zhao and Bo Xu. IJCAI 2007. [pdf]
- Analogy Generation with HowNet. Tony Veale. IJCAI 2005. [pdf]
- Semantic orientation computing based on HowNet. Yanlan Zhu, Jin Min, Yaqian Zhou, Xuanjing Huang and Lide Wu. JCIP 2005. [pdf (Chinese)]
- Chinese word sense disambiguation using HowNet. Yuntao Zhang, Ling Gong and Yongcheng Wang. ICNC 2005. [pdf]
- Word Similarity Computing Based on HowNet. Qun Liu and Sujian Li. International Journal of Computational Linguistics & Chinese Language Processing 2002. [pdf (Chinese)]
Expansion of Sememe Knowledge Bases
- Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets. Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang and Zhiyuan Liu. AAAI-20. [pdf] [code]
- Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence. Jiaju Du, Fanchao Qi, Maosong Sun and Zhiyuan Liu. arXiv 2020. [pdf]
- OpenHowNet: An Open Sememe-based Lexical Knowledge Base. Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Qiang Dong, Maosong Sun, Zhendong Dong. arXiv 2019. [pdf] [code]
- Cross-lingual Lexical Sememe Prediction. Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie and Zhiyuan Liu. EMNLP 2018. [pdf] [code]
- Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions. Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang and Xu Sun. arXiv 2018. [pdf] [code]
- Extended HowNet 2.0: An Entity-Relation Common-Sense Representation Model. Wei-Yun Ma and Yueh-Yin Shih. LREC 2018. [pdf]
- Incorporating Chinese Characters of Words for Lexical Sememe Prediction. Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin and Leyu Lin. ACL 2018. [pdf] [code]
- Lexical Sememe Prediction via Word Embeddings and Matrix Factorization. Ruobing Xie, Xingchi Yuan, Zhiyuan Liu and Maosong Sun. IJCAI 2017. [pdf] [code]
- E-HowNet and Automatic Construction of a Lexical Ontology. Wei-Te Chen, Su-Chu Lin, Shu-Ling Huang, You-Shan Chung and Keh-Jiann Chen. COLING 2010. [pdf]
- Extended-HowNet: A Representational Framework for Concepts. Keh-Jiann Chen, Shu-Ling Huang, Yueh-Yin Shih and Yi-Jun Chen. OntoLex 2005. [pdf]