Auto LIWC

The code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18).

Datasets

This folder datasets contains two datasets.

HowNet.txt is an Chinese knowledge base with annotated word-sense-sememe information.
sc_liwc.dic is the Chinese LIWC lexicon. This is revised version of the original C-LIWC file. Because the original contains part of speech (POS) categories such as verb, adverb, and auxverb, we believe it is more accurate to utilize POS tagging programs when conducting text analysis in a given text. Therefore, we delete POS categories in our experiment. Furthermore, the hierarchical structure is slightly different from the original English version of LIWC, so we altered the hierarchical structure based on the English LIWC. As for the exact meaning of each category, you can refer to here and here.

Please note that the above datasets files are for academic and educational use only. They are not for commercial use. If you have any questions, please contact us first before downloading the datasets.

Due to the large size of the embedding file, we can only release the code for training the word embeddings. Please see word2vec.py for details.

Run

Run the following command for training and testing:

python3 train_liwc.py

If the datasets are in a different folder, please change the path here.

The current code generates different training and testing set every time. To reproduce the results in the paper, you can load train.bin and test.bin located in bin_data using pickle.

Dependencies

Tensorflow == 1.4.0
Scipy == 0.19.0
Numpy == 1.13.1
Scikit-learn == 0.18.1
Gensim == 2.0.0

Cite

If you use the code, please cite this paper:

Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, Maosong Sun. Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention. The 32nd AAAI Conference on Artificial Intelligence (AAAI 2018).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin_data		bin_data
datasets		datasets
utils		utils
LICENSE		LICENSE
README.md		README.md
train_liwc.py		train_liwc.py
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto LIWC

Datasets

Run

Dependencies

Cite

About

Releases

Packages

Languages

License

thunlp/Auto_CLIWC

Folders and files

Latest commit

History

Repository files navigation

Auto LIWC

Datasets

Run

Dependencies

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages