This repository contains code accompanying the paper: Meaning to Form: Measuring Systematicity as Information (Pimentel et al., ACL 2019). It is a study about the arbitrariness of the sign and its systematicity.
Create a conda environment with
$ source config/conda.sh
And install your appropriate version of PyTorch.
Information about used datasets and where to get them from is available in: NorthEuraLex and CELEX.
$ python data_pipe/parse_celex.py --data celex
$ python data_pipe/parse_celex.py --data celex --reverse
$ python data_pipe/parse_northeuralex.py --data northeuralex
$ python learn_pipe/train.py --data <data> --context <context>
Or train all at once:
$ python learn_pipe/train_multi.sh
Context can be:
- none: No context used
- word2vec: Word2Vec context used
- pos: Grammar class context used
- mixed: Grammar class and Word2vec contexts used
Extract possible phonesthemes and test them by running:
$ python analysis_pipe/extract_phonesthemes.py --data celex --n-permuts 100000
$ python analysis_pipe/extract_phonesthemes.py --data celex --n-permuts 100000 --reverse
$ python analysis_pipe/analyse_phonesthemes.py --n-permuts 100000
If this code or the paper were usefull to you, consider citing it:
@inproceedings{pimentel-etal-2019-meaning,
title = "Meaning to Form: Measuring Systematicity as Information",
author = "Pimentel, Tiago and
McCarthy, Arya D. and
Blasi, Dami\'{a}n E. and
Roark, Brian and
Cotterell, Ryan",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1906.05906",
}
This project was tested with libraries:
numpy==1.11.3
pandas==0.23.4
scikit-learn==0.21.2
gensim==3.7.3
matplotlib==2.0.2
seaborn==0.9.0
tqdm==4.32.1
torch==1.1.0
To ask questions or report problems, please open an issue.