Evaluating Text Representations on Lexical Composition
- Python 3
- allennlp (0.8.1)
Download the pre-trained models using
The VPC classification and LVC classification tasks need a copy of the BNC corpus. Please download the XML version from here, and update its path in the JSON files.
Once you do, you will need to extract the sentences themselves:
python preprocessing/get_sentences_from_bnc.py \ [/path/to/corpora]/bnc/2554/download/Texts/ \ diagnostic_classifiers/data/vpc_classification/ \ diagnostic_classifiers/data/vpc_classification
To train all the models for a given task, e.g. NC literality, run:
To get the predictions for the test set:
Adding a new task:
You will need to create a directory under experiments
with the JSON files specifying the architecture and hyper-parameters.
Each model requires a
Model, and a
You can use the ones implemented in this repository or implement
new ones according to the specific model's needs.
See the AllenNLP tutorial for additional instructions on configuring models.
If you'd like to create new data, follow the preprocessing instructions.
Adding a new representation:
You will need to implement a new
TextFieldEmbedder. The first takes a sequence of words and returns
their IDs, and the second gets the IDs and returns the vectors.
Look at the implementations in this repository and in the
AllenNLP repository, and read the documentation
You will also need to add a JSON file for the task + representation combination and add the command to the train/evaluate/predict bash files.
Vered Shwartz and Ido Dagan. arXiv 2019.