Code for our NAACL-2021 paper "Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models".
If you find this repository useful, please consider citing our paper.
@inproceedings{huang2021disentangling,
title = {Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models},
author = {Huang, James Y. and Huang, Kuan-Hao and Chang, Kai-Wei},
booktitle = {NAACL},
year = {2021}
}
- Python==3.7.6
- PyTorch==1.6.0
- Transformers==3.0.2
Our pre-trained ParaBART model is available here
- Download the dataset and put it under
./data/
- Run the following command to train ParaBART
python train_parabart.py --data_dir ./data/
- Download the SentEval toolkit and datasets
- Name your trained model
model.pt
and put it under./model/
- Run the following command to evaluate ParaBART on semantic textual similarity and syntactic probing tasks
python parabart_senteval.py --senteval_dir ../SentEval --model_dir ./model/
- Download QQP-Easy and QQP-Hard datasets here
- Run the following command to evaluate ParaBART on QQP datasets
python parabart_qqpeval.py
James Yipeng Huang / @jyhuang36