Skip to content
yzpang.me
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
scripts
README.md
classifier.py
file_io.py
language_model.py
nn.py
options.py
utils.py committing lm-concat Oct 31, 2019
vocab.py

README.md

Paper

Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer, EMNLP Workshop on Neural Generation and Translation (WNGT), 2019, [paper]

The code will soon support auto-evaluation (December 2019). Style transfer model code will also be added (December 2019). It is based on the code by Shen et al. (2017).

Prerequisites

Suppose the dataset is named a. Then rename style-0 files to a.train.0, a.dev.0, a.test.0, and rename style-1 files to a.train.1, a.dev.1, a.test.1.

The author is using Python 3.7, TensorFlow 2.0.0, CUDA 10.1, and cuDNN 7.6.4 for CUDA 10.1. If you are using TensorFlow v1, use import tensorflow as tf. If you're using TensorFlow v2, you might need to use import tensorflow.compat.v1 as tf.

Evaluation

Step 1.1: Train classifier on training data

The model will train on a.train.0,a.train.1, and validate on a.dev.0,a.dev.1. Run the following.

.  train_classifier.sh [data_directory] [data_name] [model_directory] [classifier_model_name]

An example is . scripts/train_classifier.sh ../data/ yelp ../output_model/ 10-30-19-model-classify. Can modify train_classifier.sh and add in --embedding flag, if there is pretrained embedding.

Step 1.2: Train language model on the concat of training data

The model will train on the concatenation of a.train.0 and a.train.1, and validate on the concatenation of a.dev.0 and a.dev.1. Run the following.

.  train_lm_concat.sh [data_directory] [data_name] [model_directory] [lm_model_name]

An example is . scripts/train_lm_concat.sh ../data/ yelp ../output_model/ 10-30-19-model-lm-concat. Can modify train_classifier.sh and add in --embedding flag, if there is pretrained embedding.

Step 2.1: Run classifier on generated/output data to get Acc

Acc is defined as post-transfer style classification accuracy in the paper.

First, rename the generated output such that style-0-to-1 output is named [data_name].out.1 (so the transferred output should be of style 1 now), and style-1-to-0 output is named [data_name].out.0. Then, run the following.

.  run_classifier.sh [generation_directory] [data_name] [model_directory] [classifier_model_name]

An example is . scripts/run_classifier.sh ../output_sentences/ yelp ../output_model/ 10-30-19-model-classify. Can modify train_classifier.sh and add in --embedding flag, if there is pretrained embedding.

Step 2.2: Run language model on generated/output data to get PP

PP is defined as fluency/naturalness of the generated sentences in the paper. Run the following.

.  run_lm_concat.sh [generation_directory] [data_name] [model_directory] [lm_model_name]

An example is . scripts/run_lm_concat.sh ../output_sentences/ yelp ../output_model/ 10-30-19-model-lm-concat. Can modify train_classifier.sh and add in --embedding flag, if there is pretrained embedding.

Step 2.3: Obtain semantic similarity

To be completed in December.

Textual transfer models

To be completed before December.

Notation: rec means reconstruction loss; adv means adversarial loss; cyc represents cyclic consistency loss; para represents paraphrase loss; lm represents language model loss.

Miscellaneous

First author's website: yzpang.me

You can’t perform that action at this time.