Skip to content

Selection Distribution Generator for Domain Adaptation

Notifications You must be signed in to change notification settings

timerstime/SDG4DA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reinforced Training Data Selection for Domain Adaptation

This is the implementation code for paper "Reinforced Training Data Selection for Domain Adaptation".

The readers are welcome to star/fork this repository and use it to train your own model, modify our experiments. Please kindly cite our paper:

@inproceedings{liu2019reinforced,
  author    = {Lewis Liu and
               Yan Song and
               Hongbin Zou and
               Tong Zhang},
  title     = {Reinforced Training Data Selection for Domain Adaptation},
  year      = {2019}
}

In this document, we briefly illustrate how to set up and run selection distribution generator (SDG) to dynamically select data for multiple source domain adaptation in natural language processing tasks.

Set up and run

Download the raw datasets from SANCL2012.

  • Sentiment analysis

    cd sentiment-analysis/sentiment-analysis/src
    python train_AC.py

Implementation details of experiments

  • Sentiment analysis

    Requirements

    • Python 3
    • Tensorflow > 0.12
    • Numpy

    Training

    Print parameters:

    ./train_AC.py --help
    optional arguments of used CNN (parameters set to default) :
      -h, --help            show this help message and exit
      --embedding_dim EMBEDDING_DIM
                            Dimensionality of character embedding (default: 128)
      --filter_sizes FILTER_SIZES
                            Comma-separated filter sizes (default: '3,4,5')
      --num_filters NUM_FILTERS
                            Number of filters per filter size (default: 128)
      --l2_reg_lambda L2_REG_LAMBDA
                            L2 regularizaion lambda (default: 0.0)
      --dropout_keep_prob DROPOUT_KEEP_PROB
                            Dropout keep probability (default: 0.5)
      --batch_size BATCH_SIZE
                            Batch Size (default: 64)
      --num_epochs NUM_EPOCHS
                            Number of training epochs (default: 100)
      --evaluate_every EVALUATE_EVERY
                            Evaluate model on dev set after this many steps
                            (default: 100)
      --checkpoint_every CHECKPOINT_EVERY
                            Save model after this many steps (default: 100)
      --allow_soft_placement ALLOW_SOFT_PLACEMENT
                            Allow device soft device placement
      --noallow_soft_placement
      --log_device_placement LOG_DEVICE_PLACEMENT
                            Log placement of ops on devices
      --nolog_device_placement
    

    Train:

    python train_AC.py

    Evaluating

    ./eval.py --eval_train --checkpoint_dir="./runs/18020311733/checkpoints/"

    Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data. Run plot_tsne3.py to visualize the data representations from the feature extractor for sentiment analysis on the DVD domain.

    References

About

Selection Distribution Generator for Domain Adaptation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published