The repository of domain adaptation project for the "Examining Temporality in Document Classification" in ACL 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
image
tmp
topic_change
utils
.gitignore
LICENSE
README.md
build_clf_large.py
build_clf_small.py
build_clf_year_month.py
build_feas.py
build_sgd_base.py
domain_clf_analysis.py
environment.yml
feature_analysis.py
grid_search.py
random_domain.py
rank_feature.py
run_exps.py

README.md

Domain_Adaptation_ACL2018

The repository of domain adaptation project for the Examining Temporality in Document Classification in ACL 2018. The slides will come soon.

Table of Contents

  • Installation
  • Data
  • Usage
  • Contact and Citation

Installation

  1. Platform:
  1. Run the followings to create environment:
  • conda env create -f environment.yml
  • python -m nltk.downloader punkt stopwords
  • source activate domain

Data

  1. Amazon CDs and Vinyl
  2. Yelp reviews of Hotel and Restaurant
  3. Political Platforms: Political Parties -> United States -> * Party Platform
  4. Economical News
  5. Vaccine Data

Usage

  1. Data extraction and sample:
  • Extraction: python extract_data.py within each data folder to extract data.
  • Sample: go to the utils folder, run python under_sample.py
  1. Run cross domain classification, under the project root folder: python domain_clf_analysis.py
  2. Generate the feature vectors: python build_feas.py
  3. Run grid search to find the optimal parameters: python grid_search.py
  4. Run domain adaptation (section 4.1 and 4.2 in the paper): python run_exps.py
  5. Combine both seasonal and non-seasonal information: python build_sgd_base.py

Contact

xiaolei.huang@colorado.edu