Skip to content

Source codes for our paper "Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models" at ACL 2019.


Notifications You must be signed in to change notification settings


Repository files navigation

Neural Temporality Adaptation and Diachronic (Dynamic) Word Embeddings

Source codes for our paper "Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models" at ACL 2019.

Word Semantic Meanings Shift Overtime

Image of Semantic Shifts

Image is from

The meanings of words shift overtime~ This will bring tremendous effects on document models.

How Shifts Impact Document Classifiers

Impacts of the Shifts on Document Classifiers

Overall, it is clear that classifiers generally perform best when applied to the same time interval they were trained. Performance diminishes when applied to different time intervals, although different corpora exhibit differ patterns in the way in which the performance diminishes. The image is from our previous publication, Examining Temporality in Document Classification.

A Frustratingly Easy Method of Diachronic Word Embedding

We propose a diachronic word embedding using fastText. See the for how to train.


  1. Train it once, run everywhere. The embedding model just like normal word embedding model, we don't need to train additional equations after training your embeddings. All you need to do is to only train your embeddings once. No extra time needed.
  2. No Transformation Matrix and Transformation Errors. Since the trained model learn words across time jointly, it does not require to learn transformation matrices between time intervals.
  3. No extra space needed. Our method only requires the space for the embedding models.
  4. Support online learning and incremental training. Unlike other methods, our proposed method can incrementally learn new coming corpora.
  5. Extensible word vocabulary. Unlike the transformation or pivot method, our proposed method do not need to choose a fix number of vocabulary as the transformation matrix. Our method supports extensible words even from the new coming data.

Language Odyssey, What & Why the shifts happen?

Image of Semantic Shifts

We explore and analyze the shifts from three perspectives:

  1. Word usage change: the way to express opinions change;
  2. Word context change: contextual words are important parts to train word embeddings. The change of contextual words impact word embeddings and therefore neural document classifiers based on word embeddings.
  3. Semantic distance: after obtaining diachronic word embeddings, we treat each time period as a domain and then use Wasserstein distance to measure time shifts.

Generally, closer time intervals share higher overlap and have smaller semantic distance shifts, and vice versa.

Frequent vs. Polysemous Words

Our method is also useful for interpreting semantic shifts. The visualization below can help select features for the time-varying corpora: polysemous features (unigram) have shown more semantic variations than the most frequent features (unigram). Note that there are no overlaps between the most frequent and the polysemous feature sets.

Image of Semantic Shifts

The polysemous words were selected by the mutual information (feature selection process). Those words are recognized as the most important features and used for classifying documents. However variations of the features will impact stabilities of document classifiers over time.

Test Platform

Python 3.6+, Ubuntu 16.04

Experiment Preparation

  1. Install Conda and then install required Python packages via pip install -r requirements.txt;
  2. Train and obtain regular and diachronic word embeddings. Please refer to in the embeddings folder;
  3. Create domain data: python;
  4. Create general and domain tokenizers: python;
  5. Create weights for the embedding layer: python;
  6. Create train/valid/test data: You can either download our processed split data or run python;
  7. Convert data into indices: python

Data Analysis

  1. Please refer to the analysis folder.
  2. There are three main analysis perspectives: word usage, word context and semantic distance.
  3. To understand topic shifts and how the temporal factor impacts document classifers, please refer to our previous publication and its git repository.


  1. Please refer to the in the baselines folder.
  2. The datasets of baselines will be saved in the baselines_data folder.

Our approach

Intrinsic Evaluation for DWE (unpublished manuscript)

We conducted an intrinsic evaluation by a clustering task of word analogy. The evaluation will be available in my final Ph.D. thesis, while the manuscript was not published in the paper. You can refer to the experimental steps in this unpublished manuscript.

Image of Semantic Shifts


Because the experimental datasets are too large to share all of them. Please send any requests or questions to my email:


Please consider cite our work as the following:

    title = "Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models",
    author = "Huang, Xiaolei and Paul, Michael J.",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "",
    pages = "136--146",
    abstract = "Language usage can change across periods of time, but document classifiers models are usually trained and tested on corpora spanning multiple years without considering temporal variations. This paper describes two complementary ways to adapt classifiers to shifts across time. First, we show that diachronic word embeddings, which were originally developed to study language change, can also improve document classification, and we show a simple method for constructing this type of embedding. Second, we propose a time-driven neural classification model inspired by methods for domain adaptation. Experiments on six corpora show how these methods can make classifiers more robust over time.",


Source codes for our paper "Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models" at ACL 2019.







No releases published


No packages published