Fill-the-GAP

This is the 4th solution to the Gendered Pronoun Resolution Competition on Kaggle.

Solution Overview

1. Input Dropout

I've played with BERT in other tasks where I found there are some redundancies in BERT vector. Even though we only use a small portion (like 50%) of the BERT vector, we still can get desirable performance.

Based on this observation, I placed a dropout with a large rate just after the input layer, which can be considered as a kind of model boosting, just like training several prototypes with subsets that are randomly sampled from the BERT vector.

2. Word Encoder

As I mentioned in section 1, it might not be suitable to use the output directly because of redundancies. Therefore I use a word encoder to down-project the BERT vector into a lower-dimensional space where I can extract task-related features efficiently.

The word encoder is a simple affine transformation with SELU activation and it is shared for A, B, and P. I have tried to design the word encoder for names and pronouns independently or make the word encoder deeper with highway transformations but all of them results in overfitting.

This idea is also inspired by the multi-head transformation. I have implemented a multi-head NLI encoder but it only improved the performance by ~0.0005 and took much computation time. So maybe a single head is good enough for this task.

3. Answer selection using NLI architectures

I consider this task a sub-task of answer selection. Given queries A, B, and an answer P, we can model the relations between queries and answers with heuristic interaction:

I(Q, A) = [[Q; A], Q - A, Q * A]

and then extract features from the interaction vector I(Q, A) with a siamese encoder. The overall architecture would be like this:

Finally, here is a simple performacne report of my models:

Model	5 fold CV on Stage 1
Base BERT	0.50
Base BERT + input dropout	0.45
Base BERT + input dropout + NLI	0.43
Base BERT + all	0.39
Large BERT + input dropout	0.39
Large BERT + all	0.32
Ensemble of Base BERT and Large BERT	0.30

Note

The code is still under cleaning. There still exists some dirty methods for the trade-off between efficiency and scalability. For notebook stage 0.1 ~ 0.6, it's not necessary to use a for loop to dump features from each layer. The offical API supports to dump all of them at the same time.

Citation

If you find this repository is useful for your research, please cite our paper:

@inproceedings{yang2019fill,
  title={Fill the GAP: Exploiting BERT for Pronoun Resolution},
  author={Yang, Kai-Chou and Niven, Timothy and Chou, Tzu Hsuan and Kao, Hung-Yu},
  booktitle={Proceedings of the First Workshop on Gender Bias in Natural Language Processing},
  pages={102--106},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EDA		EDA
oof		oof
outputs		outputs
stage_1_checkpoints		stage_1_checkpoints
stage_1_finals		stage_1_finals
test_outputs		test_outputs
vector		vector
README.md		README.md
Stage 0-1. Gerenate-Bert-Base-Embeddings.ipynb		Stage 0-1. Gerenate-Bert-Base-Embeddings.ipynb
Stage 0-2. Gerenate-Bert-Multilingual-Embeddings.ipynb		Stage 0-2. Gerenate-Bert-Multilingual-Embeddings.ipynb
Stage 0-3. Gerenate-Bert-Big-Embeddings.ipynb		Stage 0-3. Gerenate-Bert-Big-Embeddings.ipynb
Stage 0-5. Gerenate-Bert-Big-Cased-Embeddings.ipynb		Stage 0-5. Gerenate-Bert-Big-Cased-Embeddings.ipynb
Stage 0-6. Gerenate-Bert-Base-Cased-Embeddings.ipynb		Stage 0-6. Gerenate-Bert-Base-Cased-Embeddings.ipynb
Stage 1.2 Naive-concat-classifier.ipynb		Stage 1.2 Naive-concat-classifier.ipynb
Stage 1.3 NLI-MH-classifier.ipynb		Stage 1.3 NLI-MH-classifier.ipynb
Stage 1.4 NLI-MH-classifier-with-hand-crafted-featurs.ipynb		Stage 1.4 NLI-MH-classifier-with-hand-crafted-featurs.ipynb
Stage 1.5 Naive-classifier-with-hand-crafted-featurs.ipynb		Stage 1.5 Naive-classifier-with-hand-crafted-featurs.ipynb
Stage 1.6 NN-Ensemble.ipynb		Stage 1.6 NN-Ensemble.ipynb
Stage 3. Generate the predictions.ipynb		Stage 3. Generate the predictions.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fill-the-GAP

Solution Overview

1. Input Dropout

2. Word Encoder

3. Answer selection using NLI architectures

Note

Citation

About

Releases

Packages

Languages

zake7749/Fill-the-GAP

Folders and files

Latest commit

History

Repository files navigation

Fill-the-GAP

Solution Overview

1. Input Dropout

2. Word Encoder

3. Answer selection using NLI architectures

Note

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages