jigsaw-toxic-bias-reduction/README.md at master · yurayli/jigsaw-toxic-bias-reduction · GitHub

Jigsaw Unintended Bias in Toxicity Classification

https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification top-5% solution

Task

Detect toxic comments and minimize unintended model bias

Data processing and tricks

Few text preprocessing is needed for NN solutions.
Bucket sampler saves a lot of time (2-3x faster), see the code for details
- separate data for buckets, several batches of samples a bucket
- sort sequence lengths for each bucket
- pad a batch from max_seq_length in the batch
Custom loss or sample weighting required for the mitigation of the model bias
Soft label contains more information and can be computed with BCE loss.
Pseudo label is helpful for LSTM-based NN
Knowledge distillation can compress an ensemble to a single model with comparative results.

Architectures

Word embedding + LSTM-based networks. The embedding is also finetuned with smaller learning rate. Whole network is trained in one-cycle cosine annealed learning rate schedule (ref).
BERT finetuning with slanted triangular learning rate schedule (ref).
GPT-2 finetuning with slanted triangular learning rates.

Code in 2. 3. is based on huggingface's code, and the notebooks are run on kaggle kernel.

Explanatory visualization

Dependencies

Pytorch 1.2.0