Mitigating Biases in Toxic Language Detection through Invariant Rationalization

This is the source code for our paper "Mitigating Biases in Toxic Language Detection through Invariant Rationalization" at ACL-IJCNLP 2021 WOAH workshop (The Firth Workshop on Online Abuse and Harms). Our code is based on the code of Challenges in Automated Debiasing for Toxic Language Detection.

To reproduce the experiments, please first follow the instruction of the original repo to setup the environment. And download the dataset of Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior.

You should have a folder <toxic_data_dir> that contains three files: ND_founta_trn_dial_pAPI.csv, ND_founta_dev_dial_pAPI.csv, ND_founta_tst_dial_pAPI.csv', which have the same format as data/demo.csv`.

Training

$seed is your random seed.
You should specify your own <output_dir>.

Vanilla

bash run_vanilla.sh <toxic_data_dir> $seed <output_dir>

InvRat (lexical)

bash run_invrat_mention.sh <toxic_data_dir> $seed <output_dir>

InvRat (dialect)

# $seed is your random seed
bash run_invrat_dialect.sh <toxic_data_dir> $seed <output_dir>

Evaluation (compute Acc, F1, FPR)

We take test set & step 56000 for example. You can change test set into dev set, or compute for other steps.
Your should specify your own <output_csv_filename>.

Vanilla

python to_ND.py <output_dir>/test_eval_results.txt-preds-step-56000.txt <toxic_data_dir>/ND_founta_tst_dial_pAPI.csv roberta <output_csv_filename>
python src/bias_stats.py <output_csv_filename> roberta data/word_based_bias_list.csv

InvRat (lexical or dialect)

You can choose the results from either invariant or variant classifier that you want to compute. All results shown in our paper is from invariant classifier.

python inv_to_ND.py <output_dir>/test_eval_results-step-56000.txt-rationale-step-56000.txt <toxic_data_dir>/ND_founta_tst_dial_pAPI.csv <output_csv_filename>
python src/bias_stats.py <output_csv_filename> [variant/invariant] data/word_based_bias_list.csv

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
tools		tools
README.md		README.md
inv_to_ND.py		inv_to_ND.py
requirements.txt		requirements.txt
run_invrat.py		run_invrat.py
run_invrat_dialect.sh		run_invrat_dialect.sh
run_invrat_mention.sh		run_invrat_mention.sh
run_toxic.py		run_toxic.py
run_vanilla.sh		run_vanilla.sh
to_ND.py		to_ND.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mitigating Biases in Toxic Language Detection through Invariant Rationalization

Training

Vanilla

InvRat (lexical)

InvRat (dialect)

Evaluation (compute Acc, F1, FPR)

Vanilla

InvRat (lexical or dialect)

About

Releases

Packages

Languages

voidism/invrat_debias

Folders and files

Latest commit

History

Repository files navigation

Mitigating Biases in Toxic Language Detection through Invariant Rationalization

Training

Vanilla

InvRat (lexical)

InvRat (dialect)

Evaluation (compute Acc, F1, FPR)

Vanilla

InvRat (lexical or dialect)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages