Skip to content
A multilingual lexicon of words to hurt.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lexica
README.md

README.md

Hurtlex

HurtLex is a lexicon of offensive, aggressive, and hateful words in over 50 languages. The words are divided into 17 categories, plus a macro-category indicating whether there is stereotype involved. The 17 categories are:

Label Description
PS negative stereotypes ethnic slurs
RCI locations and demonyms
PA professions and occupations
DDF physical disabilities and diversity
DDP cognitive disabilities and diversity
DMC moral and behavioral defects
IS words related to social and economic disadvantage
OR plants
AN animals
ASM male genitalia
ASF female genitalia
PR: words related to prostitution
OM: words related to homosexuality
QAS with potential negative connotations
CDS derogatory words
RE felonies and words related to crime and immoral behavior
SVP words related to the seven deadly sins of the Christian tradition

Hurtlex has a 2-level structure. Lemmas belong to one of these levels:

  • conservative: obtained by translating offensive senses of the words in the original lexicon.
  • inclusive: obtained by translating all the potentially relevant senses of the words in the original lexicon.

Lexica

Here is the updated list of the Hurtlex word lists in all languages.

Language Available versions
AF 1.0 1.1
AR 1.0 1.1
BG 1.0 1.1
BN 1.0 1.1
CA 1.0 1.1
CS 1.0 1.1
CY 1.0 1.1
DA 1.0 1.1
DE 1.0 1.1
EL 1.0 1.1
EN 1.0 1.1
EO 1.0 1.1
ES 1.0 1.1
ET 1.0 1.1
EU 1.0 1.1
FA 1.0 1.1
FI 1.0 1.1
FR 1.0 1.1
GA 1.0 1.1
GL 1.0 1.1
HE 1.0 1.1
HI 1.0 1.1
HR 1.0 1.1
HU 1.0 1.1
ID 1.0 1.1
IS 1.0 1.1
IT 1.0 1.1
JA 1.0 1.1
KO 1.0 1.1
LT 1.0 1.1
LV 1.0 1.1
MK 1.0 1.1
MS 1.0 1.1
MT 1.0 1.1
NL 1.0 1.1
NO 1.0 1.1
PL 1.0 1.1
PT 1.0 1.1
RO 1.0 1.1
RU 1.0 1.1
SIMPLE 1.0 1.1
SK 1.0 1.1
SL 1.0 1.1
SQ 1.0 1.1
SR 1.0 1.1
SV 1.0 1.1
SW 1.0 1.1
TH 1.0 1.1
TL 1.0 1.1
TR 1.0 1.1
UK 1.0 1.1
VI 1.0 1.1
ZH 1.0 1.1

Publications

Hurtlex is described in this paper:

Elisa Bassignana, Valerio Basile, Viviana Patti. Hurtlex: A Multilingual Lexicon of Words to Hurt. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-It 2018)

http://ceur-ws.org/Vol-2253/paper49.pdf

Contribute

Contributions are welcome, in the form of revised lexica. Everyone who is native speaker of a language is invited to fork the repository and file a pull request.

Please try to limit your modifications to the following operations:

  • add: add a new item to a lexicon, by creating a new line. Fill in all the column values, including category and stereotype, and set level="conservative"
  • remove: remove an item considered wrong for a lexicon, by removing the corresponding line
  • update: change the lemma or the category of an item, e.g. in case of misspelling or wrong alphabet
  • add offensiveness score: create a new column with a real value between 0 and 1 to indicate a score for the offensiveness of an item in a lexicon

Please create a new version directory for the lexicon you submit. If yours is the first manually corrected version of a lexicon (that is, the last version is 1.*) please create the directory for version 2.0. Otherwise, proceed incrementally (2.0 -> 2.1, 2.1 -> 2.2, ...).

Finally, do not forget to add a README.md file in your newly created directory, indicating what has changes, and your contact for due credit.

You can’t perform that action at this time.