<a href="https://colab.research.google.com/github/visumania/TFM-AMM/blob/main/cuadernos/baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 6: Sexism Categorization in Memes

This subtask is a multi-label classification. This task aims to classify sexist memes according to the categorization provided for Task 3:
1. Ideological and inequality
2. Sterotyping and dominance
3. Objectification
4. Sexual violence
5. Misogyny and non-sexual

## EXIST 2024 Memes Dataset

The EXIST 2024 Memes Dataset contains more than 5,000 labeled memes, both in English and Spanish. In particular, the traning set contains 4,044 memes and the test set contains 1,053 memes. Distribution between both languages has been balanced.

The data are provided in **JSON format**. Each meme is represented as a JSON with the following attributes:
1. **id_EXIST**: a unique identifier for the meme.
2. **lang**: the languages of the meme ("en" or "es").
3. **text**: the text automatically extracted from the meme.
4. **meme**: the name of the file that contains the meme.
5. **path_memes**: the path to the file that contains the meme.
6. **number_annotators**: the number of persons that have annotated the meme.
7. **annotators**: a unique identifier for each annotators
8. **gender_annotators**: the gender of the different annotators. Possible values are: "F" and "M", for female o male respectively.
9. **age_annotators**: the age group of the different annotators. Possible values are: 18-22, 23-45, and 46+.
10. **ethnicity_annotators**: the self-reported, ethnicity of the different annotators. Possible values are: "Black or African America", "Hispano or Latino", "White or Caucasian", "Multiracial", "Asia", "Asian Indian" and "Middle Eastern".
11. **study_level_annotators**: the self-reported level of study achieved by the different annotators. Possible values are: "Less than high school diploma", "High school degree or equivalent", "Bachelor's degree" and "Doctorate".
12. **country_annotators**: the self-reported country where the different annotators live in.
13. **labels_task4**: a set of the labels (one for each of the annotators) that indicate if the meme contains sexist expressions of refers to sexist behaviours or not. Possible values are "YES" and "NO".
14. **labels_task5**: a set of labels (one for each of the annotators) recording the intention of the person who created the meme. Possible labels are: "DIRECT", "JUDGEMENTAL", "", and "UNKNOWN".
15. **labels_task6**: a set of arrays of labels (one array for each of the annotators) indicating the type or types of sexism that are found in the meme. Possible labels are: "IDEOLOGICAL-INEQUALITY", "STEREOTYPING-DOMINANCE", "OBJECTIFICATION", "SEXUAL-VIOLENCE", "MISOGYNY-NON-SEXUAL-VIOLENCE", "-", and "UNKNOWN".
16. **split**: subset within the dataset the meme belongs to ("TRAIN-MEME", "TRAIN-MEME" + "EN"/"ES").

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd

In [17]:
dataset = pd.read_json('/content/drive/MyDrive/I2C/Adrián Moreno/EXIST2024/Datasets/training/EXIST2024_training.json', orient='index')
# Reseteamos el index que viene por defecto ya que no es secuencial
dataset = dataset.reset_index(drop=True)

In [9]:
dataset

Unnamed: 0,id_EXIST,lang,text,meme,path_memes,number_annotators,annotators,gender_annotators,age_annotators,ethnicities_annotators,study_levels_annotators,countries_annotators,labels_task4,labels_task5,labels_task6,split
0,110001,es,2+2=5 MITO Albert Einstein tenía bajo rendimie...,110001.jpeg,memes/110001.jpeg,6,"[Annotator_1, Annotator_2, Annotator_3, Annota...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 46+, 18-22, 23-45]","[Hispano or Latino, Hispano or Latino, Hispano...","[High school degree or equivalent, Master’s de...","[Mexico, Spain, Argentina, Spain, Mexico, Mexico]","[YES, YES, YES, YES, YES, YES]","[DIRECT, DIRECT, DIRECT, DIRECT, DIRECT, DIRECT]","[[IDEOLOGICAL-INEQUALITY, STEREOTYPING-DOMINAN...",TRAIN-MEME_ES
1,110002,es,CUANDO UNA MUJER VA A LUCHAR POR SUS DERECHOS,110002.jpeg,memes/110002.jpeg,6,"[Annotator_1, Annotator_2, Annotator_3, Annota...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 46+, 18-22, 23-45]","[Hispano or Latino, Hispano or Latino, Hispano...","[High school degree or equivalent, Master’s de...","[Mexico, Spain, Argentina, Spain, Mexico, Mexico]","[YES, YES, YES, YES, YES, YES]","[DIRECT, DIRECT, DIRECT, DIRECT, DIRECT, JUDGE...","[[IDEOLOGICAL-INEQUALITY, STEREOTYPING-DOMINAN...",TRAIN-MEME_ES
2,110003,es,ІЯ ЕГЕЯ Е MOA ¿El Partido Republicano busca pe...,110003.jpeg,memes/110003.jpeg,6,"[Annotator_1, Annotator_2, Annotator_3, Annota...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 46+, 18-22, 23-45]","[Hispano or Latino, Hispano or Latino, Hispano...","[High school degree or equivalent, Master’s de...","[Mexico, Spain, Argentina, Spain, Mexico, Mexico]","[YES, YES, NO, NO, NO, NO]","[DIRECT, DIRECT, -, -, -, -]","[[STEREOTYPING-DOMINANCE, OBJECTIFICATION, MIS...",TRAIN-MEME_ES
3,110004,es,"Paises que ""apoyan"" los derechos de la mujer A...",110004.jpeg,memes/110004.jpeg,6,"[Annotator_1, Annotator_2, Annotator_3, Annota...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 46+, 18-22, 23-45]","[Hispano or Latino, Hispano or Latino, Hispano...","[High school degree or equivalent, Master’s de...","[Mexico, Spain, Argentina, Spain, Mexico, Mexico]","[YES, YES, NO, NO, YES, NO]","[JUDGEMENTAL, JUDGEMENTAL, -, -, JUDGEMENTAL, -]","[[IDEOLOGICAL-INEQUALITY], [IDEOLOGICAL-INEQUA...",TRAIN-MEME_ES
4,110005,es,Ya verás como este 8 de marzo hay uno que te s...,110005.jpeg,memes/110005.jpeg,6,"[Annotator_1, Annotator_2, Annotator_3, Annota...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 46+, 18-22, 23-45]","[Hispano or Latino, Hispano or Latino, Hispano...","[High school degree or equivalent, Master’s de...","[Mexico, Spain, Argentina, Spain, Mexico, Mexico]","[NO, YES, NO, NO, YES, NO]","[-, JUDGEMENTAL, -, -, DIRECT, -]","[[-], [IDEOLOGICAL-INEQUALITY], [-], [-], [IDE...",TRAIN-MEME_ES
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4039,212006,en,u gon act like a bitch u gon die like a bitch,212006.jpeg,memes/212006.jpeg,6,"[Annotator_883, Annotator_884, Annotator_885, ...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 18-22, 23-45, 46+]","[Black or African American, White or Caucasian...","[Bachelor’s degree, Bachelor’s degree, Bachelo...","[South Africa, Poland, Canada, Poland, Italy, ...","[YES, YES, YES, YES, YES, YES]","[DIRECT, DIRECT, JUDGEMENTAL, JUDGEMENTAL, JUD...","[[STEREOTYPING-DOMINANCE, MISOGYNY-NON-SEXUAL-...",TRAIN-MEME_EN
4040,212007,en,SHE LOOKS LIKE EVERY OTHER BITCH LIKE makeamem...,212007.jpeg,memes/212007.jpeg,6,"[Annotator_883, Annotator_884, Annotator_885, ...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 18-22, 23-45, 46+]","[Black or African American, White or Caucasian...","[Bachelor’s degree, Bachelor’s degree, Bachelo...","[South Africa, Poland, Canada, Poland, Italy, ...","[YES, YES, YES, YES, YES, YES]","[JUDGEMENTAL, DIRECT, DIRECT, JUDGEMENTAL, JUD...","[[OBJECTIFICATION], [SEXUAL-VIOLENCE], [OBJECT...",TRAIN-MEME_EN
4041,212008,en,YOURE A BASIC BITCH CASE DISMISSED,212008.jpeg,memes/212008.jpeg,6,"[Annotator_883, Annotator_884, Annotator_885, ...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 18-22, 23-45, 46+]","[Black or African American, White or Caucasian...","[Bachelor’s degree, Bachelor’s degree, Bachelo...","[South Africa, Poland, Canada, Poland, Italy, ...","[NO, YES, YES, YES, YES, YES]","[-, JUDGEMENTAL, DIRECT, JUDGEMENTAL, JUDGEMEN...","[[-], [IDEOLOGICAL-INEQUALITY, STEREOTYPING-DO...",TRAIN-MEME_EN
4042,212009,en,WHEN YOU'RE AUNT HAS THIS WEIRD ASS MAN AND SH...,212009.jpeg,memes/212009.jpeg,6,"[Annotator_883, Annotator_884, Annotator_885, ...","[F, F, F, M, M, M]","[18-22, 23-45, 46+, 18-22, 23-45, 46+]","[Black or African American, White or Caucasian...","[Bachelor’s degree, Bachelor’s degree, Bachelo...","[South Africa, Poland, Canada, Poland, Italy, ...","[NO, NO, YES, NO, NO, NO]","[-, -, DIRECT, -, -, -]","[[-], [-], [STEREOTYPING-DOMINANCE], [-], [-],...",TRAIN-MEME_EN


División del dataframe en función de cada uno de los anotadores

In [15]:
dataset = dataset.astype(str)

columns_to_split = [
    'annotators',
    'gender_annotators',
    'age_annotators',
    'ethnicities_annotators',
    'study_levels_annotators',
    'countries_annotators',
    'labels_task4',
    'labels_task5',
    'labels_task6'
    ]

for column in columns_to_split:
  dataset[column] = dataset[column].str.split(',')

In [28]:
df_anotador_1 = pd.DataFrame()
df_anotador_1 = dataset.columns

df_anotador_2 = pd.DataFrame()
df_anotador_2 = dataset.columns

df_anotador_3 = pd.DataFrame()
df_anotador_3 = dataset.columns

df_anotador_4 = pd.DataFrame()
df_anotador_4 = dataset.columns

df_anotador_5 = pd.DataFrame()
df_anotador_5 = dataset.columns

df_anotador_6 = pd.DataFrame()
df_anotador_6 = dataset.columns

In [22]:
rango = dataset.shape[0]
for i in range(rango):
  df_anotador_1.iloc[0]['']

NameError: name 'dataframe' is not defined

In [24]:
import pandas as pd

# Crear un DataFrame de ejemplo
data = {'A': [1, 2, 3],
        'B': ['a', 'b', 'c']}
df = pd.DataFrame(data)
df

Unnamed: 0,A,B
0,1,a
1,2,b
2,3,c


In [25]:
# Iterar sobre las filas del DataFrame usando iterrows()
for indice, fila in df.iterrows():
    print("Fila", indice)
    print("Valores:")
    print(fila)
    print()

Fila 0
Valores:
A    1
B    a
Name: 0, dtype: object

Fila 1
Valores:
A    2
B    b
Name: 1, dtype: object

Fila 2
Valores:
A    3
B    c
Name: 2, dtype: object

