# This notebook loads the `preprocessed_500_annotated_tweets_with_predictions.csv` file given by Yalda on June 2025 and compares the performance of the locally trained models with local and API based LLMs (Llama, and GPTo4-mini)

## GPT Predictions

In [1]:
import pandas as pd
import os

In [2]:
preprocessed_annotated_tweets_df = pd.read_csv(os.path.join("data", "raw", "preprocessed_500_annotated_tweets_with_predictions.csv"))
preprocessed_annotated_tweets_df.head()

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,purity_predicted,loyalty_predicted,authority_predicted,fairness_predicted,prolife_prochoice_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,0,0,0,0,choice
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,0,1,0,1,choice
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,0,1,0,1,choice
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,1,0,0,0,life
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,0,0,1,0,life


In [3]:
from openai import OpenAI
from tqdm.auto import tqdm
import json

client = OpenAI(api_key="<OPENAI_API_KEY>")

In [4]:
binary_morality_instructions = (
    "You are an expert in moral psychology and discourse analysis. Your task is to analyze a tweet about abortion and determine "
    "whether the tweet expresses a moral concern (i.e., appeals to right/wrong, good/bad, justice, duty, loyalty, harm, purity, etc.) or not.\n"
    "Output a JSON object with exactly one key 'label' whose value is either 'moral' or 'non-moral'.\n"
    "- 'moral': The tweet appeals to values like harm, fairness, justice, loyalty, obedience, sanctity, or other moral principles.\n"
    "- 'non-moral': The tweet is informational, emotional, or opinion-based without referencing a moral concern.\n\n"
    "Here are some examples of tweets with their moral labels:\n"
    "1. `My opinion on abortion and when it can be done` - 'non-moral'\n"
    "2. `We who reject #abortion do not reject those who have had abortions.\nRather, we embrace them with mercy.\nI serve as Pastoral Director of the worlds largest ministry for healing after abortion.\nHelp others find these resources at our site, https://t.co/AuQitfVBU1` - 'moral'\n"
    "3. `Life is a gift from God, not something to be ended at will. When we destroy the unborn, we stray from the path He set for us.` - 'moral'"
)

# I have changed "one or more of the following five moral foundations" to "Output a JSON object with key foundations whose value is an array listing every moral foundation that appears in the tweet."
multilabel_morality_instructions = (
    "You are an expert in moral psychology and discourse analysis. Your task is to analyze a tweet previously classified as being a 'moral' tweet about abortion and assign "
    "one more relevant moral foundations based on Moral Foundations Theory (MFT). A tweet can reflect support for or violation of a foundation ‚Äî both count.\n"
    "Output a JSON object with key 'foundations' whose value is an array listing every moral foundation that appears in the tweet.\n"
    "- 'care': Concern with harm, suffering, compassion, empathy, or cruelty. Includes: protecting the vulnerable, preventing suffering, calling something heartless.\n"
    "- 'fairness': Focus on justice, rights, equality, or cheating. Includes: freedom, discrimination, fairness, hypocrisy, bias.\n"
    "- 'authority': Respect for tradition, leadership, law, or social order ‚Äî or defiance of them. Includes: obedience, discipline, or rebellion against unjust authority.\n"
    "- 'loyalty': Group belonging, patriotism, betrayal, or solidarity. Includes: in-group vs. out-group framing, 'us vs. them', loyalty to cause or movement.\n"
    "- 'purity': Moral disgust, sanctity, sin, degradation, or contamination. Includes: bodily purity, spiritual corruption, abortion as sacred or sinful.\n"
    "If the tweet does not express a moral concern, output an empty JSON object.\n\n"
    "Here are some examples of tweets with their moral foundations:\n"
    "1. `We who reject #abortion do not reject those who have had abortions.\nRather, we embrace them with mercy.\nI serve as Pastoral Director of the worlds largest ministry for healing after abortion.\nHelp others find these resources at our site, https://t.co/AuQitfVBU1` - ['care']\n"
    "2. `@elisa1121 Why does anyone think they have the right to tell a woman what to do?? Are all you republicans going to pay for this unwanted child once it‚Äôs born? No? Of course not! You only care about it unborn and once it‚Äôs born y‚Äôall like oh well. #prochoice` - ['fairness']\n"
    "3. `We should stand by the women in our communities not turn our backs on them when they need us most.` - ['loyalty']\n"
    "4. `Life is a gift from God, not something to be ended at will. When we destroy the unborn, we stray from the path He set for us.` - ['purity']\n"
    "5. `@kdmport @TexasTribune This üëÜüèΩ is how we do it.\n\nWhat do we want? Power \n\nHow do we get it? We win elections.\n\nHow do we win elections? We vote for #Democrats.\n\n#EndTheGOP\n#PresidentDeath\n#TrumpLiesAmericansDie\n#VoteBlue\n#VoteByMail\n#GOTV\n#prochoice https://t.co/ohxZOBEqgj` - ['authority']\n"
    "6. `Please sign this petition to protect special babies! They deserve love, not abortion: http://t.co/Z8NaIPHq85 #prolife http://t.co/kN3TUgDib6` - ['care', 'purity', 'loyalty']\n"
    "7. `Easy being #antichoice when you're not the one pregnant! #prochoice = #prolofe #feminism http://t.co/gNmKxGDhm9 #abortion #waronwomen #fem2` - ['fairness', 'care']"
)

abortion_instructions = (
    "Your task is to classify tweets about abortion into one of four categories: "
    "'life', 'choice', 'neutral', or 'throw_out'.\n"
    "Read the tweet carefully and output a JSON object with exactly one key 'label' whose value must be one of the following categories, based on the guidelines below:\n"
    "- 'life': The tweet expresses support for anti-abortion views or policies, defends the rights of the unborn, criticizes abortion, or promotes life as the moral choice.\n"
    "- 'choice': The tweet expresses support for the legal right to abortion, bodily autonomy, reproductive freedom, or criticizes anti-abortion policies.\n"
    "- 'neutral': The tweet discusses abortion-related topics (e.g., laws, elections, news coverage, statistics) without showing a clear stance in favor of or against abortion. May include sarcasm, questions, or observations that are hard to categorize.\n"
    "- 'throw_out': The tweet is not about abortion or is so vague or unclear that it cannot be reliably labeled.\n\n"
    "Here are some examples of tweets with their labels:\n"
    "1. `The Barstool Bros‚Äô Split Over Abortion Could Determine the Future of the GOP` - 'neutral'\n"
    "2. `@tdmalone1016 @LifeNewsHQ Urgent time for all the Churches, Catholic or Protestant, which support, recognize and accept Abortion should repent now! Unless, all shall be punished and perished sooner or later. Amen` - 'life'\n"
    "3. `Abortion is not the exclusive need of any political group. Your 60 million ???, could that be 60/40, 50/50 or maybe 40/60, assuming your number is correct.\n You open a political sympathy agenda by using the word 'babies' when the correct term is uterus. There is a legal Pt.` - 'choice'\n"
    "4. `Is your complaint that 36 minutes means that they gave less than 11 seconds of air time to each of Walker‚Äôs scandals? Who do you think got changed? His abortion ladies or the baby mommas he pointed guns at?` - 'throw_out'"
)

In [5]:
abortion_schema = {
    "type": "object",
    "properties": {
        "label": {
            "type": "string",
            "enum": ["life", "choice", "neutral", "throw_out"]
        }
    },
    "required": ["label"],
    "additionalProperties": False
}

binary_schema = {
    "type": "object",
    "properties": {
        "label": {
            "type": "string",
            "enum": ["moral", "non-moral"]
        }
    },
    "required": ["label"],
    "additionalProperties": False
}

multilabel_schema = {
    "type": "object",
    "properties": {
        "foundations": {
            "type": "array",
            "items": {
                "type": "string",
                "enum": ["care", "fairness", "authority", "loyalty", "purity"]
            },
            "minItems": 0,
            "maxItems": 5
        }
    },
    "required": ["foundations"],
    "additionalProperties": False
}

In [6]:
def classify_with_schema(text, instructions, schema):
    response = client.responses.create(
        model="o4-mini-2025-04-16",
        instructions=instructions,
        input=text,
        text={
            "format": {
                "type": "json_schema",      # Use JSON Schema mode
                "name": "classification_result",
                "schema": schema,           # Your JSON Schema dict
                "strict": True              # Enforce schema compliance
            }
        }
    )
    return json.loads(response.output_text)

In [7]:
# 4) Load your DataFrame
df = preprocessed_annotated_tweets_df.copy()

In [8]:
# 5) Prepare new columns
# df["prolife_prochoice_gpt-o4-mini_predicted"] = None
# df["binary_morality_gpt-o4-mini_predicted"]   = None
# for f in ["care", "fairness", "authority", "loyalty", "purity"]:
#     df[f"{f}_gpt-o4-mini_predicted"] = 0

# df

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,...,authority_predicted,fairness_predicted,prolife_prochoice_predicted,prolife_prochoice_gpt-o4-mini_predicted,binary_morality_gpt-o4-mini_predicted,care_gpt-o4-mini_predicted,fairness_gpt-o4-mini_predicted,authority_gpt-o4-mini_predicted,loyalty_gpt-o4-mini_predicted,purity_gpt-o4-mini_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,...,0,0,choice,,,0,0,0,0,0
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,...,0,1,choice,,,0,0,0,0,0
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,...,0,1,choice,,,0,0,0,0,0
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,...,0,0,life,,,0,0,0,0,0
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,...,1,0,life,,,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,@janevandis My mother had an ectopic pregnancy...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,,,0,0,0,0,0
496,"@jkfecke Interestingly, the first 3 can only b...",choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,,,0,0,0,0,0
497,We're going to put Roe v Wade on the SD ballot...,neutral,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,,,0,0,0,0,0
498,If you still fucking with this economy and pos...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,,,0,0,0,0,0


In [16]:
for i, row in tqdm(df.iterrows(), total=len(df)):
    text = row["full_text"]
    # (a) Abortion stance
    # out1 = classify_with_schema(text, abortion_instructions, abortion_schema)
    # print(f"Index: {i}, text: {text}, predicted: {out1['label']}")
    # df.at[i, "prolife_prochoice_gpt-o4-mini_predicted"] = out1["label"]

    # out2 = classify_with_schema(text, binary_morality_instructions, binary_schema)
    # print(f"Index: {i}, text: {text}, predicted: {out2['label']}")
    # df.at[i, "binary_morality_gpt-o4-mini_predicted"]   = out2["label"]

    if row["binary_morality_gpt-o4-mini_predicted"] == "moral":
        out3 = classify_with_schema(text, multilabel_morality_instructions, multilabel_schema)
        print(f"Index: {i}, text: {text}, predicted: {out3['foundations']}")
        for foundation in out3["foundations"]:
            df.at[i, f"{foundation}_gpt-o4-mini_predicted"] = 1 
    else:
        print(f"Index: {i}, text: {text}, is not previously predicted as moral")
        df.at[i, "care_gpt-o4-mini_predicted"] = 0
        df.at[i, "fairness_gpt-o4-mini_predicted"] = 0
        df.at[i, "authority_gpt-o4-mini_predicted"] = 0
        df.at[i, "loyalty_gpt-o4-mini_predicted"] = 0
        df.at[i, "purity_gpt-o4-mini_predicted"] = 0

  0%|          | 0/500 [00:00<?, ?it/s]

Index: 0, text: The conversations I have with #prolife org reps that want to use my rape conception &amp; adoption story as PR would be even more incredibly psychologically damaging than they already are - if not for my already deep understanding of caring for myself. #abortion #adoption #prochoice, is not previously predicted as moral
Index: 1, text: This is why @VoteChoice Local Impact Report matters more than ever. Read the report at and elect #prochoice champions in every election everywhere. City Council members are proving they have an effect on #ReproductiveFreedom NOW., predicted: ['fairness']
Index: 2, text: This fight about Roe V Wade is insane. Doesn't make any sense to overturn., is not previously predicted as moral
Index: 3, text: Rise &amp; shine! It's a new day and you're alive. Thank you for another day of life Lord. Amen #Christian #prolife #PraytoEndAbortion, predicted: ['care', 'purity']
Index: 4, text: @VP @March_for_Life I am there in spirit.
 #babylivesmatter, pre

In [17]:
df

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,...,authority_predicted,fairness_predicted,prolife_prochoice_predicted,prolife_prochoice_gpt-o4-mini_predicted,binary_morality_gpt-o4-mini_predicted,care_gpt-o4-mini_predicted,fairness_gpt-o4-mini_predicted,authority_gpt-o4-mini_predicted,loyalty_gpt-o4-mini_predicted,purity_gpt-o4-mini_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,...,0,0,choice,choice,non-moral,0,0,0,0,0
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,...,0,1,choice,choice,moral,0,1,0,0,0
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,...,0,1,choice,choice,non-moral,0,0,0,0,0
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,...,0,0,life,life,moral,1,0,0,0,1
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,...,1,0,life,life,moral,1,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,@janevandis My mother had an ectopic pregnancy...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,choice,non-moral,0,0,0,0,0
496,"@jkfecke Interestingly, the first 3 can only b...",choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,neutral,non-moral,0,0,0,0,0
497,We're going to put Roe v Wade on the SD ballot...,neutral,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,choice,non-moral,0,0,0,0,0
498,If you still fucking with this economy and pos...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,choice,neutral,non-moral,0,0,0,0,0


In [18]:
df.to_csv(os.path.join("data", "raw", "preprocessed_500_annotated_tweets_with_predictions_o4-mini.csv"), index=False)

## Llama Predictions

In [19]:
import os
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

<HF_TOKEN> = "<HF_TOKEN>"

device = "mps" if torch.backends.mps.is_available() else "cpu"

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    model_id, token=<HF_TOKEN>
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    token=<HF_TOKEN>,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

llama_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=64,
    do_sample=False,
    return_full_text=False
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use mps
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [20]:
import re
import json
import logging
from jsonschema import validate, ValidationError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.WARNING)

# Fallback defaults
FALLBACK = {
    "abortion":   {"label": "neutral"},
    "binary":     {"label": "non-moral"},
    "multilabel": {"foundations": []},
}

def classify_with_llama(
    text: str,
    instructions: str,
    schema: dict,
    task: str,
    max_retries: int = 3
) -> dict:
    """
    - messages: system + user
    - regex-grab first {...}
    - json.loads + jsonschema.validate
    - retry up to `max_retries`
    - fallback if still invalid
    """
    messages = [
        {"role": "system",  "content": instructions},
        {"role": "user",    "content": text + "\n\nJSON:"}
    ]
    
    for attempt in range(1, max_retries + 1):
        # pass messages list directly into the pipeline
        out = llama_pipe(messages, max_new_tokens=64, do_sample=False)[0]["generated_text"]
        
        # find the first JSON object
        m = re.search(r"\{.*?\}", out, re.DOTALL)
        if not m:
            logger.warning(f"[{task}][attempt {attempt}] no JSON found")
            continue
        
        json_str = m.group(0)
        try:
            obj = json.loads(json_str)
        except json.JSONDecodeError as e:
            logger.warning(f"[{task}][attempt {attempt}] JSON parse error: {e}")
            continue
        
        try:
            validate(instance=obj, schema=schema)
            return obj
        except ValidationError as e:
            logger.warning(f"[{task}][attempt {attempt}] schema error: {e.message}")
            continue
    
    logger.error(f"[{task}] all {max_retries} attempts failed, using fallback")
    return FALLBACK[task]

In [21]:
import pandas as pd
from tqdm.auto import tqdm

df = pd.read_csv("data/raw/preprocessed_500_annotated_tweets_with_predictions_o4-mini.csv")

df["prolife_prochoice_llama3_predicted"] = None
df["binary_morality_llama3_predicted"]   = None
for f in ["care","fairness","authority","loyalty","purity"]:
    df[f"{f}_llama3_predicted"] = 0

df

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,...,authority_gpt-o4-mini_predicted,loyalty_gpt-o4-mini_predicted,purity_gpt-o4-mini_predicted,prolife_prochoice_llama3_predicted,binary_morality_llama3_predicted,care_llama3_predicted,fairness_llama3_predicted,authority_llama3_predicted,loyalty_llama3_predicted,purity_llama3_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,...,0,0,0,,,0,0,0,0,0
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,...,0,0,1,,,0,0,0,0,0
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,...,0,1,0,,,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,@janevandis My mother had an ectopic pregnancy...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0
496,"@jkfecke Interestingly, the first 3 can only b...",choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0
497,We're going to put Roe v Wade on the SD ballot...,neutral,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0
498,If you still fucking with this economy and pos...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,,,0,0,0,0,0


In [28]:
for i, row in tqdm(df.iterrows(), total=len(df)):
    txt = row["full_text"]
    
    # Abortion stance
    # out1 = classify_with_llama(txt, abortion_instructions, abortion_schema, task="abortion")
    # logger.debug(f"Index: {i}, text: {txt}, predicted: {out1['label']}")
    # df.at[i, "prolife_prochoice_llama3_predicted"] = out1["label"]
    
    # Binary morality
    # out2 = classify_with_llama(txt, binary_morality_instructions, binary_schema, task="binary")
    # logger.debug(f"Index: {i}, text: {txt}, predicted: {out2['label']}")
    # df.at[i, "binary_morality_llama3_predicted"] = out2["label"]
    
    # Multilabel foundations only if moral
    if df.at[i, "binary_morality_gpt-o4-mini_predicted"] == "moral":
        out3 = classify_with_llama(txt, multilabel_morality_instructions, multilabel_schema, task="multilabel")
        logger.debug(f"Index: {i}, text: {txt}, predicted: {out3['foundations']}")
        for f in out3["foundations"]:
            df.at[i, f"{f}_llama3_predicted"] = 1
    else:
        logger.debug(f"Index: {i}, text: {txt}, is not previously predicted as moral")
        for f in ["care","fairness","authority","loyalty","purity"]:
            df.at[i, f"{f}_llama3_predicted"] = 0

  0%|          | 0/500 [00:00<?, ?it/s]

The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more de

In [29]:
df

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,...,authority_gpt-o4-mini_predicted,loyalty_gpt-o4-mini_predicted,purity_gpt-o4-mini_predicted,prolife_prochoice_llama3_predicted,binary_morality_llama3_predicted,care_llama3_predicted,fairness_llama3_predicted,authority_llama3_predicted,loyalty_llama3_predicted,purity_llama3_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,...,0,0,0,choice,non-moral,0,0,0,0,0
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,1,1,1,0
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,...,0,0,1,life,moral,1,0,0,1,1
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,...,0,1,0,life,moral,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,@janevandis My mother had an ectopic pregnancy...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0
496,"@jkfecke Interestingly, the first 3 can only b...",choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,neutral,non-moral,0,0,0,0,0
497,We're going to put Roe v Wade on the SD ballot...,neutral,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,neutral,non-moral,0,0,0,0,0
498,If you still fucking with this economy and pos...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0


In [30]:
df.to_csv("data/raw/preprocessed_500_annotated_tweets_with_predictions_o4-mini_llama3.csv", index=False)

## Metric Calculation

In [31]:
import pandas as pd
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    classification_report, confusion_matrix,
    hamming_loss, precision_recall_fscore_support
)

df = pd.read_csv(os.path.join("data", "raw", "preprocessed_500_annotated_tweets_with_predictions_o4-mini_llama3.csv"))
df

Unnamed: 0,full_text,prolife_prochoice,binary_morality,care,fairness,authority,loyalty,purity,binary_morality_predicted,care_predicted,...,authority_gpt-o4-mini_predicted,loyalty_gpt-o4-mini_predicted,purity_gpt-o4-mini_predicted,prolife_prochoice_llama3_predicted,binary_morality_llama3_predicted,care_llama3_predicted,fairness_llama3_predicted,authority_llama3_predicted,loyalty_llama3_predicted,purity_llama3_predicted
0,The conversations I have with #prolife org rep...,choice,moral,1,0,0,0,0,moral,1,...,0,0,0,choice,non-moral,0,0,0,0,0
1,This is why @VoteChoice Local Impact Report ma...,choice,moral,0,0,1,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,1,1,1,0
2,This fight about Roe V Wade is insane. Doesn't...,choice,moral,0,1,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0
3,Rise &amp; shine! It's a new day and you're al...,life,moral,0,0,1,0,1,moral,1,...,0,0,1,life,moral,1,0,0,1,1
4,@VP @March_for_Life I am there in spirit.\n #b...,life,moral,1,0,0,1,0,non-moral,1,...,0,1,0,life,moral,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,@janevandis My mother had an ectopic pregnancy...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0
496,"@jkfecke Interestingly, the first 3 can only b...",choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,neutral,non-moral,0,0,0,0,0
497,We're going to put Roe v Wade on the SD ballot...,neutral,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,neutral,non-moral,0,0,0,0,0
498,If you still fucking with this economy and pos...,choice,non-moral,0,0,0,0,0,non-moral,0,...,0,0,0,choice,non-moral,0,0,0,0,0


In [32]:
# 2) Define model suffixes (extend with your actual columns)
model_suffixes = ["_predicted",           # fine-tuned Roberta
                  "_gpt-o4-mini_predicted",# GPT-o4-mini
                  "_llama3_predicted"
                  ]     # Llama3

In [33]:
# 3) Initialize a results dict
results = {}

In [34]:
# 4) True labels
y_true_bin = df["binary_morality"]
y_true_multi = df["prolife_prochoice"]
y_true_ml = df[["care","fairness","authority","loyalty","purity"]].to_numpy()

In [35]:
for suffix in model_suffixes:
    m = suffix.replace("_predicted","")
    results[m] = {}
    
    # --- Binary Metrics ---
    y_pred_bin = df[f"binary_morality{suffix}"]
    acc_b = accuracy_score(y_true_bin, y_pred_bin)
    prec_b = precision_score(y_true_bin, y_pred_bin,
                             pos_label="moral", average="binary")
    rec_b = recall_score(y_true_bin, y_pred_bin,
                         pos_label="moral", average="binary")
    f1_b = f1_score(y_true_bin, y_pred_bin,
                    pos_label="moral", average="binary")
    report_b = classification_report(y_true_bin, y_pred_bin, zero_division=0)
    results[m]["binary"] = {
        "accuracy": acc_b,
        "precision": prec_b,
        "recall": rec_b,
        "f1": f1_b,
        "report": report_b
    }
    
    # --- Multiclass Metrics ---
    y_pred_multi = df[f"prolife_prochoice{suffix}"]
    acc_m = accuracy_score(y_true_multi, y_pred_multi)
    report_m = classification_report(y_true_multi, y_pred_multi,
                                     labels=["life","choice","neutral","throw_out"],
                                     zero_division=0)
    cm = confusion_matrix(y_true_multi, y_pred_multi,
                          labels=["life","choice","neutral","throw_out"])
    results[m]["multiclass"] = {
        "accuracy": acc_m,
        "report": report_m,
        "confusion_matrix": cm.tolist()
    }
    
    # --- Multilabel Metrics ---
    y_pred_ml = df[
        [f"{lbl}{suffix}" for lbl in ["care","fairness","authority","loyalty","purity"]]
    ].to_numpy()
    subset_acc = (y_true_ml == y_pred_ml).all(axis=1).mean()
    ham = hamming_loss(y_true_ml, y_pred_ml)
    prec_micro, rec_micro, f1_micro, _ = precision_recall_fscore_support(
        y_true_ml, y_pred_ml, average="micro", zero_division=0
    )
    prec_macro, rec_macro, f1_macro, _ = precision_recall_fscore_support(
        y_true_ml, y_pred_ml, average="macro", zero_division=0
    )
    results[m]["multilabel"] = {
        "subset_accuracy": subset_acc,
        "hamming_loss": ham,
        "precision_micro": prec_micro,
        "recall_micro": rec_micro,
        "f1_micro": f1_micro,
        "precision_macro": prec_macro,
        "recall_macro": rec_macro,
        "f1_macro": f1_macro
    }

# 5) Convert results to DataFrame (optional) and display
perf_df = pd.json_normalize(results).T

In [36]:
display(perf_df)

Unnamed: 0,0
binary.accuracy,0.712
binary.precision,0.943478
binary.recall,0.623563
binary.f1,0.750865
binary.report,precision recall f1-score ...
multiclass.accuracy,0.802
multiclass.report,precision recall f1-score ...
multiclass.confusion_matrix,"[[192, 27, 11, 0], [14, 185, 5, 0], [9, 13, 24..."
multilabel.subset_accuracy,0.518
multilabel.hamming_loss,0.1676


In [37]:
# Completely print each perf_df row
for i, row in perf_df.iterrows():
    # print row's name
    print(f"Row {i}:")
    # print each column's name and value
    for col in row.index:
        print(f"{col}: {row[col]}")
    print("\n")

Row binary.accuracy:
0: 0.712


Row binary.precision:
0: 0.9434782608695652


Row binary.recall:
0: 0.6235632183908046


Row binary.f1:
0: 0.7508650519031141


Row binary.report:
0:               precision    recall  f1-score   support

       moral       0.94      0.62      0.75       348
   non-moral       0.51      0.91      0.66       152

    accuracy                           0.71       500
   macro avg       0.73      0.77      0.70       500
weighted avg       0.81      0.71      0.72       500



Row multiclass.accuracy:
0: 0.802


Row multiclass.report:
0:               precision    recall  f1-score   support

        life       0.85      0.83      0.84       230
      choice       0.79      0.91      0.85       204
     neutral       0.57      0.52      0.55        46
   throw_out       0.00      0.00      0.00        20

    accuracy                           0.80       500
   macro avg       0.55      0.57      0.56       500
weighted avg       0.77      0.80      0.78    

## Put in LaTeX tables

In [38]:
import os
import pandas as pd
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    classification_report, hamming_loss, precision_recall_fscore_support
)

# 1) Load the data
df = pd.read_csv(
    os.path.join("data","raw","preprocessed_500_annotated_tweets_with_predictions_o4-mini_llama3.csv")
)

# 2) Ground truths
y_true_bin   = df["binary_morality"]
y_true_multi = df["prolife_prochoice"]
y_true_ml    = df[["care","fairness","authority","loyalty","purity"]].to_numpy()

# 3) Models & Their Column Suffixes
models = {
    "RoBERTa":              "_predicted",
    "GPT-o4-mini":          "_gpt-o4-mini_predicted",
    "Llama-3.1-8B-Instruct":"_llama3_predicted"
}

# ‚Äî‚Äî 4.1 Binary Classification Table ‚Äî‚Äî
bin_rows = []
for model_name, suf in models.items():
    y_pred = df[f"binary_morality{suf}"]
    bin_rows.append({
        "Model":     model_name,
        "Accuracy":  accuracy_score(y_true_bin, y_pred),
        "Precision": precision_score(y_true_bin, y_pred, pos_label="moral", zero_division=0),
        "Recall":    recall_score(y_true_bin, y_pred, pos_label="moral", zero_division=0),
        "F1":        f1_score(y_true_bin, y_pred, pos_label="moral", zero_division=0),
    })

binary_df = pd.DataFrame(bin_rows).set_index("Model")

print(binary_df.to_latex(
    float_format="%.3f",
    caption="Binary morality classification performance.",
    label="tab:binary-metrics"
))


# ‚Äî‚Äî 4.2 Multiclass Abortion-Stance Table ‚Äî‚Äî
multi_rows = []
for model_name, suf in models.items():
    y_pred = df[f"prolife_prochoice{suf}"]
    # overall accuracy
    acc = accuracy_score(y_true_multi, y_pred)
    # get macro & weighted F1 from classification_report
    crep = classification_report(
        y_true_multi, y_pred,
        labels=["life","choice","neutral","throw_out"],
        output_dict=True, zero_division=0
    )
    multi_rows.append({
        "Model":      model_name,
        "Accuracy":   acc,
        "Macro F1":   crep["macro avg"]["f1-score"],
        "Weighted F1":crep["weighted avg"]["f1-score"]
    })

multiclass_df = pd.DataFrame(multi_rows).set_index("Model")

print(multiclass_df.to_latex(
    float_format="%.3f",
    caption="Multiclass abortion-stance classification performance.",
    label="tab:multiclass-metrics"
))


# ‚Äî‚Äî 4.3 Multilabel Moral-Foundations Table ‚Äî‚Äî
ml_rows = []
for model_name, suf in models.items():
    y_pred_ml = df[[f"{lbl}{suf}" for lbl in ["care","fairness","authority","loyalty","purity"]]].to_numpy()
    subset_acc = (y_true_ml == y_pred_ml).all(axis=1).mean()
    ham        = hamming_loss(y_true_ml, y_pred_ml)
    pmic, rmic, fmic, _ = precision_recall_fscore_support(y_true_ml, y_pred_ml, average="micro", zero_division=0)
    pmac, rmac, fmac, _ = precision_recall_fscore_support(y_true_ml, y_pred_ml, average="macro", zero_division=0)

    ml_rows.append({
        "Model":            model_name,
        "Subset Acc.":      subset_acc,
        "Hamming Loss":     ham,
        "P_{micro}":           pmic,
        "R_{micro}":           rmic,
        "F_{micro}":           fmic,
        "P_{macro}":           pmac,
        "R_{macro}":           rmac,
        "F_{macro}":           fmac,
    })

multilabel_df = pd.DataFrame(ml_rows).set_index("Model")

print(multilabel_df.to_latex(
    float_format="%.3f",
    caption="Multilabel moral-foundations classification performance.",
    label="tab:multilabel-metrics"
))

\begin{table}
\caption{Binary morality classification performance.}
\label{tab:binary-metrics}
\begin{tabular}{lrrrr}
\toprule
 & Accuracy & Precision & Recall & F1 \\
Model &  &  &  &  \\
\midrule
RoBERTa & 0.712 & 0.943 & 0.624 & 0.751 \\
GPT-o4-mini & 0.724 & 0.895 & 0.684 & 0.775 \\
Llama-3.1-8B-Instruct & 0.566 & 0.912 & 0.417 & 0.572 \\
\bottomrule
\end{tabular}
\end{table}

\begin{table}
\caption{Multiclass abortion-stance classification performance.}
\label{tab:multiclass-metrics}
\begin{tabular}{lrrr}
\toprule
 & Accuracy & Macro F1 & Weighted F1 \\
Model &  &  &  \\
\midrule
RoBERTa & 0.802 & 0.559 & 0.784 \\
GPT-o4-mini & 0.864 & 0.732 & 0.873 \\
Llama-3.1-8B-Instruct & 0.826 & 0.683 & 0.835 \\
\bottomrule
\end{tabular}
\end{table}

\begin{table}
\caption{Multilabel moral-foundations classification performance.}
\label{tab:multilabel-metrics}
\begin{tabular}{lrrrrrrrr}
\toprule
 & Subset Acc. & Hamming Loss & P_{micro} & R_{micro} & F_{micro} & P_{macro} & R_{macro} & F_{mac