## **Importing Resources**

In [1]:
!pip -q install openai tqdm scikit-learn pandas matplotlib

In [2]:
import openai
import pandas as pd
import time
from tqdm import tqdm
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, accuracy_score, f1_score, cohen_kappa_score
from scipy.stats import chi2_contingency

## **Loading Datasets**

In [3]:
unannotated_2000 = 'datasets/cleaned/combined_HL_test_2000.csv'
unannotated_3000 = 'datasets/cleaned/remaining_HL_test_3000.csv'
gold_2000 = 'datasets/annotated/human_annotation_2000.csv'
chatgpt_2000 = 'datasets/annotated/chatgpt_annotation_2000.csv'

In [4]:
df = pd.read_csv(unannotated_3000)
df

Unnamed: 0,text,label
0,"Kagawad, isa pa dedbol, tserman kritikal sa ts...",
1,Suwerte para sa isang magdyowa mula sa North Y...,
2,Isang limang-taong gulang na batang lalaki ang...,
3,"10 pumuga sa Pasay, balik-selda na lahat",
4,Senegalese tiklo sa pekeng passport,
...,...,...
2979,"Typhoon Betty, patuloy na humihina sa karagata...",
2980,'Sarap maging tatay!' Post ng netizen tungkol ...,
2981,"TESDA, maglulunsad ng training programs para s...",
2982,"Anne Curtis, nagdiwang ng kaarawan sa ‘It’s Sh...",


In [5]:
gold_df = pd.read_csv(gold_2000)
gold_df

Unnamed: 0,text,code_frames,label
0,Isang umano’y tinaguriang ‘shabu queen’ at lid...,7,6
1,Anthrax infection kumalat sa Cagayan,9,8
2,TESDA: Mga tech-voc graduate swak sa trabaho,10,9
3,Nagkamit ng unang pwesto ang isang Filipina st...,15,14
4,NEDA inaprub tapyas taripa sa e-vehicle,6,5
...,...,...,...
1995,"Willie Revillame, humingi ng dispensa kay Cris...",15,14
1996,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",7,6
1997,Sino raw itong bida-bidang bagitong miyembro n...,13,12
1998,MAGPAPAKALAT ang PNP Special Action Force ng d...,2,1


In [6]:
chatgpt_df = pd.read_csv(chatgpt_2000)
chatgpt_df

Unnamed: 0,text,code_frames,label
0,Isang umano’y tinaguriang ‘shabu queen’ at lid...,7,6
1,Anthrax infection kumalat sa Cagayan,9,8
2,TESDA: Mga tech-voc graduate swak sa trabaho,2,1
3,Nagkamit ng unang pwesto ang isang Filipina st...,15,14
4,NEDA inaprub tapyas taripa sa e-vehicle,6,5
...,...,...,...
1995,"Willie Revillame, humingi ng dispensa kay Cris...",15,14
1996,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",7,6
1997,Sino raw itong bida-bidang bagitong miyembro n...,13,12
1998,MAGPAPAKALAT ang PNP Special Action Force ng d...,8,7


## **Annotating Using ChatGPT**

In [None]:
# Set your OpenAI API key
client = openai.OpenAI(api_key='')

In [8]:
# Define policy issue frames (ordered for numeric labeling)
policy_frames = [
    'Economic', 'Capacity and Resources', 'Morality', 'Fairness and Equality',
    'Constitutionality and Jurisprudence', 'Policy Prescription and Evaluation',
    'Law and Order, Crime and Justice', 'Security and Defense', 'Health and Safety',
    'Quality of Life', 'Cultural Identity', 'Public Opinion', 'Political',
    'External Regulation and Reputation', 'Other'
]

policy_descriptions = [
    'The costs, benefits, or monetary/financial implications of the issue (to an individual, family, community, or the economy as a whole). '
    'Contexts: Mentions of the state of the economy, monetary or financial implications of a policy or issue, discussions on percentage, money, or price fluctuations, or references to economic conditions or market forces.',

    'The lack of or availability of physical, geographical, spatial, human, and financial resources, or the capacity of existing systems and resources to implement or carry out policy goals. '
    'Contexts: Mentions of the availability or lack of resources to implement a policy or achieve goals, resource allocation challenges, scarcity of resources, budget limitations, issues with implementation capacity, or when capacity and resources are allocated more to the general public instead of specific people, groups, or places.',

    'Any perspective—or policy objective or action (including proposed action)— that is compelled by religious doctrine or interpretation, duty, honor, righteousness, or any other sense of ethics or social responsibility. '
    'Contexts: References to religious beliefs, ethical dilemmas, moral responsibility, social good, or the moral duty of individuals or governments, perspectives or actions influenced by religion, principles, values, duty, honor, or ethics, and calls to action or suggestions based on moral duty or universal values.',

    'Equality or inequality with which laws, punishment, rewards, and resources are applied or distributed among individuals or groups. Also, the balance between the rights or interests of one individual or group compared to another individual or group. '
    'Contexts: Mentions of fairness, discrimination, or unequal treatment; comparisons between individuals or groups; discussions of systemic bias, marginalization, or advantage/disadvantage; references to minority rights, underrepresented groups, or unequal access to opportunities or services.',

    'The constraints imposed on or freedoms granted to individuals, government, and corporations via the Constitution, Bill of Rights, and other amendments, or judicial interpretation. This deals specifically with the authority of government to regulate, and the authority of individuals/corporations to act independently of government. '
    'Contexts: Mentions of Supreme Court or other judicial bodies (e.g., Sandiganbayan), decisions based on the Constitution or laws, references to constitutional amendments, and judicial rulings or interpretations.',

    'Particular policies proposed for addressing an identified problem, and figuring out if certain policies will work, or if existing policies are effective. '
    'Contexts: Suggestions to address a problem through actions or policies (regardless of who made the suggestion), discussions about the formulation, assessment, evaluation, approval, or rejection of policies.',

    'Specific policies in practice and their enforcement, incentives, and implications. Includes stories about enforcement and interpretation of laws by individuals and law enforcement, breaking laws, loopholes, fines, sentencing, and punishment. Increases or reductions in crime. '
    'Contexts: Mentions of how laws are practiced or implemented, discussions of breaking the law (e.g., violations), mentions of crimes, crime rates, punishment, interpretations of the law, and the involvement of law enforcers in maintaining order or solving crimes. Also includes reports of criminal investigations and deaths caused by crimes, such as homicides, violent attacks, or incidents under legal scrutiny.',

    'Security, threats to security, and protection of one’s person, family, in-group, nation, etc. Generally, it is an action or a call to action that can be taken to protect the welfare of a person, group, or nation sometimes from a not yet manifested threat. '
    'Contexts: Mentions of national security, military action, or defense strategy; discussions of threats to national sovereignty, terrorism, cybersecurity; references to protective actions taken by the military or other security forces; involvement of rebel groups or insurgents in conflicts; emphasis on safeguarding national integrity.',

    'Healthcare access and effectiveness, illness, disease, sanitation, obesity, mental health effects, prevention of or perpetuation of gun violence, infrastructure, and building safety. '
    'Contexts: Involvement of diseases, viruses, illnesses, or healthcare access, discussions of accidents, hazards, or disasters, references to mortality or death caused by accidents, hazards, or disasters (excluding crime-related), concerns about the safety of people from illnesses or dangerous buildings that may require construction or repairs, disaster and risk preparedness, mitigation, and reduction, rescue operations (excluding crime-related).',

    'The effects of a policy on individuals’ wealth, mobility, access to resources, happiness, social structures, ease of day-to-day routines, quality of community life, etc. '
    'Contexts: Mentions of living conditions, community well-being, or disruptions and improvements to everyday life; references to comfort, safety, or convenience in daily settings; policies or actions aiming to improve well-being, ease burdens, or raise living standards. Includes announcements that affect people’s daily routines, access to services, or life opportunities, as well as explicit mentions of affected individuals, groups, or places.',

    'The social norms, trends, values, and customs constituting culture(s), as they relate to a specific policy issue. '
    'Contexts: Policies related to Filipino culture, or actions and situations that embody, celebrate, or reflect Filipino cultural norms, values, or traditions.',

    'References to general social attitudes, polling, and demographic information, as well as implied or actual consequences of diverging from or getting ahead of public opinion or polls. '
    'Contexts: Involvement of surveys or polls, perceptions or opinions of specific groups regarding situations or policies (whether in support or criticism), and actions by groups to express their stance.',

    'Any political considerations surrounding an issue. Issue actions or efforts or stances that are political, such as partisan filibusters, lobbyist involvement, bipartisan efforts, deal-making and vote trading, appealing to one’s base, and mentions of political maneuvering. Explicit statements that a policy issue is good or bad for a particular political party. '
    'Contexts: Involvement of politicians or political institutions through actions such as giving public statements, debating or arguing with fellow politicians, expressing support or criticism of a policy.',

    'The country’s external relations with another nation; the external relations of one state with another; or relations between groups. This includes trade agreements and outcomes, comparisons of policy outcomes, or desired policy outcomes. '
    'Contexts: Involvement of government representatives from different countries, discussions on country-to-country relationships or engagements through agreements or policies, or mentions of assistance from one country to another.',

    'Any frames that do not fit into the above categories. '
    'Contexts: Content that falls outside all the previous policy frames. This includes topics such as entertainment, celebrity news, showbiz updates, sports events, competitions, lottery results, and similar stories not tied to governance, public policy, or societal issues.'
]



In [9]:
# Create a mapping from frame name to its index
frame_index_map = {policy_frames[i]: i for i in range(len(policy_frames))}

# Format policy frames as a string for the prompt
policy_prompt = '\n'.join([f'{i}. {policy_frames[i]}: {policy_descriptions[i]}' for i in range(len(policy_frames))])
print(policy_prompt)

0. Economic: The costs, benefits, or monetary/financial implications of the issue (to an individual, family, community, or the economy as a whole). Contexts: Mentions of the state of the economy, monetary or financial implications of a policy or issue, discussions on percentage, money, or price fluctuations, or references to economic conditions or market forces.
1. Capacity and Resources: The lack of or availability of physical, geographical, spatial, human, and financial resources, or the capacity of existing systems and resources to implement or carry out policy goals. Contexts: Mentions of the availability or lack of resources to implement a policy or achieve goals, resource allocation challenges, scarcity of resources, budget limitations, issues with implementation capacity, or when capacity and resources are allocated more to the general public instead of specific people, groups, or places.
2. Morality: Any perspective—or policy objective or action (including proposed action)— tha

In [None]:
def classify_text(text):

    prompt = f'''
    Classify the following news text into exactly **one** of the 15 policy issue frames listed below, based on its **main message**. 

    - Use the provided frame descriptions and contexts to determine the most relevant classification. 
    - The contexts highlight common situations, scenarios, or perspectives typically associated with each frame. Refer to these contexts to guide your classification, especially when the text could fit into multiple frames.
    - Focus on the primary argument or perspective conveyed in the text.
    - If the text does not match any frame, classify it as 'Other'.

    Policy Frames:
    {policy_prompt}

    News Text: '{text}'

    Choose the **most relevant** frame and return **only** its name. Do not provide explanations or multiple answers.
    '''


    for _ in range(3):  # Retry up to 3 times in case of failure
        try:
            response = client.chat.completions.create(
                model='gpt-4o',  
                messages=[{'role': 'system', 'content': 'You are an expert in policy issue classification.'},
                          {'role': 'user', 'content': prompt}],
                temperature=0
            )
            frame_name = response.choices[0].message.content.strip()
            return frame_index_map.get(frame_name, 14)  # Default to 'Other' if unrecognized
        except Exception as e:
            print(f'Error: {e}. Retrying...')
            time.sleep(5)  # Wait before retrying

    return 14  # Default to 'Other' if all attempts fail

In [11]:
# Get labels from classification
df['label'] = [classify_text(row['text']) for _, row in tqdm(df.iterrows(), total=df.shape[0])]

# Insert code_frames as second column
df.insert(1, 'code_frames', df['label'] + 1)

df

100%|██████████| 2984/2984 [2:42:23<00:00,  3.27s/it]  


Unnamed: 0,text,code_frames,label
0,"Kagawad, isa pa dedbol, tserman kritikal sa ts...",9,8
1,Suwerte para sa isang magdyowa mula sa North Y...,15,14
2,Isang limang-taong gulang na batang lalaki ang...,9,8
3,"10 pumuga sa Pasay, balik-selda na lahat",7,6
4,Senegalese tiklo sa pekeng passport,7,6
...,...,...,...
2979,"Typhoon Betty, patuloy na humihina sa karagata...",9,8
2980,'Sarap maging tatay!' Post ng netizen tungkol ...,11,10
2981,"TESDA, maglulunsad ng training programs para s...",2,1
2982,"Anne Curtis, nagdiwang ng kaarawan sa ‘It’s Sh...",15,14


In [12]:
output_path = 'datasets/annotated/chatgpt_annotation_3000.csv'
df.to_csv(output_path, index=False)

print(f'ChatGPT annotations saved to: {output_path}')

ChatGPT annotations saved to: datasets/annotated/chatgpt_annotation_3000.csv


## **Evaluating ChatGPT Annotation Performance**

### Merging Annotations

In [13]:
gold = gold_df.copy()
chatgpt = chatgpt_df.copy()

In [14]:
gold = gold.rename(columns={'label': 'gold_label'})
chatgpt = chatgpt.rename(columns={'label': 'chatgpt_label'})

In [15]:
annotations = pd.concat([gold, chatgpt], axis=1)
annotations

Unnamed: 0,text,code_frames,gold_label,text.1,code_frames.1,chatgpt_label
0,Isang umano’y tinaguriang ‘shabu queen’ at lid...,7,6,Isang umano’y tinaguriang ‘shabu queen’ at lid...,7,6
1,Anthrax infection kumalat sa Cagayan,9,8,Anthrax infection kumalat sa Cagayan,9,8
2,TESDA: Mga tech-voc graduate swak sa trabaho,10,9,TESDA: Mga tech-voc graduate swak sa trabaho,2,1
3,Nagkamit ng unang pwesto ang isang Filipina st...,15,14,Nagkamit ng unang pwesto ang isang Filipina st...,15,14
4,NEDA inaprub tapyas taripa sa e-vehicle,6,5,NEDA inaprub tapyas taripa sa e-vehicle,6,5
...,...,...,...,...,...,...
1995,"Willie Revillame, humingi ng dispensa kay Cris...",15,14,"Willie Revillame, humingi ng dispensa kay Cris...",15,14
1996,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",7,6,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",7,6
1997,Sino raw itong bida-bidang bagitong miyembro n...,13,12,Sino raw itong bida-bidang bagitong miyembro n...,13,12
1998,MAGPAPAKALAT ang PNP Special Action Force ng d...,2,1,MAGPAPAKALAT ang PNP Special Action Force ng d...,8,7


In [16]:
for i, col in enumerate(annotations.columns):
    print(f"{i}: {col}")

0: text
1: code_frames
2: gold_label
3: text
4: code_frames
5: chatgpt_label


In [17]:
annotations = annotations.iloc[:, [0, 2, 5]]
annotations

Unnamed: 0,text,gold_label,chatgpt_label
0,Isang umano’y tinaguriang ‘shabu queen’ at lid...,6,6
1,Anthrax infection kumalat sa Cagayan,8,8
2,TESDA: Mga tech-voc graduate swak sa trabaho,9,1
3,Nagkamit ng unang pwesto ang isang Filipina st...,14,14
4,NEDA inaprub tapyas taripa sa e-vehicle,5,5
...,...,...,...
1995,"Willie Revillame, humingi ng dispensa kay Cris...",14,14
1996,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",6,6
1997,Sino raw itong bida-bidang bagitong miyembro n...,12,12
1998,MAGPAPAKALAT ang PNP Special Action Force ng d...,1,7


In [18]:
annotations.loc[:,'match'] = annotations['gold_label'] == annotations['chatgpt_label']
annotations

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  annotations.loc[:,'match'] = annotations['gold_label'] == annotations['chatgpt_label']


Unnamed: 0,text,gold_label,chatgpt_label,match
0,Isang umano’y tinaguriang ‘shabu queen’ at lid...,6,6,True
1,Anthrax infection kumalat sa Cagayan,8,8,True
2,TESDA: Mga tech-voc graduate swak sa trabaho,9,1,False
3,Nagkamit ng unang pwesto ang isang Filipina st...,14,14,True
4,NEDA inaprub tapyas taripa sa e-vehicle,5,5,True
...,...,...,...,...
1995,"Willie Revillame, humingi ng dispensa kay Cris...",14,14,True
1996,"Rob Mananquil, nagulat sa nadatnang mga kalat,...",6,6,True
1997,Sino raw itong bida-bidang bagitong miyembro n...,12,12,True
1998,MAGPAPAKALAT ang PNP Special Action Force ng d...,1,7,False


In [19]:
print("Matches:", annotations['match'].sum())
print("Total rows:", len(annotations))

Matches: 1437
Total rows: 2000


In [20]:
output_path = 'datasets/annotated/annotation_comparison_2000.csv'
annotations.to_csv(output_path, index=False)

print(f'Human vs. ChatGPT annotations saved to: {output_path}')

Human vs. ChatGPT annotations saved to: datasets/annotated/annotation_comparison_2000.csv


### Label Distribution

#### Gold vs Predicted Label Count

In [21]:
# Compute value counts (aligned by label index)
gold_counts = annotations['gold_label'].value_counts().sort_index()
chatgpt_counts = annotations['chatgpt_label'].value_counts().sort_index()

# Combine into one DataFrame
comparison = pd.DataFrame({
    'gold_label': gold_counts,
    'chatgpt_label': chatgpt_counts
})

# Add a difference column
comparison['diff'] = comparison['chatgpt_label'] - comparison['gold_label']

comparison


Unnamed: 0,gold_label,chatgpt_label,diff
0,45,70,25
1,72,176,104
2,25,26,1
3,9,20,11
4,24,30,6
5,121,97,-24
6,457,433,-24
7,31,92,61
8,455,395,-60
9,128,66,-62


In [22]:
comparison_sorted = comparison.reindex(comparison['diff'].abs().sort_values(ascending=False).index)
comparison_sorted

Unnamed: 0,gold_label,chatgpt_label,diff
1,72,176,104
10,15,83,68
9,128,66,-62
12,157,96,-61
7,31,92,61
8,455,395,-60
14,385,346,-39
0,45,70,25
6,457,433,-24
5,121,97,-24


#### Per-Label Match vs Mismatch Count

In [23]:
label_match_stats = annotations.groupby(['gold_label', 'match']).size().reset_index(name='count')

label_match_pivot = label_match_stats.pivot(index='gold_label', columns='match', values='count').fillna(0)
label_match_pivot.columns = ['Mismatch', 'Match']  # False = Mismatch, True = Match
label_match_pivot['Total'] = label_match_pivot['Match'] + label_match_pivot['Mismatch']
label_match_pivot['Accuracy_Per_Label'] = label_match_pivot['Match'] / label_match_pivot['Total']

label_match_pivot.reset_index()

Unnamed: 0,gold_label,Mismatch,Match,Total,Accuracy_Per_Label
0,0,5,40,45,0.888889
1,1,17,55,72,0.763889
2,2,11,14,25,0.56
3,3,5,4,9,0.444444
4,4,9,15,24,0.625
5,5,63,58,121,0.479339
6,6,65,392,457,0.857768
7,7,3,28,31,0.903226
8,8,97,358,455,0.786813
9,9,93,35,128,0.273438


In [None]:
label_match_pivot[['Match', 'Mismatch']].plot(kind='bar', stacked=True, figsize=(8,4))
plt.title("Label-wise Match vs Mismatch Distribution")
plt.xlabel("Gold Label")
plt.ylabel("Count")
plt.legend(title="Prediction")
plt.tight_layout()
plt.show()

### Classification Report

In [28]:
# Compare predicted vs ground truth labels
report = classification_report(annotations['gold_label'], annotations['chatgpt_label'], digits=4)
print('Classification Report:\n', report)

Classification Report:
               precision    recall  f1-score   support

           0     0.5714    0.8889    0.6957        45
           1     0.3125    0.7639    0.4435        72
           2     0.5385    0.5600    0.5490        25
           3     0.2000    0.4444    0.2759         9
           4     0.5000    0.6250    0.5556        24
           5     0.5979    0.4793    0.5321       121
           6     0.9053    0.8578    0.8809       457
           7     0.3043    0.9032    0.4553        31
           8     0.9063    0.7868    0.8424       455
           9     0.5303    0.2734    0.3608       128
          10     0.1446    0.8000    0.2449        15
          11     0.4500    0.5143    0.4800        35
          12     0.8125    0.4968    0.6166       157
          13     0.8333    0.6098    0.7042        41
          14     0.8815    0.7922    0.8345       385

    accuracy                         0.7185      2000
   macro avg     0.5659    0.6531    0.5648      2000
we

### Accuracy and Weighted F1 Score

In [25]:
y_true = annotations['gold_label']
y_pred = annotations['chatgpt_label']

accuracy = accuracy_score(y_true, y_pred)
weighted_f1 = f1_score(y_true, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.4f}")
print(f"Weighted F1 Score: {weighted_f1:.4f}")

Accuracy: 0.7185
Weighted F1 Score: 0.7354


### Cohen's Kappa

In [26]:
kappa = cohen_kappa_score(y_true, y_pred)
print(f"Cohen's Kappa: {kappa:.4f}")

Cohen's Kappa: 0.6717


### Chi-Square Test

In [None]:
# Step 1: Create the contingency table
contingency = pd.crosstab(annotations['gold_label'], annotations['chatgpt_label'])

# Step 2: Perform the Chi-square test
chi2, p, dof, expected = chi2_contingency(contingency)

# Step 3: Print results
print(f"Chi-square statistic: {chi2:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p:.4f}")

# Optional: inspect expected values
expected_df = pd.DataFrame(expected, index=contingency.index, columns=contingency.columns)
expected_df