# Step 2. Classifying the article data by objectivity by leveraging Chat GPT API
Based on the following criteria:

1. A sentence is subjective if it expressess  personal opinionins, sarcasm, exhortations, discrimnatory remarks, or rhetorical figures.
2. All other sentences should be considered objective, including third-party opinions, when referenced as such, and non-conclusive comments
---



**Section 1:** Importing the right resources

In [153]:
import pandas as pd
import numpy as np
import nltk
import re


In [154]:
!pip install openai



In [155]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [35]:
from nltk.tokenize import sent_tokenize

In [36]:
#import openai library
import openai
from openai import OpenAI

#declare the api key for this project
client = OpenAI(api_key='sk-proj-PIAJbNhCs1okjwAXhk3UT3BlbkFJuRarqI6gVB1BsjWbdegw')

#testing response
completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "Please confirm succesful connection"}
  ]
)
response = completion.choices[0].message.content

print(response)

Connection successful! How can I assist you today?


In [37]:
#connect to google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [134]:
df = pd.read_csv('/content/drive/My Drive/266 Natural language processing/NLP Project/initial_data.csv')

**Section 2:** Loading the unlabeled dataset created during step 1

In [135]:
# print size of dataframe
df.shape

(200, 11)

In [136]:
# print first 5 examples in dataframe
df.head()

Unnamed: 0.1,Unnamed: 0,date,year,month,day,author,title,article,url,section,publication
0,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,\n next\n Image 1 of 2 \n ...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News
1,1,2018-07-10 00:00:00,2018,7.0,10,Ryan Gaydos,Australian rangers trap gigantic saltwater cro...,"\n A 15-foot, 1,300-pound saltwater c...",https://www.foxnews.com/world/australian-range...,Reptiles,Fox News
2,2,2018-08-01 00:00:00,2018,8.0,1,John Stossel,John Stossel: Are looney liberals and smug cel...,\n While restaurant leaders reportedl...,https://www.foxnews.com/opinion/john-stossel-a...,OPINION,Fox News
3,3,2018-05-15 00:00:00,2018,5.0,15,Associated Press,"UN peacekeeping force to stay, but shrink, in ...",UNITED NATIONS – U.N. peacekeepers will remai...,https://www.foxnews.com/world/un-peacekeeping-...,World,Fox News
4,4,2018-05-17 00:00:00,2018,5.0,17,Associated Press,Trump tells NK's Kim to denuclearize or risk o...,close Video Trump contradicts John Bolton on N...,https://www.foxnews.com/us/trump-tells-nks-kim...,MILITARY,Fox News


In [137]:
# create new df with article author, title, and article contents
df_new = df[['author', 'title','article']]
df_new.head()

Unnamed: 0,author,title,article
0,Associated Press,Rescuers look through mud for Japan flood vict...,\n next\n Image 1 of 2 \n ...
1,Ryan Gaydos,Australian rangers trap gigantic saltwater cro...,"\n A 15-foot, 1,300-pound saltwater c..."
2,John Stossel,John Stossel: Are looney liberals and smug cel...,\n While restaurant leaders reportedl...
3,Associated Press,"UN peacekeeping force to stay, but shrink, in ...",UNITED NATIONS – U.N. peacekeepers will remai...
4,Associated Press,Trump tells NK's Kim to denuclearize or risk o...,close Video Trump contradicts John Bolton on N...


**Section 3:** Tokenize and transform original dataframe into sentences to properly label

In [138]:
#function to split text into sentences
def split_into_sentences(text):
  return sent_tokenize(text)

In [139]:

# Function to clean sentences
def clean_sentence(sentence):
    # Remove unnecessary whitespace and line breaks
    sentence = re.sub(r'\s+', ' ', sentence).strip()

    # Remove non-alphanumeric characters
    sentence = re.sub(r'[^a-zA-Z0-9\s]', '', sentence)

    # Convert to lowercase
    sentence = sentence.lower()

    return sentence

In [140]:
# Apply the function to expand each item in 'sentence' into multiple rows
df_expanded = df['article'].apply(split_into_sentences).explode().reset_index()
df_expanded.columns = ['original_index', 'sentence']

# Clean each sentence
df_expanded['cleaned_sentence'] = df_expanded['sentence'].apply(clean_sentence)

# Remove rows with empty, single-word, or non-informative sentences
df_expanded = df_expanded[df_expanded['cleaned_sentence'].apply(lambda x: len(x.split()) > 5)]

# Show the first few rows of the cleaned DataFrame
df_expanded.head()

Unnamed: 0,original_index,sentence,cleaned_sentence
0,0,\n next\n Image 1 of 2 \n ...,next image 1 of 2 prev image 2 of 2 hiroshima ...
1,0,Officials and reports say more than 80 people ...,officials and reports say more than 80 people ...
2,0,The Fire and Disaster Management Agency said 1...,the fire and disaster management agency said 1...
3,0,Several days of heavy rainfall that weather of...,several days of heavy rainfall that weather of...
4,0,Many people started to return and check on the...,many people started to return and check on the...


In [141]:
df_expanded['sentence'] = df_expanded['cleaned_sentence']
df_expanded.head()

Unnamed: 0,original_index,sentence,cleaned_sentence
0,0,next image 1 of 2 prev image 2 of 2 hiroshima ...,next image 1 of 2 prev image 2 of 2 hiroshima ...
1,0,officials and reports say more than 80 people ...,officials and reports say more than 80 people ...
2,0,the fire and disaster management agency said 1...,the fire and disaster management agency said 1...
3,0,several days of heavy rainfall that weather of...,several days of heavy rainfall that weather of...
4,0,many people started to return and check on the...,many people started to return and check on the...


In [142]:
#merge back with orginal dataframe:
df_final = df_expanded.merge(df.drop('article', axis=1), left_on='original_index', right_index=True, how='left')
df_final.shape
df_final.pop('cleaned_sentence')
df_final = df_final.reset_index(drop=True)
df_final.head()

Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication
0,0,next image 1 of 2 prev image 2 of 2 hiroshima ...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News
1,0,officials and reports say more than 80 people ...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News
2,0,the fire and disaster management agency said 1...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News
3,0,several days of heavy rainfall that weather of...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News
4,0,many people started to return and check on the...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News


**Section 4:** Create Classification function

In [143]:
df_final['objectivity_classification'] = None
df_final['Emotional_appeal_classification'] = None
df_final['classification_raw'] = None

In [144]:
# deinfe a function that processes batches of sentences and updates the dataframe

def classify_batch(df, start_index, batch_size):
    # Generate batches and classify
    end_index = start_index + batch_size
    batch = df.iloc[start_index:end_index]

    print("Start Index:", start_index)
    print("End Index:", end_index)
    print("Batch Size:", len(batch))  # Check the size of the batch

    responses = []
    for sentence in batch['sentence']:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": """Classify the following sentence based on the criteria below:

1. Objectivity: Classify as 'Objective' or 'Subjective'. A sentence is subjective if it expresses personal opinions, sarcasm, exhortations, discriminatory remarks, or rhetorical figures. Categorize as 'Unclassifiable" if not enough information to classify.
   Examples:
   - "The sky is blue and clear today." (Objective)
   - "I believe this new policy will be a disaster." (Subjective)
   - "According to experts, the economy is improving." (Objective)

2. Emotional Appeal: Classify as 'Emotional Appeal' if the sentence attempts to evoke a strong emotional response from the reader, such as fear, sympathy, or anger, often without presenting logical arguments or evidence.
   Examples:
   - "Imagine the heartbreak of those who lost everything in the fire." (Emotional Appeal)
   - "The heartwarming story of a dog who traveled miles to find its owner." (Emotional Appeal)
   - "The terrifying threat of an impending economic collapse has everyone on edge." (Emotional Appeal)

Respond with a space-separated string of all applicable classifications (e.g., 'Objective', 'Objective Emotional Appeal','Unclassifiable', 'Subjective Emottional Appeal').
"""},
                {"role": "user", "content": sentence}
            ]
        )
        responses.append(response.choices[0].message.content)
    print("Number of Responses:", len(responses))  # Check the number of responses

    # Store results back into DataFrame
    print(responses)
    print(f"length response: {len(responses)}, length indexes = {len(df.loc[start_index:end_index-1, 'classification_raw'])}")
    df.loc[start_index:end_index-1, 'classification_raw'] = responses

In [145]:
#test with one run
df_test = df_final[:5]
df_test

Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication,objectivity_classification,Emotional_appeal_classification,classification_raw
0,0,next image 1 of 2 prev image 2 of 2 hiroshima ...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,,,
1,0,officials and reports say more than 80 people ...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,,,
2,0,the fire and disaster management agency said 1...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,,,
3,0,several days of heavy rainfall that weather of...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,,,
4,0,many people started to return and check on the...,0,2018-07-09 00:00:00,2018,7.0,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,,,


In [146]:
batch_size = 5  # Set batch size according to API limits and your preference
num_sentences = len(df_test)
counter = 1
for start_index in range(0, num_sentences, batch_size):
    print(f"{counter}. Batch starting at index {start_index} being classified.")  # Optional: for tracking progress
    classify_batch(df_test, start_index, batch_size)
    counter += 1



Start Index: 0
End Index: 5
Batch Size: 5
Number of Responses: 5
['Objective Emotional Appeal', 'Objective Emotional Appeal', 'Objective', 'Objective', 'Objective']
length response: 5, length indexes = 5
Batch starting at index 0 classified.


In [147]:
batch_size = 50  # Set batch size according to API limits and your preference
num_sentences = len(df_final)
counter = 1
for start_index in range(0, num_sentences, batch_size):
    print(f"{counter}. Batch starting at index {start_index} being classified.")  # Optional: for tracking progress
    classify_batch(df_final, start_index, batch_size)
    counter += 1


1. Batch starting at index 0 being classified.
Start Index: 0
End Index: 50
Batch Size: 50
Number of Responses: 50
['Objective Emotional Appeal', 'Objective Emotional Appeal', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective Emotional Appeal', 'Subjective Emotional Appeal', 'Subjective Emotional Appeal', 'Objective Emotional Appeal', 'Objective', 'Objective', 'Objective', 'Objective Emotional Appeal', 'Objective Emotional Appeal', 'Objective', 'Objective Emotional Appeal', 'Objective Emotional Appeal', 'Objective Emotional Appeal', 'Objective', 'Objective Emotional Appeal', 'Subjective', 'Objective', 'Objective Emotional Appeal', 'Subjective Emotional Appeal', 'Unclassifiable', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Unclassifiable', 'Objective', 'Objective', 'Objective', 'Unclassifiable', 'Subjective Emotional Appeal', 'Objective', 'Objective', 'Objective Emotional Appeal', 'Subjective Emotional Appeal', 'S

In [149]:
len(df_final.loc[400:450-1, 'classification_raw'])   # Displays rows 455 to 465 (inclusive)

50

In [150]:
len(df_final.loc[450:500-1, 'classification_raw'])   # Displays rows 455 to 465 (inclusive)

50

In [159]:
df_final['objectivity_classification'] = False
df_final['Emotional_appeal_classification'] = False

In [160]:
# if column 'classification_raw' mentions Objective, flag 'objectivity_classification' as True. if column mentions 'Subjective' flag objectivity_classification column as False. If 'classification_raw' column mentions 'Emotional Appeal' then classify column 'emotional_appeal' as True, else false.
df_final['objectivity_classification'] = df_final['classification_raw'].apply(lambda x: 'Objective' in x)
df_final['Emotional_appeal_classification'] = df_final['classification_raw'].apply(lambda x: 'Emotional Appeal' in x)

In [161]:
df_final.head()

Unnamed: 0.2,Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication,objectivity_classification,Emotional_appeal_classification,classification_raw,WordCount
0,0,0,next image 1 of 2 prev image 2 of 2 hiroshima ...,0,7/9/2018 0:00,2018,7,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,True,True,Objective Emotional Appeal,50
1,1,0,officials and reports say more than 80 people ...,0,7/9/2018 0:00,2018,7,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,True,True,Objective Emotional Appeal,20
2,2,0,the fire and disaster management agency said 1...,0,7/9/2018 0:00,2018,7,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,True,False,Objective,17
3,3,0,several days of heavy rainfall that weather of...,0,7/9/2018 0:00,2018,7,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,True,False,Objective,28
4,4,0,many people started to return and check on the...,0,7/9/2018 0:00,2018,7,9,Associated Press,Rescuers look through mud for Japan flood vict...,https://www.foxnews.com/world/rescuers-look-th...,World,Fox News,True,False,Objective,18


In [162]:
df_final.to_csv('/content/drive/My Drive/266 Natural language processing/NLP Project/training_data_modified_final.csv')