# Step 2. Classifying the article data by objectivity by leveraging Chat GPT API
Based on the following criteria:

1. A sentence is subjective if it expressess  personal opinionins, sarcasm, exhortations, discrimnatory remarks, or rhetorical figures.
2. All other sentences should be considered objective, including third-party opinions, when referenced as such, and non-conclusive comments
---



**Section 1:** Importing the right resources

In [23]:
import pandas as pd
import numpy as np
import nltk
import re


In [24]:
!pip install openai



In [25]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [26]:
from nltk.tokenize import sent_tokenize

In [27]:
#import openai library
import openai
from openai import OpenAI

#declare the api key for this project
client = OpenAI(api_key='sk-proj-qe7WXW6HSWFKIOTT0Up3T3BlbkFJGVofzL9WI0DtlibuCH1i')

#testing response
completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "Please confirm succesful connection"}
  ]
)
response = completion.choices[0].message.content

print(response)

Connection confirmed! How can I assist you today?


In [28]:
#connect to google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [29]:
df = pd.read_csv('/content/drive/My Drive/Project 266 files/sample_news.csv')

**Section 2:** Loading the unlabeled dataset created during step 1

In [30]:
# print size of dataframe
df.shape

(900, 11)

In [31]:
# print first 5 examples in dataframe
df.head()

Unnamed: 0.1,Unnamed: 0,date,year,month,day,author,title,article,url,section,publication
0,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,\n next\n Image 1 of 2 \n ...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News
1,1,2017-11-05 00:00:00,2017,11.0,5,Associated Press,"Militants storm security compound in Yemen, ki...","SANAA, Yemen – Militants set off a large car ...",https://www.foxnews.com/world/militants-storm-...,World,Fox News
2,2,2018-05-23 00:00:00,2018,5.0,23,Lukas Mikelionis,Professor found guilty of spraying fake blood ...,\n Patricia Hill was found guilty on ...,https://www.foxnews.com/us/professor-found-gui...,Second Amendment,Fox News
3,3,2017-08-08 00:00:00,2017,8.0,8,Christopher Wallace,Trump's generals: President turns to military ...,close Video Should Kelly rein in Trump on Twit...,https://www.foxnews.com/politics/trumps-genera...,Fox News Investigates,Fox News
4,4,2018-05-30 00:00:00,2018,5.0,30,Associated Press,Setback for outgoing Paraguay president's Sena...,"ASUNCION, Paraguay – Paraguay&aposs President...",https://www.foxnews.com/world/setback-for-outg...,World,Fox News


In [32]:
# create new df with article author, title, and article contents
df_new = df[['author', 'title','article']]
df_new.head()

Unnamed: 0,author,title,article
0,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,\n next\n Image 1 of 2 \n ...
1,Associated Press,"Militants storm security compound in Yemen, ki...","SANAA, Yemen – Militants set off a large car ..."
2,Lukas Mikelionis,Professor found guilty of spraying fake blood ...,\n Patricia Hill was found guilty on ...
3,Christopher Wallace,Trump's generals: President turns to military ...,close Video Should Kelly rein in Trump on Twit...
4,Associated Press,Setback for outgoing Paraguay president's Sena...,"ASUNCION, Paraguay – Paraguay&aposs President..."


**Section 3:** Tokenize and transform original dataframe into sentences to properly label

In [33]:
#function to split text into sentences
def split_into_sentences(text):
  return sent_tokenize(text)

In [34]:

# Function to clean sentences
def clean_sentence(sentence):
    # Remove unnecessary whitespace and line breaks
    sentence = re.sub(r'\s+', ' ', sentence).strip()

    # Remove non-alphanumeric characters
    sentence = re.sub(r'[^a-zA-Z0-9\s]', '', sentence)

    # Convert to lowercase
    sentence = sentence.lower()

    return sentence

In [35]:
# Apply the function to expand each item in 'sentence' into multiple rows
df_expanded = df['article'].apply(split_into_sentences).explode().reset_index()
df_expanded.columns = ['original_index', 'sentence']

# Clean each sentence
df_expanded['cleaned_sentence'] = df_expanded['sentence'].apply(clean_sentence)

# Remove rows with empty, single-word, or non-informative sentences
df_expanded = df_expanded[df_expanded['cleaned_sentence'].apply(lambda x: len(x.split()) > 5)]

# Show the first few rows of the cleaned DataFrame
df_expanded.head()

Unnamed: 0,original_index,sentence,cleaned_sentence
0,0,\n next\n Image 1 of 2 \n ...,next image 1 of 2 prev image 2 of 2 moscow th...
1,0,"A century later, their descendants say these h...",a century later their descendants say these hi...
2,0,As Russia approaches the centennial of the upr...,as russia approaches the centennial of the upr...
3,0,The Kremlin is avoiding any official commemora...,the kremlin is avoiding any official commemora...
4,0,"Alexis Rodzianko, whose great-grandfather was ...",alexis rodzianko whose greatgrandfather was sp...


In [36]:
df_expanded['sentence'] = df_expanded['cleaned_sentence']
df_expanded.head()

Unnamed: 0,original_index,sentence,cleaned_sentence
0,0,next image 1 of 2 prev image 2 of 2 moscow th...,next image 1 of 2 prev image 2 of 2 moscow th...
1,0,a century later their descendants say these hi...,a century later their descendants say these hi...
2,0,as russia approaches the centennial of the upr...,as russia approaches the centennial of the upr...
3,0,the kremlin is avoiding any official commemora...,the kremlin is avoiding any official commemora...
4,0,alexis rodzianko whose greatgrandfather was sp...,alexis rodzianko whose greatgrandfather was sp...


In [37]:
#merge back with orginal dataframe:
df_final = df_expanded.merge(df.drop('article', axis=1), left_on='original_index', right_index=True, how='left')
df_final.shape
df_final.pop('cleaned_sentence')
df_final = df_final.reset_index(drop=True)
df_final.head()

Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication
0,0,next image 1 of 2 prev image 2 of 2 moscow th...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News
1,0,a century later their descendants say these hi...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News
2,0,as russia approaches the centennial of the upr...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News
3,0,the kremlin is avoiding any official commemora...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News
4,0,alexis rodzianko whose greatgrandfather was sp...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News


**Section 4:** Create Classification function

In [38]:
df_final['objectivity_classification'] = None
df_final['Emotional_appeal_classification'] = None
df_final['classification_raw_1'] = None
df_final['classification_raw_2'] = None
df_final['classification_raw_3'] = None

In [39]:
# deinfe a function that processes batches of sentences and updates the dataframe

def classify_batch(df, start_index, batch_size, classification_column):
    # Generate batches and classify
    end_index = start_index + batch_size
    batch = df.iloc[start_index:end_index]

    print("Start Index:", start_index)
    print("End Index:", end_index)
    print("Batch Size:", len(batch))  # Check the size of the batch

    responses = []
    for sentence in batch['sentence']:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": """Classify the following sentence based on the criteria below:

1. Objectivity: Classify as 'Objective' or 'Subjective'.

- Objective: Sentences that present facts, verifiable data, or direct quotes without expressing personal opinions, bias, or rhetorical devices. These sentences should be straightforward, factual, and neutral.
   - Examples:
     - "The sky is blue and clear today." (Objective)
     - "According to experts, the economy is improving." (Objective)
     - "The company reported a 5% increase in profits last quarter." (Objective)
- Subjective: Sentences that express personal opinions, biases, sarcasm, exhortations, discriminatory remarks, or use rhetorical figures. These sentences often reflect the author's personal viewpoint or attempt to persuade the reader.
   - Examples:
     - "I believe this new policy will be a disaster." (Subjective)
     - "This is the best movie of the year." (Subjective)
     - "She handled the situation poorly." (Subjective)
- Unclassifiable: If there is not enough information to determine whether a sentence is objective or subjective.
   - Example:
     - "It remains to be seen what will happen next." (Unclassifiable)

2. Emotional Appeal: Classify as 'Emotional Appeal' if the sentence attempts to evoke a strong emotional response from the reader, such as fear, sympathy, or anger, often without presenting logical arguments or evidence.

- Emotional Appeal: Sentences that aim to provoke emotions like fear, sympathy, anger, joy, or sadness, often by using vivid imagery, personal stories, or dramatic language.
   - Examples:
     - "Imagine the heartbreak of those who lost everything in the fire." (Emotional Appeal)
     - "The heartwarming story of a dog who traveled miles to find its owner." (Emotional Appeal)
     - "The terrifying threat of an impending economic collapse has everyone on edge." (Emotional Appeal)
- Not Emotional Appeal: Sentences that present information, facts, or logical arguments without attempting to evoke a strong emotional response.
   - Examples:
     - "The government announced a new policy today." (Not Emotional Appeal)
     - "The study shows a significant decrease in crime rates over the past decade." (Not Emotional Appeal)
     - "The weather forecast predicts rain for the next three days." (Not Emotional Appeal)

Respond with a space-separated string of all applicable classifications (e.g., 'Objective', 'Objective Emotional Appeal', 'Unclassifiable', 'Subjective Emotional Appeal').
"""},
                {"role": "user", "content": sentence}
            ]
        )
        responses.append(response.choices[0].message.content)
    print("Number of Responses:", len(responses))  # Check the number of responses

    # Store results back into DataFrame
    print(responses)
    print(f"length response: {len(responses)}, length indexes = {len(df.loc[start_index:end_index-1, classification_column])}")
    df.loc[start_index:end_index-1, classification_column] = responses

In [40]:
#test with one run
df_test = df_final[:5]
df_test

Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication,objectivity_classification,Emotional_appeal_classification,classification_raw_1,classification_raw_2,classification_raw_3
0,0,next image 1 of 2 prev image 2 of 2 moscow th...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,,,
1,0,a century later their descendants say these hi...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,,,
2,0,as russia approaches the centennial of the upr...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,,,
3,0,the kremlin is avoiding any official commemora...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,,,
4,0,alexis rodzianko whose greatgrandfather was sp...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,,,


In [41]:
batch_size = 50  # Set batch size according to API limits and your preference
num_sentences = len(df_final)
counter = 1

for i in range(1, 4):
    classification_column = f'classification_raw_{i}'
    for start_index in range(0, num_sentences, batch_size):
        print(f"{counter}. Batch starting at index {start_index} being classified (Run {i}).")  # Optional: for tracking progress
        classify_batch(df_final, start_index, batch_size, classification_column)
        counter += 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
['Subjective', 'Subjective Emotional Appeal', 'Objective', 'Objective', 'Objective', 'Objective', 'Subjective', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Subjective', 'Subjective', 'Subjective', 'Subjective', 'Subjective Emotional Appeal', 'Subjective Emotional Appeal', 'Subjective', 'Subjective', 'Unclassifiable', 'Subjective Emotional Appeal', 'Subjective Emotional Appeal', 'Subjective', 'Subjective Emotional Appeal', 'Subjective', 'Subjective', 'Subjective', 'Objective', 'Subjective', 'Subjective Emotional Appeal', 'Objective', 'Objective', 'Objective', 'Subjective Emotional Appeal', 'Subjective Emotional Appeal', 'Objective', 'Objective', 'Objective', 'Objective', 'Objective', 'Unclassifiable', 'Objective Emotional Appeal', 'Objective', 'Unclassifiable', 'Objective', 'Objective', 'Subjective', 'Objective', 'Objective']
length response: 50, length indexes = 50
868. Batch starting at index 17000 

In [42]:
df_final.head()

Unnamed: 0.1,original_index,sentence,Unnamed: 0,date,year,month,day,author,title,url,section,publication,objectivity_classification,Emotional_appeal_classification,classification_raw_1,classification_raw_2,classification_raw_3
0,0,next image 1 of 2 prev image 2 of 2 moscow th...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,Objective,Objective,Objective
1,0,a century later their descendants say these hi...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,Subjective Emotional Appeal,Subjective Emotional Appeal,Subjective Emotional Appeal
2,0,as russia approaches the centennial of the upr...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,Objective,Objective,Objective
3,0,the kremlin is avoiding any official commemora...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,Subjective Emotional Appeal,Subjective,Subjective Emotional Appeal
4,0,alexis rodzianko whose greatgrandfather was sp...,0,2017-11-05 00:00:00,2017,11.0,5,Associated Press,Russia struggles with legacy of 1917 Bolshevik...,https://www.foxnews.com/world/russia-struggles...,RELIGION,Fox News,,,Subjective,Subjective Emotional Appeal,Subjective Emotional Appeal


In [43]:
df_final.to_csv('/content/drive/My Drive/Project 266 files/Modified_data.csv')