# Sentiment Analysis with NLP

### Problem statement

As a small nation dependent on foreign investments, spending, and good relationships with other countries, Singapore has a vested interest in its public perception and international reputation. Public perception has an impact on economy and foreign relations. For example, travel vlogs depicting Singapore in a good light are amplified in the online space, attracting more visitors and tourist spending. Singapore's good reputation in terms of safety and political stability are also integral to attracting businesses to set up here. It is therefore important to monitor what the online sphere is saying about Singapore and take relevant sentiments/perceptions as feedback for improvement.

The goal for this project is to build a classification model that can identify negative sentiment against Singapore. We do this by training our model on Youtube comments from videos about Singapore. We will use different classification models, such as Naive Bayes and Random Forest. We will use the F1-score as as a metric as it takes into account both precision (reduce false positive) and specificity (reduce false negatives).

### Video selection process

1. Filtered by most recent
   - Focused on videos about Singapore as a whole and/or its policies, rather than travel vlogs, businesses (ie. Singapore Airlines)
   - Disregarded non-English videos
   - Disregarded those with less than 100 comments or have comments disabled
2. Filtered by most views
   - Focused on videos about Singapore as a whole and/or its policies, rather than travel vlogs, businesses (ie. Singapore Airlines)
   - Disregarded non-English videos
   - Disregarded those with less than 100 comments or have comments disabled
3. Tried to get a mix of "positive" and "negative" sounding titles


#### Youtube videos used

|Video Title|Publish Date|Link|
|-|-|-|
|Singapore to hang first woman in nearly 20 years during executions set for this week: rights groups|26 Jul 2023|[Link](https://youtu.be/xJlgtV8L7Jc)|
|Inside Singapore’s deadly war on drugs: 101 East Documentary|19 Jan 2023|[Link](https://youtu.be/GL1JdIeoo4A)|
|The Dark Side of Singapore's Economic Miracle|9 Oct 2022|[Link](https://youtu.be/XDYy8z7krAI)|
|Why Singapore is One of the World's Richest Countries|9 Sep 2022|[Link](https://youtu.be/XSOgcpRbrCo)|
|Why Singapore Is Insanely Well Designed|14 Sep 2022|[Link](https://youtu.be/vyfJgJBB3Vk)|
|Singapore: The World's Only Successful Dictatorship?|3 Jul 2021|[Link](https://youtu.be/Hkxf4SC_SBk)|
|The Almost Perfect Country|26 Aug 2020|[Link](https://www.youtube.com/watch?v=8XNu282FkvM)|
|City of the Future: Singapore – Full Episode: National Geographic|24 Nov 2018|[Link](https://youtu.be/xi6r3hZe5Tg)|ates.


### Extracting comments using Google's Youtube API

In [1]:
# import necessary libraries

from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
import pandas as pd

In [2]:
# api credentials

apikey = #####

In [3]:
# get comments from a video

def get_all_video_comments(video_id):
    youtube = build('youtube', 'v3', developerKey=apikey)

    videos, comments, dates = [], [], []
    nextPageToken = None

    while True:
        response = youtube.commentThreads().list(
            part='snippet',
            videoId=video_id,
            textFormat='plainText',
            maxResults=100,
            pageToken=nextPageToken
        ).execute()

        for item in response['items']:
            video = item['snippet']['videoId']
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            date = item['snippet']['topLevelComment']['snippet']['publishedAt']
            
            videos.append(video)
            comments.append(comment)
            dates.append(date)

        nextPageToken = response.get('nextPageToken')

        if not nextPageToken:
            break

    df = pd.DataFrame({'video_id':videos, 'comment':comments, 'comment_date': dates})
    
    return df

In [4]:
# save comments to CSV

def save_to_csv(df, csv_filename):
    df.to_csv(csv_filename, index=False)

In [5]:
# combining all above functions
# batched processed to check on google api quota

# def main():
#     video_id = videoIDs
#     df = get_all_video_comments(video_id)
#     csv_filename = "./data/comments.csv"
#     save_to_csv(df, csv_filename)
#     print("Comments extracted and saved to CSV.")


# if __name__ == "__main__":
#     main()