<a href="https://colab.research.google.com/github/tc3518/text-analysis-final/blob/main/tc3518_text_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Public Sentiments Toward AI Companions: A YouTube Comment Sentiment Analysis

## 1. Research Question
With the advancement of large language models and the widespread use of generative AI tools, some users have begun to view AI models as friends or even romantic partners. As AI companion technologies become increasingly integrated into emotional, relational, and even intimate aspects of daily life, policymakers face new questions about digital well-being, psychological safety, and the ethical governance of human–AI interactions. Public sentiment is a key indicator of emerging regulatory concerns, this study asks: How do YouTube users emotionally respond to AI companionship, and what might these reactions imply for ongoing policy debates around AI safety and regulation on emotional AI.

## 2. Literature Review
Existing research on AI companions and social robots highlights both their psychological benefits and their ethical dilemma. Meta-analytic evidence demonstrates that social robots can meaningfully reduce loneliness and depressive symptoms by providing emotional support and consistent interaction (Yen et al., 2024). These therapeutic effects suggest that AI companions may serve as accessible tools for individuals experiencing social isolation or psychological distress. Similarly, studies on social robot interventions in mental health contexts show generally positive outcomes, though results vary depending on user characteristics and the specific design of the intervention (Guemghar et al., 2022).
However, scholars have also raised concerns about the ethical and social implications of forming intimate bonds with AI agents. Shank, Koike, and Loughnan (2025) argue that emerging forms of “artificial intimacy”, including AI romance and AI companions may create risks such as emotional dependency, distorted expectations of relationships, and the potential exploitation of vulnerable users. They emphasize the need for careful oversight and ethical guidelines as these technologies become more immersive and personalized.

## 3. Research Methods
This study investigates public sentiment toward AI companion technologies by analyzing YouTube comments from videos discussing various forms of human–AI companionship. Data were collected using the YouTube Data API v3. To capture a broad range of content related to AI companionship, six targeted keyword queries were used: “ai companion,” “ai friendship,” “ai relationship,” “ai romantic relationship,” “ai boyfriend,” and “ai girlfriend.”  For each query, the API returned up to 50 of the most relevant videos based on YouTube’s default relevance-ranking algorithm.For each retrieved video, up to 300 publicly available top-level comments were collected, producing a dataset of user-generated responses reflecting diverse perspectives on AI companions. Non-text elements and non-English comments were removed, as the VADER sentiment analyzer is optimized for English. VADER was then used to compute sentiment scores for each comment, generating a compound value from –1 to +1. Comments were categorized as positive, negative, or neutral using standard thresholds. Descriptive statistical analysis was used to summarize overall sentiment patterns across the dataset.

The combination of YouTube API data collection and VADER sentiment analysis is well suited to answering this study’s research question. Because debates surrounding AI companionship unfold largely on public digital platforms, YouTube comments provide an authentic window into spontaneous, user-generated reactions that are difficult to capture through surveys. Using the YouTube API ensures systematic and replicable data collection across multiple keywords associated with AI companionship, while VADER offers an efficient and valid way to quantify emotional tone in large-scale comment datasets. Together, these methods directly fit the goal of mapping real-world public responses to emerging AI companionship.

### 3.0 Install & Imports

In [None]:
!pip -q install --upgrade nltk
!pip -q install requests tqdm
!pip install -q kaleido
!pip install -q langdetect

from langdetect import detect, LangDetectException
import re

import pandas as pd
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

import os
import json
from urllib.parse import urlencode

import requests
from tqdm import tqdm

import plotly.express as px
import plotly.io as pio

nltk.download('vader_lexicon')

sia = SentimentIntensityAnalyzer()


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### 3.1 Data Collection Using API

#### 3.1.1 Get YouTube API Key

In [None]:
import os
from getpass import getpass

os.environ["YOUTUBE_API_KEY"] = "AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo"

# Quick sanity check
assert os.environ.get("YOUTUBE_API_KEY"), "API key not set — please run the cell and paste your key."


#### 3.1.2 Helper: call the YouTube Data API

In [None]:
# Define a small utility function yt_get() that standardizes how we call the YouTube Data API v3.

API_KEY = os.environ.get("YOUTUBE_API_KEY") # Read the API key from environment variables.
BASE_URL = "https://www.googleapis.com/youtube/v3" # Base endpoint for all YouTube Data API v3 resources.

# If the API key is missing, stop immediately and show a clear error message.
if not API_KEY:
    raise ValueError("Missing API key. Set os.environ['YOUTUBE_API_KEY'] first.")

def yt_get(resource: str, params: dict) -> dict:
    """Call YouTube Data API v3.
    - resource: e.g., 'search', 'videos', 'commentThreads'
    - params: dict of query params (we append the API key here)
    Returns parsed JSON as a Python dict.
    """

    # Append the API key to the parameters.
    q = {**params, "key": API_KEY}

    # Construct the request URL.
    url = f"{BASE_URL}/{resource}?{urlencode(q)}"

    # Send the GET request to the API server.
    r = requests.get(url, timeout=30)

    # If the request failed (invalid parameters, quota exceeded, etc.), raise an error immediately so the user can fix the issue.
    r.raise_for_status()

    # Convert the response from JSON text into a Python dictionary.
    return r.json()

#### 3.1.3 Search videos for the topics (collect video IDs)

In [None]:
# Use the YouTube Search API ("search" endpoint) to find videos related to keywords).

# List of topic keywords for AI companionship.
QUERIES = [
    "ai companion",
    "ai friendship",
    "ai relationship",
    "ai romantic relationship",
    "ai boyfriend",
    "ai girlfriend",
]

TARGET_VIDEOS_PER_QUERY = 50   # Number of videos to collect for EACH keyword.
MAX_RESULTS = 50               # Max videos returned per API page (YouTube API limit).
video_hits = []                # This list will store metadata for ALL retrieved videos.

# Loop over each query term separately so that each one contributes
# up to TARGET_VIDEOS_PER_QUERY videos to the final dataset.
for q in QUERIES:
    print(f"\nSearching videos for query: {q}")

    page_token = None          # Used for pagination (moving through multiple result pages).
    collected = 0              # Counter for how many videos we have collected for this query.

    # tqdm shows a progress bar for the current query.
    with tqdm(total=TARGET_VIDEOS_PER_QUERY, desc=f"Query = '{q}'") as pbar:
        # Continue requesting result pages until we hit the per-query target or there are no more videos available.
        while collected < TARGET_VIDEOS_PER_QUERY:

            # Parameters for the YouTube Search API.
            params = {
                "part": "snippet",         # 'snippet' includes title, description, channel, etc.
                "q": q,                    # Current keyword.
                "type": "video",           # Only return videos (no channels or playlists).
                "maxResults": MAX_RESULTS, # Up to 50 results per page (API maximum).
                "order": "relevance",      # Sort by relevance to the query (default behavior).
            }

            # If we already fetched at least one page, request the next page.
            if page_token:
                params["pageToken"] = page_token

            # Call the helper function defined earlier to hit the API.
            data = yt_get("search", params)

            # The "items" list contains individual video search results.
            items = data.get("items", [])

            # If there are no more items, stop searching for this query.
            if not items:
                break

            # Extract useful metadata from each returned video.
            for it in items:
                vid = it.get("id", {}).get("videoId")
                if not vid:
                    continue  # Skip if for some reason there is no videoId.

                snip = it.get("snippet", {})

                # Append one row of video metadata to our list.
                video_hits.append({
                    "query": q,                           # Which keyword this video came from.
                    "video_id": vid,
                    "publishedAt": snip.get("publishedAt"),
                    "title": snip.get("title"),
                    "channelId": snip.get("channelId"),
                    "channelTitle": snip.get("channelTitle"),
                })

                collected += 1      # Update per-query counter.
                pbar.update(1)      # Update progress bar.

                # Stop if we have reached the target number of videos for this query.
                if collected >= TARGET_VIDEOS_PER_QUERY:
                    break

            # Prepare for the next page of results, if available.
            page_token = data.get("nextPageToken")

            # If the API did not return a nextPageToken, there are no more pages.
            if not page_token:
                break

# Convert the collected list of videos into a DataFrame for later steps.
videos_df = pd.DataFrame(video_hits)
videos_df.head()


Searching videos for query: ai companion


Query = 'ai companion': 100%|██████████| 50/50 [00:00<00:00, 124.46it/s]



Searching videos for query: ai friendship


Query = 'ai friendship': 100%|██████████| 50/50 [00:00<00:00, 95.05it/s]



Searching videos for query: ai relationship


Query = 'ai relationship': 100%|██████████| 50/50 [00:00<00:00, 152.45it/s]



Searching videos for query: ai romantic relationship


Query = 'ai romantic relationship': 100%|██████████| 50/50 [00:00<00:00, 99.22it/s]



Searching videos for query: ai boyfriend


Query = 'ai boyfriend': 100%|██████████| 50/50 [00:00<00:00, 109.41it/s]



Searching videos for query: ai girlfriend


Query = 'ai girlfriend': 100%|██████████| 50/50 [00:00<00:00, 98.19it/s]


Unnamed: 0,query,video_id,publishedAt,title,channelId,channelTitle
0,ai companion,-w4JrIxFZRA,2025-01-17T12:00:01Z,Can AI Companions Help Heal Loneliness? | Euge...,UCAuUUnT6oDeKwE6v1NGQxug,TED
1,ai companion,QGLGq8WIMzM,2022-04-15T10:45:49Z,The Rise of A.I. Companions [Documentary],UC4QZ_LsYcvcq7qOsOhpAX4A,ColdFusion
2,ai companion,gDfhxDHh2XY,2025-05-09T18:20:41Z,How to Turn ChatGPT into a True AI Companion,UCchPKgIzyzUjpAalcAmJTRw,Greener Thinking
3,ai companion,qL7kr1ckgDE,2024-10-08T17:21:43Z,HP AI Companion,UCqcEzOKaA4nc0gISklN0Ryw,HP
4,ai companion,_d08BZmdZu8,2025-05-04T11:14:42Z,Why people are falling in love with A.I. compa...,UC0L1suV8pVgO4pCAIBNGx5w,60 Minutes Australia


In [None]:
len(video_hits)

300

#### 3.1.4 Enrich videos: titles, descriptions, and stats

In [None]:
# After collecting a list of video IDs from the search step, we now retrieve detailed metadata for each video

# A helper generator to split a long list into smaller batches.
def chunked(seq, size):
    for i in range(0, len(seq), size):
        yield seq[i:i+size]


# Extract the unique list of video IDs collected earlier.
video_ids = videos_df["video_id"].dropna().unique().tolist()


video_details = []   # This list will store metadata for each video.

# Loop over batches of up to 50 IDs at a time.
for batch in tqdm(list(chunked(video_ids, 50)), desc="Fetching video details"):

    # Parameters for the "videos" endpoint.
    params = {
        "part": "snippet,statistics",  # Retrieve both descriptive info and numeric stats.
        "id": ",".join(batch),         # Comma-separated list of video IDs.
        "maxResults": 50,              # API limit.
    }

    # Call our previously defined API helper.
    data = yt_get("videos", params)

    # Iterate through the returned items (each corresponds to a video).
    for it in data.get("items", []):
        snip = it.get("snippet", {})       # General metadata.
        stats = it.get("statistics", {})   # Numeric engagement statistics.

        # Append structured metadata into our list.
        video_details.append({
            "video_id": it.get("id"),
            "title": snip.get("title"),
            "description": snip.get("description"),
            "publishedAt": snip.get("publishedAt"),
            "channelTitle": snip.get("channelTitle"),

            # Numeric stats arrive as strings; convert to integers.
            "viewCount": int(stats.get("viewCount", 0) or 0),
            "likeCount": int(stats.get("likeCount", 0) or 0),
            "commentCount": int(stats.get("commentCount", 0) or 0),
        })


# Convert the collected results into a DataFrame for easier analysis and merging later.
video_details_df = pd.DataFrame(video_details)
video_details_df.head()

Fetching video details: 100%|██████████| 5/5 [00:00<00:00,  8.69it/s]


Unnamed: 0,video_id,title,description,publishedAt,channelTitle,viewCount,likeCount,commentCount
0,-w4JrIxFZRA,Can AI Companions Help Heal Loneliness? | Euge...,AI companions could either be the cure to our ...,2025-01-17T12:00:01Z,TED,83236,2081,257
1,QGLGq8WIMzM,The Rise of A.I. Companions [Documentary],Become smarter in 5 minutes by signing up for ...,2022-04-15T10:45:49Z,ColdFusion,1085103,37971,4204
2,gDfhxDHh2XY,How to Turn ChatGPT into a True AI Companion,What if ChatGPT could be more than just a chat...,2025-05-09T18:20:41Z,Greener Thinking,14061,729,181
3,qL7kr1ckgDE,HP AI Companion,"From summarizing 10,000 words in seconds to bo...",2024-10-08T17:21:43Z,HP,83511,729,162
4,_d08BZmdZu8,Why people are falling in love with A.I. compa...,The rise of artificial intelligence companions...,2025-05-04T11:14:42Z,60 Minutes Australia,1446623,21993,7678


#### 3.1.5 Fetch top‑level comments for each video (with pagination)

In [None]:
# For every video in our dataset, call the YouTube "commentThreads" endpoint to collect top-level comments.

# This list will store all comments from all videos.
all_comments = []

# Iterate over every video_id that we previously collected and enriched in video_details_df.
for vid in tqdm(video_details_df["video_id"].tolist(), desc="Fetching comments"):
    page_token = None   # Used to move through pages of comments.
    fetched = 0         # Counter to track how many comments we've collected for this video.

    try:
        # Keep requesting pages of comments until there are no more or we hit our per-video cap (300 comments).
        while True:
            # Parameters for the "commentThreads" endpoint.
            params = {
                "part": "snippet",     # 'snippet' includes the top-level comment text and metadata.
                "videoId": vid,        # Current video we are collecting comments from.
                "maxResults": 100,     # API maximum per page for commentThreads.
                "order": "relevance",  # Sort by 'relevance'; use 'time' for chronological order.
                # "textFormat": "plainText" is default, so we omit it.
            }

            # If we already have a page_token from a previous request, include it to fetch the next page of comments.
            if page_token:
                params["pageToken"] = page_token

            # Call our API helper to hit the "commentThreads" endpoint.
            data = yt_get("commentThreads", params)
            items = data.get("items", [])

            # Loop over each returned comment thread item.
            for it in items:
                # Each item contains a "topLevelComment" with its own snippet.
                top = it.get("snippet", {}).get("topLevelComment", {})
                s = top.get("snippet", {})

                # Append a clean, structured record for each top-level comment.
                all_comments.append({
                    "video_id": vid,
                    "comment_id": top.get("id"),
                    "author": s.get("authorDisplayName"),
                    "publishedAt": s.get("publishedAt"),
                    "likeCount": s.get("likeCount", 0),
                    "text": s.get("textOriginal", ""),  # raw comment text
                })
                fetched += 1   # Increment per-video comment counter.

            # Prepare the next page of results, if any.
            page_token = data.get("nextPageToken")

            # If there is no nextPageToken, then we have reached the last page of comments for this video.
            if not page_token:
                break  # no more pages

            # Safety cap: stop if we have already collected 300 comments
            if fetched >= 300:
                break

    # If the API returns an HTTP error (e.g., comments disabled, restricted, or quota/permission issues), skip this video and continue with the rest.
    except requests.HTTPError as e:
        print(f"Skipping {vid} due to HTTP error: {e}")
        continue

# Convert the accumulated list of comment dictionaries into a DataFrame.
comments_df = pd.DataFrame(all_comments)
comments_df.head(3)

Fetching comments:   5%|▌         | 12/234 [00:06<01:17,  2.86it/s]

Skipping O14XEGt-XsY due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=O14XEGt-XsY&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  12%|█▏        | 27/234 [00:12<01:04,  3.22it/s]

Skipping JgLVX4z19wg due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=JgLVX4z19wg&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  33%|███▎      | 77/234 [00:35<00:42,  3.69it/s]

Skipping hX6dr5FTRuE due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=hX6dr5FTRuE&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  47%|████▋     | 110/234 [00:52<01:04,  1.93it/s]

Skipping A5crWeLKDwc due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=A5crWeLKDwc&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  48%|████▊     | 112/234 [00:53<01:01,  1.99it/s]

Skipping 7r-T_nQCIkw due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=7r-T_nQCIkw&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  51%|█████▏    | 120/234 [00:57<00:52,  2.17it/s]

Skipping Yn88JO8AVWA due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=Yn88JO8AVWA&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments:  89%|████████▉ | 209/234 [01:50<00:24,  1.01it/s]

Skipping EATERfrTla0 due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=EATERfrTla0&maxResults=100&order=relevance&key=AIzaSyDJjm9G43oGHJSfAp7PSU1OyiO4P81aPTo


Fetching comments: 100%|██████████| 234/234 [02:08<00:00,  1.82it/s]


Unnamed: 0,video_id,comment_id,author,publishedAt,likeCount,text
0,-w4JrIxFZRA,UgzFhN5WMrHYDebVO7J4AaABAg,@pavllann,2025-01-17T16:27:03Z,79,I watched an episode of Black Mirror that star...
1,-w4JrIxFZRA,Ugw3Koj5EM2TCWdwa_R4AaABAg,@akhmadwahid3119,2025-01-20T08:40:22Z,8,"I was born fatherless, and I really had a stro..."
2,-w4JrIxFZRA,UgxCOdiSr9HUrVgdkcx4AaABAg,@HazyDaisyWasHere,2025-03-14T10:01:36Z,22,I think people need to stop worrying about wha...


In [None]:
len(comments_df)

34691

#### 3.1.6 Clean and filter comments (Data Cleaning)

In [None]:
# Removing unusable rows, obvious noise, and non-English comments.

# Replace None/NaN with empty strings, then strip leading/trailing whitespace.
comments_df["text"] = comments_df["text"].fillna("").astype(str).str.strip()

# Drop rows where text is now empty (no usable content).
comments_df = comments_df[comments_df["text"] != ""].copy()

# Remove URLs and collapse extra whitespace.
def remove_noise(text: str) -> str:
    # Remove http/https URLs
    text = re.sub(r"http\S+", " ", text)
    text = re.sub(r"www\.\S+", " ", text)
    # Collapse multiple spaces/newlines into a single space
    text = re.sub(r"\s+", " ", text)
    return text.strip()

comments_df["text"] = comments_df["text"].apply(remove_noise)


# Keep only English-language comments
def is_english(text: str) -> bool:
    """
    Detect the language of a comment.
    Returns True if detected as English, otherwise False.
    If detection fails, we treat it as non-English to be conservative.
    """
    try:
        return detect(text) == "en"
    except LangDetectException:
        return False

comments_df["is_english"] = comments_df["text"].apply(is_english)
comments_df = comments_df[comments_df["is_english"]].copy()


# Remove duplicate comments (same comment_id)
if "comment_id" in comments_df.columns:
    comments_df = comments_df.drop_duplicates(subset=["comment_id"]).copy()

### 3.2 Sentiment Analysis Using VADER

#### 3.2.1 Score text fields

In [None]:
# Use the VADER sentiment analyzer (sia) to compute a 'compound' sentiment score for:
#       - video titles
#       - video descriptions
#       - comment text (if any)

# Helper function: Takes a text string and returns ONLY the 'compound' score. If text is None or NaN, replace it with an empty string.
def compound_score(text):
    return sia.polarity_scores(text or "")["compound"]


# Score video titles and descriptions
video_details_df["title_compound"] = (
    video_details_df["title"]
    .fillna("")
    .apply(compound_score)      # Apply VADER compound scoring
)

video_details_df["description_compound"] = (
    video_details_df["description"]
    .fillna("")
    .apply(compound_score)
)


# Score comment text (if comments exist)
if not comments_df.empty:
    comments_df["compound"] = (
        comments_df["text"]
        .fillna("")
        .apply(compound_score)
    )


#### 3.2.2 Aggregate to video level

In [None]:
# After computing VADER sentiment for each individual comment, this step aggregates (summarizes) sentiment by video.

# Common thresholds used to convert VADER compound score into labels.
POS, NEG = 0.05, -0.05

# If we *have* comments, aggregate them
if not comments_df.empty:

    # Assign a discrete sentiment label based on the compound value.
    comments_df["sentiment_label"] = comments_df["compound"].apply(
        lambda c: "pos" if c > POS else ("neg" if c < NEG else "neu")
    )

    # Group by video_id and compute per-video summary statistics.
    agg = (
        comments_df.groupby("video_id").agg(
            n_comments=("comment_id", "count"),           # number of comments
            mean_compound=("compound", "mean"),           # average sentiment
            pct_pos=("sentiment_label", lambda s: (s == "pos").mean()),
            pct_neg=("sentiment_label", lambda s: (s == "neg").mean()),
            pct_neu=("sentiment_label", lambda s: (s == "neu").mean()),
        ).reset_index()
    )

# If we have *no* comments, create an empty placeholder so that the merge with video_details_df still works.
else:
    agg = pd.DataFrame(
        columns=["video_id", "n_comments", "mean_compound", "pct_pos", "pct_neg", "pct_neu"]
    )

# Merge video-level sentiment with earlier video metadata
summary = (
    video_details_df.merge(agg, on="video_id", how="left")
    .assign(
        # Round title/description compound scores for readability
        title_compound=lambda d: d["title_compound"].round(3),
        description_compound=lambda d: d["description_compound"].round(3),

        # Round sentiment metrics to 3 decimals (or convert to %)
        mean_compound=lambda d: d["mean_compound"].round(3),
        pct_pos=lambda d: (d["pct_pos"] * 100).round(1),
        pct_neg=lambda d: (d["pct_neg"] * 100).round(1),
        pct_neu=lambda d: (d["pct_neu"] * 100).round(1),
    )
)


# Select and order columns for display
summary_cols = [
    "video_id", "channelTitle", "publishedAt", "viewCount", "likeCount", "commentCount",
    "title_compound", "description_compound",
    "n_comments", "mean_compound", "pct_pos", "pct_neg", "pct_neu", "title"
]


# Show the top 10 videos with the highest mean sentiment
summary[summary_cols].sort_values(by=["mean_compound"], ascending=False).head(10)

Unnamed: 0,video_id,channelTitle,publishedAt,viewCount,likeCount,commentCount,title_compound,description_compound,n_comments,mean_compound,pct_pos,pct_neg,pct_neu,title
40,su2yhJUC1zg,Reality Not Found,2024-09-01T13:30:25Z,1307,24,3,0.572,0.0,1.0,0.973,100.0,0.0,0.0,What If We Had the Perfect AI Companion?
182,4UNNwbOlh4g,Kindroidpolly,2025-05-11T17:40:08Z,1942,25,2,0.627,0.501,1.0,0.893,100.0,0.0,0.0,My AI boyfriend Is so handsome
30,FRKE1W92Un0,Personal Human AI,2025-01-22T19:45:53Z,29103,3654,1,0.188,0.705,1.0,0.804,100.0,0.0,0.0,Not Alone. Not Anymore. | Personal Human AI - ...
66,WRL98QJi7So,TEDx Talks,2023-05-22T16:12:15Z,3802,65,4,0.477,0.992,3.0,0.796,100.0,0.0,0.0,Friends with AI? It's complicated. | Marisa T...
133,zJE4XlcKSSA,Irish Independent,2023-08-15T05:00:28Z,6421,10,3,0.0,0.0,3.0,0.551,100.0,0.0,0.0,Can AI relationships fill a void? @MalieCoyne ...
26,I4NObpfGJWc,Future Ai Robots,2025-07-01T15:08:03Z,28360,503,4,0.459,0.963,1.0,0.511,100.0,0.0,0.0,Smiling Together: Human Meets Her AI Companion...
5,uujDFUZpVXE,TEDx Talks,2025-01-27T16:47:38Z,4251,114,16,0.0,0.987,12.0,0.504,75.0,16.7,8.3,How AI Companions Will Change Your Life | Rand...
76,axQKOlqF2JQ,The Podcast Today,2024-10-05T08:14:19Z,2465,46,1,0.494,0.0,1.0,0.42,100.0,0.0,0.0,ChatGPT Your AI Friend Who Knows Everything
140,48n12cN8M3E,Bankless,2025-04-30T19:00:09Z,7326,102,12,0.557,0.0,8.0,0.405,75.0,0.0,25.0,People are now falling in love with AI #ai #ch...
38,Fj9AEvRD82A,Obscure Nerd VR,2025-11-07T01:15:03Z,1041,75,13,0.0,0.871,8.0,0.4,75.0,0.0,25.0,This AI Companion App has WILD Animated Avatar...


#### 3.2.3 — Plotly Visualizations

In [None]:
# Set a renderer suitable for Colab. Alternatives: 'notebook_connected', 'svg', 'png'
pio.renderers.default = "colab"

# --- 1) Bar chart: Top 10 and bottom 10 videos by number of fetched comments and views ---

if "summary" in globals() and not summary.empty and summary["n_comments"].notna().any():

    top_comments = summary.sort_values("n_comments", ascending=False).head(10).copy()

    # Truncate long titles
    top_comments["title_short"] = (
        top_comments["title"].str.slice(0, 60)
        + top_comments["title"].apply(lambda t: "..." if len(str(t)) > 60 else "")
    )

    fig_top_comments_sent = px.bar(
        top_comments,
        x="title_short",
        y="mean_compound",
        hover_data=["title", "channelTitle", "viewCount", "likeCount", "n_comments"],
        title="Top 10 most commented videos (mean sentiment)",
        labels={
            "title_short": "Video title (truncated)",
            "mean_compound": "Mean compound sentiment"
        },
    )

    fig_top_comments_sent.update_layout(xaxis_tickangle=30)
    fig_top_comments_sent.show()

    fig_top_comments_sent.write_html(
        "plot_top10_most_commented_sentiment.html",
        include_plotlyjs="cdn",
        full_html=True
    )

else:
    print("No comment data available to plot top-commented sentiment chart.")


if "summary" in globals() and not summary.empty and summary["viewCount"].notna().any():

    top_views = summary.sort_values("viewCount", ascending=False).head(10).copy()

    # Title truncation
    top_views["title_short"] = (
        top_views["title"].str.slice(0, 60)
        + top_views["title"].apply(lambda t: "..." if len(str(t)) > 60 else "")
    )

    fig_top_views_sent = px.bar(
        top_views,
        x="title_short",
        y="mean_compound",
        hover_data=["title", "channelTitle", "viewCount", "likeCount", "n_comments"],
        title="Top 10 most viewed videos (mean sentiment)",
        labels={
            "title_short": "Video title (truncated)",
            "mean_compound": "Mean compound sentiment"
        },
    )

    fig_top_views_sent.update_layout(xaxis_tickangle=30)
    fig_top_views_sent.show()

    fig_top_views_sent.write_html(
        "plot_top10_most_viewed_sentiment.html",
        include_plotlyjs="cdn",
        full_html=True
    )

else:
    print("No viewCount data available to plot top-viewed sentiment chart.")

In [None]:
# --- 2) Scatter: Relationship between viewCount and mean comment sentiment ---
if "summary" in globals() and not summary.empty and summary["mean_compound"].notna().any():

    # Drop videos with missing mean_compound to avoid errors.
    scatter_df = summary.dropna(subset=["mean_compound"]).copy()

    # Use log scale for views if view counts vary widely.
    fig_scatter = px.scatter(
        scatter_df,
        x="viewCount",
        y="mean_compound",
        hover_name="title",
        hover_data=["channelTitle", "likeCount", "n_comments"],
        title="View count vs. mean comment sentiment",
        labels={"viewCount": "Views", "mean_compound": "Mean compound sentiment"},
    )
    fig_scatter.update_xaxes(type="log")

    fig_scatter.show()

    # Save the scatter plot.
    fig_scatter.write_html("plot_views_vs_sentiment.html", include_plotlyjs="cdn", full_html=True)

else:
    print("No sentiment summary to plot. Ensure the aggregation step ran successfully.")

### 3.3 Save datasets

In [None]:
# Export tidy CSVs for later analysis or visualization
videos_df.to_csv("videos_search_hits.csv", index=False)
video_details_df.to_csv("video_details.csv", index=False)
comments_df.to_csv("video_comments.csv", index=False)
summary.to_csv("video_sentiment_summary.csv", index=False)

print("Saved: videos_search_hits.csv, video_details.csv, video_comments.csv, video_sentiment_summary.csv")


Saved: videos_search_hits.csv, video_details.csv, video_comments.csv, video_sentiment_summary.csv


## 4. Discussion
From the bar plots we can observe that the sentiment patterns emerging from the most commented and most viewed videos reveal a public discourse that is highly engaged but emotionally moderate. Among the ten videos with the largest number of comments, sentiment scores cluster around neutral or mildly positive values. The general public's attitude toward AI companions is neutral to slightly positive and extensive comment activity does not necessarily reflect emotional endorsement. From the titles of these videos we can see they often involve debates surrounding ethical risks, societal consequences, or the psychological implications of AI companionship. Such topics draw viewers to share differing perspectives, creating high engagement without producing strongly positive or negative average sentiment. A similar pattern appears in the ten most viewed videos. Despite substantial viewership, these content does not generate moral panic or enthusiasm either. It is worth noting that, judging by their titles, some of the top 10 most-viewed videos appear to have no direct connection to the AI companions explored in this project. Instead, they seem to be funny short videos generated by AI that revolve around the theme of romantic relationships. This inaccuracy is an inherent drawback of API-scraped data.


The scatterplot of view count versus sentiment further reinforces the pattern that public sentiment remains neutral across different view counts. There is no clear correlation between public attention and emotional valence: videos with millions of views display sentiment scores similar to those with only a few thousand views. Meanwhile, the majority of data points correspond to a mean greater than zero, with only a few data points falling below the x-axis. This indicates that the public maintains a neutral to slightly positive attitude toward AI companions, and this sentiment does not change based on the popularity of videos or topics.

The overall neutral-to-positive public sentiment suggests that users do not currently perceive AI companions as an urgent societal threat, but the high engagement surrounding ethical concerns indicates that the public is actively negotiating the boundaries of acceptable human–AI intimacy. Policymakers should treat this as an early window of opportunity: regulations can be introduced before strong polarization or backlash takes shape. Given that the most discussed videos involve psychological risks, dependency issues, and the potential manipulation of vulnerable users, regulatory frameworks should prioritize transparency, data protection, and safeguards against emotional exploitation.

## References
Guemghar, I., et al. (2022). Social robot interventions in mental health care and their outcomes.

Shank, D. B., Koike, M., & Loughnan, S. (2025). Artificial intimacy: Ethical issues of AI romance. Trends in Cognitive Sciences, 29(6), 499–501.

Yen, H. Y., et al. (2024). The effect of social robots on depression and loneliness: A meta-analysis. Psychiatry Research.