# YouTube Channel Analysis PJT

## In this project, I will be analyzing the official YouTube channel of Kpop group LE SSERAFIM.

# 1. Introduction
## 1.1 Background
Le sserafim is a popular Kpop girl group debuted in May 2022. The group is known for creating many entertaining contents especially YouTube contents on their official YouTube channel. Nowadays, YouTube contents including music videos play a big part in establishing yourself as an artist, and Kpop artists in general are the front runner in effectively using social contents to boost their popularity. The most famous example is BTS, who created a great bond with fans ("army") through their clever use of social media contents. Le sserafim from Source Music (which is a subsidiary of HYBE) is definitely following the footsteps of BTS, and I thought it would be interesting to gain some insights on their YouTube content performance. 

I want to analyze the performance of their contents in general, and gain insights on what type of contents resonate well with their fans. Lastly, I want to suggest some content ideas that might do well considering past performance history. The scope of the project will be limited to the analysis of the Le sserafim channel only, and won't compare other Kpop group's channel or comparable YouTube channels. 

## 1.2 Objectives
In this project, I will focus on learning the followings:

- Get familiar with YouTube API, and use it to gather YouTube channel data
- Analyze video metrics to find out what type of contents are popular among fans:
    - What type of contents get the most views?
    - What type of contents have the most engagement among the fans?
    - What are some conents that didn't perform well, and why?
    - How's the video performance of contents over time?
- Utilize NLP techniques to gain some insights on fan reaction:
    - Explore the top 100 comments of videos to explore fan reaction
    - Is there a meaningful difference in comment reactions for content types?
    - What kind of questions or requests are given in the comment sections?
    - Is there any content idea that can come from the comment sections?
- Come up with some content idea with the insights gained from the above analysis

## 1.3 Project process
1. Get the channel video data, and comments data from LE SSERAFIM channel using YouTube Data API v3.
2. Preprocess data and add new data for better analysis
3. Perform exploratory data analysis
4. Conclusions

## 1.4 Dataset
### Data Source
For this project, I obtained the dataset myself by utilizing YouTube Data API v3. 

### Data Limitation
The data is a real-world dataset, suitable for research purpose. However, considering the API quota limit of 10,000 units per day, I only analyzed the Le sserafim channel and not other comparable channels. I will rely on my domain knowledge to compare Le sserafim YouTube channel with other comparable channels.
It can be interesting to compare the performance of Le sserafim channel with other Kpop groups' channels, which could be the next step of this project. 

Also, comments are limited to 100 top level comments per video due to the same API quota limit. The video metric is also total metrics, meaning it contains the total views, and total engagement metrics which makes it harder to compare each video in fair standing, because older videos naturally will have more views. It would be nice to have 7 day or 14 day video metrics, but the API does not have such options. We will use some basic discount method to account for upload time difference instead.

### Ethics of data source
According to Youtube API guide, the usage of YouTube API is free and open to anyone who created API KEY. As long as the API user abide by the YouTube API quota lmiit, there is no issue in using YouTube API to get data. Also, the data itself is a public data that can be obtained on YouTube channel, so there is no privacy issue involved with the data source.


In [123]:
# Import necessary libraries
from googleapiclient.discovery import build
import pandas as pd
import seaborn as sns
from IPython.display import JSON
import yt_api_key as api

In [124]:
# Get the API_KEY and build API service
channel_id = 'UCs-QBT4qkj_YiQw1ZntDO3g'

youtube = build('youtube', 'v3', developerKey=api.API_KEY)

In [125]:
# Define a function to get the basic chanenl stat and playlist id
def get_channel_stats(youtube, channel_id):
    
    request = youtube.channels().list(
        part="snippet,contentDetails,statistics",
        id=channel_id)
    response = request.execute()
    
    all_data = []
    
    for item in response['items']:
        data = {'channel_name': item['snippet']['title'],
                'subscribers': item['statistics']['subscriberCount'],
                'total_views': item['statistics']['viewCount'],
                'video_count': item['statistics']['videoCount'],
                'playlist_id': item['contentDetails']['relatedPlaylists']['uploads']
               }
        
        all_data.append(data)
    
    return pd.DataFrame(all_data)


# Get the video ids from the LESSERAFIM Channel
playlist_id = 'UUs-QBT4qkj_YiQw1ZntDO3g'

def get_video_ids(youtube, playlist_id):
    
    video_ids = []
    
    request = youtube.playlistItems().list(
        part="snippet,contentDetails",
        playlistId= playlist_id,
        maxResults = 50
    )

    response = request.execute()
    
    for item in response['items']:
        video_ids.append(item['contentDetails']['videoId'])
        
    next_page_token = response.get('nextPageToken')
    
    while next_page_token is not None:
        request = youtube.playlistItems().list(
            part="snippet,contentDetails",
            playlistId= playlist_id,
            maxResults = 50,
            pageToken = next_page_token            
            )

        response = request.execute()
        
        for item in response['items']:
            video_ids.append(item['contentDetails']['videoId'])

        next_page_token = response.get('nextPageToken')

    return video_ids


# Get the video stats

def get_video_stats(youtube, video_ids):

    all_video_stat = []
    
    for i in range(0, len(video_ids), 50):
        request = youtube.videos().list(
                part="snippet,contentDetails,statistics",
                id=','.join(video_ids[i:i+50])
            )
        response = request.execute()

        for video in response['items']:
            stats = {'snippet': ['publishedAt', 'title', 'description', 'tags'],
                     'contentDetails': ['duration'],
                     'statistics': ['viewCount','likeCount','favoriteCount','commentCount']}

            video_stat ={}
            video_stat['video_id'] = video['id']

            for i in stats.keys():
                for k in stats[i]:
                    try:
                        video_stat[k] = video[i][k]
                    except:
                        video_stat[k] = None

            all_video_stat.append(video_stat)
        
    return pd.DataFrame(all_video_stat)


# Get top 100 comments from each video
def get_comments_in_videos(youtube, video_ids):
    
    all_comments = []
    
    for video_id in video_ids:
        try:
            request = youtube.commentThreads().list(
                part="snippet,replies",
                videoId=video_id,
                maxResults = 100
            )

            response = request.execute()

            for comment in response['items']:
                comments = {}
                comments['video_id'] = comment['snippet']['videoId']

                toplevel = {'snippet': ['authorDisplayName', 'textOriginal', 'likeCount', 'publishedAt']}

                for i in toplevel['snippet']:
                    comments[i] = comment['snippet']['topLevelComment']['snippet'][i]

                comments['reply_count'] = comment['snippet']['totalReplyCount']

                all_comments.append(comments)
                
        except:
            print('No comment available for ' + video_id)
            
    return pd.DataFrame(all_comments)

In [126]:
# Get the channel stats
channel_info = get_channel_stats(youtube, channel_id)

In [127]:
channel_info

Unnamed: 0,channel_name,subscribers,total_views,video_count,playlist_id
0,LE SSERAFIM,2620000,710646918,318,UUs-QBT4qkj_YiQw1ZntDO3g


In [21]:
# Get the video ids for the Le sserafim channel
video_ids = get_video_ids(youtube, playlist_id)

In [111]:
# Create a dataframe with the video_ids
video_id_list = pd.DataFrame(video_ids, columns=['video_id'])

In [37]:
# Get the video stats for all the videos of Le sserafim channel
video_stat = get_video_stats(youtube, video_ids)

In [None]:
# Get top 100 comments of each video
comments = get_comments_in_videos(youtube, video_ids)

In [115]:
# Get video id list into csv file
video_id_list.to_csv('LS_video_ids.csv', index=False)
# Save video stat data into csv file
video_stat.to_csv('video_stat.csv', index=False)
# Save comments data in csv file
comments.to_csv('comments.csv', index=False)

In [117]:
# Read video id list csv file
video_id_list = pd.read_csv('LS_video_ids.csv')
# Get video_stat csv file data
video_stat = pd.read_csv('video_stat.csv')
# Get comments csv file
comments = pd.read_csv('comments.csv')