# YouTube Trending Videos Analysis PJT

## In this project, I will be tracking YouTube trending videos from the trend tab, and create a Tableau dashboard.

# 1. Introduction
## 1.1 Background
YouTube has a trending tab that shows the trending videos in certain country. Since YouTube is a leading content platform for many countries, we can spot content trends just by observing trending videos in different countries. I wanted to create a  Tableau dashboard that tracks trending videos for three countries (Korea, Japan, and USA), and use it to analyze content trends in 3 different countries. The scope of the project will be limited to the analysis of these 3 countries' weekly-updated trending videos. 

## 1.2 Objectives
In this project, I will focus on learning the followings:

- Get familiar with YouTube API, and use the API to gather YouTube trending videos
- Use cron to update data weekly
- Use Tableau to create a interactive dashboard
- Analyze trending videos for each country to find out what type of content are popular in different countries:
    - What type of content is the hot trend right now for Korea, Japan, and USA?
    - What content format is gaining more popularity?
    - What content format is not as popular?
    - What channels entered the trending chart the most?

## 1.3 Project process
1. Get the trending video data for each country using YouTube Data API v3, and update the data weekly via cron
2. Preprocess data and engineer new features
3. Create a Tableau dashboard
4. Analyze data with Tableau dashboard

## 1.4 Dataset
### Data Source
For this project, I obtained the dataset myself by utilizing YouTube Data API v3. 

### Data Limitation
The data is a real-world dataset, suitable for research purposes. However, considering the API quota limit of 10,000 units per day, I am getting trending videos for just 3 countries (Korea, Japan, USA). Also, the data will be updated every week, since the rending chart doesn't change much on a daily basis. 

### Ethics of data source
According to the Youtube API guide, the usage of YouTube API is free and open to anyone who created API KEY. As long as the API user abides by the YouTube API quota limit, there is no issue in using YouTube API to get data. Also, the data itself is public data that can be obtained from the YouTube channel, so there is no privacy issue involved with the data source.


In [55]:
# Import basic libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

# Import API related libararies
from googleapiclient.discovery import build
from IPython.display import JSON

# Import API KEY from the config file
import sys
sys.path.append('/Users/minguyeo/Documents/coding/pythonPJT/config')
import yt_api_key as api

# Enable Korean Font
from matplotlib import font_manager, rc
import platform

if platform.system() == 'Windows':
# For windows user
    font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
    rc('font', family=font_name)
else:    
# For mac user
    rc('font', family='AppleGothic')

plt.rcParams['axes.unicode_minus'] = False

# 2. Data collection with YouTube Data API v3

First, I created an API key from the google cloud platform(GCP) console, and enabled YouTube Data API v3 for my account. I saved the API_KEY in the separate config directory, so I can import the API_KEY without showing the key in the notebook. Then, I checked the YouTube Data API documentation to find out how to get trending videos for each region (Korea, Japan, USA). Then created a `get_trending_video` function to collect video statistics of trending videos for each country via the API. 

In [16]:
# Build API service
youtube = build('youtube', 'v3', developerKey=api.API_KEY)

regions = ['KR', 'JP', 'US']

The region code for three countries:
- Korea = KR
- Japan = JP
- United States = US

In [10]:
# Find the region code for Korea, Japan, USA
request = youtube.i18nRegions().list(
        part="snippet",
        hl="en_US"
    )
response = request.execute()
print(response['items'])

In [66]:
def get_trending_video(youtube, regions):

    all_video_stat = []
    
    for country in regions:
        request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            chart="mostPopular",
            maxResults = 50,
            regionCode= country,
        )

        response = request.execute()

        for video in response['items']:
            stats = {'snippet':['channelId','channelTitle','title','publishedAt','description','thumbnails','tags'],
                    'contentDetails':['duration'],
                    'statistics':['viewCount','likeCount','commentCount']
                   }

            video_stat = {}
            video_stat['region'] = country
            video_stat['video_id'] = video['id']

            for i in stats.keys():
                for k in stats[i]:
                    try: 
                        video_stat[k] = video[i][k]
                    except:
                        video_stat[k] = None
                            
            all_video_stat.append(video_stat)

        # Check for next page
        next_page_token = response.get('nextPageToken')

        while next_page_token is not None:
            request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            chart="mostPopular",
            maxResults = 50,
            regionCode= "KR",
            pageToken = next_page_token
        )

            response = request.execute()

            for video in response['items']:

                video_stat = {}
                video_stat['region'] = country
                video_stat['video_id'] = video['id']

                for i in stats.keys():
                    for k in stats[i]:
                        try: 
                            video_stat[k] = video[i][k]
                        except:
                            video_stat[k] = None

                all_video_stat.append(video_stat)

            next_page_token = response.get('nextPageToken')


    return pd.DataFrame(all_video_stat)


In [67]:
trending_vid = get_trending_video(youtube, regions)

In [82]:
trending_vid.head()

Unnamed: 0,region,video_id,channelId,channelTitle,title,publishedAt,description,thumbnails,tags,duration,viewCount,likeCount,commentCount
0,KR,YudHcBIxlYw,UCOmHUn--16B90oW2L6FRR3A,BLACKPINK,JISOO - ‘꽃(FLOWER)’ M/V,2023-03-31T04:00:14Z,JISOO - ‘꽃(FLOWER)’ \n\nABC 도레미만큼 착했던 나\n그 눈빛이...,{'default': {'url': 'https://i.ytimg.com/vi/Yu...,"[YG Entertainment, YG, 와이지, K-pop, BLACKPINK, ...",PT3M5S,69746003,6691845,946726
1,KR,LsrJNUT0eTk,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-1ㅣ살벌한 킬러 고객님들도 심장 뛰게 만드는 출장ㅣ🧳출장십오야2 X 길복순,2023-04-02T09:10:21Z,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,{'default': {'url': 'https://i.ytimg.com/vi/Ls...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",PT21M34S,598202,7230,490
2,KR,U6z9qdx538s,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-2ㅣ킬러 고객 맞춤 '신상 게임'의 등장에 줄줄이 쓰러짐ㅣ🧳출장십오야2 ...,2023-04-02T09:34:11Z,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,{'default': {'url': 'https://i.ytimg.com/vi/U6...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",PT16M35S,299944,4917,224
3,KR,1WEAJ-DFkHE,UCX6OQ3DkcsbYNE6H8uQQuVA,MrBeast,"$1 vs $500,000 Plane Ticket!",2023-04-01T20:00:04Z,Check out ALL of MrBeast’s awesome jobs or dis...,{'default': {'url': 'https://i.ytimg.com/vi/1W...,,PT12M20S,44486408,2584946,84790
4,KR,Yvqz-BYBnp4,UCBJeMCIeLQos7wacox4hmLQ,Serie A,Napoli-Milan 0-4 | Leao and the Rossoneri stun...,2023-04-02T21:15:30Z,The unpredictable unfolds at the Diego Maradon...,{'default': {'url': 'https://i.ytimg.com/vi/Yv...,"[Ronaldo, Serie A, Dybala, highlights, Juventu...",PT3M26S,1835161,44784,2248


In [84]:
trending_vid.columns

Index(['region', 'video_id', 'channelId', 'channelTitle', 'title',
       'publishedAt', 'description', 'thumbnails', 'tags', 'duration',
       'viewCount', 'likeCount', 'commentCount'],
      dtype='object')

### Change column names

In [85]:
columns = {'channelId':'channel_id','channelTitle':'channel_name','publishedAt':'upload_date','viewCount':'view','likeCount':'like','commentCount':'comment'}
trending_vid = trending_vid.rename(columns = columns)

### Check for null, empty values

Tags, like, comment are the columns with null values. Nothing has to be done for null, empty values.

In [87]:
trending_vid.isnull().sum()

region           0
video_id         0
channel_id       0
channel_name     0
title            0
upload_date      0
description      0
thumbnails       0
tags            90
duration         0
view             0
like            11
comment          6
dtype: int64

### Reformat values and change data types

Change duration, upload_date format to datetime format

In [89]:
# Change upload_date column to datetime object
trending_vid['upload_date'] = pd.to_datetime(trending_vid['upload_date']).dt.tz_convert(None)

# Convert duration (isodate format) to datetime format
import isodate
trending_vid['duration'] =  trending_vid['duration'].apply(lambda x: isodate.parse_duration(x))

# Change columns with number values to int type
trending_vid[['view','like','comment']] = trending_vid[['view','like','comment']].apply(pd.to_numeric)

# Change description, and title to string type
trending_vid['description'] = trending_vid['description'].astype(str)
trending_vid['title'] = trending_vid['title'].astype(str)

In [92]:
trending_vid.head()

Unnamed: 0,region,video_id,channel_id,channel_name,title,upload_date,description,thumbnails,tags,duration,view,like,comment
0,KR,YudHcBIxlYw,UCOmHUn--16B90oW2L6FRR3A,BLACKPINK,JISOO - ‘꽃(FLOWER)’ M/V,2023-03-31 04:00:14,JISOO - ‘꽃(FLOWER)’ \n\nABC 도레미만큼 착했던 나\n그 눈빛이...,{'default': {'url': 'https://i.ytimg.com/vi/Yu...,"[YG Entertainment, YG, 와이지, K-pop, BLACKPINK, ...",0 days 00:03:05,69746003,6691845.0,946726.0
1,KR,LsrJNUT0eTk,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-1ㅣ살벌한 킬러 고객님들도 심장 뛰게 만드는 출장ㅣ🧳출장십오야2 X 길복순,2023-04-02 09:10:21,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,{'default': {'url': 'https://i.ytimg.com/vi/Ls...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",0 days 00:21:34,598202,7230.0,490.0
2,KR,U6z9qdx538s,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-2ㅣ킬러 고객 맞춤 '신상 게임'의 등장에 줄줄이 쓰러짐ㅣ🧳출장십오야2 ...,2023-04-02 09:34:11,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,{'default': {'url': 'https://i.ytimg.com/vi/U6...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",0 days 00:16:35,299944,4917.0,224.0
3,KR,1WEAJ-DFkHE,UCX6OQ3DkcsbYNE6H8uQQuVA,MrBeast,"$1 vs $500,000 Plane Ticket!",2023-04-01 20:00:04,Check out ALL of MrBeast’s awesome jobs or dis...,{'default': {'url': 'https://i.ytimg.com/vi/1W...,,0 days 00:12:20,44486408,2584946.0,84790.0
4,KR,Yvqz-BYBnp4,UCBJeMCIeLQos7wacox4hmLQ,Serie A,Napoli-Milan 0-4 | Leao and the Rossoneri stun...,2023-04-02 21:15:30,The unpredictable unfolds at the Diego Maradon...,{'default': {'url': 'https://i.ytimg.com/vi/Yv...,"[Ronaldo, Serie A, Dybala, highlights, Juventu...",0 days 00:03:26,1835161,44784.0,2248.0


### Engineer new features

Get thumbnail image url and week number

In [121]:
# Get medium quality image url from each thumbnails column 
trending_vid['thumbnail_url'] = trending_vid['thumbnails'].apply(lambda x: x['medium']['url'])

# Drop original thumbnails column
trending_vid = trending_vid.drop('thumbnails', axis=1)

In [114]:
# Get current year, week info and save it in column year, and week
trending_vid['year'] = datetime.datetime.today().year
trending_vid['week'] = 'W'+str(datetime.datetime.today().isocalendar()[1])

In [122]:
trending_vid.head()

Unnamed: 0,region,video_id,channel_id,channel_name,title,upload_date,description,tags,duration,view,like,comment,thumbnail_url,week,year
0,KR,YudHcBIxlYw,UCOmHUn--16B90oW2L6FRR3A,BLACKPINK,JISOO - ‘꽃(FLOWER)’ M/V,2023-03-31 04:00:14,JISOO - ‘꽃(FLOWER)’ \n\nABC 도레미만큼 착했던 나\n그 눈빛이...,"[YG Entertainment, YG, 와이지, K-pop, BLACKPINK, ...",0 days 00:03:05,69746003,6691845.0,946726.0,https://i.ytimg.com/vi/YudHcBIxlYw/mqdefault.jpg,W14,2023
1,KR,LsrJNUT0eTk,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-1ㅣ살벌한 킬러 고객님들도 심장 뛰게 만드는 출장ㅣ🧳출장십오야2 X 길복순,2023-04-02 09:10:21,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",0 days 00:21:34,598202,7230.0,490.0,https://i.ytimg.com/vi/LsrJNUT0eTk/mqdefault.jpg,W14,2023
2,KR,U6z9qdx538s,UCQ2O-iftmnlfrBuNsUUTofQ,채널 십오야,🧳EP.1-2ㅣ킬러 고객 맞춤 '신상 게임'의 등장에 줄줄이 쓰러짐ㅣ🧳출장십오야2 ...,2023-04-02 09:34:11,#유료광고포함 #전도연 #설경구 #김시아 #이솜 #이연 #길복순 #출장십오야2\n\...,"[나영석, 나PD, 아간세, 아이슬란드간세끼, 이수근, 수근세끼, 은지원, 지원세끼...",0 days 00:16:35,299944,4917.0,224.0,https://i.ytimg.com/vi/U6z9qdx538s/mqdefault.jpg,W14,2023
3,KR,1WEAJ-DFkHE,UCX6OQ3DkcsbYNE6H8uQQuVA,MrBeast,"$1 vs $500,000 Plane Ticket!",2023-04-01 20:00:04,Check out ALL of MrBeast’s awesome jobs or dis...,,0 days 00:12:20,44486408,2584946.0,84790.0,https://i.ytimg.com/vi/1WEAJ-DFkHE/mqdefault.jpg,W14,2023
4,KR,Yvqz-BYBnp4,UCBJeMCIeLQos7wacox4hmLQ,Serie A,Napoli-Milan 0-4 | Leao and the Rossoneri stun...,2023-04-02 21:15:30,The unpredictable unfolds at the Diego Maradon...,"[Ronaldo, Serie A, Dybala, highlights, Juventu...",0 days 00:03:26,1835161,44784.0,2248.0,https://i.ytimg.com/vi/Yvqz-BYBnp4/mqdefault.jpg,W14,2023


### Save cleaned data to csv file (weekly)

In [123]:
trending_vid.to_csv(f'trending_videos_W{datetime.datetime.today().isocalendar()[1]}.csv', index=False)

### Combine all the files to master csv file