# USING AN API TO EXTRACT DATA FROM ANY YOUTUBE CHANNEL

Last month, I came across this video [Python YouTube API Tutorial: Calculating the Duration of a Playlist](https://www.youtube.com/watch?v=coZbOM6E47I&t=16s). The video shows how to calculate the duration of any playlist on YouTube.  This video is part of a tutorial on the YouTube API. The video inspired me to work on my first personal data science project.  Even though the idea is simple, extract and analyze data from YouTube.   
The first step of the project is to collect data for a specific YouTube channel by retrieving metric information from each video uploaded to that channel, then the data will be saved and stored to be used later without the need to run the script again.   
The second part is to use data science tools to analyze the data and to get insights from it. We can look for the most popular videos on the channel, the most watched playlist, the relationship between duration and number of views, the relationship between video duration and number of comments, the ratio between likes and dislikes. We can ask all the questions we want and try to answer them as much as we can, given the complexity of the problem and we only have  access to public  data.


##  Creating an API Key

First things first, we need a YOUTUBE API KEY. I used this video https://www.youtube.com/watch?v=th5_9woFJmk&t=2s to set up my API key and install the packages we need. It's a clear and well explained video. At the end of this video, you can make your first YouTube API request. 

In [71]:
from googleapiclient.discovery import build
import os
import pandas as pd
import re
from datetime import date
from dotenv import load_dotenv
import json

## Hiding the API key
we will store the API key in a fille called `.env` and use `dotenv` module to  read it.  
check http://jonathansoma.com/lede/foundations-2019/classes/apis/keeping-api-keys-secret/

In [15]:
load_dotenv()
API_KEY = os.getenv('api_key')

## Building a service object

Before using the Youtube API to make requests, we need to build a service object.
We will use the [`build()`](https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.discovery-module.html#build) function to create the service object, we will need to specify the name of the service, in our case `youtube`, the API version as `v3` and we will also need a developer key.
For more information, you can always check the [Getting Started](https://github.com/googleapis/google-api-python-client/blob/master/docs/start.md) document [
google-api-python-client documentation](https://github.com/googleapis/google-api-python-client).


In [16]:
youtube = build('youtube', 'v3', developerKey=API_KEY)

## Some basic statistics about a youtube channel

we are ready to make our first request. Since our goal is to collect data for a specific YouTube channel. We need a parameter which uniquely identifies the YouTube channel.   
In order to request information about a particular channel, we call the `channel.list` method, and to identify the channel, we can use the channel ID or the username associated with that channel.  
Perhaps you are wondering how to find the ID of a channel? Me too.  
One way to do it based on this post on [stackoverflow](https://stackoverflow.com/questions/14366648/how-can-i-get-a-channel-id-from-youtube), is to look for either `data-channel-external-id` or `externalId` in the source code  of the channel page. If you fund a better solution, please share it with us.


In this project we will use the YouTube channel [Corey Schafer](https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g) as an example. Because this project is inspired from his YouTube API tutorial. Thanks [Corey Schafer](https://coreyms.com/).  

In [8]:
user_name = 'schafer5' 
channel_id = 'UCCezIgC97PvUuR4_gbFUs5g'



request = youtube.channels().list(
        part="snippet,statistics",
        forUsername=user_name
    )
response = request.execute()

In [9]:
print(json.dumps(response, indent=4,sort_keys=True))

{
    "etag": "aD7rOc_sF67s8eWxhgruzPlAe6I",
    "items": [
        {
            "etag": "aX6eukNYTw5H5kY1kzRn7RpybXk",
            "id": "UCCezIgC97PvUuR4_gbFUs5g",
            "kind": "youtube#channel",
            "snippet": {
                "country": "US",
                "customUrl": "coreyms",
                "description": "Welcome to my Channel. This channel is focused on creating tutorials and walkthroughs for software developers, programmers, and engineers. We cover topics for all different skill levels, so whether you are a beginner or have many years of experience, this channel will have something for you.\n\nWe've already released a wide variety of videos on topics that include: Python, Git, Development Environments, Terminal Commands, SQL, Programming Terms, JavaScript, Computer Science Fundamentals, and plenty of other tips and tricks which will help you in your career.\n\n\nIf you enjoy these videos and would like to support my channel, I would greatly appreciate any

We can look for more than one channel, by passing a list of channel ids.     
we created a list of channel IDs, by selecting the top 10 channels from the [Top Programmer Guru](https://noonies.tech/award/top-programming-guru) list.

In [12]:
channel_ids = ["UCWv7vMbMWH4-V0ZXdmDpPBA", "UC29ju8bIPH5as8OGnQzwJyA", "UCCezIgC97PvUuR4_gbFUs5g", "UC4JX40jDee_tINbkjycV4Sg", "UCNU_lfiiWBdtULKOw6X0Dig", "UC8butISFwT-Wl7EV0hUK0BQ", "UCXgGY0wkgOzynnHvSEVmE3A", "UCqrILQNl5Ed9Dz6CGMyvMTQ", "UCStj-ORBZ7TGK1FwtGAUgbQ","UCZUyPT9DkJWmS_DzdOi7RIA"  ]

In [13]:
len(channel_ids)

10

In [51]:
request = youtube.channels().list(
        part="snippet,contentDetails,statistics",
        id=channel_ids
    )
response = request.execute()

In [52]:
print(json.dumps(response, indent=4,sort_keys=True))

{
    "etag": "3-mUwVBsVBcbzkDZsgsS7Lv-jbY",
    "items": [
        {
            "contentDetails": {
                "relatedPlaylists": {
                    "favorites": "",
                    "likes": "",
                    "uploads": "UUXgGY0wkgOzynnHvSEVmE3A"
                }
            },
            "etag": "p3Sw7HjRKZ73xL2bvoLubYj0gRk",
            "id": "UCXgGY0wkgOzynnHvSEVmE3A",
            "kind": "youtube#channel",
            "snippet": {
                "country": "IN",
                "customUrl": "hiteshchoudharydotcom",
                "description": "Website: https://courses.LearnCodeOnline.in\nHey there everyone, Hitesh here back again with another video!\nThis means I create a lot of videos, every single week. I cover a wide range of subjects like programming, what's latest in tech, new frameworks, open-source products etc. I keep my interest in a wide area of tech like Javascript, Python, PHP, Machine Learning, etc.\n\n\nFor the Business purpose, Sponsorships

#### Let's see if your favorite coding channels are in the top 10.

In [59]:
for item in response['items']:
    print(item['snippet']['title'])

Hitesh Choudhary
Caleb Curry
Traversy Media
Clever Programmer
freeCodeCamp.org
Programming with Mosh
Tech With Tim
Krish Naik
Programming Hero
Corey Schafer


#### Let's print the last item

In [60]:
item
print(json.dumps(item, indent=4,sort_keys=True))

{
    "contentDetails": {
        "relatedPlaylists": {
            "favorites": "",
            "likes": "",
            "uploads": "UUCezIgC97PvUuR4_gbFUs5g"
        }
    },
    "etag": "C8N4JRLugZOO_iMOMAWrr1ktibo",
    "id": "UCCezIgC97PvUuR4_gbFUs5g",
    "kind": "youtube#channel",
    "snippet": {
        "country": "US",
        "customUrl": "coreyms",
        "description": "Welcome to my Channel. This channel is focused on creating tutorials and walkthroughs for software developers, programmers, and engineers. We cover topics for all different skill levels, so whether you are a beginner or have many years of experience, this channel will have something for you.\n\nWe've already released a wide variety of videos on topics that include: Python, Git, Development Environments, Terminal Commands, SQL, Programming Terms, JavaScript, Computer Science Fundamentals, and plenty of other tips and tricks which will help you in your career.\n\n\nIf you enjoy these videos and would like to

#### Let's store to result in DataFrame
will don't do 

In [53]:
channels_stat = {}

channels_stat['channelId'] = []
channels_stat['title'] = []
channels_stat['description'] = []
channels_stat['country'] = []
channels_stat['viewCount'] = []
channels_stat['subscriberCount'] = []
channels_stat['videoCount'] = []
channels_stat['subscriberCount'] = []
channels_stat['publishedAt'] = []
channels_stat['uploads'] = []
for item in response['items']:
    
    channels_stat['channelId'].append(item['id'])
    channels_stat['title'].append(item['snippet']['title'])
    channels_stat['description'].append(item['snippet']['description'])
    channels_stat['country'].append(item['snippet']['country'])
    channels_stat['viewCount'].append(item['statistics']['viewCount'])
    channels_stat['videoCount'].append(item['statistics']['videoCount'])
    channels_stat['subscriberCount'].append(item['statistics']['subscriberCount'])
    channels_stat['publishedAt'].append(item['snippet']['publishedAt'])
    channels_stat['uploads'].append(item['contentDetails']['relatedPlaylists']['uploads'])

channels_stat

{'channelId': ['UCXgGY0wkgOzynnHvSEVmE3A',
  'UCZUyPT9DkJWmS_DzdOi7RIA',
  'UC29ju8bIPH5as8OGnQzwJyA',
  'UCqrILQNl5Ed9Dz6CGMyvMTQ',
  'UC8butISFwT-Wl7EV0hUK0BQ',
  'UCWv7vMbMWH4-V0ZXdmDpPBA',
  'UC4JX40jDee_tINbkjycV4Sg',
  'UCNU_lfiiWBdtULKOw6X0Dig',
  'UCStj-ORBZ7TGK1FwtGAUgbQ',
  'UCCezIgC97PvUuR4_gbFUs5g'],
 'title': ['Hitesh Choudhary',
  'Caleb Curry',
  'Traversy Media',
  'Clever Programmer',
  'freeCodeCamp.org',
  'Programming with Mosh',
  'Tech With Tim',
  'Krish Naik',
  'Programming Hero',
  'Corey Schafer'],
 'description': ["Website: https://courses.LearnCodeOnline.in\nHey there everyone, Hitesh here back again with another video!\nThis means I create a lot of videos, every single week. I cover a wide range of subjects like programming, what's latest in tech, new frameworks, open-source products etc. I keep my interest in a wide area of tech like Javascript, Python, PHP, Machine Learning, etc.\n\n\nFor the Business purpose, Sponsorships and invitation, reach out at hi

In [54]:
for item in channels_stat:
    print(item, len(channels_stat[item]))

channelId 10
title 10
description 10
country 10
viewCount 10
subscriberCount 10
videoCount 10
publishedAt 10
uploads 10


In [55]:
channels_stat = pd.DataFrame.from_dict(channels_stat)
channels_stat

Unnamed: 0,channelId,title,description,country,viewCount,subscriberCount,videoCount,publishedAt,uploads
0,UCXgGY0wkgOzynnHvSEVmE3A,Hitesh Choudhary,Website: https://courses.LearnCodeOnline.in\nH...,IN,37201703,633000,1011,2011-10-24T10:25:16Z,UUXgGY0wkgOzynnHvSEVmE3A
1,UCZUyPT9DkJWmS_DzdOi7RIA,Caleb Curry,Programming Made Fun and Simple \n\nHigh qual...,US,27748158,376000,1381,2009-08-18T18:32:42Z,UUZUyPT9DkJWmS_DzdOi7RIA
2,UC29ju8bIPH5as8OGnQzwJyA,Traversy Media,Traversy Media features the best online web de...,US,139223547,1530000,878,2009-10-30T21:33:14Z,UU29ju8bIPH5as8OGnQzwJyA
3,UCqrILQNl5Ed9Dz6CGMyvMTQ,Clever Programmer,You can find awesome programming lessons here!...,US,40116997,959000,590,2016-03-12T08:59:15Z,UUqrILQNl5Ed9Dz6CGMyvMTQ
4,UC8butISFwT-Wl7EV0hUK0BQ,freeCodeCamp.org,Learn to code for free.,US,185736127,3620000,1146,2014-12-16T21:18:48Z,UU8butISFwT-Wl7EV0hUK0BQ
5,UCWv7vMbMWH4-V0ZXdmDpPBA,Programming with Mosh,I train professional software engineers that c...,AU,78962399,1770000,160,2014-10-07T00:40:53Z,UUWv7vMbMWH4-V0ZXdmDpPBA
6,UC4JX40jDee_tINbkjycV4Sg,Tech With Tim,"Learn programming, software engineering, machi...",CA,49660216,658000,587,2014-04-23T01:57:10Z,UU4JX40jDee_tINbkjycV4Sg
7,UCNU_lfiiWBdtULKOw6X0Dig,Krish Naik,"I work as a Lead Data Scientist, pioneering in...",IN,26087780,371000,1044,2012-02-11T04:05:06Z,UUNU_lfiiWBdtULKOw6X0Dig
8,UCStj-ORBZ7TGK1FwtGAUgbQ,Programming Hero,Learning is boring? Not any more. \nProgrammin...,US,4749894,198000,36,2019-04-13T16:32:45Z,UUStj-ORBZ7TGK1FwtGAUgbQ
9,UCCezIgC97PvUuR4_gbFUs5g,Corey Schafer,Welcome to my Channel. This channel is focused...,US,57450289,778000,230,2006-05-31T22:49:22Z,UUCezIgC97PvUuR4_gbFUs5g


In [56]:
channels_stat.to_csv('channelsDB.csv')

In [57]:
channels_stat = pd.read_csv('channelsDB.csv', index_col=0)

In [58]:
channels_stat

Unnamed: 0,channelId,title,description,country,viewCount,subscriberCount,videoCount,publishedAt,uploads
0,UCXgGY0wkgOzynnHvSEVmE3A,Hitesh Choudhary,Website: https://courses.LearnCodeOnline.in\nH...,IN,37201703,633000,1011,2011-10-24T10:25:16Z,UUXgGY0wkgOzynnHvSEVmE3A
1,UCZUyPT9DkJWmS_DzdOi7RIA,Caleb Curry,Programming Made Fun and Simple \n\nHigh qual...,US,27748158,376000,1381,2009-08-18T18:32:42Z,UUZUyPT9DkJWmS_DzdOi7RIA
2,UC29ju8bIPH5as8OGnQzwJyA,Traversy Media,Traversy Media features the best online web de...,US,139223547,1530000,878,2009-10-30T21:33:14Z,UU29ju8bIPH5as8OGnQzwJyA
3,UCqrILQNl5Ed9Dz6CGMyvMTQ,Clever Programmer,You can find awesome programming lessons here!...,US,40116997,959000,590,2016-03-12T08:59:15Z,UUqrILQNl5Ed9Dz6CGMyvMTQ
4,UC8butISFwT-Wl7EV0hUK0BQ,freeCodeCamp.org,Learn to code for free.,US,185736127,3620000,1146,2014-12-16T21:18:48Z,UU8butISFwT-Wl7EV0hUK0BQ
5,UCWv7vMbMWH4-V0ZXdmDpPBA,Programming with Mosh,I train professional software engineers that c...,AU,78962399,1770000,160,2014-10-07T00:40:53Z,UUWv7vMbMWH4-V0ZXdmDpPBA
6,UC4JX40jDee_tINbkjycV4Sg,Tech With Tim,"Learn programming, software engineering, machi...",CA,49660216,658000,587,2014-04-23T01:57:10Z,UU4JX40jDee_tINbkjycV4Sg
7,UCNU_lfiiWBdtULKOw6X0Dig,Krish Naik,"I work as a Lead Data Scientist, pioneering in...",IN,26087780,371000,1044,2012-02-11T04:05:06Z,UUNU_lfiiWBdtULKOw6X0Dig
8,UCStj-ORBZ7TGK1FwtGAUgbQ,Programming Hero,Learning is boring? Not any more. \nProgrammin...,US,4749894,198000,36,2019-04-13T16:32:45Z,UUStj-ORBZ7TGK1FwtGAUgbQ
9,UCCezIgC97PvUuR4_gbFUs5g,Corey Schafer,Welcome to my Channel. This channel is focused...,US,57450289,778000,230,2006-05-31T22:49:22Z,UUCezIgC97PvUuR4_gbFUs5g


In [43]:
channels_stat.country.value_counts()

US    6
IN    2
AU    1
CA    1
Name: country, dtype: int64

In [49]:
channels_stat.videoCount.sort_values()

1      36
2     160
7     230
6     587
0     590
4     878
5    1011
8    1043
9    1146
3    1381
Name: videoCount, dtype: int64

In [45]:
channels_stat.describe(include='all')

Unnamed: 0,channelId,title,description,country,viewCount,subscriberCount,videoCount,publishedAt
count,10,10,10,10,10.0,10.0,10.0,10
unique,10,10,10,4,,,,10
top,UCWv7vMbMWH4-V0ZXdmDpPBA,Tech With Tim,Learn to code for free.,US,,,,2009-08-18T18:32:42Z
freq,1,1,1,6,,,,1
mean,,,,,64693710.0,1089300.0,706.2,
std,,,,,56269700.0,1021399.0,457.846602,
min,,,,,4749894.0,198000.0,36.0,
25%,,,,,30111540.0,440250.0,319.25,
50%,,,,,44888610.0,718000.0,734.0,
75%,,,,,73584370.0,1387250.0,1035.0,


### The next step
We can do some combiration between channel. if we want to go deepr we need more data. We collect more data for each channel. one by 

In [65]:
def getVideosId(youtube, channelId):
    '''
    Get list of all videos ids in youtube channle
    
    Args:
        youtube (servibe object):
        channelId (string): the channel id 
        
    Return:
        a list of videos ids
    '''
    videosIdList = []
    nextPageToken = None

    while True:

        request = youtube.search().list(
            part="snippet",
                channelId=channelId,
                maxResults=50,
                regionCode='US',
                pageToken=nextPageToken,
            )
        response = request.execute()



        for item in response['items']:

            if item['id']['kind'] == "youtube#video":

                videosIdList.append(item['id']['videoId'])

        nextPageToken = response.get('nextPageToken')
        if not nextPageToken:
            break

    return videosIdList

### A list of all videos in a youtube channel
we will only work with one. we have limit quota of 10.000 on request for 

In [63]:
channelId = channels_stat.loc[9, 'channelId']
videosIdList = getVideosId(youtube, channelId)

In [64]:
len(videosIdList)

230

In [72]:
today = date.today()

In [73]:
f'The channel {channels_stat.loc[9, "title"]} has  { len(videosIdList)} videos until {today}.'

'The channel Corey Schafer has  230 videos until 2021-06-02.'

### A table of all playlists in  youtube channel

In [76]:
def getPlaylistId(youtube, channelId):

    '''
    Get list of all playlist for given channeId  and save result in to database 

    return:

        df (DataFrame): dataframe withe the following columns
            plylistId | title | description | itmCount | channelId

    '''
    pl_dict = {'playlistId':[], 'title': [], 'description': [] ,'itmCount':[], 'channelId':[]}

    nextPageToken = None

    while True:

        pl_request = youtube.playlists().list(
            part ='contentDetails, snippet',
            channelId=channelId,
            maxResults=50,
            pageToken=nextPageToken,)
        pl_response = pl_request.execute()


        for item in pl_response['items']:

            pl_dict['playlistId'].append(item['id'])
            pl_dict['title'].append(item['snippet']['title'])
            pl_dict['description'].append(item['snippet']['description'])
            pl_dict['itmCount'].append(item['contentDetails']['itemCount'])
            pl_dict['channelId'].append(channelId)

        nextPageToken = response.get('nextPageToken')

        if not nextPageToken:
        
            break

    df = pd.DataFrame.from_dict(pl_dict)

    return df

In [77]:
playlistDb = getPlaylistId(youtube, channelId)

In [14]:
playlistDb

Unnamed: 0,playlistId,title,description,itmCount,channelId
0,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,Pandas Tutorials,,11,UCCezIgC97PvUuR4_gbFUs5g
1,PL-osiE80TeTvipOqomVEeZ1HRrcEvtZB_,Matplotlib Tutorials,"In this Python Programming series, we will be ...",10,UCCezIgC97PvUuR4_gbFUs5g
2,PL-osiE80TeTtoQCKZ03TU5fNfx2UY6U4p,Django Tutorials,"Python Django Tutorials. In this series, we wi...",17,UCCezIgC97PvUuR4_gbFUs5g
3,PL-osiE80TeTs4UjLw5MM6OjgkjFeUxCYH,Flask Tutorials,"Python Flask Tutorials. In this series, we wil...",15,UCCezIgC97PvUuR4_gbFUs5g
4,PL-osiE80TeTvviVL0pJGX5mZCo7CAvIuf,Career Advice,"Career Advice for Programmers, Developers, and...",6,UCCezIgC97PvUuR4_gbFUs5g
5,PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7,Python Programming Beginner Tutorials,"In these Python Beginner Tutorials, we will be...",26,UCCezIgC97PvUuR4_gbFUs5g
6,PL-osiE80TeTt66h8cVpmbayBKlMTuS55y,Python - Setting up a Python Environment,Python Development Environment Tutorials. Ther...,9,UCCezIgC97PvUuR4_gbFUs5g
7,PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc,Python OOP Tutorials - Working with Classes,Python Object-Oriented Tutorials. In this seri...,6,UCCezIgC97PvUuR4_gbFUs5g
8,PL-osiE80TeTt9WQbFm0uoXG8CrMy_xj5Z,Channel Updates,Channel Updates for Corey Schafer's YouTube Ch...,6,UCCezIgC97PvUuR4_gbFUs5g
9,PL-osiE80TeTsKOdPrKeSOp4rN3mza8VHN,SQL Tutorials,SQL Tutorials. An in-depth look at the SQL lan...,5,UCCezIgC97PvUuR4_gbFUs5g


In [78]:
playlistDb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   playlistId   21 non-null     object
 1   title        21 non-null     object
 2   description  21 non-null     object
 3   itmCount     21 non-null     int64 
 4   channelId    21 non-null     object
dtypes: int64(1), object(4)
memory usage: 968.0+ bytes


What the table above miss is some statistics about each playlistId, 
	like the  number of view and duration. 
	To add this information and more we can not use the youtube api diractly, 
	we have to go aroud, 
	one way to do this is going through each videos in  a playlist.

In [79]:
def getPlaylistItems(youtube, playlist_id):
    '''
    return videos id  in given  playlist
    Args:
        youtube (youtube api): youtube api
    playlist_id (string): the playlist id
    return dic: {video_id: list ,playlist_id: list}
    '''

    nextPageToken = None
    playlist_items = {'videoId': [], 'playlistId': []}

    while True:
        pl_request = youtube.playlistItems().list(
            part ='contentDetails',
            playlistId=playlist_id,
            maxResults=50,
            pageToken=nextPageToken,
            )

        pl_response = pl_request.execute()


        for item in pl_response['items']:

            video_id = item['contentDetails']['videoId']
            playlist_items['playlistId'].append(playlist_id)
            playlist_items['videoId'].append(video_id) # video can in more then one playlist




        nextPageToken = pl_response.get('nextPageToken')

        if not nextPageToken:
            break

    return playlist_items

In [80]:

playlists_items = {'videoId': [], 'playlistId': []}
# list of all the playlist
playlistIds = playlistDb['playlistId'].values 

def dictUpdate(dict1, dict2):
    
    '''
        councatinute the value of two dictionary with the same keys
    '''
    
    dict3 = {}

    for key in dict1:

        dict3[key] = dict1[key] + dict2[key]

    return dict3

In [81]:
# get the items in each playlist
for playlist_id in playlistIds:

    playlist_items = getPlaylistItems(youtube, playlist_id)

    playlists_items = dictUpdate(playlists_items, playlist_items)

Let's save the resuls we get in DataFrame.

In [82]:
playlistItemsDB = pd.DataFrame.from_dict(playlists_items)
playlistItemsDB['channelId'] = channelId


In [84]:
playlistItemsDB.head()

Unnamed: 0,videoId,playlistId,channelId
0,ZyhVh-qRZPA,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,UCCezIgC97PvUuR4_gbFUs5g
1,zmdjNSmRXF4,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,UCCezIgC97PvUuR4_gbFUs5g
2,W9XjRYFkkyw,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,UCCezIgC97PvUuR4_gbFUs5g
3,Lw2rlcxScZY,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,UCCezIgC97PvUuR4_gbFUs5g
4,DCDe29sIKcE,PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS,UCCezIgC97PvUuR4_gbFUs5g


In [85]:
playlistItemsDB.shape

(316, 3)

In [88]:
len(playlistItemsDB['videoId'].unique())

218

A video can be in more  than one playlist, it can also not belong to any playlist

### A table of all videos in  youtube channel
We will create dataset of statistics information of each video in the channel.

In [89]:
def getVideoStat(youtube, videos_id_list):

    '''
    Get statistics about each videos in list 
    Args:
        youtube (youtube api): youtube api
        videos_id_list (list): a list of videos id, with less 50 elements
    
    '''
    videos_request = youtube.videos().list(
        part='contentDetails, statistics, snippet',
        id = ','.join(videos_id_list),
    )

    videos_response = videos_request.execute()

    for item in videos_response['items']:

        # vid_dict['playlistId'].append(playlist_id) this column will be add using join
        vid_dict['videoId'].append(item['id'])
        vid_dict['title'].append(item['snippet']['title'])
        vid_dict['tags'].append(item['snippet']['tags'])
        vid_dict['viewCount'].append(item['statistics']['viewCount'])
        vid_dict['likeCount'].append(item['statistics']['likeCount'])
        vid_dict['dislikeCount'].append(item['statistics']['dislikeCount'])
        vid_dict['commentCount'].append(item['statistics']['commentCount'])
        vid_dict['duration'].append(item['contentDetails']['duration'])
        vid_dict['date'].append(item['snippet']['publishedAt'])
        vid_dict['channelId'].append(item['snippet']['channelId'])



In [91]:
import math

def make_chunks(data, chunk_size):
    
    '''Split a data into chunk of given size'''
    
    num_chunks = math.ceil(len(data) / chunk_size)
    
    return [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]

In [92]:
chunks = make_chunks(videosIdList, 50)

In [93]:
len(chunks)

5

In [54]:
getVideoStat(youtube, chunks[0])

In [94]:
list_columns = ['videoId','title', 'tags', 'viewCount', 'likeCount', 'dislikeCount', 'commentCount', 'duration','channelId','date']
vid_dict = {key : [] for key in list_columns}

for chunk in chunks:
    getVideoStat(youtube, chunk)

videosDB = pd.DataFrame.from_dict(vid_dict)

videosDB

Unnamed: 0,videoId,title,tags,viewCount,likeCount,dislikeCount,commentCount,duration,channelId,date
0,sugvnHA7ElY,Python Tutorial: if __name__ == '__main__',"[Python, Programming, Computer Science, Video ...",1471483,39704,607,1330,PT8M43S,UCCezIgC97PvUuR4_gbFUs5g,2015-03-23T06:04:35Z
1,ZDa-Z5JzLYM,Python OOP Tutorial 1: Classes and Instances,"[Python, Classes, Object Oriented, OOP, Python...",2688184,73312,433,3137,PT15M24S,UCCezIgC97PvUuR4_gbFUs5g,2016-06-20T17:00:03Z
2,jCzT9XFZ5bw,Python OOP Tutorial 6: Property Decorators - G...,"[Python, Property, Python Tutorial, Property D...",593299,19994,86,741,PT9M33S,UCCezIgC97PvUuR4_gbFUs5g,2016-08-19T16:30:01Z
3,bD05uGo_sVI,Python Tutorial: Generators - How to use them ...,"[Python (Programming Language), How-to (Websit...",604802,13821,148,564,PT11M14S,UCCezIgC97PvUuR4_gbFUs5g,2015-08-17T16:30:01Z
4,GfxJYp9_nJA,Python Tutorial: Namedtuple - When and why sho...,"[Python, Python Tutorials, Python Tutorial, Py...",72179,1965,11,87,PT7M21S,UCCezIgC97PvUuR4_gbFUs5g,2015-07-07T17:30:00Z
...,...,...,...,...,...,...,...,...,...,...
225,qxzp4X6sfGo,"JavaScript Arrays: Properties, Methods, and Ma...","[JavaScript, JavaScript (Programming Language)...",4432,104,0,15,PT7M43S,UCCezIgC97PvUuR4_gbFUs5g,2015-03-18T12:11:03Z
226,4qMJN1pY_aw,"Channel Update: Code Snippets, New Rewards, an...","[Channel Update, Patreon, Update, Corey Schafe...",3485,150,0,27,PT7M6S,UCCezIgC97PvUuR4_gbFUs5g,2017-04-17T16:30:00Z
227,_63O1hgJTaQ,Lab Puppy playing fetch in a creek,"[Dog, Puppy, Chocolate Lab, Labrador, Labrador...",8787,200,0,3,PT43S,UCCezIgC97PvUuR4_gbFUs5g,2014-04-25T15:23:56Z
228,vRapY8xJwn8,Welcome to my Channel,"[Introduction, Intro, Channel Intro, Youtube, ...",392621,490,38,39,PT1M23S,UCCezIgC97PvUuR4_gbFUs5g,2015-11-16T06:17:57Z


It's time to  save all to data we colleact `videosDB` `playlistItemsDB` and `playlistDb` to `csv` file, to use it later 

In [95]:
videosDB.to_csv('videosDB.csv')
playlistItemsDB.to_csv('playlistItemsDB.csv')
playlistDb.to_csv('playlistDb.csv')