# 3. Retrieval of Video Statistics

Aside from just selecting videos we also want to collect as much information on said videos as possible. In order to do so, we will be using a different feature of the Youtube API; namely, the list feature. Here we are able to retrieve a myriad of video statistics for either of our samples.

In [None]:
# Overall Document Prep:
from pyforest import * # This library quickly imports most of the relevant Data Science libraries
directory = '####'     # Set a working directory

## 3.1 Set-up the Service

In [5]:
# Allocate credentials:
from googleapiclient.discovery import build

# Api Keys
key = "#####"

# Session Build
youtube = build('youtube', 'v3', developerKey = key)

## 3.2 Define the RetrieveStats() Function to Retrieve the Statistics 

In [14]:
def RetrieveStats(df):
    
    #Initialize dictionary for the data points to collect:
    stats = {"publishedAt"   : [], 
             "duration"      : [],
             "definition"    : [],
             "viewCount"     : [],
             "likeCount"     : [],
             "dislikeCount"  : [],
             "favoriteCount" : [],
             "commentCount"  : []}

    # Execute request per Video ID
    for vid in df['Video.ID']:
        
        # Formalize Request:
        request = youtube.videos().list(
                part ="snippet,statistics,contentDetails",
                id   = vid
            )
        
        # Save Response
        response = request.execute()
        
        # Store the data in the dictionary
        try:
            stats['publishedAt'].append(response['items'][0]['snippet']['publishedAt'])
        except(KeyError):
            stats['publishedAt'].append(np.nan)
        try:
            stats['duration'].append(response['items'][0]['contentDetails']['duration']) 
        except(KeyError):
            stats['duration'].append(np.nan)
        try:
            stats['definition'].append(response['items'][0]['contentDetails']['definition']) 
        except(KeyError):
            stats['definition'].append(np.nan)
        try:
            stats['viewCount'].append(response['items'][0]['statistics']['viewCount'])
        except(KeyError):
            stats['viewCount'].append(np.nan)
        try:
            stats['likeCount'].append(response['items'][0]['statistics']['likeCount']) 
        except(KeyError):
            stats['likeCount'].append(np.nan)
        try:
            stats['dislikeCount'].append(response['items'][0]['statistics']['dislikeCount']) 
        except(KeyError):
            stats['dislikeCount'].append(np.nan)
        try:
            stats['favoriteCount'].append(response['items'][0]['statistics']['favoriteCount'])
        except(KeyError):
            stats['favoriteCount'].append(np.nan)
        try:
            stats['commentCount'].append(response['items'][0]['statistics']['commentCount'])
        except(KeyError):
            stats['commentCount'].append(np.nan)
            
        # progress report:
        current = raw_sample_V1[raw_sample_V1['Video.ID'] == vid].index[0] + 1
        
        if current % 50 == 0:
            print(f'We are {(current/len(df)*100):.1f}% of the way there!')
    
    # Store data as a dataframe and concatenate it to the original:
    stats = pd.DataFrame(stats)
    df = pd.concat([df, stats], axis = 1)
    
    # Response summary:
    print(f"\nWe couldn't find at least 1 statistic for {df.isna().sum().max()} videos. \nSee a data loss overview below: \n")
    print(df.isna().sum())
    
    return df

## 3.3 Retrieve and Save Video Statistics

In [15]:
# Read data
raw_sample_V1 = pd.read_csv(f"{directory}Sample_V1.csv", ';')
raw_sample_V1.reset_index(inplace = True) # done to improve the progress report.

raw_sample_V1 = RetrieveStats(raw_sample_V1)

<IPython.core.display.Javascript object>

We are 6.0% of the way there!
We are 11.9% of the way there!
We are 17.9% of the way there!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

We are 29.8% of the way there!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

We are 35.8% of the way there!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

We are 41.8% of the way there!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

We are 53.7% of the way there!
We are 59.7% of the way there!
We are 65.6% of the way there!
We are 71.6% of the way there!
We are 77.6% of the way there!
We are 83.5% of the way there!


<IPython.core.display.Javascript object>

We are 89.5% of the way there!
We are 95.5% of the way there!


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


We couldn't find at least 1 statistic for 12 videos. 
See a data loss overview below: 

index             0
Unnamed: 0        0
Video.ID          0
Title             0
Channel_Name      0
publishedAt       0
duration          0
definition        0
viewCount        12
likeCount        12
dislikeCount     12
favoriteCount     0
commentCount     12
publishedAt       0
duration          0
definition        0
viewCount        12
likeCount        12
dislikeCount     12
favoriteCount     0
commentCount     12
dtype: int64


In [47]:
# Save the data to csv:
raw_sample_V1.to_csv(f"{directory}Sample_V1.csv", sep = ';')
#raw_sample_V2.to_csv(f"{directory}Sample_V2.csv", sep = ';')