# 1. Sampling Version 1

We'll start off by selecting videos for the study. As we've defined in the plan of approach, we will be studying two seperate samples. One from which we include the channels that are included, and one which dives into the general domain. This section will concern the former variant.

In [None]:
# Overall Document Prep:
from pyforest import * # This library quickly imports most of the relevant Data Science libraries
directory = '####'     # Set a working directory

## 1.1 Set-up the Service

We'll be using the Youtube API to select videos.

In [None]:
# Allocate credentials:
from googleapiclient.discovery import build

# Api Keys
api_key = "####"

# Session Build
youtube = build('youtube', 'v3', developerKey = api_key)

## 1.2 Generate the Sample Using the Service

Below we generate the sample by using the service, passing the channel ID's and populating dataframes. Since the present study focusses on science communication channels, you will find those 5 channels in the dictionary below; however, these may be interchanged at will.

In [4]:
# Variant 1 Sample Generation
channelids = {'Veritasium'  : 'UCHnyfMqiRRG1u-2MsSQLbXA',
              'VSauce'      : 'UC6nSFpj9HTCZ5t-N3Rm3-HA',
              'Kurzgesagt'  : 'UCsXVk37bltHxD1rDPwtNM8Q',
              'Mark Rober'  : 'UCY1kMZp36IQSyNx_9h4mpCg',
              'asapSCIENCE' : 'UCC552Sd-3nyi_tk2BudLUzA'}

n     = 600
iter  = range(1, (int(n/50+1)))
order = 'rating'

Raw_sample_V1 = pd.DataFrame(columns = ["Video.ID", "Title", "Channel_Name"])

#Iterate through Channels
for channelid in channelids.items():
    
    print(channelid[0])
    
    #Iterate iter number of times to fulfill n; there is a maximum of 50 results per search.
    for i in iter:

        if i == 1:
            
            # Search Request
            request = youtube.search().list(
                part       ="snippet",
                type       = "video",
                maxResults = n,
                channelId  = channelid[1],    
            ) 

            # Save response
            response = request.execute()

            # Unpack Respons
            rows = []
            
            for item in response['items']:

                    rows.append([item['id']['videoId'],
                                item['snippet']['title'],
                                item['snippet']['channelTitle']])

            video_sample = pd.DataFrame(rows, columns = ["Video.ID", "Title", "Channel_Name"])
            print(f'{len(video_sample)} out of {n}')
        
        else:
            try:   
                # Search Request
                request = youtube.search().list(
                    part       = "snippet",
                    type       = "video",
                    maxResults = n,
                    channelId  = channelid[1],
                    pageToken  = response['nextPageToken']    
                ) 

                # Save response
                response = request.execute()

                # Unpack Respons
                rows = []

                for item in response['items']:

                    rows.append([item['id']['videoId'],
                                item['snippet']['title'],
                                item['snippet']['channelTitle']])

                video_sample_temp = pd.DataFrame(rows, columns = ["Video.ID", "Title", "Channel_Name"])
                video_sample = video_sample.append(video_sample_temp)
                print(f'{len(video_sample)} out of {n}')
            
            except(KeyError):
                print("Results exhausted")
                break
    
    #Cleaning:
    to_delete = ['#short', 
                 ' prank']
    video_sample = video_sample[~video_sample['Title'].str.contains('|'.join(to_delete))] 
    
    #Sampling:
    if len(video_sample) > 200:
        sample = video_sample.sample(n=200, 
                                     random_state=123,
                                     replace = True)
    else:
        sample = video_sample
    
    Raw_sample_V1 = Raw_sample_V1.append(sample)
    print(f'The {channelid[0]} sample has been saved! \n')

#Output:
Raw_sample_V1.to_csv(f'{directory}Raw_Sample_V1.csv', 
                     sep=';', 
                     index=False, 
                     encoding='utf-8')

Veritasium
50 out of 600
100 out of 600
150 out of 600
200 out of 600
250 out of 600
300 out of 600
306 out of 600
Results exhausted
The Veritasium sample has been saved! 

VSauce
50 out of 600
100 out of 600
150 out of 600
200 out of 600
250 out of 600
300 out of 600
350 out of 600
366 out of 600
Results exhausted
The VSauce sample has been saved! 

Kurzgesagt
50 out of 600
100 out of 600
141 out of 600
Results exhausted
The Kurzgesagt sample has been saved! 

Mark Rober
50 out of 600
98 out of 600
Results exhausted
The Mark Rober sample has been saved! 

asapSCIENCE
50 out of 600
100 out of 600
150 out of 600
200 out of 600
250 out of 600
300 out of 600
350 out of 600
359 out of 600
Results exhausted
The asapSCIENCE sample has been saved! 

