# How to find a YouTube channel Id?

One way to get a channel Id is to go through the source code of youtube channel page and look for Id. [You can check this post](https://stackoverflow.com/questions/14366648/how-can-i-get-a-channel-id-from-youtube).   
Now imagine that you want to retrieve the IDs for a list of youtube channels, which can include one, two or several hundred youtube channels.  Then you should look for a better solution.   
An alternative solution is to use the YouTube API, if in the first place we need the id to use it, but we can work around to get the id of the channel, using the [`search`](https://developers.google.com/youtube/v3/guides/implementation/search) requests and the name of the channel. Drawback of this approach, it can return more than one result because the names of the channel are not necessarily unique. In addition, each search request costs 100 units, which can be an issue considering that the quota limit is 10,000 units per day ([quota usage](https://developers.google.com/youtube/v3/getting-started?hl=en#quota)).
 


In [1]:
from googleapiclient.discovery import build
import pandas as pd
import json
from dotenv import load_dotenv
import os

## Load the data

First, we load a dataset of the youtube channels which we are looking for the IDs. The dataset was scraped from [Top Programmer Guru](https://noonies.tech/award/top-programming-guru) web page using the framework [`selenium`](https://github.com/SeleniumHQ/). The code is avaible [here](https://github.com/zakicode19/YouTubeApi/blob/main/web_scraping.py)

In [2]:
channelList = pd.read_csv('data/top-programming-guru.csv', index_col=0)

In [3]:
channelList.head()

Unnamed: 0,channelName,url
0,Programming with Mosh,https://www.youtube.com/c/programmingwithmosh/...
1,Traversy Media,https://www.youtube.com/user/TechGuyWeb
2,Corey Schafer,https://www.youtube.com/user/schafer5
3,Tech With Tim,https://m.youtube.com/channel/UC4JX40jDee_tINb...
4,Krish Naik,https://www.youtube.com/user/krishnaik06/playl...


In [4]:
channelList.shape

(71, 2)

In [6]:
len(channels)

71

### Clean the data

we going to check if theyre in any duplicate in `channelName` colomn then in the `url` colomn.

In [7]:
channelList['channelName'].str.lower().value_counts() #[channelList.channelName.value_counts()>1]

the coding train    2
michael reeves      2
tim corey           1
fireship            1
techlead            1
                   ..
programming hero    1
codingwithmitch     1
dani                1
wesbos              1
gaurav sen          1
Name: channelName, Length: 69, dtype: int64

We find two duplicate, we are going to remove

In [8]:
channelList.channelName[(channelList['channelName'].str.lower() == 'michael reeves') |
                        (channelList['channelName'].str.lower() == 'the coding train')]

14      Michael Reeves
16    The Coding Train
48    the coding train
68      Michael Reeves
Name: channelName, dtype: object

We can use [`drop_duplicates`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html) method but this going to only work for one channel. Instread we are going to use the [`drop`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) method.

In [9]:
channelList.drop(labels=[48, 68], inplace=True)

In [10]:
channelList.shape

(69, 2)

Let's check the `url` column.

In [11]:
channelList.url.value_counts().head()

https://www.youtube.com/c/Telusko                               3
https://www.youtube.com/c/HusseinNasser-software-engineering    2
https://m.youtube.com/channel/UCsBjURrPoezykLs9EqgamOA          1
https://www.youtube.com/c/Elfocrash                             1
https://www.youtube.com/channel/UCV0qA-eDDICsRR9rPcnG7tw        1
Name: url, dtype: int64

In [12]:
duplicate_url = channelList.url.value_counts()[channelList.url.value_counts()>1].index
duplicate_url

Index(['https://www.youtube.com/c/Telusko', 'https://www.youtube.com/c/HusseinNasser-software-engineering'], dtype='object')

In [13]:
duplicate_url

Index(['https://www.youtube.com/c/Telusko', 'https://www.youtube.com/c/HusseinNasser-software-engineering'], dtype='object')

In [14]:
channelList[channelList.url == duplicate_url[0]]

Unnamed: 0,channelName,url
10,naveen Reddy,https://www.youtube.com/c/Telusko
27,Telusko,https://www.youtube.com/c/Telusko
44,Keith Galli,https://www.youtube.com/c/Telusko


We notice that three different channels have the same url, after a quick search on YouTube, we found that `naveeb Reddy` and `Telusko` should reference the same channel, the one run by `Navin Reddy`.  For [`Keith Galli`](https://www.youtube.com/c/KGMIT/featured) channel we will to look in YouTube for the correct url.

In [15]:
channelList.loc[44, 'url'] = 'https://www.youtube.com/c/KGMIT/featured'

In [16]:
channelList[channelList.url == duplicate_url[1]]

Unnamed: 0,channelName,url
26,Ben Awad,https://www.youtube.com/c/HusseinNasser-softwa...
29,Hussein Nasser,https://www.youtube.com/c/HusseinNasser-softwa...


We will correct the url for `Ben Awad` channel.

In [17]:
channelList.loc[26, 'url'] = 'https://www.youtube.com/c/BenAwad97/featured'

After the correction of the wrong url, we will delete any remaining duplicates.

In [18]:
channelList.drop_duplicates(subset='url',ignore_index=True, inplace=True)

In [19]:
channelList.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68 entries, 0 to 67
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   channelName  68 non-null     object
 1   url          68 non-null     object
dtypes: object(2)
memory usage: 1.2+ KB


### Save the data
After removing the duplicate rows, we save the clean data in a new `csv` file.  

In [20]:
channelList.to_csv('data/channelListDB.csv')

In [40]:
# we store channels title in list
channels = channelList.channelName.values

## Load the API key

In [47]:
load_dotenv()
API_KEY = os.getenv('api_key2')
youtube = build('youtube', 'v3', developerKey=API_KEY)

In [48]:
request = youtube.search().list(
        part="snippet",
        maxResults=1, # limit the search to one result
        q="Programming with Mosh",
        type="channel", # we going to look only for a youtube channels
    )
response = request.execute()
print(response)

{'kind': 'youtube#searchListResponse', 'etag': 'Ltkdgt53Y_gMBAUDiqNjeSmcn8s', 'nextPageToken': 'CAEQAA', 'regionCode': 'FR', 'pageInfo': {'totalResults': 480, 'resultsPerPage': 1}, 'items': [{'kind': 'youtube#searchResult', 'etag': '_nOCjlI1BhrARdHe0BcGaGkAw-I', 'id': {'kind': 'youtube#channel', 'channelId': 'UCWv7vMbMWH4-V0ZXdmDpPBA'}, 'snippet': {'publishedAt': '2014-10-07T00:40:53Z', 'channelId': 'UCWv7vMbMWH4-V0ZXdmDpPBA', 'title': 'Programming with Mosh', 'description': 'I train professional software engineers that companies love to hire. My courses: http://codewithmosh.com My blog: http://programmingwithmosh.com Connect on ...', 'thumbnails': {'default': {'url': 'https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s88-c-k-c0xffffffff-no-rj-mo'}, 'medium': {'url': 'https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s240-c-k-c0xffffffff-no-rj-mo'}, 'high': {'url': 'https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s8

In [49]:
json.dumps(response['items'][0]['snippet'],indent=4)

'{\n    "publishedAt": "2014-10-07T00:40:53Z",\n    "channelId": "UCWv7vMbMWH4-V0ZXdmDpPBA",\n    "title": "Programming with Mosh",\n    "description": "I train professional software engineers that companies love to hire. My courses: http://codewithmosh.com My blog: http://programmingwithmosh.com Connect on ...",\n    "thumbnails": {\n        "default": {\n            "url": "https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s88-c-k-c0xffffffff-no-rj-mo"\n        },\n        "medium": {\n            "url": "https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s240-c-k-c0xffffffff-no-rj-mo"\n        },\n        "high": {\n            "url": "https://yt3.ggpht.com/ytc/AAUvwnj82Lirw0dg6V5pJWAcWdG22OESyldUcDwAFEqQWg=s800-c-k-c0xffffffff-no-rj-mo"\n        }\n    },\n    "channelTitle": "Programming with Mosh",\n    "liveBroadcastContent": "upcoming",\n    "publishTime": "2014-10-07T00:40:53Z"\n}'

In [50]:
json.dumps(response['items'][0]['snippet']['channelId'], indent=4)

'"UCWv7vMbMWH4-V0ZXdmDpPBA"'

### Display the result of the first request

In [51]:
for item in response['items']:
    print(item['snippet']['title'])
    print(item['snippet']['channelId'])
    print(item['id']['kind'])
    print('*' * 10)

Programming with Mosh
UCWv7vMbMWH4-V0ZXdmDpPBA
youtube#channel
**********


### Initialize dictionary to store the results

In [52]:
data = {'channelName': [], 'title': [], 'id':[], 'kind':[]}

### Add the first result to dictionary 

In [53]:
id = response['items'][0]['snippet']['channelId']
title = response['items'][0]['snippet']['title']
kind = response['items'][0]['id']['kind']

In [54]:
channel = channelList.loc[0, 'channelName']
channel

'Programming with Mosh'

In [55]:
data['channelName'].append(channel)
data['title'].append(title)
data['id'].append(id)
data['kind'].append(kind)

In [56]:
data

{'channelName': ['Programming with Mosh'],
 'title': ['Programming with Mosh'],
 'id': ['UCWv7vMbMWH4-V0ZXdmDpPBA'],
 'kind': ['youtube#channel']}

### Retrive the Id for each channel in dataset

In [57]:
for channel in channels[1:]:

  request = youtube.search().list(
        part="snippet",
        maxResults=1,
        q=channel,
        type="channel"
    )
  
  response = request.execute()

  id = response['items'][0]['snippet']['channelId']
  title = response['items'][0]['snippet']['title']
  kind = response['items'][0]['id']['kind']

  data['channelName'].append(channel)
  data['title'].append(title)
  data['id'].append(id)
  data['kind'].append(kind)

In [None]:
data

In [58]:
df = pd.DataFrame.from_dict(data)

In [59]:
df.head()

Unnamed: 0,channelName,title,id,kind
0,Programming with Mosh,Programming with Mosh,UCWv7vMbMWH4-V0ZXdmDpPBA,youtube#channel
1,Traversy Media,Traversy Media,UC29ju8bIPH5as8OGnQzwJyA,youtube#channel
2,Corey Schafer,Corey Schafer,UCCezIgC97PvUuR4_gbFUs5g,youtube#channel
3,Tech With Tim,Tech With Tim,UC4JX40jDee_tINbkjycV4Sg,youtube#channel
4,Krish Naik,Krish Naik,UCNU_lfiiWBdtULKOw6X0Dig,youtube#channel


In [60]:
df['kind'].unique()

array(['youtube#channel'], dtype=object)

In [61]:
df.shape

(68, 4)

Let's check if there is a difference between the channel name and the title.

In [62]:
(df.channelName != df.title).sum()

22

Let's display the rows where the `channelName` and the `title` are different.

In [63]:
df[df.channelName != df.title]

Unnamed: 0,channelName,title,id,kind
9,programming Hero,Programming Hero,UCStj-ORBZ7TGK1FwtGAUgbQ,youtube#channel
10,naveen Reddy,Telusko,UC59K-uG2A5ogwIrHw4bmlEg,youtube#channel
11,Code Basics,codebasics,UCh9nVJoWXmFb7sLApWGcLPQ,youtube#channel
12,Maximilian Schwarzmüller,Academind,UCSJbGtTlrDami-tDGPUV9-w,youtube#channel
19,Code with Harry,CodeWithHarry,UCeVMnSShP_Iviwkknt83cww,youtube#channel
22,Tim Corey,IAmTimCorey,UC-ptWR16ITQyYOglXyQmpzw,youtube#channel
24,WesBos,Wes Bos,UCoebwHSTvwalADTJhps0emA,youtube#channel
25,Gajesh S. Naik,Gajesh S Naik,UC7PWnwwqMSqAXQkKXqxRkMw,youtube#channel
33,Scott Hansellman,Scott Hanselman,UCL-fHOdarou-CR2XUmK48Og,youtube#channel
37,Joma,JOMA,UCRqKPt2ZRU21RSl3DF81-0w,youtube#channel


There are 22 rows where the value of the `channelName` column is different from the value of the `title` column, One of the reasons is that in the web page [Top Programmer Guru](https://noonies.tech/award/top-programming-guru)  the name of the channel owner is used as `channelName` instead of the official `title`, another reasons is case sensitivity.  
We are performing this verification because, as we mentioned at the beginning of the research method, it can return more than one result, so we need to make sure that we get the right channel ID.   
We check again the difference between the `channelName` column  and the `title` column but this time ignoring the case sensitivity.

In [64]:
(df.channelName.str.lower() != df.title.str.lower()).sum()

13

In [65]:
df[df.channelName.str.lower() != df.title.str.lower()]

Unnamed: 0,channelName,title,id,kind
10,naveen Reddy,Telusko,UC59K-uG2A5ogwIrHw4bmlEg,youtube#channel
11,Code Basics,codebasics,UCh9nVJoWXmFb7sLApWGcLPQ,youtube#channel
12,Maximilian Schwarzmüller,Academind,UCSJbGtTlrDami-tDGPUV9-w,youtube#channel
19,Code with Harry,CodeWithHarry,UCeVMnSShP_Iviwkknt83cww,youtube#channel
22,Tim Corey,IAmTimCorey,UC-ptWR16ITQyYOglXyQmpzw,youtube#channel
24,WesBos,Wes Bos,UCoebwHSTvwalADTJhps0emA,youtube#channel
25,Gajesh S. Naik,Gajesh S Naik,UC7PWnwwqMSqAXQkKXqxRkMw,youtube#channel
33,Scott Hansellman,Scott Hanselman,UCL-fHOdarou-CR2XUmK48Og,youtube#channel
41,Programming With ERik,Program With Erik,UCshZ3rdoCLjDYuTR_RBubzw,youtube#channel
42,Forest Knight,ForrestKnight,UC2WHjPDvbE6O328n17ZGcfg,youtube#channel


Now we only have `13` rows left to verify. Quick check in Youtube and we see that there is only one error, it is in row 52. Let's correct the error and save the data.

In [66]:
channle_52 = {'title': 'IIMB Inventors Inventing Machine Business', 'id':'UCWym_j-OGCzIfz8K4vXtU6g'}

In [67]:
df.loc[52, 'title'] = 'IIMB Inventors Inventing Machine Business'
df.loc[52, 'id'] = 'UCWym_j-OGCzIfz8K4vXtU6g'

In [68]:
df.loc[52]

channelName                                  Prajwal NH 
title          IIMB Inventors Inventing Machine Business
id                              UCWym_j-OGCzIfz8K4vXtU6g
kind                                     youtube#channel
Name: 52, dtype: object

We going to store data in `csv` file

In [69]:
df.to_csv('data/channelsID.csv')