## Crawling Data (Youtube)

### Crawling data From Youtube

**Tujuan** dari program ini adalah melakukan crawling (pengambilan) data komentar pada sebuah video Youtube menggunakan **Youtube Data API v3**. Sebelum mencoba program ini, pastikan Anda sudah memiliki (mengaktifkan) layanan Youtube Data API dan telah membangkitkan **API Key**. 

Jika belum memiliki **API KEY**, Anda dapat mengikuti petunjuk singkat sebagai berikut: 
1. Login ke Google Developer Console (https://console.developers.google.com/)dengan akun Google Anda
2. Buat project baru dan lengkapi isian yang diminta. 
3. Aktifkan Layanan API pada halaman project, dan cari **Youtube Data API v3**.
4. Dari halaman dashboard, buat kredential agar API tersebut dapat digunakan. Klik tombol **Buat Kredensial** (**Create Credential**). Lengkapi isian formnya.
5. Anda dapat mengakses / melihat API KEY pada tab **Credentials**.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd /content/drive/My Drive/prosaindata/

/content/drive/My Drive/prosaindata


In [None]:
#install library
!pip install sastrawi
!pip install swifter
!pip install gensim

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
#import library
import pandas as pd
from googleapiclient.discovery import build
import numpy as np
from string import punctuation
import re
import nltk

In [None]:
#Membuat function untuk crawling data
def video_comments(video_id):
	# empty list for storing reply
	replies = []

	# creating youtube resource object
	youtube = build('youtube', 'v3', developerKey=api_key)

	# retrieve youtube video results
	video_response = youtube.commentThreads().list(part='snippet,replies', videoId=video_id).execute()

	# iterate video response
	while video_response:
		
		# extracting required info
		# from each result object
		for item in video_response['items']:
			
			# Extracting comments ()
			published = item['snippet']['topLevelComment']['snippet']['publishedAt']
			user = item['snippet']['topLevelComment']['snippet']['authorDisplayName']

			# Extracting comments
			comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
			likeCount = item['snippet']['topLevelComment']['snippet']['likeCount']

			replies.append([published, user, comment, likeCount])
			
			# counting number of reply of comment
			replycount = item['snippet']['totalReplyCount']

			# if reply is there
			if replycount>0:
				# iterate through all reply
				for reply in item['replies']['comments']:
					
					# Extract reply
					published = reply['snippet']['publishedAt']
					user = reply['snippet']['authorDisplayName']
					repl = reply['snippet']['textDisplay']
					likeCount = reply['snippet']['likeCount']
					
					# Store reply is list
					#replies.append(reply)
					replies.append([published, user, repl, likeCount])

			# print comment with list of reply
			#print(comment, replies, end = '\n\n')

			# empty reply list
			#replies = []

		# Again repeat
		if 'nextPageToken' in video_response:
			video_response = youtube.commentThreads().list(
					part = 'snippet,replies',
					pageToken = video_response['nextPageToken'], 
					videoId = video_id
				).execute()
		else:
			break
	#endwhile
	return replies


In [None]:
# isikan dengan api key Anda
api_key = 'AIzaSyBcQknzxNArq2ASQeN3IXu-PkvyugNKhPs'

# Enter video id
# contoh url video = https://www.youtube.com/watch?v=5tucmKjOGi8
video_id = "KtntKGlmuZw" #isikan dengan kode / ID video

# Call function
comments = video_comments(video_id)

comments

[['2023-05-11T04:11:55Z',
  'mujiati cantik',
  'jaga wibawanya, pak GP. jgn cengar cengir.',
  0],
 ['2023-05-11T04:05:47Z',
  'Reza Rizal Ys',
  'AKU PILIH RAKYAT BIASA AJA YAITU ANIES BASWEDAN YG PRO RAKYAT BUKAN PETUGAS PARTAI 😏',
  0],
 ['2023-05-11T01:22:44Z', 'NINIEN S', 'hindari jauh2 penggemar video', 0],
 ['2023-05-11T01:19:30Z',
  'Nasrun SMD',
  'Bagi saya cuman ada calon ofsi yang bisa di pilih. Pak.Prabowo dan Pak Ganjar.',
  0],
 ['2023-05-11T00:00:41Z', 'Zaen Arif', 'Konten pkokkkkkk', 0],
 ['2023-05-10T23:32:22Z',
  'Ama agus Kudrotilah',
  'Wajah koruptor mnyesakan Dada,,,kalo mbunuh di bolehkan ,,,😅akan koruptor diduluin,,,😂',
  0],
 ['2023-05-10T23:28:28Z',
  'Ama agus Kudrotilah',
  'Kasihan lambang Islam dukung ci porno yg tiada akhlaq tiada adab,,otax cabul mau djadikan pmimpin,,,parah negeri ini akan jadi negri binatang,,,gada morall,,,gantung koruptr',
  0],
 ['2023-05-10T22:25:41Z', 'Bbg Susilo', 'Setuju banget', 0],
 ['2023-05-10T20:05:26Z',
  'Eko Waluyo',
 

In [None]:
#menjadikan dataframe
df = pd.DataFrame(comments, columns=['publishedAt', 'authorDisplayName', 'text', 'likeCount'])
df

Unnamed: 0,publishedAt,authorDisplayName,text,likeCount
0,2023-05-11T04:11:55Z,mujiati cantik,"jaga wibawanya, pak GP. jgn cengar cengir.",0
1,2023-05-11T04:05:47Z,Reza Rizal Ys,AKU PILIH RAKYAT BIASA AJA YAITU ANIES BASWEDA...,0
2,2023-05-11T01:22:44Z,NINIEN S,hindari jauh2 penggemar video,0
3,2023-05-11T01:19:30Z,Nasrun SMD,Bagi saya cuman ada calon ofsi yang bisa di pi...,0
4,2023-05-11T00:00:41Z,Zaen Arif,Konten pkokkkkkk,0
...,...,...,...,...
1701,2023-05-10T14:42:51Z,Seraphine Tan,Bacot,0
1702,2023-05-10T13:37:23Z,Andre Cebong,@Rajawali&#39;88 tepat skali,0
1703,2023-05-10T13:28:35Z,supardi pardi,"@Rajawali&#39;88 dan apakan anda lebih mudeng,...",0
1704,2023-05-10T13:21:39Z,Ari Torong,Begono bro..yen kowe arep ng semarang pilih ml...,0


In [None]:
%cd /content/drive/My Drive/prosaindata/tugas/Dataset

In [None]:
#simpan hasil crawling ke csv
df.to_csv('youtube_comments.csv', index=False)