Skip to content

Files

Latest commit

 

History

History

data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

💾 Data collection

Using the youtube-scraper package, data from 1000 youtube videos were collected using 2200 YouTube Data API v3 quotas, that is, 22% of the daily available quotas.

Execution time (Obtaining 100 videos for each query):

image

From scraping/ run:

go mod tidy
go run main.go

📝 Data "cleansing"

Using regular expressions on python, unwanted characters, like emojis, urls and linebreaks were deleted.

Fom cleansing/ run:

(optional) virtualenv -p python3 venv
(optional) source venv/bin/activate

pip install notebook 
jupyter notebook

Then, open the cleansing.ipynb file.