# Motivation
I want to try downloading some YouTube videos using [pytube](https://github.com/pytube/pytube), a Python library meant for working with YouTube content. 

# Setup
The cells below will set up the rest of the notebook.

I'll start by changing my working directory to the root of the repo. 

In [1]:
# Change to the root of the repo
%cd ..

C:\Data\Personal Study\Programming\neural-needle-drop


Next, I'll import some libraries. 

In [2]:
# Import statements
from pytube import YouTube
from pytube import Channel
import pandas as pd
from pathlib import Path
from time import time

# Experimentation


### Downloading Audio
I'm going to follow [pytube's quickstart guide](https://pytube.io/en/latest/user/quickstart.html) and try to download the audio for [this TheNeedleDrop review](https://www.youtube.com/watch?v=ue2o_EokaIw).

In [3]:
# Declare the YouTube object, which we'll call video
video = YouTube("https://www.youtube.com/watch?v=ue2o_EokaIw")

# Print the title of the video
print(f"The title of the video in question is:\n{video.title}")

The title of the video in question is:
Oneohtrix Point Never - Magic Oneohtrix Point Never ALBUM REVIEW


In [4]:
# Print the description of the video
print(f"The description of this video is as follows:\n{video.description}")

The description of this video is as follows:
Listen: https://www.youtube.com/watch?v=w5azY0dH67U

As an overview of Daniel Lopatin's musical exploits, Magic OPN isn't quite as spectacular as it could have been.

More electronic reviews: https://www.youtube.com/playlist?list=PLP4CSgl7K7ormX2pL9h0inES2Ub630NoL

Subscribe: http://bit.ly/1pBqGCN

Patreon: https://www.patreon.com/theneedledrop

Official site: http://theneedledrop.com

Twitter: http://twitter.com/theneedledrop

Instagram: https://www.instagram.com/afantano

TND Twitch: https://www.twitch.tv/theneedledrop

FAV TRACKS: THE WHETHER CHANNEL, NO NIGHTMARES, TALES FROM THE TRASH STRATUM, IMAGO, NOTHING'S SPECIAL

LEAST FAV TRACK: I DON'T LOVE ME ANYMORE

ONEOHTRIX POINT NEVER - MAGIC ONEOHTRIX POINT NEVER / 2020 / WARP / NEO-PSYCH, AMBIENT, HYPNAGOGIC POP, ODE TO RADIO

6/10

Y'all know this is just my opinion, right?


Seems *really* easy. Let's see if we can download things. Apparently, YouTube is using this thing called "DASH" - [Dynamic Adaptive Streaming over HTTP](https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP). With pytube, you can look at all of the different streams, and then download them as you see fit. 

Below, I've got a loop to determine the highest-bandwidth mp4 audio stream.

In [5]:
# Find the highest-bitrate mp4 audio stream
highest_bitrate_mp4_audio_stream = None
highest_bitrate_found = 0
for stream in video.streams.filter(only_audio=True):
    if (stream.mime_type == "audio/mp4"):
        stream_bitrate = int(stream.abr.split("kbps")[0])
        if (stream_bitrate > highest_bitrate_found):
            highest_bitrate_mp4_audio_stream = stream
            highest_bitrate_found = stream_bitrate

Now that I've got this stream, I should be able to download it.

In [6]:
# Create a data/ folder if it doesn't exist
data_folder_path = Path("data/")
data_folder_path.mkdir(exist_ok=True, parents=True)

# Time how long it takes to download the .m4a audio for this video
start_time = time()
highest_bitrate_mp4_audio_stream.download(output_path=data_folder_path, 
                                          filename="test_audio_download.m4a",
                                          skip_existing=False)
print(f"audio downloaded to {data_folder_path}/test_audio_download.m4a in {time()-start_time:.2f}sec")

audio downloaded to data/test_audio_download.m4a in 4.57sec


Really quickly: I actually want this to be an MP3 file instead. The cell below will convert the file. 

In [7]:
# This method will convert an m4a file to mp3
def convert_m4a_to_mp3(input_file_path, output_file_path):
    
    # Import the necessary library
    import subprocess
    
    # Generate the ffmpeg command we'll use
    command = f"""ffmpeg -i {input_file_path} {output_file_path}"""

    # Run the command
    rsp = subprocess.run(command)
    
# Run the aforementioned method
convert_m4a_to_mp3(f"{data_folder_path}/test_audio_download.m4a", 
                   f"{data_folder_path}/test_audio_download.mp3")

### Downloading Channel Information
Now: another question. Can I use `pytube` to get all of the videos from TheNeedleDrop's channel? According to [their "Using Channels" tutorial](https://pytube.io/en/latest/user/channel.html#using-channels): probably, yeah. 

In [16]:
# Creating a Channel object for theneedledrop
theneedledrop_channel = Channel('https://www.youtube.com/c/theneedledrop')

# Determining all of his videos
all_video_urls = theneedledrop_channel.video_urls
print(f"theneedledrop has {len(all_video_urls):,} videos.")

theneedledrop has 3,971 videos.


Nice - I've got the URLs for **ALL** of Anthony Fantano's reviews. Let's see if I can guess some sort of pattern to identify the reviews themselves (versus additional content). 

**Actually...** I don't really *care* about differentiating the reviews right now. In theory, I want to download *all* of his videos, and then classify whether they're reviews or not later. 