# Motivation
I wanted to play around with [OpenAI's Whisper](https://github.com/openai/whisper), so I've created this notebook. 

# Setup
The cells below will help to set up the rest of the notebook.

I'll start by changing directories to the root of the repo. 

In [1]:
# Change the directory to the root of the repo
%cd ..

C:\Data\Personal Study\Programming\neural-needle-drop


Next, I'll import a couple of different libraries.

In [2]:
# Import statements
import subprocess
import whisper
from pathlib import Path
from time import time
import torch

Finally, I'll set up Whisper by loading the model. Since I have an 8GB GPU, I should be able to load in [their medium.en model](https://github.com/openai/whisper#available-models-and-languages). I'm using the English only model, since TheNeedleDrop reviews are in English. Apparently this model performs a little better for English only applications. 

In [3]:
# Determining whether we'll use GPU or CPU
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load in the model
model_type = "medium.en"
start_time=time()
model = whisper.load_model(model_type, device=DEVICE)
print(f"Loaded the '{model_type}' model into {DEVICE} in {time()-start_time:.2f} seconds")

Loaded the 'medium.en' model into cuda in 7.60 seconds


In [7]:
# I also want to load in the base_model - this way, I can compare their transcriptions
base_model = whisper.load_model("base.en", device=DEVICE)

# Experimentation
Now that I've got the model loaded in, I want to test things out. I'm going to follow their [Python usage](https://github.com/openai/whisper#python-usage) quickstart within the repo's README. 

First question: how long does it take me to transcribe a single Anthony Fantano review? (The next couple of cells assume that you've run the **Experiments** Section of the **`Pytubes Experiments`** notebook.)

In [4]:
# Declare the path to the test file
test_file_path = Path("data/test_audio_download.mp3")

# Time how long the transcription takes
start_time = time()
transcription = model.transcribe(str(test_file_path))
print(f"Transcription took {time()-start_time:.2f} seconds.")

Transcription took 9.3e+01 seconds.


Next: how *good* is the transcription? I'm going to paste the first couple hundred characters of it to understand what's going on. 

In [11]:
transcription["text"][:1000]

" Yeah, yeah, hi everyone orange man almost gone here the Internet's busiest music nerd And it's time for a review of the new one Oh tricks point never album Magic one Oh tricks point never this is the latest LP from prolific composer producer sonic alchemist Daniel Lopatin aka one Oh tricks point never This I believe is his fourth full-length LP with the legendary Warp records and the project seems like a pretty huge concept for him an artistic self-portrait of sorts Maybe the records title flow and various interludes are all deeply inspired by the world of radio pretty interesting We're finally getting a deep dive on something like this especially considering the one Oh tricks point never name is a play on a radio station's frequency numbers anyway now when I understood this was the conceptual direction of this new album that excited me because personally I do have a lot of Passion for the world of radio ever since really I had a little boom box that I could tape my favorite songs of

Wow. It's practically perfect. This is about the first minute of the review. 

Let's transcribe it again, but this time using the base model. I want to compare the accuracy