In [1]:
# deps
!pip3 install youtube_transcript_api openai

Collecting youtube_transcript_api
  Downloading youtube_transcript_api-0.6.2-py3-none-any.whl.metadata (15 kB)
Collecting openai
  Downloading openai-1.16.2-py3-none-any.whl.metadata (21 kB)
Collecting requests (from youtube_transcript_api)
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting anyio<5,>=3.5.0 (from openai)
  Using cached anyio-4.3.0-py3-none-any.whl.metadata (4.6 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Using cached httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.6.4-py3-none-any.whl.metadata (85 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.1/85.1 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sniffio (from openai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting tqdm>4 (from openai)
  Usi

In [4]:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter

import os
import textwrap
import re
from openai import OpenAI
client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

PROMPT_STRING = "Write a detailed summary of the following with the starting and ending timestamp formatted nicely in markdown format (with proper spacing and line change). With different levels of heading and cooresponding bullets points supporting it:\n\n<<SUMMARY>>\n"

# Get transcript for given YouTube video id
video_id = "wjZofJX0v4M"
transcript = YouTubeTranscriptApi.get_transcript(video_id)
# Format transcript using TextFormatter from youtube_transcript_api library
formatter = TextFormatter()
transcript = formatter.format_transcript(transcript)
print(transcript)

video_length = len(transcript)

# If the video is ~25 minutes or more, double the chunk size
# This is done to reduce overall amount of API calls
chunk_size = 16000 

# Wrap the transcript in chunks of characters
chunks = textwrap.wrap(transcript, chunk_size)

summaries = list()

# For each chunk of characters, generate a summary
for chunk in chunks:
    prompt = PROMPT_STRING.replace("<<SUMMARY>>", chunk)

    # Generate summary using GPT-3
    # If the davinci model is incurring too much cost,
    # the text-curie-001 model may be used in its place.
    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"{prompt}",
            }
        ],
        model="gpt-3.5-turbo",
    )
    summary = re.sub("\s+", " ", response.choices[0].message.content)
    summaries.append(summary)

# Join the chunk summaries into one string
chunk_summaries = " ".join(summaries)
prompt = PROMPT_STRING.replace("<<SUMMARY>>", chunk_summaries)

# Generate a final summary from the chunk summaries
response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"{prompt}",
            }
        ],
        model="gpt-3.5-turbo",
    )
final_summary = re.sub("\s+", " ", response.choices[0].message.content)

# Print out all of the summaries
for idx, summary in enumerate(summaries):
    print(f"({idx}) - {summary}\n")

print(f"(Final Summary) - {final_summary}")

The initials GPT stand for Generative Pretrained Transformer.
So that first word is straightforward enough, these are bots that generate new text.
Pretrained refers to how the model went through a process of learning 
from a massive amount of data, and the prefix insinuates that there's 
more room to fine-tune it on specific tasks with additional training.
But the last word, that's the real key piece.
A transformer is a specific kind of neural network, a machine learning model, 
and it's the core invention underlying the current boom in AI.
What I want to do with this video and the following chapters is go through 
a visually-driven explanation for what actually happens inside a transformer.
We're going to follow the data that flows through it and go step by step.
There are many different kinds of models that you can build using transformers.
Some models take in audio and produce a transcript.
This sentence comes from a model going the other way around, 
producing synthetic speech just