# YouTube Video Transcript Summarizer with Q&A

This project allows users to enter a search topic or a YouTube video link to automatically fetch the transcript, generate a summary, and ask questions about the video content. If the transcript lacks the necessary information, the system performs a web search to provide a more accurate answer.

In [None]:
!pip install -q requests youtube-transcript-api langchain langchain_community chromadb langchain_groq pytube tavily-python langchain-tavily

In [None]:
import os
from google.colab import userdata

YOUTUBE_API_KEY = userdata.get("YouTube")
os.environ["GROQ_API_KEY"] = userdata.get("Groq")
os.environ['TAVILY_API_KEY'] = userdata.get("Tavily")

By using the YouTube Data API v3 to search videos and extract the first video ID, users can enter general topics instead of specific video URL.

In [None]:
import requests

def search_youtube_video(query):
  url = "https://www.googleapis.com/youtube/v3/search"
  params = {
      "part": "snippet",
      "q": query,
      "type": "video",
      "maxResults": 1,
      "key": YOUTUBE_API_KEY
  }

  response = requests.get(url, params=params)
  results = response.json()

  if results.get("items"):
    video = results["items"][0]
    video_id = video["id"]["videoId"]
    title = video["snippet"]["title"]
    print(f"Found YouTube video: {title} (https://www.youtube.com/watch?v={video_id})")
    return video_id

  return None

Using the video ID, this function attempts to fetch the English transcript through the YouTubeTranscriptApi.

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi

def fetch_transcript(video_id):
  try:
    transcript = YouTubeTranscriptApi().fetch(video_id, languages=['en', 'en-US', 'en-GB'])
    text = " ".join([entry.text for entry in transcript])
    return text
  except Exception as e:
    return None

Because transcripts can be very long, this function splits the text into overlapping chunks. This chunking helps ensure that the vector search engine can effectively find relevant pieces of text when the user asks questions, without losing important context.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_into_chunks(text):
  splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
  documents = splitter.create_documents([text])
  return documents

This function converts the text chunks into vector embeddings using a pre-trained SentenceTransformer model. These embeddings are stored in Chroma database, which enables fast similarity search.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

def store_in_db(documents):
  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  vectorstore = Chroma.from_documents(documents, embedding_model, persist_directory="chroma_db")
  return vectorstore

Language Model Setup and Prompt Templates: sets up the LLM, prompt templates, and chains for routing user questions and performing retrieval-augmented generation (RAG).

In [None]:
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain import hub

llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0.0)

router_system_msg = """
You are given a user's question and some context from a video transcript.
Decide whether the question can be answered using only the given context.

If yes, return "Can respond".
If not, return "Search required".

Return a JSON with a single key 'decision'. No explanation or extra text.
"""

router_user_msg = "Context:\n{context}\n\nQuestion: {question}"

router_prompt = PromptTemplate(
    template=router_system_msg + "\n\n" + router_user_msg,
    input_variables=["question", "context"]
)

router_chain = router_prompt | llm | JsonOutputParser()

rag_prompt = hub.pull("feefe/rag_llama3_1")
rag_chain = rag_prompt | llm | StrOutputParser()

If the router decides the transcript doesn’t have enough information, this function performs a web search using the TavilySearch tool. It collects the top results content into a single document, which is then used as external context for the RAG model to generate a relevant answer.

In [None]:
from langchain_tavily import TavilySearch
from langchain.schema import Document

def do_web_search(query: str, max_res=3) -> Document:
  web_search_tool = TavilySearch(max_results=max_res, search_depth='advanced')
  results = web_search_tool.invoke({"query": query})
  results_content = "\n".join([res["content"] for res in results['results']])
  return Document(page_content=results_content)

This is the core function for processing and generating answers. It retrieves context from transcript vectorstore and uses the router to determine if it’s sufficient. If not, performs a web search and answers using a RAG prompt.

In [None]:
def search_and_answer(qa_chain, query):
  retrieved_docs = qa_chain.retriever.get_relevant_documents(query)
  transcript_context = "\n".join(doc.page_content for doc in retrieved_docs[:3])

  decision_response = router_chain.invoke({
      "question": query,
      "context": transcript_context
  })
  decision = decision_response.get("decision", "").strip().lower()

  if decision == "can respond":
    response = qa_chain.invoke(query)
    result = response.get("result", "") if isinstance(response, dict) else response

    no_answer_phrases = ["i don't", "not sure", "cannot answer", "no information", "not mentioned", "not related"]

    if result.strip() and all(phrase not in result.lower() for phrase in no_answer_phrases):
      return result

  print("No relevant information in transcript. Switching to web search...")
  web_doc = do_web_search(query)
  rag_response = rag_chain.invoke({
      "question": query,
      "context": web_doc.page_content
  })

  return rag_response

RetrievalQA chain for question answering is built to generate factual responses. For this task, its temperature is set to zero, thus decreasing the chance for hallucinations.

In [None]:
from langchain.chains import RetrievalQA

def build_factual_chain(vectorstore):
  factual_llm = ChatGroq(model_name="llama-3.1-8b-instant", temperature=0.0)
  factual_chain = RetrievalQA.from_chain_type(llm=factual_llm, retriever=vectorstore.as_retriever())

  return factual_chain

A separate prompt-based summary chain has been created for summarization, using a temperature of 0.4 to allow some creativity while keeping the output controlled. To handle large transcripts that exceed the model's context limits, each chunk is summarized individually. These partial summaries are then combined and summarized again to produce a final summary.

In [None]:
summary_llm = ChatGroq(model_name="llama-3.1-8b-instant", temperature=0.4)

summary_prompt = PromptTemplate.from_template("""
You are a helpful assistant. Analyze the following video transcript, extract the key points and provide a structured, concise summary.

Transcript:
{transcript}

Return a concise paragraph summarizing the video without any introductory phrases.
""")

summary_chain = summary_prompt | summary_llm | StrOutputParser()

def summarize_transcript(docs):
  summaries = []
  for doc in docs:
      partial = summary_chain.invoke({"transcript": doc.page_content})
      summaries.append(partial)

  summary = summary_chain.invoke({"transcript": "\n".join(summaries)})
  return summary

Simple utility function that wraps the text so that it is easier to read the model's output:

In [None]:
def wrap_and_print(text, width=150):
  import textwrap
  wrapped_text = textwrap.fill(text, width=width)
  print(wrapped_text)

This section handles user input of a topic or YouTube video link and supports interactive Q&A until the user exits.

In [None]:
from pytube import extract

user_input = input("Enter a topic or YouTube video link: ").strip()

if "www.youtube.com" in user_input:
  video_id = extract.video_id(user_input)
else:
  video_id = search_youtube_video(user_input)

if not video_id:
  print("No video found.")
  exit()

transcript = fetch_transcript(video_id)
if not transcript:
  print("Couldn't fetch transcript.")
  exit()

docs = split_into_chunks(transcript)

summary = summarize_transcript(docs)
print("\nVideo Summary:")
wrap_and_print(summary)

vectordb = store_in_db(docs)
factual_chain = build_factual_chain(vectordb)

while True:
  question = input("\nAsk a question (or type 'exit'): ").strip()
  if question.lower() in ("exit"):
    break

  answer = search_and_answer(factual_chain, question)

  print("\nAnswer:")
  wrap_and_print(answer)