# **RAG using Video as Context**

This notebook will guide you on creating a RAG pipeline with a video as the knowledge source. The pipeline will be able to answer questions based on the video content.

## **Setup**

In [None]:
%pip install indexify indexify-extractor-sdk pytube indexify-langchain langchain langchain-openai

# Download Indexify Server
!curl https://www.tensorlake.ai | sh

# Download Extractors
!indexify-extractor download hub://audio/whisper-asr
!indexify-extractor download hub://video/audio-extractor
!indexify-extractor download hub://text/chunking
!indexify-extractor download hub://embedding/minilm-l6

After installing the necessary libraries, download the server, and the extractors, you need to restart the runtime. Then, you have to run Indexify Server with the Extractors.

Open 2 terminals and run the following commands:

```bash
# Terminal 1
./indexify server -d

# Terminal 2
indexify-extractor join-server
```

## **RAG Pipeline**

In Indexify, we need to create an Extraction Graph to extract the information from the video. Think of it as a pipeline that will process the video and extract the necessary information.

For this example, we will use the following extraction policies:
1. Extract the audio from the video (`audio-extractor`)
2. Transcribe the audio to text (`whisper-asr`)
3. Chunk the text into paragraphs (`chunk-extractor`)
4. Generate the embeddings and index the paragraphs (`minilm-l6`)

Extraction Policies: How we extract the information from a data source.

In [12]:
from indexify import IndexifyClient, ExtractionGraph
client = IndexifyClient()

In [14]:
client.extractors()

[Extractor(name=tensorlake/audio-extractor, description=Extract audio from video, input_params={'properties': {}, 'title': 'AudioExtractorConfig', 'type': 'object'}, input_mime_types=['video', 'video/mp4', 'video/mov', 'video/avi'], outputs={}),
 Extractor(name=tensorlake/chunk-extractor, description=Text Chunk Extractor, input_params={'properties': {'chunk_size': {'default': 100, 'title': 'Chunk Size', 'type': 'integer'}, 'headers_to_split_on': {'default': [], 'items': {'type': 'string'}, 'title': 'Headers To Split On', 'type': 'array'}, 'overlap': {'default': 0, 'title': 'Overlap', 'type': 'integer'}, 'text_splitter': {'default': 'recursive', 'enum': ['char', 'recursive', 'markdown', 'html'], 'title': 'Text Splitter', 'type': 'string'}}, 'title': 'ChunkExtractionInputParams', 'type': 'object'}, input_mime_types=['text/plain'], outputs={}),
 Extractor(name=tensorlake/minilm-l6, description=MiniLM-L6 Sentence Transformer, input_params=None, input_mime_types=['text/plain'], outputs={'em

In [13]:
extraction_graph_spec = """
name: "video-knowledgebase"
extraction_policies:
   - extractor: "tensorlake/audio-extractor"
     name: "audio_clips"

   - extractor: "tensorlake/whisper-asr"
     name: "transcription"
     content_source: "audio_clips"

   - extractor: "tensorlake/chunk-extractor"
     name: "transcription_chunks"
     input_params:
        chunk_size: 1000
        overlap: 250
     content_source: "transcription"

   - extractor: "tensorlake/minilm-l6"
     name: "transcription-embedding"
     content_source: "transcription_chunks"
"""

extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)                                            

## **Upload Video to Indexify**

In [None]:
from pytube import YouTube
import os
yt = YouTube("https://www.youtube.com/watch?v=cplSUhU2avc")
file_name = "state_of_the_union_2024.mp4"
if not os.path.exists(file_name):
    video = yt.streams.filter(progressive=True, file_extension="mp4").order_by("resolution").desc().first()
    video.download(filename=file_name)

In [None]:
client.upload_file(extraction_graphs="video-knowledgebase", path="state_of_the_union.mp4")       

## **RAG with Indexify**

In [None]:
from indexify_langchain import IndexifyRetriever

params = {
    "name": "video-knowledgebase.transcription-embedding.embedding",
    "top_k": 50
}

retriever = IndexifyRetriever(client=client, params=params)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(openai_api_key="xxx")
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
chain.invoke("Whats President Biden doing to save climate and the evidences he provides?")        

'Biden is taking significant action on climate by cutting carbon emissions in half by 2030, creating clean energy jobs, launching the Climate Corps, and working towards environmental justice. He mentions that the world is facing a climate crisis and that all Americans deserve the freedom to be safe. Biden also mentions that America is safer today than when he took office and provides statistics on murder rates and violent crime decreasing.'