<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/scene-index-2/integrations/llama-index/simple_video_rag_with_readers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# VideoDB Readers 

### Note : Draft Notebook & Integrations


&nbsp;
## 🛠️️ Setup connection

###  Requirements

To connect to VideoDB, simply get the API key and create a connection. This can be done by setting the `VIDEO_DB_API_KEY` environment variable. You can get it from 👉🏼 [VideoDB Console](https://console.videodb.io). ( Free for first 50 uploads, **No credit card required!** )

Get your `OPENAI_API_KEY` from OpenAI platform for `llama_index` response synthesizer.

<!-- > Set the `OPENAI_API_KEY` & `VIDEO_DB_API_KEY` environment variable with your API keys. -->

In [None]:
import os

os.environ["OPENAI_API_KEY"] = ""
os.environ["VIDEO_DB_API_KEY"] = ""

### Installing Dependencies

To get started, we'll need to install the following packages:

- `llama-index`
- `llama-index-readers-videodb`
- `videodb`

In [None]:
%pip install llama-index
%pip install videodb

In [None]:
%pip install git+https://github.com/video-db/llama_index@add-videodb-readers#subdirectory=llama-index-integrations/readers/llama-index-readers-videodb

### Data Ingestion

Let's upload a few video files first. You can use any `public url`, `Youtube link` or `local file` on your system. First 50 uploads are free!

In [None]:
from videodb import connect

# connect to VideoDB
conn = connect()

# upload videos to default collection in VideoDB
print("uploading first video")
video = conn.upload(url="https://www.youtube.com/watch?v=zdPSaMuLRso&list=PL6duHAYny9nNuTHN2EKNfuhmmRKCJD7os&index=5")

> * `coll = conn.get_collection()` : Returns default collection object.
> * `coll.get_videos()` : Returns list of all the videos in a collections.
> * `coll.get_video(video_id)`: Returns Video object from given`video_id`.

### Index Spoken Words 

In [None]:
video.index_spoken_words()

### Indexing scenes


In [None]:
from videodb import SceneExtractionType

index_id = video.index_scenes(extraction_type=SceneExtractionType.time_based, extraction_config={"time": 30}, model_name="gemini-1.5-pro")
scenes = video.get_scene_index(index_id)

#NOTE: replace this by your scn collection id
scn_col_id = "tt30sf"

### SimpleRAG

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.readers.videodb import VideoDBReader

import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)


reader = VideoDBReader(video_ids=[video.id], base_url="https://api.videodb.io")

video_documents = reader.load_data(video_id=video.id, load_scene=True, load_transcript=True, scn_col_id=scn_col_id)

In [None]:
from llama_index.core.node_parser import SentenceSplitter 

splitter = SentenceSplitter(
    chunk_size=500,
    chunk_overlap=20,
)
video_nodes = splitter.get_nodes_from_documents(video_documents)

In [None]:
reader.post_process_transcript_nodes(video_nodes, video_documents)

In [None]:
for node in transcript_nodes:
    print(node.metadata)
    print(node.text)

In [None]:
# Further pipeline