<a href="https://colab.research.google.com/github/vkrisvasan/llamaKV/blob/main/llamaindexYouTubekv1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
#The code loads a Youtube transcript about Neuralink,
#splits it into smaller chunks, and uses it to create a question-answering system.
#It leverages a large language model "llama-3.1-8b-instant" and
#an embedding model "sentence-transformers/all-MiniLM-L6-v2" to understand and
#uses VectorStoreIndex from llama_index, which saves the vector index and
#its associated data to a local file directory and
#LlamaIndex acts as a bridge between the Youtube transcript by
#ingesting, indexing and querying the data by using the capabilities of LLM and
#respond to user queries about the transcript.

# Install required packages
!pip install youtube-transcript-api llama-index-readers-youtube-transcript llama-index llama-index-llms-groq groq llama-index-embeddings-huggingface -q

In [2]:
# Import necessary modules from llama_index
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    load_index_from_storage,
    Settings
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq

In [3]:
# Import os and getpass for handling credentials
import os
import getpass
# Prompt for credentials if not found in environment variables
credential_names = ["GROQ_API_KEY"]
for credential in credential_names:
  if credential not in os.environ:
    os.environ[credential]=getpass.getpass("Provide your..." + credential)

Provide your...GROQ_API_KEY··········


In [4]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [5]:
# Import YoutubeTranscriptReader and load data from YouTube links
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(
    ytlinks=["https://www.youtube.com/watch?v=Kbk9BiPhm7o"] #Elon Musk Nolan BCI NeuraLink Future of Humanity
)

In [6]:
# Initialize SentenceSplitter for text splitting
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
# Split documents into nodes
nodes = text_splitter.get_nodes_from_documents(documents, show_progress=True)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

In [7]:
# Print information about loaded documents and nodes
print(f"Loaded {len(documents)} documents")
print(f"Split into {len(nodes)} nodes")
print(f"nodes [0] {nodes[0].metadata} ")

Loaded 1 documents
Split into 140 nodes
nodes [0] {'video_id': 'Kbk9BiPhm7o'} 


In [8]:
# Configure LLM and embedding model settings

Settings.llm = Groq(model="llama-3.1-8b-instant",api_key=os.environ["GROQ_API_KEY"])

Settings.embed_model = HuggingFaceEmbedding(
    #model_name="BAAI/bge-small-en-v1.5"
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
# Create VectorStoreIndex from documents and persist to storage
vector_index = VectorStoreIndex.from_documents(documents, show_progress=True, node_parser=nodes)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/140 [00:00<?, ?it/s]

In [10]:
# Load index from storage and create query engine
vector_index.storage_context.persist(persist_dir="./storage_mini")
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

In [11]:
# Perform queries and print responses
query = "summarise the document"
resp = query_engine.query(query)
print(resp)

The speaker, who has a background of being a thoughtful and introspective person, has always been fascinated by the human brain and its workings. They believe that understanding how the brain encodes information, generates desires, and creates suffering could lead to better solutions for humanity's problems. They draw parallels between human behavior and that of non-human primates, suggesting that by studying their social structures and behaviors, we can gain insights into human nature.

The speaker also touches on the idea that humans are driven by basic desires such as companionship, sex, food, and power, and that understanding these drives can help us reduce the complexity of human behavior. They mention their own experiences, including a trip to the Amazon jungle, where they observed the primal nature of life and the importance of social status and mating.

The conversation also delves into the concept of asking the right questions, with the speaker suggesting that the meaning of h

In [16]:
# prompt: chat bot over the document
# Start a simple chat loop
while True:
  query = input("Ask a question: ")
  if query.lower() == "exit":
    break
  response = query_engine.query(query)
  print(response)

Ask a question: give deep context about speakers
The speakers in this conversation seem to be individuals with a high level of expertise and interest in cutting-edge technology, particularly in the fields of brain-computer interfaces and neurotechnology. They appear to be familiar with various concepts and terminology related to these fields, such as neural signals, electrode placement, and calibration processes.

One of the speakers mentions having an RFID chip implanted in their body, which suggests that they are part of a community of individuals who are experimenting with biohacking and implanting microchips for various purposes, including storing private data and demonstrating technological capabilities.

The speakers also discuss the potential for brain-computer interfaces to revolutionize the way people interact with technology, comparing it to the development of the piano, which allowed people to create music in a way that was previously unimaginable. They express excitement ab