<a href="https://colab.research.google.com/github/vkrisvasan/llamaKV/blob/main/llamaindexYouTubekv1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#The code loads a Youtube transcript about Neuralink,
#splits it into smaller chunks, and uses it to create a question-answering system.
#It leverages a large language model "llama-3.1-8b-instant" and
#an embedding model "sentence-transformers/all-MiniLM-L6-v2" to understand and
#uses VectorStoreIndex from llama_index, which saves the vector index and its associated data to a local file directory and
#LlamaIndex acts as a bridge between the Youtube transcript by
#ingesting, indexing and querying the data by using the capabilities of LLM and
#respond to user queries about the transcript.

# Install required packages
!pip install youtube-transcript-api
!pip install llama-index-readers-youtube-transcript
!pip install llama-index llama-index-llms-groq groq llama-index-embeddings-huggingface

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.2-py3-none-any.whl.metadata (15 kB)
Downloading youtube_transcript_api-0.6.2-py3-none-any.whl (24 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.2
Collecting llama-index-readers-youtube-transcript
  Downloading llama_index_readers_youtube_transcript-0.1.4-py3-none-any.whl.metadata (2.0 kB)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-readers-youtube-transcript)
  Downloading llama_index_core-0.10.62-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json (from llama-index-core<0.11.0,>=0.10.1->llama-index-readers-youtube-transcript)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.11.0,>=0.10.1->llama-index-readers-youtube-transcript)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-i

In [2]:
# Import necessary modules from llama_index
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    load_index_from_storage,
    Settings
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq

In [18]:
# Import os and getpass for handling credentials
import os
import getpass
# Prompt for credentials if not found in environment variables
credential_names = ["GROQ_API_KEY"]
for credential in credential_names:
  if credential not in os.environ:
    os.environ[credential]=getpass.getpass("Provide your..." + credential)

In [4]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [5]:
# Import YoutubeTranscriptReader and load data from YouTube links
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(
    #ytlinks=["https://www.youtube.com/watch?v=i3OYlaoj-BM"]
    #ytlinks=["https://www.youtube.com/watch?v=rzD5i-6eUO0&t"]
    ytlinks=["https://www.youtube.com/watch?v=Kbk9BiPhm7o"] #Elon Musk Nolan BCI NeuraLink Future of Humanity
)

In [6]:
# Initialize SentenceSplitter for text splitting
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
# Split documents into nodes
nodes = text_splitter.get_nodes_from_documents(documents, show_progress=True)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

In [7]:
# Print information about loaded documents and nodes
print(f"Loaded {len(documents)} documents")
print(f"Split into {len(nodes)} nodes")
print(f"nodes [0] {nodes[0].metadata} ")

Loaded 1 documents
Split into 140 nodes
nodes [0] {'video_id': 'Kbk9BiPhm7o'} 


In [8]:
# Configure LLM and embedding model settings

Settings.llm = Groq(model="llama-3.1-8b-instant",api_key=os.environ["GROQ_API_KEY"])

Settings.embed_model = HuggingFaceEmbedding(
    #model_name="BAAI/bge-small-en-v1.5"
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
# Create VectorStoreIndex from documents and persist to storage
vector_index = VectorStoreIndex.from_documents(documents, show_progress=True, node_parser=nodes)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/140 [00:00<?, ?it/s]

In [10]:
# Load index from storage and create query engine
vector_index.storage_context.persist(persist_dir="./storage_mini")
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

In [11]:
# Perform queries and print responses
query = "summarise the document"
resp = query_engine.query(query)
print(resp)

The speaker's interest in the human brain and its workings stems from their introspective nature as a child. They believe that understanding how the brain encodes information, generates desires, and produces suffering could lead to better solutions for humanity's problems. This idea is explored through the lens of primatology, where the speaker studies chimpanzees and bonobos to gain insights into human behavior. The speaker also draws parallels between human behavior and the natural world, suggesting that many of life's struggles are driven by basic desires such as companionship, sex, and power.

The conversation also touches on the importance of asking the right questions in the search for meaning in life. The speaker suggests that increasing the diversity of people asking questions and seeking consciousness can lead to a better understanding of human existence. The value of asking questions is highlighted through the example of communicating with someone who cannot speak, where the 

In [12]:
query = "Generate 5 difficult quiz questions with answer from the document "
resp = query_engine.query(query)
print(resp)

Here are 5 difficult quiz questions with answers based on the provided context:

1. What is the name of the TV show that the speaker, Bliss, mentions as an example of a discussion about the meaning of life?

Answer: The West Wing

2. According to Nolan, what was the likely cause of his diving accident that left him paralyzed?

Answer: A stray fist, elbow, knee, or foot to the side of his head.

3. What is the name of the drink that Elon Musk mentions as having a lot of caffeine, which is supposed to keep him up till the next day?

Answer: Nitro

4. According to Elon Musk, what is the current number of electrodes that are providing signals in Nolan's neuralink implant?

Answer: 400

5. What is the estimated rate at which the neuralink team hopes to achieve regulatory approvals for human participants, with a goal of 10 participants by the end of the year?

Answer: The rate of regulatory approvals is not explicitly stated, but Elon Musk mentions that it will depend on the regulatory appro

In [13]:
query = "Generate 5 hypothetical questions and answers from the document assuming that the scenario is applied to solve a retail business context"
resp = query_engine.query(query)
print(resp)

Here are 5 hypothetical questions and answers from the document, assuming the scenario is applied to solve a retail business context:

**Q1:** How can we ensure that our retail customers with mobility impairments can interact with our digital signage and kiosks in a seamless way?

**A1:** We can implement a similar BCI system to enable customers to control digital devices using their minds, allowing them to navigate and interact with our retail technology in a more accessible and autonomous way.

**Q2:** What are the key factors to consider when designing a retail experience for customers with mobility impairments?

**A2:** We should focus on creating a user-centered design that prioritizes accessibility and digital autonomy, taking into account the unique challenges and needs of customers with mobility impairments, such as the ability to interact with digital devices using their minds.

**Q3:** How can we measure the success of our retail technology in terms of accessibility and user 

In [14]:
query = "who is lexfridman"
resp = query_engine.query(query)
print(resp)

Lex Fridman is the host of the podcast that features a conversation with Elon Musk, DJ, Matthew, Bliss, and Nolan.


In [15]:
query = "explain in detail about the ai model that is used in neuralink"
resp = query_engine.query(query)
print(resp)

The AI model used in Neuralink is a machine learning model that decodes neural signals from the brain into a set of outputs that can be used to control a device, such as a cursor. This model is referred to as the "neuralink application" or B1 app.

The model is designed to run on an external device, such as a phone or computer, and receives data from the Neuralink implant via Bluetooth. The implant itself processes the neural signals and sends only the interesting data, such as the occurrence of a spike, to the external device.

The AI model uses this data to decode the intended cursor movement, allowing the user to control the cursor with their thoughts. The model is trained to recognize patterns in the neural signals and map them to specific actions, such as moving the cursor up, down, left, or right.

The model is described as a "very simple machine learning model," suggesting that it is a relatively basic implementation of a neural network. However, the model is still capable of ac

In [16]:
query = "explain top 10 imapcts of neuralink - cover both 6 positive impacts and 4 negative impacts"
resp = query_engine.query(query)
print(resp)

Here are the top 10 impacts of Neuralink, covering both positive and negative aspects:

**Positive Impacts:**

1. **Revolutionizing Brain-Computer Interfaces (BCIs)**: Neuralink's technology has the potential to revolutionize the way humans interact with computers, enabling people to control devices with their minds.
2. **Treatment of Paralysis and Neurological Disorders**: Neuralink's implantable brain-machine interface (BMI) could potentially treat paralysis, depression, anxiety, and other neurological disorders by allowing people to control devices with their thoughts.
3. **Enhanced Cognition and Memory**: Neuralink's technology could potentially enhance human cognition and memory by allowing people to upload and download information directly to their brains.
4. **Improved Communication**: Neuralink's BMI could potentially enable people to communicate more effectively with each other, especially for those with speech or hearing impairments.
5. **Advancements in Neuroscience**: Neura

In [17]:
# prompt: print the text in a paragraph format

for doc in documents:
  print(doc.text)


the following is a conversation with
Elon Musk DJ sa Matthew McDougall Bliss
Chapman and Nolan arbaugh about
neuralink and the future of
humanity Elon DJ Matthew and Bliss are
of course part of the amazing neuralink
team and Nolan is the first human to
have a neuralink device implanted in his
brain I speak with each of them
individually so use time stamps to jump
around or as I recommend go hardcore and
listen to the whole thing this is the
longest podcast I've ever done it's a
fascinating super technical and wide-
ranging conversation and I loved every
minute of it and now dear friends here's
Elon Musk his fifth time on this The Lex
fredman
podcast drinking coffee or water
water I'm so over caffeinated right now
do you want some caffeine I mean sure
there's a there a Nitro
drink this supposed to keep you up till
like you know tomorrow afternoon
basically yeah I don't so what does
Nitro it's just got a lot of caffeine or
something don't ask questions it's
called Nitro do you need to kn