<a href="https://colab.research.google.com/github/vkrisvasan/llamaKV/blob/main/llamaindexYouTubekv_naiveRAGwithnoTracking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#The code loads a Youtube transcript about Finetuning Llama,
#splits it into smaller chunks, and uses it to create a question-answering system.
#It leverages a large language model "llama-3.1-8b-instant" and
#an embedding model "sentence-transformers/all-MiniLM-L6-v2" to understand and
#uses VectorStoreIndex from llama_index, which saves the vector index and its associated data to a local file directory and
#LlamaIndex acts as a bridge between the Youtube transcript by
#ingesting, indexing and querying the data by using the capabilities of LLM and
#respond to user queries about the transcript.

# Install required packages
!pip install youtube-transcript-api llama-index-readers-youtube-transcript llama-index llama-index-llms-groq groq llama-index-embeddings-huggingface -q

In [None]:
# Import necessary modules from llama_index
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    load_index_from_storage,
    Settings
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.groq import Groq

In [None]:
# Import os and getpass for handling credentials
import os
import getpass
# Prompt for credentials if not found in environment variables
credential_names = ["GROQ_API_KEY"]
for credential in credential_names:
  if credential not in os.environ:
    os.environ[credential]=getpass.getpass("Provide your..." + credential)

Provide your...GROQ_API_KEY··········


In [None]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [None]:
# Import YoutubeTranscriptReader and load data from YouTube links
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader

loader = YoutubeTranscriptReader()
documents = loader.load_data(
    #ytlinks=["https://www.youtube.com/watch?v=Kbk9BiPhm7o"] #Elon Musk Nolan BCI NeuraLink Future of Humanity
    ytlinks=["https://www.youtube.com/watch?v=pK8u4QfdLx0"]  #
)

In [None]:
# Initialize SentenceSplitter for text splitting
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
# Split documents into nodes
nodes = text_splitter.get_nodes_from_documents(documents, show_progress=True)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
# Print information about loaded documents and nodes
print(f"Loaded {len(documents)} documents")
print(f"Split into {len(nodes)} nodes")
print(f"nodes [0] {nodes[0].metadata} ")

Loaded 1 documents
Split into 8 nodes
nodes [0] {'video_id': 'pK8u4QfdLx0'} 


In [None]:
# Configure LLM and embedding model settings

Settings.llm = Groq(model="llama-3.1-8b-instant",api_key=os.environ["GROQ_API_KEY"])

Settings.embed_model = HuggingFaceEmbedding(
    #model_name="BAAI/bge-small-en-v1.5"
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

In [None]:
# Create VectorStoreIndex from documents and persist to storage
vector_index = VectorStoreIndex.from_documents(documents, show_progress=True, node_parser=nodes)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

In [None]:
# Load index from storage and create query engine
vector_index.storage_context.persist(persist_dir="./storage_mini")
storage_context = StorageContext.from_defaults(persist_dir="./storage_mini")

index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

In [None]:
# Perform queries and print responses
query = "summarise the document"
resp = query_engine.query(query)
print(resp)

The document discusses fine-tuning language models (LLMs) to create engaging content and improve their performance on specific tasks. It mentions the importance of tailoring content generation to a particular audience or domain, and how fine-tuning can make a model's performance better. The author also shares their experience with implementing fine-tuning on the Llama free model using Google Colab, a free software that allows users to write and execute code in cells. They provide a step-by-step guide on how to prepare and load a range of quantized language models, including the new 15 trillion parameter model, and how to format data sets to train the model. The author emphasizes that fine-tuning is not limited to machine learning experts and encourages users to follow along and experiment with the process.


In [None]:
query = "Generate 5 difficult quiz questions with answer from the document "
resp = query_engine.query(query)
print(resp)

Here are 5 difficult quiz questions with answers based on the provided context:

1. What is the name of the Google software being used in the video, which splits the code into cells and is similar to a Jupyter Notebook?

Answer: Google Collab.

2. What is the name of the framework being used for fine-tuning the model, which allows for efficient updating of a fraction of the parameters and enhances training speed?

Answer: Laura.

3. What is the name of the dataset being used for data preparation, which has 50,000 rows and is loaded in VS Code?

Answer: Alpaka dataset from YMA.

4. What is the name of the model being used for fine-tuning, which is trained on 15 trillion tokens and is optimized for efficiency with 4-bit quantization?

Answer: Llama 3 8B.

5. What is the name of the feature in Google Collab that allows users to access a personalized AI strategy and future-proof themselves and their business?

Answer: Personalized AI strategy (available to users who join the community duri

In [None]:
query = "Generate 5 hypothetical questions and answers from the document assuming that the scenario is applied to solve a retail business context"
resp = query_engine.query(query)
print(resp)

Here are 5 hypothetical questions and answers from the document assuming that the scenario is applied to solve a retail business context:

Q: How can we use fine-tuning to improve the performance of our language model in generating product descriptions for our e-commerce website?
A: By fine-tuning our language model on a dataset of product descriptions from our specific retail domain, we can create a model that is tailored to our brand's tone and style, resulting in more engaging and effective product descriptions.

Q: What are the benefits of using a 15-trillion token language model in a retail context, such as product recommendation or customer service chatbots?
A: The 15-trillion token language model can provide more accurate and relevant product recommendations, as well as more effective customer service responses, due to its larger training dataset and optimized efficiency.

Q: How can we use a UI-based system, such as GPT-4 or another open-source model, to integrate our fine-tune

In [None]:
query = "who are speakers"
resp = query_engine.query(query)
print(resp)

The speaker in this video appears to be a machine learning expert or enthusiast who is explaining how to fine-tune a language model, specifically Llama 3, for a specific use case. They seem to be knowledgeable about the topic but also acknowledge their own limitations and encourage viewers to follow along and learn from the demonstration.


In [None]:
query = "explain in detail about core of the finetuning"
resp = query_engine.query(query)
print(resp)

At the core of fine-tuning lies the concept of adapting a pre-trained language model to a specific task or domain by adjusting a small portion of its parameters. This process leverages the power of the pre-trained model, which has been trained on a massive dataset, to achieve improved performance on a smaller, more focused dataset.

The pre-trained model's weights are updated incrementally using optimization algorithms, such as gradient descent, based on the new dataset. This incremental update process allows the model to learn from the new data and adjust its parameters to better fit the specific task or domain.

The key to fine-tuning lies in the fact that only a small portion of the model's parameters are adjusted, rather than retraining the entire model from scratch. This approach is more data-efficient and cost-effective, as it leverages the pre-trained model's knowledge and avoids the need for extensive retraining.

The core of fine-tuning can be broken down into several key comp

In [None]:
query = "explain top 10 impacts of finetuning - cover both 6 positive impacts and 4 negative impacts"
resp = query_engine.query(query)
print(resp)

Fine-tuning can have a significant impact on the performance and usability of a language model. Here are the top 10 impacts of fine-tuning, covering both positive and negative effects:

**Positive Impacts:**

1. **Improved Accuracy**: Fine-tuning allows a language model to adapt to a specific task or domain, leading to improved accuracy and relevance of its outputs.
2. **Enhanced Performance**: By adjusting a small portion of the model's parameters, fine-tuning can significantly enhance the model's performance on a specific task, often reaching 10 times better results.
3. **Cost-Effectiveness**: Fine-tuning leverages the power of pre-trained language models, which can cost tens of millions of dollars to train, at a fraction of the cost, often just a few cents or dollars.
4. **Data Efficiency**: Fine-tuning can achieve excellent results even with smaller data sets, making it an attractive option for those with limited data.
5. **Customization**: Fine-tuning allows for customization of t

In [None]:
query = "explain llama 3 8b instrcut model"
resp = query_engine.query(query)
print(resp)

Llama 3 8B is a type of pre-trained language model that has been optimized for efficiency with a specific quantization technique, allowing it to be more compact and faster to run. This model is based on a transformer architecture, which is a popular choice for natural language processing tasks.

The "8B" in Llama 3 8B refers to the number of parameters in the model, which is approximately 8 billion. This is a large number of parameters, indicating that the model has a high degree of complexity and is capable of learning a wide range of patterns and relationships in language.

The instruction model aspect of Llama 3 8B suggests that it is designed to perform a specific set of tasks, such as answering questions, generating text, or completing tasks based on a set of instructions. This is in contrast to other types of language models that may be more general-purpose or focused on specific domains.

Overall, Llama 3 8B is a powerful and efficient language model that is well-suited for a wi

In [None]:
# prompt: print the text in a paragraph format

for doc in documents:
  print(doc.text)


my name is David Andre and in this video
I'll teach you how to fine tune llama
free so that it performs 10 times better
for your specific use case let's start
with what even is fine tuning and I made
this explanation in plain English so
that anybody can understand fine-tuning
is adapting a pre-trained llm like gbd4
or in this case Lama 3 to a specific
task or domain it involves adjusting a
small portion of the parameters on a
more focused data set so you know when a
new model releases what everybody needs
to know is how many parameters it has we
have llama 3 8B and always that number
like 8B or 70b that's the number of
parameters so we're adjusting just a
small number of them to make it more
focused on a specific thing fine tuning
customizes the outputs to be more
relevant and accurate for your use case
here's the power of fine tuning cost
Effectiveness it leverages the power of
pre-trained llms which cost tens of
millions of dollar if not hundreds of
millions to train and we can just 