<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/main/notebooks/12-Improve_Query.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[link text](https:// [link text](https://))# Install Packages and Setup Variables

In [None]:
!pip install -q llama-index==0.10.5 llama-index-vector-stores-chroma==0.1.7 openai==1.12.0 llama-index-finetuning llama-index-embeddings-huggingface llama-index-readers-web tiktoken==0.6.0 chromadb==0.4.22 pandas==2.2.0 html2text sentence_transformers pydantic kaleido==0.2.1

In [None]:
# Allows running asyncio in environments with an existing event loop, like Jupyter notebooks.

import nest_asyncio

nest_asyncio.apply()

In [None]:
import os

# Set the "OPENAI_API_KEY" in the Python environment. Will be used by OpenAI client later.
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

# Load a Model

In [None]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(temperature=0.9, model="gpt-3.5-turbo", max_tokens=512)

# Load the Dataset (CSV)

## Download

The dataset includes several articles from the TowardsAI blog, which provide an in-depth explanation of the LLaMA2 model. Read the dataset as a long string.

In [None]:
!curl -o ./mini-llama-articles.csv https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/main/data/mini-llama-articles.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  169k  100  169k    0     0   362k      0 --:--:-- --:--:-- --:--:--  362k


In [None]:
import csv

rows = []

# Load the file as a JSON
with open("./mini-llama-articles.csv", mode="r", encoding="utf-8") as file:
  csv_reader = csv.reader(file)

  for idx, row in enumerate( csv_reader ):
    if idx == 0: continue; # Skip header row
    rows.append( row )

# The number of characters in the dataset.
len( rows )

14

In [None]:
from llama_index.core import Document

# Convert the chunks to Document objects so the LlamaIndex framework can process them.
documents = [Document(text=row[1], metadata={"title": row[0], "url": row[2], "source_name": row[3]}) for row in rows]

# Create a VectoreStore

In [None]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
# create client and a new collection
# chromadb.EphemeralClient saves data in-memory.
chroma_client = chromadb.PersistentClient(path="./mini-llama-articles")
chroma_collection = chroma_client.create_collection("mini-llama-articles")
# Define a storage context object using the created vector database.
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)


# Convert to Document obj

# Transforming

In [None]:
from llama_index.core.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(
    separator=" ", chunk_size=512, chunk_overlap=128
)

In [None]:
from llama_index.core.extractors import (
    SummaryExtractor,
    QuestionsAnsweredExtractor,
    KeywordExtractor,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
    transformations=[
        text_splitter,
        QuestionsAnsweredExtractor(questions=3, llm=llm),
        SummaryExtractor(summaries=["prev", "self"], llm=llm),
        KeywordExtractor(keywords=10, llm=llm),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store
)

nodes = pipeline.run(documents=documents, show_progress=True);

Parsing nodes:   0%|          | 0/14 [00:00<?, ?it/s]

100%|██████████| 108/108 [00:48<00:00,  2.22it/s]
100%|██████████| 108/108 [00:55<00:00,  1.94it/s]
100%|██████████| 108/108 [00:23<00:00,  4.56it/s]


Generating embeddings:   0%|          | 0/108 [00:00<?, ?it/s]

In [None]:
len( nodes )

108

In [None]:
# Create your index
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_vector_store(vector_store)

# Query Dataset

## Default

In [None]:
gpt3 = OpenAI(temperature=0, model="gpt-3.5-turbo")


In [None]:
# Define a query engine that is responsible for retrieving related pieces of text,
# and using a LLM to formulate the final answer.
query_engine = vector_index.as_query_engine(llm=gpt3)

res = query_engine.query("Provide highest parameter size for CodeLlama  model and WizardCoder model ?")
res.response

'CodeLlama model size is not provided in the given context. The WizardCoder model size is 34B.'

In [None]:
for src in res.source_nodes:
  print("Node ID\t", src.node_id)
  print("Title\t", src.metadata['title'])
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 c2d2ae9d-49b8-4493-bf08-dc4a22ed274e
Title	 WizardCoder: Why It's the Best Coding Model Out There
Text	 an LLM, resulting in a new model called WizardCoder. The fine-tuning process involves training the LLM on the instruction data to improve its ability to generate coherent and fluent text in response to various inputs.  Prompt Format For WizardCoder, the Prompt should be as follows:  Best Use Cases WizardCoder can be used for a variety of code-related tasks, including code generation, code completion, and code summarization. Here are some examples of input prompts that can be used with the model: Code generation: Given a description of a programming task, generate the corresponding code. Example input: "Write a Python function that takes a list of integers as input and returns the sum of all even numbers in the list."Code completion: Given an incomplete code snippet, complete the code. Example input: "def multiply(a, b): \n return a * b _"Code summarization: Given a long code

# Multi-Step Query Engine

## GPT-4

In [None]:
from llama_index.core import ServiceContext

gpt4 = OpenAI(temperature=0, model="gpt-4o")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)

  service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)


In [None]:
from llama_index.core.indices.query.query_transform.base import StepDecomposeQueryTransform

step_decompose_transform_gpt4 = StepDecomposeQueryTransform(llm=gpt4, verbose=True)

In [None]:
from llama_index.core.query_engine.multistep_query_engine import MultiStepQueryEngine

query_engine_gpt4 = vector_index.as_query_engine(service_context=service_context_gpt4)
query_engine_gpt4 = MultiStepQueryEngine(
    query_engine=query_engine_gpt4,
    query_transform=step_decompose_transform_gpt4,
    index_summary="Used to answer questions about the LLaMA2 Model",
)

In [None]:
response_gpt4 = query_engine_gpt4.query("Provide the highest parameter sizes for CodeLlama model and WizardCoder model?")

[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What is the highest parameter size for the LLaMA2 Model?
[0m[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What is the highest parameter size for the CodeLlama model?
[0m[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What is the highest parameter size for the WizardCoder model?
[0m

In [None]:
response_gpt4.response

'The highest parameter size for the CodeLlama model is 34B, and the highest parameter size for the WizardCoder model is also 34B.'

In [None]:
for src in response_gpt4.source_nodes:
  print("Node ID\t", src.node_id)
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 8cf4998a-67dc-4417-b71b-c646ce286ddf
Text	 
Question: What is the highest parameter size for the LLaMA2 Model?
Answer: The highest parameter size for the LLaMA2 Model is 7 billion parameters.
Score	 None
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Node ID	 4ff1d598-d525-45ec-981f-44b5cd6c62b5
Text	 
Question: What is the highest parameter size for the CodeLlama model?
Answer: The highest parameter size for the Code Llama model is 34B.
Score	 None
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Node ID	 c5bb381c-aea8-4d0e-92d7-8beff8c7c94b
Text	 
Question: What is the highest parameter size for the WizardCoder model?
Answer: The highest parameter size for the WizardCoder model is 34B.
Score	 None
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Node ID	 6403bb55-08b5-4c4d-9397-ddebc637e186
Text	 a commodity GPU and then fine-tune them. There are two parts to this- Quantisation and Parameter Efficient Tuning. The real magic of this is that a laptop with a sufficient recent GPU (having Tensor Core

# GPT-3

In [None]:
from llama_index.core import ServiceContext
from llama_index.core.indices.query.query_transform.base import StepDecomposeQueryTransform
from llama_index.core.query_engine.multistep_query_engine import MultiStepQueryEngine

gpt3 = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt3 = ServiceContext.from_defaults(llm=gpt3)

step_decompose_transform_gpt3 = StepDecomposeQueryTransform(llm=gpt3, verbose=True)

query_engine_gpt3 = vector_index.as_query_engine(service_context=service_context_gpt3)
query_engine_gpt3 = MultiStepQueryEngine(
    query_engine=query_engine_gpt3,
    query_transform=step_decompose_transform_gpt3,
    index_summary="Used to answer questions about the LLaMA2 Model",
)

  service_context_gpt3 = ServiceContext.from_defaults(llm=gpt3)


In [None]:
response_gpt3 = query_engine_gpt3.query("Provide the highest parameter sizes for CodeLlama model and WizardCoder model?")

[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What are the specific parameter sizes for the CodeLlama model and WizardCoder model in the LLaMA2 Model?
[0m[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What are the parameter sizes for the WizardCoder model in the LLaMA2 Model?
[0m[1;3;33m> Current query: Provide the highest parameter sizes for CodeLlama model and WizardCoder model?
[0m[1;3;38;5;200m> New query: What are the specific parameter sizes for the WizardCoder model in the LLaMA2 Model?
[0m

In [None]:
response_gpt3.response

'The highest parameter size for the CodeLlama model is 34 billion (34B), and the highest parameter size for the WizardCoder model is also 34 billion (34B).'

# Subquestion Query Engine


In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="LlamaIndex",
            description="Used to answer questions about the LLaMA2 Model",
        ),
    ),
]

sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

In [None]:
response = sub_question_engine.query(
    "Provide the highest parameter sizes for CodeLlama model and WizardCoder model?"
)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[LlamaIndex] Q: What are the highest parameter sizes for the CodeLlama model?
[0m[1;3;38;2;90;149;237m[LlamaIndex] Q: What are the highest parameter sizes for the WizardCoder model?
[0m[1;3;38;2;90;149;237m[LlamaIndex] A: The highest parameter sizes for the WizardCoder model are 34B and 15B.
[0m[1;3;38;2;237;90;200m[LlamaIndex] A: The highest parameter sizes for the Code Llama model are 34B.
[0m

In [None]:
response.response

'The highest parameter size for the CodeLlama model is 34B, and for the WizardCoder model, it is 34B and 15B.'

# HyDE Transform

In [None]:
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine.transform_query_engine import TransformQueryEngine
query_engine = vector_index.as_query_engine()
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

In [None]:
response = hyde_query_engine.query("How many parameters does CodeLLaMA model have?")

In [None]:
response.response

'Code Llama model has three different variants characterized by their parameter sizes of 7B, 13B, and 34B.'

In [None]:
for src in response.source_nodes:
  print("Node ID\t", src.node_id)
  print("Text\t", src.text)
  print("Score\t", src.score)
  print("-_"*20)

Node ID	 360a12a0-bdff-46f2-b46b-72d01b6983d7
Text	 Inside Code Llama The release of Code Llama does not include a single model but three different variants, characterized by their parameter sizes of 7B, 13B, and 34B. Each of these models has been trained on an extensive pool of 500B tokens encompassing code and code-related information. Notably, the 7B and 13B base and instruct models have been endowed with fill-in-the-middle (FIM) competence, empowering them to seamlessly insert code into existing code structures. This attribute equips them to handle tasks like code completion right from the outset.The trio of models caters to distinct requisites concerning serving and latency. For instance, the 7B model boasts the ability to operate on a single GPU. While the 34B model stands out for yielding optimal outcomes and elevating coding assistance, the smaller 7B and 13B versions excel in speed, making them fitting for low-latency tasks such as real-time code completion. Meta AI's innovati

In [None]:
query_bundle = hyde("How many parameters does CodeLLaMA model have?")

In [None]:
hyde_doc = query_bundle.embedding_strs[0]

In [None]:
hyde_doc

'The CodeLLaMA model has a total of 12 parameters. These parameters include the learning rate, batch size, number of layers, number of neurons in each layer, activation functions, dropout rate, optimizer, loss function, and metrics. Each parameter plays a crucial role in determining the performance and accuracy of the CodeLLaMA model in predicting code smells and maintaining code quality. By fine-tuning these parameters, developers can optimize the model to achieve the best results for their specific codebase and project requirements."'