<a href="https://colab.research.google.com/github/vokativ/rag_demo_qb/blob/main/rag_demo/colab_notebooks/rag_demo_notebook_hugging_face_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
!pip install llama_index
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-huggingface-api
!pip install chromadb
!pip install llama-index-vector-stores-chroma



# Table of Contents
1. [Introduction](#introduction)
2. [Load Environment Variables](#load-environment-variables)
3. [Set up LLM and Embedding Model](#set-up-llm-and-embedding-model)
4. [Create In-memory Vector Store](#create-in-memory-vector-store)
5. [Create and Load On-disk Vector Store](#create-and-load-on-disk-vector-store)
6. [Update and Delete Data](#update-and-delete-data)


## Introduction

In this basic RAG (Retrieval-Augmented Generation) example, we take a Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it.

## Load Environment Variables

In this section, we will load the necessary environment variables required for accessing the Hugging Face LLMs and Embedding models. <br>

Get the Hugging Face Token: <br>
How to get your Hugging Face token: https://huggingface.co/docs/hub/en/security-tokens

In [1]:
# Access the Hugging Face environment variables
#HF_TOKEN = ""

from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')

## Set up LLM and Embedding Model

In this section, we will set up the Hugging Face LLM and embedding model using the loaded environment variables.

In [2]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI


Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm_model = HuggingFaceInferenceAPI(model_name="HuggingFaceH4/zephyr-7b-beta", token=HF_TOKEN)
#Settings.llm_model = HuggingFaceInferenceAPI(model_name="smirki/UIGEN-T1-Qwen-7b", token=HF_TOKEN)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [3]:
Settings.llm_model.complete("To infinity, and")

CompletionResponse(text=' beyond!\n\nThe Toy Story franchise has been a beloved part of pop culture for over two decades now, and it\'s not hard to see why. The films are a perfect blend of humor, heart, and nostalgia, and they\'ve managed to stay fresh and relevant with each new installment.\n\nBut what makes Toy Story so special? Here are just a few reasons why this franchise has captured the hearts of audiences around the world:\n\n1. The characters are relatable and memorable.\n\nFrom Woody and Buzz Lightyear to Mr. Potato Head and Hamm, the Toy Story cast is filled with iconic characters that have become a part of popular culture. But what really sets them apart is how relatable they are. Whether it\'s Woody\'s struggle to hold onto his place as Andy\'s favorite toy, or Buzz\'s unwavering belief in his own superiority, these characters feel like real people (or, in this case, toys).\n\n2. The humor is clever and timeless.\n\nFrom the classic "To infinity, and beyond!" catchphrase 

## Create In-memory Vector Store

In this section, we will download the document, load the documents, create a Chroma vector store, and store the embedded documents.

In [4]:
# Import Chroma and other required libraries
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from IPython.display import Markdown, display

In [5]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2025-02-19 02:07:08--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2025-02-19 02:07:08 (30.9 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [6]:
# Create Chroma client and a new collection
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("quickstart")

In [7]:
# Load documents
documents = SimpleDirectoryReader("/content/data/paul_graham/").load_data()

In [8]:
# Set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=Settings.embed_model
)

# Query Data
query_engine = index.as_query_engine(llm=Settings.llm_model)
response = query_engine.query("What did the author do growing up?")
#response = query_engine.query("How many essays has the author written?")
display(Markdown(f"<b>{response}</b>"))

<b>

The author grew up working on two main things outside of school: writing short stories and programming on an IBM 1401 computer. He also used a friend's Heathkit-built microcomputer to program simple games, a program to predict rocket flight, and a word processor for his father's book writing. However, he didn't plan to study programming in college and instead planned to study philosophy, which he found more powerful than other fields due to its supposed ultimate truths.</b>

## Create and Load On-disk Vector Store

Extending the previous example, if you want to save to disk, simply initialize the Chroma client and pass the directory where you want the data to be saved to.

`Caution`: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stomp each other's work. As a best practice, only have one client per path running at any given time.

In [None]:
# Save to disk

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=Settings.embed_model
)

# Load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=Settings.embed_model,
)

# Query Data from the persisted index
query_engine = index.as_query_engine(llm=Settings.llm_model)
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))

<b>

The author grew up working on two main things outside of school: writing short stories and programming on an IBM 1401 computer. He also used a friend's Heathkit-built microcomputer to program simple games, a program to predict rocket flight, and a word processor for his father's book writing. However, he didn't plan to study programming in college and instead planned to study philosophy, which he found more powerful than other fields due to its supposed ultimate truths.</b>

## Update and Delete Data

While building toward a real application, you want to go beyond adding data, and also update and delete data.

Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined hash like `filename_paragraphNumber`, etc.

Here is a basic example showing how to do various operations:

In [None]:
doc_to_update = chroma_collection.get(limit=1)
doc_to_update["metadatas"][0] = {
    **doc_to_update["metadatas"][0],
    **{"author": "Paul Graham"},
}
chroma_collection.update(
    ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]]
)
updated_doc = chroma_collection.get(limit=1)
print(updated_doc["metadatas"][0])

# delete the last document
print("count before", chroma_collection.count())
chroma_collection.delete(ids=[doc_to_update["ids"][0]])
print("count after", chroma_collection.count())

{'_node_content': '{"id_": "2347aeb0-8b3b-4dba-85c2-55fc247cfeb3", "embedding": null, "metadata": {"file_path": "/content/data/paul_graham/paul_graham_essay.txt", "file_name": "paul_graham_essay.txt", "file_type": "text/plain", "file_size": 75042, "creation_date": "2025-02-18", "last_modified_date": "2025-02-18"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "bb1afc81-4a25-4222-ae2c-d2f4b2c3cf0d", "node_type": "4", "metadata": {"file_path": "/content/data/paul_graham/paul_graham_essay.txt", "file_name": "paul_graham_essay.txt", "file_type": "text/plain", "file_size": 75042, "creation_date": "2025-02-18", "last_modified_date": "2025-02-18"}, "hash": "0c3c3f46cac874b495d944dfc4b920f6b68817dbbb1699ecc955d1fafb2bf87b", "class_name": "Rela