<a href="https://colab.research.google.com/github/vokativ/rag_demo_qb/blob/main/rag_demo/colab_notebooks/rag_demo_notebook_hugging_face_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install llama_index
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-huggingface-api
!pip install chromadb
!pip install llama-index-vector-stores-chroma

# Table of Contents
1. [Introduction](#introduction)
2. [Load Environment Variables](#load-environment-variables)
3. [Set up LLM and Embedding Model](#set-up-llm-and-embedding-model)
4. [Create In-memory Vector Store](#create-in-memory-vector-store)
5. [Create and Load On-disk Vector Store](#create-and-load-on-disk-vector-store)
6. [Update and Delete Data](#update-and-delete-data)


## Introduction

In this basic RAG (Retrieval-Augmented Generation) example, we take a Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it.

## Load Environment Variables

In this section, we will load the necessary environment variables required for accessing the Hugging Face LLMs and Embedding models. <br>

Get the Hugging Face Token: <br>
How to get your Hugging Face token: https://huggingface.co/docs/hub/en/security-tokens

In [1]:
# Access the Hugging Face environment variables
#HF_TOKEN = ""

from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')

## Set up LLM and Embedding Model

In this section, we will set up the Hugging Face LLM and embedding model using the loaded environment variables.

In [None]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI


Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm_model = HuggingFaceInferenceAPI(model_name="HuggingFaceH4/zephyr-7b-beta", token=HF_TOKEN)
#Settings.llm_model = HuggingFaceInferenceAPI(model_name="smirki/UIGEN-T1-Qwen-7b", token=HF_TOKEN)

In [None]:
#Settings.llm_model.complete("To infinity, and")
#Settings.llm_model.complete("An offer you can not refuse")
Settings.llm_model.complete("To be, or not to be")

## Create In-memory Vector Store

In this section, we will download the document, load the documents, create a Chroma vector store, and store the embedded documents.

In [5]:
# Import Chroma and other required libraries
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from IPython.display import Markdown, display

In [None]:
!mkdir -p 'data/paul_graham/'
#!mkdir -p 'data/shakespeare/'
#!mkdir -p 'data/apple_financial/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
#!wget 'https://www.gutenberg.org/cache/epub/100/pg100.txt' -O 'data/shakespeare/pg100.txt'
#!wget 'https://www.apple.com/newsroom/pdfs/fy2025-q1/FY25_Q1_Consolidated_Financial_Statements.pdf' -O 'data/apple_financial/FY25_Q1_Consolidated_Financial_Statements.pdf'

In [10]:
# Create Chroma client and a new collection
chroma_client = chromadb.EphemeralClient()
#chroma_collection = chroma_client.delete_collection("quickstart")
chroma_collection = chroma_client.create_collection("quickstart")

In [11]:
# Load documents
documents = SimpleDirectoryReader("/content/data/paul_graham/").load_data()
#documents = SimpleDirectoryReader("/content/data/shakespeare/").load_data()
#documents = SimpleDirectoryReader("/content/data/apple_financial/").load_data()

In [None]:
documents

In [None]:
# Set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=Settings.embed_model
)

# Query Data
query_engine = index.as_query_engine(llm=Settings.llm_model)
response = query_engine.query("What did the author do growing up?")
#response = query_engine.query("Sonnet 1")# times out
#response = query_engine.query("What was the growth of sales?") #this will be very wrong
display(Markdown(f"<b>{response}</b>"))

## Create and Load On-disk Vector Store

Extending the previous example, if you want to save to disk, simply initialize the Chroma client and pass the directory where you want the data to be saved to.

`Caution`: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stomp each other's work. As a best practice, only have one client per path running at any given time.

In [None]:
# Save to disk

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, embed_model=Settings.embed_model
)

# Load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=Settings.embed_model,
)

# Query Data from the persisted index
query_engine = index.as_query_engine(llm=Settings.llm_model)
response = query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))

## Update and Delete Data

While building toward a real application, you want to go beyond adding data, and also update and delete data.

Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined hash like `filename_paragraphNumber`, etc.

Here is a basic example showing how to do various operations:

In [None]:
doc_to_update = chroma_collection.get(limit=1)
doc_to_update["metadatas"][0] = {
    **doc_to_update["metadatas"][0],
    **{"author": "Paul Graham"},
}
chroma_collection.update(
    ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]]
)
updated_doc = chroma_collection.get(limit=1)
print(updated_doc["metadatas"][0])

# delete the last document
print("count before", chroma_collection.count())
chroma_collection.delete(ids=[doc_to_update["ids"][0]])
print("count after", chroma_collection.count())