# Llama 3.1 Rag Agent with LlamaIndex

<a target="_blank" href="https://colab.research.google.com/github/ytang07/ai_agents_cookbooks/blob/main/llamaindex/llama31_8b_rag_agent.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook will walk you through building a LlamaIndex ReactAgent using Llama 3.1 70b. We will be using [OctoAI](https://octo.ai) as our embeddings and llm provider.

## Install Dependencies

In [None]:
# ! pip install -qU llama-index llama-index-llms-openai llama-index-readers-file octoai llama-index-llms-octoai llama-index-embeddings-octoai llama-index-embeddings-openai llama-index-llms-openai-like

# ! pip freeze | grep llama-index-core
# ! pip freeze | grep embeddings-openai

## Setup API Keys
To run the rest of the notebook you will need access to an OctoAI API key. You can sign up for an account [here](https://octoai.cloud/). If you need further guidance you can check OctoAI's [documentation page](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).

In [3]:
from os import environ
# from getpass import getpass
# environ["OCTOAI_API_KEY"] = getpass("Input your OCTOAI API key: ")
from dotenv import load_dotenv

load_dotenv()

OCTOAI_API_KEY = environ["OCTOAI_API_KEY"]

## Import libraries and setup LlamaIndex

In [4]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an llm object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=40000,
    is_function_calling_model=True,
    is_chat_model=True,
)


## Load Documents

In [5]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/luma"
    )
    luma_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/profiles"
    )
    profile_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

This is the point we create our vector indexes, by calculating the embedding vectors for each of the chunks. You only need to run this once.

In [11]:
if not index_loaded:
    # load data
    luma_docs = SimpleDirectoryReader(
        input_files=["./luma.json"]
    ).load_data()
    profile_docs = SimpleDirectoryReader(
        input_files=["./profiles/vikash.pdf"]
    ).load_data()
    
    # build index
    luma_index = VectorStoreIndex.from_documents(luma_docs, show_progress=True)
    profile_index = VectorStoreIndex.from_documents(profile_docs, show_progress=True)

    # persist index
    luma_index.storage_context.persist(persist_dir="./storage/luma")
    profile_index.storage_context.persist(persist_dir="./storage/profiles")

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/5 [00:00<?, ?it/s]

Parsing nodes:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Now create the query engines.

In [12]:
luma_engine = luma_index.as_query_engine(similarity_top_k=3, llm=llm)

profile_engine = profile_index.as_query_engine(similarity_top_k=3, llm=llm)

We can now define the query engines as tools that will be used by the agent.

As there is a query engine per document we need to also define one tool for each of them.

In [13]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=luma_engine,
        metadata=ToolMetadata(
            name="luma_10k",
            description=(
                "Provides information about Luma events. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=profile_engine,
        metadata=ToolMetadata(
            name="profile_10k",
            description=(
                "Provides information about attendee profiles "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

## Creating the Agent
Now we have all the elements to create a LlamaIndex ReactAgent

In [14]:
agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    max_turns=10,
)

Now we can interact with the agent and ask a question.

In [None]:
response = agent.chat("Which luma events are upcoming? and which event matches vikash's profile?")
print(str(response))

> Running step 73ba50fa-7396-4624-a61f-b4376f2fc533. Step input: Which luma events are upcoming? and which event matches vikash's profile?
[1;3;38;5;200mThought: The current language of the user is: English. I need to use the luma_10k tool to find out which Luma events are upcoming and the profile_10k tool to find out which event matches Vikash's profile.
Action: luma_10k
Action Input: {'input': 'upcoming Luma events'}
[0m