# Llama 3.1 Rag Agent with LlamaIndex

<a target="_blank" href="https://colab.research.google.com/github/ytang07/ai_agents_cookbooks/blob/main/llamaindex/llama31_8b_rag_agent.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook will walk you through building a LlamaIndex ReactAgent using Llama 3.1 70b. We will be using [OctoAI](https://octo.ai) as our embeddings and llm provider.

## Install Dependencies

In [None]:
# ! pip install -qU llama-index llama-index-llms-openai llama-index-readers-file octoai llama-index-llms-octoai llama-index-embeddings-octoai llama-index-embeddings-openai llama-index-llms-openai-like

# ! pip freeze | grep llama-index-core
# ! pip freeze | grep embeddings-openai

## Setup API Keys
To run the rest of the notebook you will need access to an OctoAI API key. You can sign up for an account [here](https://octoai.cloud/). If you need further guidance you can check OctoAI's [documentation page](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).

In [3]:
from os import environ
# from getpass import getpass
# environ["OCTOAI_API_KEY"] = getpass("Input your OCTOAI API key: ")
from dotenv import load_dotenv

load_dotenv()

OCTOAI_API_KEY = environ["OCTOAI_API_KEY"]

## Import libraries and setup LlamaIndex

In [12]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an llm object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=100000,
    is_function_calling_model=True,
    is_chat_model=True,
)


## Load Documents

In [33]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/events"
    )
    luma_index = load_index_from_storage(storage_context)

    storage_context1 = StorageContext.from_defaults(
        persist_dir="./storage/profiles"
    )
    profile_index = load_index_from_storage(storage_context1)

    index_loaded = False
except:
    index_loaded = False

This is the point we create our vector indexes, by calculating the embedding vectors for each of the chunks. You only need to run this once.

In [35]:
if not index_loaded:
    # load data
    event_docs = SimpleDirectoryReader(
        input_files=["./luma.json", "./meetup_events.json"]
    ).load_data()
    profile_docs = SimpleDirectoryReader(
        input_files=["./profiles/kris.pdf"]
    ).load_data()
    
    # build index
    event_index = VectorStoreIndex.from_documents(event_docs, show_progress=True)
    profile_index = VectorStoreIndex.from_documents(profile_docs, show_progress=True)

    # persist index
    event_index.storage_context.persist(persist_dir="./storage/events")
    profile_index.storage_context.persist(persist_dir="./storage/profiles")

Ignoring wrong pointing object 46 0 (offset 0)
Ignoring wrong pointing object 48 0 (offset 0)
Ignoring wrong pointing object 50 0 (offset 0)
Ignoring wrong pointing object 52 0 (offset 0)


Parsing nodes:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/16 [00:00<?, ?it/s]

Parsing nodes:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

Now create the query engines.

In [36]:
event_engine = event_index.as_query_engine(similarity_top_k=3, llm=llm)

profile_engine = profile_index.as_query_engine(similarity_top_k=3, llm=llm)

We can now define the query engines as tools that will be used by the agent.

As there is a query engine per document we need to also define one tool for each of them.

In [37]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=event_engine,
        metadata=ToolMetadata(
            name="event_10k",
            description=(
                "Provides information about events. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=profile_engine,
        metadata=ToolMetadata(
            name="profile_10k",
            description=(
                "Provides information about attendee profiles "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

## Creating the Agent
Now we have all the elements to create a LlamaIndex ReactAgent

In [41]:
agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    max_turns=20,
)

Now we can interact with the agent and ask a question.

In [42]:
response = agent.chat("""
Using the profile data available through the query engine, focus on the user's interests, groups, and what they're looking for from events. Then, follow these steps:
Use the profile_10k query engine to retrieve the user's profile data. This data contains the user's interests, groups they belong to, and what they're looking for in events. Never retrieve profile information as events.
Access the event query engine to retrieve event data. Each event has details including start and end times, summary, description, location, and URL.
Compare the user's interests, group affiliations, and preferences with the event summaries and descriptions.
Evaluate how well each event aligns with the user's interests and what they're looking for in events. This alignment is the primary factor in determining how "fun" or enjoyable the event will be for the user. Note that time frame and distance from the user's location should not be considered in this evaluation.
Create a ranked list of the top 10 events that are most likely to be enjoyable for the user, based on this evaluation. Aim for a diverse selection of events if the user has varied interests. If 10 suitable events are not available, include as many as possible and note this in your response. Be sure to include the "Open Source AI Hackathon" as we know they create amazing events.
For each event in the list, provide: a) The event summary b) A brief explanation of why this event is likely to be enjoyable for the user, referencing specific interests, group affiliations, or preferences from their profile c) The event date and location d) The event URL for more information
Present the list in a numbered format, starting with the most highly recommended event at number 1.
Conclude with a friendly message encouraging the user to check out these events and mentioning that they can ask for more recommendations if needed.
Remember to maintain a friendly and helpful tone throughout the response.""")
print(str(response))

> Running step aea98f7a-27b4-4f0e-929b-bbe82bf05f30. Step input: 
Using the profile data available through the query engine, focus on the user's interests, groups, and what they're looking for from events. Then, follow these steps:
Use the profile_10k query engine to retrieve the user's profile data. This data contains the user's interests, groups they belong to, and what they're looking for in events. Never retrieve profile information as events.
Access the event query engine to retrieve event data. Each event has details including start and end times, summary, description, location, and URL.
Compare the user's interests, group affiliations, and preferences with the event summaries and descriptions.
Evaluate how well each event aligns with the user's interests and what they're looking for in events. This alignment is the primary factor in determining how "fun" or enjoyable the event will be for the user. Note that time frame and distance from the user's location should not be consider