---
description: Cookbook that showcases Opik's integration with the LlamaIndex Python SDK
---

# Using Opik with LlamaIndex

This notebook showcases how to use Opik with LlamaIndex. [LlamaIndex](https://github.com/run-llama/llama_index) is a flexible data framework for building LLM applications:
> LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:
>
> - Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
> - Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
> - Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
> - Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).

For this guide we will be downloading the essays from Paul Graham and use them as our data source. We will then start querying these essays with LlamaIndex.

## Creating an account on Comet.com

[Comet](https://www.comet.com/site?from=llm&utm_source=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm&=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) and grab you API Key.

> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/overview/?from=llm&utm_source=opik&utm_medium=colab&utm_content=llamaindex&utm_campaign=opik) for more information.

In [17]:
# !pip install opik llama-index llama-index-agent-openai llama-index-llms-openai --upgrade --quiet

In [18]:
import opik

opik.configure(use_local=False)

OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/akshaypachaar/.opik.config


## Preparing our environment

First, we will download the Chinook database and set up our different API keys.

And configure the required environment variables:

In [19]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

In addition, we will download the Paul Graham essays:

In [20]:
import os
import requests

# Create directory if it doesn't exist
os.makedirs("./data/paul_graham/", exist_ok=True)

# Download the file using requests
url = "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
response = requests.get(url)
with open("./data/paul_graham/paul_graham_essay.txt", "wb") as f:
    f.write(response.content)

## Using LlamaIndex

### Configuring the Opik integration

You can use the Opik callback directly by calling:

In [2]:
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from opik.integrations.llama_index import LlamaIndexCallbackHandler

opik_callback_handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([opik_callback_handler])

Now that the callback handler is configured, all traces will automatically be logged to Opik.

### Using LLamaIndex

The first step is to load the data into LlamaIndex. We will use the `SimpleDirectoryReader` to load the data from the `data/paul_graham` directory. We will also create the vector store to index all the loaded documents.

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/akshayp/redirect/projects?name=Default%20Project.


We can now query the index using the `query_engine` object:

In [4]:
response = query_engine.query("What did the author do growing up?") 
print(response)

The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 using Fortran in 9th grade. Later, the author transitioned to working with microcomputers, building simple games and a word processor on a TRS-80. Additionally, the author initially planned to study philosophy in college but switched to AI due to a lack of interest in philosophy courses.


You can now go to the Opik app to see the trace:

![LlamaIndex trace in Opik](https://raw.githubusercontent.com/comet-ml/opik/main/apps/opik-documentation/documentation/static/img/cookbook/llamaIndex_cookbook.png)

In [14]:
str(response)

'The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 using Fortran in 9th grade. Later, the author transitioned to working with microcomputers, building simple games and a word processor on a TRS-80. Additionally, the author initially planned to study philosophy in college but switched to AI due to a lack of interest in philosophy courses.'

## Eval

Test dataset

In [21]:
import pandas as pd

df = pd.read_csv("data/test.csv")
df.head()

Unnamed: 0,Question,Answer,Context
0,What was the very first programming language P...,He used an early version of Fortran on the IBM...,The language we used was an early version of F...
1,Which microcomputer did Paul Graham’s father f...,A TRS-80.,Computers were expensive in those days and it ...
2,What was the name of the startup Paul Graham c...,Viaweb.,"We started a new company we called Viaweb, aft..."
3,Which friend of Paul Graham was the person res...,Robert Tappan Morris (often referred to as “Ro...,I remember when my friend Robert Morris got ki...
4,What was the title of the second Lisp book tha...,*ANSI Common Lisp.*,So with my unerring nose for financial opportu...


In [22]:
from opik import Opik

client = Opik()
dataset = client.get_or_create_dataset(name="Test dataset")

In [23]:
qa_pairs = [
    {"input": row["Question"], "expected_output": row["Answer"], "context": row["Context"]} 
    for _, row in df.iterrows()
]
qa_pairs[0]

{'input': 'What was the very first programming language Paul Graham used when he began learning to program on the IBM 1401?',
 'expected_output': 'He used an early version of Fortran on the IBM 1401.',
 'context': 'The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.'}

In [24]:

dataset.insert(qa_pairs)

LLM application

In [25]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

In [26]:
from opik import Opik, track
from opik.evaluation import evaluate
from opik.evaluation.metrics import (
    Hallucination,
    AnswerRelevance,
    ContextPrecision,
    ContextRecall
)
from opik.integrations.openai import track_openai
import openai

# Define the task to evaluate
openai_client = track_openai(openai.OpenAI())

MODEL = "gpt-3.5-turbo"

@track
def my_llm_application(input: str) -> str:
    response = query_engine.query(input)
    return str(response)

# Define the evaluation task
def evaluation_task(x):
    return {
        "output": my_llm_application(x['input'])
    }

# Create a simple dataset
client = Opik()
dataset = client.get_or_create_dataset(name="Test dataset")

# Define the metrics
hallucination_metric = Hallucination()
answer_relevance_metric = AnswerRelevance()
context_precision_metric = ContextPrecision()
context_recall_metric = ContextRecall() 



evaluation = evaluate(
    dataset=dataset,
    task=evaluation_task,
    scoring_metrics=[hallucination_metric, answer_relevance_metric, context_precision_metric, context_recall_metric],
    experiment_config={
        "model": MODEL
    }
)

  Expected `PromptTokensDetails` but got `dict` with value `{'audio_tokens': 0, 'cached_tokens': 0}` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
  Expected `PromptTokensDetails` but got `dict` with value `{'audio_tokens': 0, 'cached_tokens': 1024}` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(
Evaluation: 100%|██████████| 5/5 [00:20<00:00,  4.17s/it]
