<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>
<h1 align="center">Tracing and Evaluating a LlamaIndex Application</h1>

LlamaIndex provides high-level APIs that enable users to build powerful applications in a few lines of code. However, it can be challenging to understand what is going on under the hood and to pinpoint the cause of issues. Phoenix makes your LLM applications *observable* by visualizing the underlying structure of each call to your query engine and surfacing problematic `spans`` of execution based on latency, token count, or other evaluation metrics.

In this tutorial, you will:
- Build a simple query engine using LlamaIndex that uses retrieval-augmented generation to answer questions over the Arize documentation,
- Record trace data in [OpenInference tracing](https://github.com/Arize-ai/openinference) format using the global `arize_phoenix` handler
- Inspect the traces and spans of your application to identify sources of latency and cost,
- Export your trace data as a pandas dataframe and run an [LLM Evals](https://docs.arize.com/phoenix/concepts/llm-evals) to measure the precision@k of the query engine's retrieval step.

ℹ️ This notebook requires an OpenAI API key.

## 1. Install Dependencies and Import Libraries

Install Phoenix, LlamaIndex, and OpenAI.

Import libraries.

In [1]:
import json
import os
from getpass import getpass
from urllib.request import urlopen

import nest_asyncio
import openai
import pandas as pd
#from gcsfs import GCSFileSystem
from llama_index.core import (
    Settings,
    StorageContext,
    load_index_from_storage,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from tqdm import tqdm

import phoenix as px
from phoenix.evals import (
    HallucinationEvaluator,
    OpenAIModel,
    QAEvaluator,
    RelevanceEvaluator,
    run_evals,
)
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations

nest_asyncio.apply()  # needed for concurrent evals in notebook environments
pd.set_option("display.max_colwidth", 1000)

## 2. Launch Phoenix

You can run Phoenix in the background to collect trace data emitted by any LlamaIndex application that has been instrumented with the `OpenInferenceTraceCallbackHandler`. Phoenix supports LlamaIndex's [one-click observability](https://gpt-index.readthedocs.io/en/latest/end_to_end_tutorials/one_click_observability.html) which will automatically instrument your LlamaIndex application! You can consult our [integration guide](https://docs.arize.com/phoenix/integrations/llamaindex) for a more detailed explanation of how to instrument your LlamaIndex application.

Launch Phoenix and follow the instructions in the cell output to open the Phoenix UI (the UI should be empty because we have yet to run the LlamaIndex application).

(session := px.launch_app()).view()

## 3. Configure Your OpenAI API Key

Set your OpenAI API key if it is not already set as an environment variable.

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")
openai.api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key

## 4. Build Your LlamaIndex Application

This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like.

Download our pre-built index of the Arize docs from cloud storage and instantiate your storage context.

file_system = GCSFileSystem(project="public-assets-275721")
index_path = "arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"
storage_context = StorageContext.from_defaults(
    fs=file_system,
    persist_dir=index_path,
)

Enable Phoenix tracing via `LlamaIndexInstrumentor`. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix.

from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

from phoenix.otel import register

tracer_provider = register(endpoint="http://127.0.0.1:6006/v1/traces")
LlamaIndexInstrumentor().instrument(skip_dep_check=True, tracer_provider=tracer_provider)

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability  

Settings.llm = OpenAI(model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
index = load_index_from_storage(
    storage_context,
)
query_engine = index.as_query_engine()

## 5. Run Your Query Engine and View Your Traces in Phoenix

We've compiled a list of commonly asked questions about Arize. Let's download the sample queries and take a look.

queries_url = "http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl"
queries = []
with urlopen(queries_url) as response:
    for line in response:
        line = line.decode("utf-8").strip()
        data = json.loads(line)
        queries.append(data["query"])
queries[:5]

Let's run the first 10 queries and view the traces in Phoenix.

for query in tqdm(queries[:5]):
    query_engine.query(query)

And just for fun, ask your own question!

response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")
print(response)

Check the Phoenix UI as your queries run. Your traces should appear in real time.

Open the Phoenix UI with the link below if you haven't already and click through the queries to better understand how the query engine is performing. For each trace you will see a break

Phoenix can be used to understand and troubleshoot your by surfacing:
 - **Application latency** - highlighting slow invocations of LLMs, Retrievers, etc.
 - **Token Usage** - Displays the breakdown of token usage with LLMs to surface up your most expensive LLM calls
 - **Runtime Exceptions** - Critical runtime exceptions such as rate-limiting are captured as exception events.
 - **Retrieved Documents** - view all the documents retrieved during a retriever call and the score and order in which they were returned
 - **Embeddings** - view the embedding text used for retrieval and the underlying embedding model
LLM Parameters - view the parameters used when calling out to an LLM to debug things like temperature and the system prompts
 - **Prompt Templates** - Figure out what prompt template is used during the prompting step and what variables were used.
 - **Tool Descriptions** - view the description and function signature of the tools your LLM has been given access to
 - **LLM Function Calls** - if using OpenAI or other a model with function calls, you can view the function selection and function messages in the input messages to the LLM.

<img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/RAG_trace_details.png" alt="Trace Details View on Phoenix" style="width:100%; height:auto;">

print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}")

## 6. Export and Evaluate Your Trace Data

You can export your trace data as a pandas dataframe for further analysis and evaluation.

In this case, we will export our `retriever` spans into two separate dataframes:
- `queries_df`, in which the retrieved documents for each query are concatenated into a single column,
- `retrieved_documents_df`, in which each retrieved document is "exploded" into its own row to enable the evaluation of each query-document pair in isolation.

This will enable us to compute multiple kinds of evaluations, including:
- relevance: Are the retrieved documents grounded in the response?
- Q&A correctness: Are your application's responses grounded in the retrieved context?
- hallucinations: Is your application making up false information?

In [2]:
from utils import (
    get_phoenix_endpoint,
    process_messages,
    run_agent,
    tools,
    update_sql_gen_prompt,
)
px.Client()



<phoenix.session.client.Client at 0x292a5fa90>

In [3]:
help(get_retrieved_documents)

Help on function get_retrieved_documents in module phoenix.trace.dsl.helpers:

get_retrieved_documents(obj: phoenix.trace.dsl.helpers.CanQuerySpans, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, project_name: Optional[str] = None, stop_time: Optional[datetime.datetime] = None, timeout: Optional[int] = 5) -> pandas.core.frame.DataFrame



In [4]:
queries_df = get_qa_with_reference(px.Client(), project_name='appointment_prep_agent')
queries_df

Unnamed: 0_level_0,input,output,reference
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1b63b8bf6829fb03,what have i mentioned in recent reflections?,"{\n ""type"": ""text"",\n ""message"": ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well. It's great that you're seeking support. Is there anything specific you'd like to focus on in your first session?""\n}","Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}\n\nContext: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User:..."
2149c5eaa0a59e6c,what have i mentioned in recent reflections?,"{\n ""type"": ""text"",\n ""message"": ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well. It's great that you're seeking support. Is there anything specific you'd like to focus on in your first session?""\n}","Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}\n\nContext: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User:..."
98fee675909b4751,what have i mentioned in recent reflections?,"{\n ""type"": ""text"",\n ""message"": ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well. It's great that you're seeking support. Is there anything specific you'd like to focus on in your first session?""\n}","Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}\n\nContext: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User:..."


In [5]:
retrieved_documents_df = get_retrieved_documents(px.Client(), project_name='appointment_prep_agent')
retrieved_documents_df



Unnamed: 0_level_0,Unnamed: 1_level_0,context.trace_id,input,reference,document_score
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
b2c21623c8b7e944,0,c8c59f449e47cd513016b84ba0383a72,recent reflections,"Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.202546
b2c21623c8b7e944,1,c8c59f449e47cd513016b84ba0383a72,recent reflections,"Context: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.201367
b2c21623c8b7e944,2,c8c59f449e47cd513016b84ba0383a72,recent reflections,"Context: A user named Sol initiates a conversation with the assistant, expressing that they are not feeling well, and the assistant responds with empathy and offers to listen.\n\nContent: User: Hello\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Hi Sol! How are you feeling today? I'm here to listen if there's anything on your mind.""\n}\nUser: Hello. I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}",0.123959
d7ff722c64a68cfe,0,f5a61dc5ed2ed220e0af69ce30e6b6cd,recent reflections,"Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.202546
d7ff722c64a68cfe,1,f5a61dc5ed2ed220e0af69ce30e6b6cd,recent reflections,"Context: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.201367
d7ff722c64a68cfe,2,f5a61dc5ed2ed220e0af69ce30e6b6cd,recent reflections,"Context: A user named Sol initiates a conversation with the assistant, expressing that they are not feeling well, and the assistant responds with empathy and offers to listen.\n\nContent: User: Hello\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Hi Sol! How are you feeling today? I'm here to listen if there's anything on your mind.""\n}\nUser: Hello. I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}",0.123959
b9d8b6bc25c9217c,0,d5b2cfe0cdcbf588c8788c3e0259a8bf,recent reflections,"Context: A user named Sol is expressing their feelings of stress and seeking support from the assistant.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.202546
b9d8b6bc25c9217c,1,d5b2cfe0cdcbf588c8788c3e0259a8bf,recent reflections,"Context: A user named Sol expresses feeling unwell and shares that work has been stressful, while the assistant offers support and a listening ear.\n\nContent: User: I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}\nUser: Sure\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Thank you for sharing, Sol. What's been on your mind lately? I'm here to help you work through it.""\n}\nUser: Things have been so stressful at work today. And that's all I want to share today. Thanks\nAssistant: {\n ""type"": ""completed"",\n ""message"": ""Thank you for sharing that with me, Sol. It's completely okay to take things one step at a time. If you ever want to talk more or need support, I'm here for you. Take care!""\n}",0.201367
b9d8b6bc25c9217c,2,d5b2cfe0cdcbf588c8788c3e0259a8bf,recent reflections,"Context: A user named Sol initiates a conversation with the assistant, expressing that they are not feeling well, and the assistant responds with empathy and offers to listen.\n\nContent: User: Hello\nAssistant: {\n ""type"": ""text"",\n ""message"": ""Hi Sol! How are you feeling today? I'm here to listen if there's anything on your mind.""\n}\nUser: Hello. I am not feeling good today\nAssistant: {\n ""type"": ""text"",\n ""message"": ""I'm sorry to hear that you're not feeling good today, Sol. Would you like to talk about what's been bothering you? I'm here to listen.""\n}",0.123959


Next, define your evaluation model and your evaluators.

Evaluators are built on top of language models and prompt the LLM to assess the quality of responses, the relevance of retrieved documents, etc., and provide a quality signal even in the absence of human-labeled data. Pick an evaluator type and instantiate it with the language model you want to use to perform evaluations using our battle-tested evaluation templates.

In [6]:
"""
eval_model = OpenAIModel(
    model="gpt-4o",
)

"""
from sm_llm.agents.phoenix_utility import get_aps_azure_openai

eval_model = get_aps_azure_openai()

hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_correctness_evaluator = QAEvaluator(eval_model)
relevance_evaluator = RelevanceEvaluator(eval_model)

hallucination_eval_df, qa_correctness_eval_df = run_evals(
    dataframe=queries_df,
    evaluators=[hallucination_evaluator, qa_correctness_evaluator],
    provide_explanation=True,
)
relevance_eval_df = run_evals(
    dataframe=retrieved_documents_df,
    evaluators=[relevance_evaluator],
    provide_explanation=True,
)[0]

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Hallucination", dataframe=hallucination_eval_df),
    SpanEvaluations(eval_name="QA Correctness", dataframe=qa_correctness_eval_df),
    DocumentEvaluations(eval_name="Relevance", dataframe=relevance_eval_df),
)

AWS secret, assistant_platform/global/env, could not be retrieved; no secrets from AWS were incorporated into config:
An error occurred (ExpiredTokenException) when calling the GetSecretValue operation: The security token included in the request is expired
Attempting to instrument while already instrumented


🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: appointment_prep_agent
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



run_evals |          | 0/6 (0.0%) | ⏳ 00:00<? | ?it/s

run_evals |          | 0/9 (0.0%) | ⏳ 00:00<? | ?it/s



In [7]:
hallucination_eval_df

Unnamed: 0_level_0,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1b63b8bf6829fb03,hallucinated,1,"To determine if the answer is factual or hallucinated, we need to compare the information in the answer with the details provided in the reference text. The query asks about recent reflections, and the reference text provides a conversation where Sol mentions feeling stressed, particularly due to work, and not feeling well. The answer states that Sol has mentioned feeling stressed, especially related to work, and not feeling well, which aligns with the reference text. However, the answer also includes a statement about seeking support and asking if there's anything specific Sol would like to focus on in a session. The reference text does not mention anything about a session or seeking support in a structured way like a session. Therefore, the part about focusing on something specific in a session is not supported by the reference text, making that portion of the answer a hallucination."
2149c5eaa0a59e6c,factual,0,"To determine if the answer is factual or hallucinated, we need to compare the information in the answer with the details provided in the reference text. \n\n1. The query asks about recent reflections, which implies looking at what the user, Sol, has mentioned in their recent interactions.\n\n2. The reference text shows that Sol has expressed not feeling good and mentioned that things have been stressful at work. This is consistent across multiple interactions where Sol states they are not feeling well and that work has been stressful.\n\n3. The answer states that Sol has mentioned feeling stressed, especially related to work, and not feeling well. This aligns with the reference text where Sol explicitly mentions these feelings.\n\n4. The answer also includes a statement about seeking support, which is implied by Sol's engagement with the assistant and the assistant's offer to listen and provide support.\n\n5. The final part of the answer asks if there is anything specific Sol would..."
98fee675909b4751,hallucinated,1,"To determine if the answer is factual or hallucinated, we need to compare the information in the answer with the details provided in the reference text. \n\n1. The query asks about what has been mentioned in recent reflections. The reference text provides several interactions between a user named Sol and an assistant.\n\n2. In the reference text, Sol mentions feeling stressed, particularly due to work, and not feeling well. This is consistent across multiple interactions where Sol expresses not feeling good and mentions stress at work.\n\n3. The answer states: ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well."" This aligns with the reference text where Sol indeed mentions these feelings.\n\n4. The answer also includes: ""It's great that you're seeking support. Is there anything specific you'd like to focus on in your first session?"" While the first part about seeking support is implied by Sol's interaction with the assistant, the mention ..."


In [8]:
qa_correctness_eval_df

Unnamed: 0_level_0,label,score,explanation
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1b63b8bf6829fb03,correct,1,"To determine if the answer is correct, we need to compare the content of the answer with the information provided in the reference text. The question asks about what has been mentioned in recent reflections. The reference text shows multiple interactions where the user, Sol, mentions feeling stressed, particularly due to work, and not feeling well. The answer states that Sol has mentioned feeling stressed, especially related to work, and not feeling well, which aligns with the information in the reference text. Additionally, the answer acknowledges Sol's seeking of support, which is consistent with the context of the interactions. Therefore, the answer correctly summarizes the recent reflections mentioned by Sol in the reference text."
2149c5eaa0a59e6c,correct,1,"To determine if the answer is correct, we need to compare the content of the answer with the information provided in the reference text. The question asks about what has been mentioned in recent reflections. The reference text shows that Sol has mentioned feeling stressed, particularly due to work, and not feeling well. The answer states: ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well."" This matches the information in the reference text. Additionally, the answer acknowledges Sol's seeking support, which is consistent with the context of the conversation. Therefore, the answer correctly reflects the recent reflections mentioned in the reference text."
98fee675909b4751,correct,1,"To determine if the answer is correct, we need to compare the content of the answer with the information provided in the reference text. The question asks about what has been mentioned in recent reflections. The reference text shows that Sol has mentioned feeling stressed, particularly due to work, and not feeling well. The answer states: ""Recently, you've mentioned feeling stressed, especially related to work, and not feeling well."" This matches the information in the reference text. Additionally, the answer acknowledges Sol's seeking support, which is consistent with the context of the conversation. Therefore, the answer correctly reflects the recent reflections mentioned in the reference text."


In [9]:
relevance_eval_df

Unnamed: 0_level_0,Unnamed: 1_level_0,label,score,explanation
context.span_id,document_position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
b2c21623c8b7e944,0,unrelated,0,"The question 'recent reflections' is quite vague and does not provide specific context or details about what is being asked. It could refer to a variety of topics such as personal reflections, reflections on a recent event, or reflections in a specific context. The reference text, however, is a conversation between a user named Sol and an assistant, where Sol is expressing feelings of stress related to work. The conversation does not mention any reflections, recent or otherwise. It focuses on Sol's current emotional state and the assistant's supportive responses. Therefore, the reference text does not contain information about 'recent reflections' as it is not discussed or mentioned in the conversation. Thus, the reference text is unrelated to the question."
b2c21623c8b7e944,1,unrelated,0,"The question ""recent reflections"" is quite vague and does not provide specific context or details about what is being asked. It could refer to recent thoughts, experiences, or insights. The reference text, however, is a conversation between a user named Sol and an assistant, where Sol shares feeling unwell and stressed due to work. The assistant offers support and a listening ear. This conversation does not directly address or provide information about ""recent reflections"" as it is more about Sol's current feelings and stress rather than reflections or insights. Therefore, the reference text does not contain information that can help answer the question about recent reflections."
b2c21623c8b7e944,2,unrelated,0,"The question 'recent reflections' is quite vague and does not provide specific context or details about what is being asked. It could refer to recent thoughts, experiences, or feedback. The reference text, however, is a conversation between a user named Sol and an assistant, where Sol expresses not feeling well and the assistant offers to listen. This interaction does not provide any information about 'recent reflections' as it does not mention any reflections, thoughts, or feedback from Sol or the assistant. Therefore, the reference text does not contain information that can help answer the question about 'recent reflections'."
d7ff722c64a68cfe,0,relevant,1,"The question ""recent reflections"" is quite vague and does not provide specific context or details about what is being asked. It could refer to recent thoughts, experiences, or feelings. The reference text involves a conversation where a user named Sol is expressing feelings of stress and seeking support from an assistant. The conversation includes Sol's recent reflections on feeling stressed at work. This aligns with the idea of recent reflections, as Sol is reflecting on their current emotional state and experiences. Therefore, the reference text contains information that could be considered relevant to the question if the question is interpreted as seeking recent reflections or thoughts from an individual."
d7ff722c64a68cfe,1,relevant,1,"The question ""recent reflections"" is quite vague and does not provide specific details about what is being asked. It could refer to recent thoughts, experiences, or feelings. The reference text involves a conversation where a user named Sol shares their recent feelings of stress related to work. This could be considered a form of recent reflection, as Sol is reflecting on their current emotional state and experiences. Therefore, the reference text contains information that could be relevant to the question if it is interpreted as seeking recent reflections or thoughts from an individual."
d7ff722c64a68cfe,2,unrelated,0,"The question is 'recent reflections,' which is quite vague and does not provide specific context or details about what kind of reflections are being referred to. The reference text is a conversation between a user named Sol and an assistant, where Sol expresses not feeling well, and the assistant offers empathy and a listening ear. The reference text does not mention any reflections, recent or otherwise, nor does it provide any information that could be interpreted as reflections. Therefore, the reference text does not contain information relevant to answering the question about 'recent reflections.'"
b9d8b6bc25c9217c,0,relevant,1,"The question is 'recent reflections,' which is quite vague and does not provide specific context or details about what is being asked. The reference text, however, is a conversation between a user named Sol and an assistant, where Sol expresses feelings of stress and seeks support. The conversation includes Sol's recent reflections on their stressful day at work. This aligns with the idea of 'recent reflections' as it involves Sol reflecting on their current emotional state and experiences. Therefore, the reference text contains information that can be considered relevant to the question, as it provides an example of recent reflections from Sol's perspective."
b9d8b6bc25c9217c,1,unrelated,0,"The question 'recent reflections' is quite vague and does not provide specific context or details about what is being asked. The reference text involves a conversation between a user named Sol and an assistant, where Sol expresses feeling unwell and mentions stress at work. The assistant offers support and a listening ear. However, there is no explicit mention of 'recent reflections' or any content that directly addresses reflections or introspective thoughts. The reference text is more about providing emotional support rather than discussing reflections. Therefore, the reference text does not contain information that directly answers or relates to the question 'recent reflections'."
b9d8b6bc25c9217c,2,unrelated,0,"The question 'recent reflections' is quite vague and does not provide specific context or details about what is being asked. It could refer to recent thoughts, experiences, or feedback. The reference text, however, is a conversation between a user named Sol and an assistant, where Sol expresses not feeling well and the assistant offers to listen. This interaction does not provide any information about 'recent reflections' as it does not mention any reflections, thoughts, or feedback from Sol or the assistant. Therefore, the reference text does not contain information that can help answer the question about 'recent reflections'."


Your evaluations should now appear as annotations on the appropriate spans in Phoenix.

![A view of the Phoenix UI with evaluation annotations](https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/evals/traces_with_evaluation_annotations.png)

## 7. Final Thoughts

LLM Traces and the accompanying OpenInference Tracing specification is designed to be a category of telemetry data that is used to understand the execution of LLMs and the surrounding application context such as retrieval from vector stores and the usage of external tools such as search engines or APIs. It lets you understand the inner workings of the individual steps your application takes wile also giving you visibility into how your system is running and performing as a whole.

LLM Evals are designed for simple, fast, and accurate LLM-based evaluations. They let you quickly benchmark the performance of your LLM application and help you identify the problematic spans of execution.

For more details on Phoenix, LLM Tracing, and LLM Evals, checkout our [documentation](https://docs.arize.com/phoenix/).