# Measuring Retrieval Quality

There are a variety of ways we can measure retrieval quality from LLM-based evaluations to embedding similarity. In this example, we will explore the different methods available.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/expositional/frameworks/llama_index/llama_index_retrievalquality.ipynb)

## Setup

### Install dependencies
Let's install some of the dependencies for this notebook if we don't have them already

In [None]:
# ! pip install trulens_eval==0.24.0 llama_index==0.10.11 html2text>=2020.1.16

### Add API keys
For this quickstart, you will need Open AI and Huggingface keys. The OpenAI key is used for embeddings and GPT, and the Huggingface key is used for evaluation.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["HUGGINGFACE_API_KEY"] = "..."

### Import from LlamaIndex and TruLens

In [None]:
from trulens_eval import Feedback, Tru, TruLlama
from trulens_eval.feedback import Embeddings
from trulens_eval.feedback.provider.openai import OpenAI

tru = Tru()
tru.reset_database()

### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.legacy import ServiceContext
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(
    html_to_text=True
).load_data(["http://paulgraham.com/worked.html"])

from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model = HuggingFaceEmbeddings(model_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
service_context = ServiceContext.from_defaults(embed_model = embed_model)

index = VectorStoreIndex.from_documents(documents, service_context = service_context)

query_engine = index.as_query_engine(top_k = 5)

### Send your first request

In [None]:
response = query_engine.query("What did the author do growing up?")
print(response)

## Initialize Feedback Function(s)

In [None]:
import numpy as np

# Initialize provider class
openai = OpenAI()

# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
    TruLlama.select_source_nodes().node.text
    ).aggregate(np.mean)

In [None]:
f_embed = Embeddings(embed_model=embed_model)

f_embed_dist = Feedback(f_embed.cosine_distance).on_input().on(
    TruLlama.select_source_nodes().node.text
    ).aggregate(np.mean)

## Instrument app for logging with TruLens

In [None]:
tru_query_engine_recorder = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_qs_relevance, f_embed_dist])

In [None]:
# or as context manager
with tru_query_engine_recorder as recording:
    query_engine.query("What did the author do growing up?")

## Explore in a Dashboard

In [None]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [None]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all