# Ground Truth Evaluations

In this quickstart you will create a evaluate a LangChain app using ground truth. Ground truth evaluation can be especially useful during early LLM experiments when you have a small set of example queries that are critical to get right.

Ground truth evaluation works by comparing the similarity of an LLM response compared to its matching verified response.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/groundtruth_evals.ipynb)

### Add API keys
For this quickstart, you will need Open AI keys.

In [None]:
# ! pip install trulens_eval==0.19.2 openai==1.3.7

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "..."

In [3]:
from trulens_eval import Tru

tru = Tru()

### Create Simple LLM Application

In [4]:
from openai import OpenAI
oai_client = OpenAI()

from trulens_eval.tru_custom_app import instrument

class APP:
    @instrument
    def completion(self, prompt):
        completion = oai_client.chat.completions.create(
                model="gpt-3.5-turbo",
                temperature=0,
                messages=
                [
                    {"role": "user",
                    "content": 
                    f"Please answer the question: {prompt}"
                    }
                ]
                ).choices[0].message.content
        return completion
    
llm_app = APP()

## Initialize Feedback Function(s)

In [5]:
from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement

golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "response": "Thomas Edison"}
]

f_groundtruth = Feedback(GroundTruthAgreement(golden_set).agreement_measure, name = "Ground Truth").on_input_output()

✅ In Ground Truth, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Ground Truth, input response will be set to __record__.main_output or `Select.RecordOutput` .


## Instrument chain for logging with TruLens

In [6]:
# add trulens as a context manager for llm_app
from trulens_eval import TruCustomApp
tru_app = TruCustomApp(llm_app, app_id = 'LLM App v1', feedbacks = [f_groundtruth])

In [7]:
# Instrumented query engine can operate as a context manager:
with tru_app as recording:
    llm_app.completion("¿quien invento la bombilla?")
    llm_app.completion("who invented the lightbulb?")

## See results

In [8]:
tru.get_leaderboard(app_ids=[tru_app.app_id])

Unnamed: 0_level_0,Ground Truth,positive_sentiment,Human Feedack,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LLM App v1,1.0,0.38994,1.0,1.75,7.6e-05
