In [1]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import sys

sys.path.append(str(Path().cwd().parent.parent.parent.resolve()))

# Quickstart

In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response.

## Setup

### Install dependencies
Let's install some of the dependencies for this notebook if we don't have them already

In [12]:
! pip list

Package                       Version
----------------------------- -----------
accelerate                    0.19.0
aiohttp                       3.8.4
aiosignal                     1.3.1
altair                        5.0.1
anyio                         3.7.0
appnope                       0.1.3
argilla                       1.8.0
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.3
asttokens                     2.2.1
async-timeout                 4.0.2
attrs                         23.1.0
backcall                      0.2.0
backoff                       2.2.1
backports.functools-lru-cache 1.6.4
backports.zoneinfo            0.2.1
beautifulsoup4                4.12.2
bitarray                      2.7.5
bleach                        6.0.0
blinker                       1.6.2
cachetools                    5.3.1
certifi                       2023.5.7
cffi                          1.15.1
chardet                       5.1.0
charset-nor

In [2]:
# !pip install trulens-eval
# !pip install llama_index==0.7.0

### Add API keys
For this quickstart, you will need Open AI and Huggingface keys

In [3]:
# import os
# os.environ["OPENAI_API_KEY"] = "..."
# os.environ["HUGGINGFACE_API_KEY"] = "..."
from trulens_eval.keys import check_or_set_keys

check_or_set_keys(
    OPENAI_API_KEY="to fill in",
    HUGGINGFACE_API_KEY="to fill in",
    PINECONE_API_KEY="to fill in",
    PINECONE_ENV="to fill in"
)

✅ Key OPENAI_API_KEY set from environment (same value found in .env file at /Users/piotrm/Dropbox/repos/github/trulens/.env).
✅ Key HUGGINGFACE_API_KEY set from environment (same value found in .env file at /Users/piotrm/Dropbox/repos/github/trulens/.env).
✅ Key PINECONE_API_KEY set from environment (same value found in .env file at /Users/piotrm/Dropbox/repos/github/trulens/.env).
✅ Key PINECONE_ENV set from environment (same value found in .env file at /Users/piotrm/Dropbox/repos/github/trulens/.env).


### Import from LlamaIndex and TruLens

In [4]:
# Imports main tools:
from trulens_eval import TruLlama, Feedback, Tru, feedback
tru = Tru()

### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [5]:
# LLama Index starter example from: https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html
# In order to run this, download into data/ Paul Graham's Essay 'What I Worked On' from https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt 

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

### Send your first request

In [6]:
response = query_engine.query("What did the author do growing up?")
print(response)


Growing up, the author wrote short stories, experimented with programming on an IBM 1401, nagged his father to buy a TRS-80, wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He also studied philosophy in college, but switched to AI after finding it boring. He worked on building the infrastructure of the web, wrote essays, and did some painting. He also had dinners for a group of friends every Thursday night and bought a building in Cambridge.


## Initialize Feedback Function(s)

In [7]:
import numpy as np

# Initialize Huggingface-based feedback function collection class:
hugs = feedback.Huggingface()
openai = feedback.OpenAI()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()

# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
    TruLlama.select_source_nodes().node.text
).aggregate(np.min)

✅ In language_match, input text1 will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In language_match, input text2 will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In qs_relevance, input question will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In qs_relevance, input statement will be set to *.__record__.app.query.rets.source_nodes[:].node.text .


## Instrument chain for logging with TruLens

In [8]:
tru_query_engine = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance])

✅ app LlamaIndex_App1 -> default.sqlite
✅ feedback def. feedback_definition_hash_a2972a1d61f07ca7f4fb930ffa1dfd18 -> default.sqlite
✅ feedback def. feedback_definition_hash_8912d6418f3eef2ab79dfd62645b00d9 -> default.sqlite
✅ feedback def. feedback_definition_hash_2bf9e5ee101927596a827f3b4909789c -> default.sqlite


In [9]:
# Instrumented query engine can operate like the original:
llm_response = tru_query_engine.query("What did the author do growing up?")

print(llm_response)


Growing up, the author wrote short stories, experimented with programming on an IBM 1401, nagged his father to buy a TRS-80 computer, wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He also studied philosophy in college, switched to AI, and worked on building the infrastructure of the web. He wrote essays and published them online, had dinners for a group of friends every Thursday night, painted, and bought a building in Cambridge.
✅ record record_hash_089fa363b280a3c2f811147c5ca9cc06 from LlamaIndex_App1 -> default.sqlite

## Explore in a Dashboard

In [11]:
tru.run_dashboard(force=True, _dev=Path().cwd().parent.parent.parent.resolve()) # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Starting dashboard ...


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.1.251:8502 .


<subprocess.Popen at 0x16e280af0>

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

### Leaderboard

Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up.

Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best).

![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png)

To dive deeper on a particular chain, click "Select Chain".

### Understand chain performance with Evaluations
 
To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more.

The evaluations tab provides record-level metadata and feedback on the quality of your LLM application.

![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png)

### Deep dive into full chain metadata

Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain.

![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png)

If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [None]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all