# Quickstart

In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response.

## Setup
### Add API keys
For this quickstart you will need Open AI and Huggingface keys

In [None]:
!pip install trulens

In [1]:
import os
os.environ["OPENAI_API_KEY"] = "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"
os.environ["HUGGINGFACE_API_KEY"] = "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"

### Import from LlamaIndex and TruLens

In [2]:
from IPython.display import JSON

# Imports main tools:
from trulens_eval import TruLlama, Feedback, Huggingface, Tru, Query
tru = Tru()


### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [3]:
# LLama Index starter example from: https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html
# In order to run this, download into data/ Paul Graham's Essay 'What I Worked On' from https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt 

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

### Send your first request

In [4]:
response = query_engine.query("What did the author do growing up?")
print(response)


Growing up, the author wrote short stories, programmed on an IBM 1401, and nagged his father to buy him a TRS-80 microcomputer. He wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He also studied philosophy in college, but switched to AI after becoming bored with it. He then took art classes at Harvard and applied to art schools, eventually attending RISD.


## Initialize Feedback Function(s)

In [5]:
# Initialize Huggingface-based feedback function collection class:
hugs = Huggingface()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

huggingface api: 0requests [00:00, ?requests/s]

## Instrument chain for logging with TruLens

In [6]:
tru_query_engine = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_lang_match])

✅ app LlamaIndex_App1 -> default.sqlite
✅ feedback def. feedback_definition_hash_54236801baf0231f54d65198ca19d765 -> default.sqlite


In [7]:
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)

In [8]:
type(query_engine).__bases__

(llama_index.indices.query.base.BaseQueryEngine,)

In [9]:
# Instrumented query engine can operate like the original:
llm_response = tru_query_engine.query("What did the author do growing up?")

display(llm_response)

2023-06-14 17:00:49,184 - openai - DEBUG - message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
2023-06-14 17:00:49,189 - openai - DEBUG - api_version=None data='{"input": ["What did the author do growing up?"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
2023-06-14 17:00:49,284 - urllib3.connectionpool - DEBUG - https://api.openai.com:443 "POST /v1/embeddings HTTP/1.1" 200 None
2023-06-14 17:00:49,286 - openai - DEBUG - message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=30 request_id=3fef5d4262b387c24c8fb94f68f8b7af response_code=200
2023-06-14 17:00:49,297 - llama_index.indices.utils - DEBUG - > Top 2 nodes:
> [Node 9241300b-508e-4714-bc26-b2eda44d5625] [Similarity score:             0.808594] 		

What I Worked On

February 2021

Before college the two main things I worked on, outside of s...
> [Node 82fe5cbf-a5cd-4bfa-98ec-4d3c785f2993] [Similarity score:             0.805

2023-06-14 17:00:56,222 - urllib3.connectionpool - DEBUG - https://api.openai.com:443 "POST /v1/completions HTTP/1.1" 200 None
2023-06-14 17:00:56,223 - openai - DEBUG - message='OpenAI API response' path=https://api.openai.com/v1/completions processing_ms=6817 request_id=790c8461ba46278f9f12cfc95e73b27e response_code=200
2023-06-14 17:00:56,225 - llama_index.llm_predictor.base - DEBUG - 
Growing up, the author wrote short stories, programmed on an IBM 1401, and nagged his father to buy him a TRS-80 microcomputer. He wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He also studied philosophy in college, but switched to AI after becoming bored with it. He then took art classes at Harvard and applied to art schools, eventually attending RISD.
2023-06-14 17:00:56,229 - llama_index.indices.response.base_builder - DEBUG - > Initial prompt template: Context information is below. 
---------------------
What I Worked On

February 2021

Before

2023-06-14 17:00:56,231 - llama_index.indices.response.base_builder - DEBUG - > Initial response: 
Growing up, the author wrote short stories, programmed on an IBM 1401, and nagged his father to buy him a TRS-80 microcomputer. He wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He also studied philosophy in college, but switched to AI after becoming bored with it. He then took art classes at Harvard and applied to art schools, eventually attending RISD.
2023-06-14 17:00:56,233 - llama_index.token_counter.token_counter - INFO - > [get_response] Total LLM token usage: 1920 tokens
2023-06-14 17:00:56,234 - llama_index.token_counter.token_counter - INFO - > [get_response] Total embedding token usage: 0 tokens


AssertionError: No information recorded in call.

In [None]:
list(tru_query_engine.instrumented())b


## Explore in a Dashboard

In [None]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

### Leaderboard

Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up.

Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best).

![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png)

To dive deeper on a particular chain, click "Select Chain".

### Understand chain performance with Evaluations
 
To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more.

The evaluations tab provides record-level metadata and feedback on the quality of your LLM application.

![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png)

### Deep dive into full chain metadata

Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain.

![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png)

If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [None]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all