# Quickstart

In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response.

## Setup
### Add API keys
For this quickstart you will need Open AI and Huggingface keys

In [1]:
"a"

'a'

In [2]:
%load_ext autoreload
%autoreload 2
import os
os.environ["OPENAI_API_KEY"] = "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"
os.environ["HUGGINGFACE_API_KEY"] = "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"

### Import from LangChain and TruLens

In [3]:
from IPython.display import JSON

# Imports main tools:
from trulens_eval import TruChain, Feedback, Huggingface, Tru, Query
tru = Tru()

# imports from langchain to build app
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate

### Create Simple LLM Application

This example uses a LangChain framework and OpenAI LLM

In [4]:
full_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=
        "Provide a helpful response with relevant background information for the following: {prompt}",
        input_variables=["prompt"],
    )
)

chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt])

llm = OpenAI(temperature=0.9, max_tokens=128)

chain = LLMChain(llm=llm, prompt=chat_prompt_template, verbose=True)

### Send your first request

In [5]:
prompt_input = '¿que hora es?'

In [6]:
llm_response = chain(prompt_input)

display(llm_response)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHuman: Provide a helpful response with relevant background information for the following: ¿que hora es?[0m

[1m> Finished chain.[0m


{'prompt': '¿que hora es?',
 'text': '\n\nLa hora actual en tu ubicación depende de la zona horaria en la que te encuentras. Para averiguar la hora en tu zona, puedes usar una herramienta de conversión de zonas horarias en línea para encontrar la hora actual. También puedes consultar la zona horaria en tu dispositivo para obtener información precisa.'}

## Initialize Feedback Function(s)

In [7]:
# Initialize Huggingface-based feedback function collection class:
hugs = Huggingface()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

huggingface api: 0requests [00:00, ?requests/s]

## Instrument chain for logging with TruLens

In [8]:
truchain = TruChain(chain,
    chain_id='Chain3_ChatApplication',
    feedbacks=[f_lang_match],
    tru = tru)

✅ chain Chain3_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c -> default.sqlite


In [9]:
# Instrumented chain can operate like the original:
llm_response = truchain(prompt_input)

display(llm_response)



[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m


{'prompt': '¿que hora es?',
 'text': '\n\nLa hora actual es [hora en la zona horaria actual]. El mundo se divide en 24 zonas horarias diferentes, con una hora diferente para cada lugar. Para encontrar la hora exacta en la zona horaria actual, consulte una herramienta de hora local como el reloj mundial en línea.'}

✅ record record_hash_a2919b3a75a800dcca0d166c9af7d2ed from Chain3_ChatApplication -> default.sqlite


## Explore in a Dashboard

In [10]:
#tru.run_dashboard() # open a local streamlit app to explore

# tru.run_dashboard(_dev=True) # if running from repo
# tru.stop_dashboard() # stop if needed

### Chain Leaderboard

Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up.

Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best).

![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png)

To dive deeper on a particular chain, click "Select Chain".

### Understand chain performance with Evaluations
 
To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more.

The evaluations tab provides record-level metadata and feedback on the quality of your LLM application.

![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png)

### Deep dive into full chain metadata

Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain.

![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png)

If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [11]:
tru.get_records_and_feedback(chain_ids=[])[0] # pass an empty list of chain_ids to get all

Unnamed: 0,record_id,chain_id,input,output,record_json,tags,ts,cost_json,chain_json,total_tokens,total_cost,language_match,language_match_calls
0,record_hash_016d5d7f77bb9da0881d4149fdd4dd92,Chain1_ChatApplication,This will be logged by deferred evaluator.,\n\nDeferred Evaluator is an automated scoring...,"{""record_id"": ""record_hash_016d5d7f77bb9da0881...",,2023-06-08 09:42:51.602106,"{""n_tokens"": 134, ""cost"": 0.00268}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",134,0.00268,0.992349,[{'args': {'text1': 'This will be logged by de...
1,record_hash_1c188102dd1b2349cdb67882fe3d8d8a,Chain1_ChatApplication,que hora es?,\n\nLa hora actual es 12:35.,"{""record_id"": ""record_hash_1c188102dd1b2349cdb...",,2023-06-08 09:42:41.008099,"{""n_tokens"": 30, ""cost"": 0.0006000000000000001}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",30,0.0006,0.007279,"[{'args': {'text1': 'que hora es?', 'text2': '..."
2,record_hash_783160580bb6a9798f76e1fb0495061d,Chain1_ChatApplication,This will be automatically logged.,\n\nThis means that the action you are taking ...,"{""record_id"": ""record_hash_783160580bb6a9798f7...",,2023-06-08 09:42:39.554762,"{""n_tokens"": 85, ""cost"": 0.0017}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",85,0.0017,0.989833,[{'args': {'text1': 'This will be automaticall...
3,record_hash_9865e5b6907c71e9de97fa6a6e5807bb,Chain1_ChatApplication,This will be automatically logged.,\n\nIn order to keep your records organized an...,"{""record_id"": ""record_hash_9865e5b6907c71e9de9...",,2023-06-08 09:38:05.778958,"{""n_tokens"": 116, ""cost"": 0.00232}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",116,0.00232,0.994001,[{'args': {'text1': 'This will be automaticall...
4,record_hash_a002097b06f96f8a6e7cf476888901ee,Chain1_ChatApplication,This will be automatically logged.,\n\nThis means that any time you perform an ac...,"{""record_id"": ""record_hash_a002097b06f96f8a6e7...",,2023-06-08 14:02:47.324103,"{""n_tokens"": 85, ""cost"": 0.0017}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",85,0.0017,0.99743,[{'args': {'text1': 'This will be automaticall...
5,record_hash_a2919b3a75a800dcca0d166c9af7d2ed,Chain3_ChatApplication,¿que hora es?,\n\nLa hora actual es [hora en la zona horaria...,"{""record_id"": ""record_hash_a2919b3a75a800dcca0...",,2023-06-08 14:10:22.818910,"{""n_tokens"": 114, ""cost"": 0.00228}","{""chain_id"": ""Chain3_ChatApplication"", ""feedba...",114,0.00228,,
6,record_hash_a8897f746a480e35f8ce33bd11821cf8,Chain1_ChatApplication,que hora es?,"\n\nRespuesta: Actualmente, dependiendo de la ...","{""record_id"": ""record_hash_a8897f746a480e35f8c...",,2023-06-08 14:02:58.328323,"{""n_tokens"": 93, ""cost"": 0.00186}","{""chain_id"": ""Chain1_ChatApplication"", ""feedba...",93,0.00186,0.997064,"[{'args': {'text1': 'que hora es?', 'text2': '..."
7,record_hash_ae39fd378185b4ba992212110024860b,Chain3_ChatApplication,¿que hora es?,\n\nLa hora actual es las 11:30pm.,"{""record_id"": ""record_hash_ae39fd378185b4ba992...",,2023-06-08 09:42:30.386223,"{""n_tokens"": 34, ""cost"": 0.00068}","{""chain_id"": ""Chain3_ChatApplication"", ""feedba...",34,0.00068,0.99655,"[{'args': {'text1': '¿que hora es?', 'text2': ..."
8,record_hash_b2bc294ccddc92e7d4fffc0276a0ae78,Chain3_ChatApplication,¿que hora es?,\n\nLa hora actual depende de la ubicación. Si...,"{""record_id"": ""record_hash_b2bc294ccddc92e7d4f...",,2023-06-08 09:37:55.874846,"{""n_tokens"": 120, ""cost"": 0.0024}","{""chain_id"": ""Chain3_ChatApplication"", ""feedba...",120,0.0024,0.99419,"[{'args': {'text1': '¿que hora es?', 'text2': ..."
9,record_hash_ba1041598fdeecf93759b73c2c7b5343,Chain3_ChatApplication,¿que hora es?,\n\nLa hora actual es la hora del reloj del si...,"{""record_id"": ""record_hash_ba1041598fdeecf9375...",,2023-06-08 14:06:03.861315,"{""n_tokens"": 96, ""cost"": 0.00192}","{""chain_id"": ""Chain3_ChatApplication"", ""feedba...",96,0.00192,0.994699,"[{'args': {'text1': '¿que hora es?', 'text2': ..."


# Logging

## Automatic Logging

The simplest method for logging with TruLens is by wrapping with TruChain and including the tru argument, as shown in the quickstart.

This is done like so:

In [12]:
truchain = TruChain(
    chain,
    chain_id='Chain1_ChatApplication',
    tru=tru
)
truchain("This will be automatically logged.")

✅ chain Chain1_ChatApplication -> default.sqlite


[1m> Entering new LLMChain chain...[0m
⚡ feedback feedback_result_hash_bc6fdabde84b50adee5539f41e4b5cda on record_hash_a2919b3a75a800dcca0d166c9af7d2ed -> default.sqlite

[1m> Finished chain.[0m


{'prompt': 'This will be automatically logged.',
 'text': '\n\nThis means the data or information that you enter into the application will be securely stored and tracked. This includes any changes or updates that are made to the data or information. Logging this information helps with tracking and improving user experience, since you can analyze the information to identify potential areas for improvement. It can also help you easily access specific information and keep your system secure.'}

Feedback functions can also be logged automatically by providing them in a list to the feedbacks arg.

In [13]:
truchain = TruChain(
    chain,
    chain_id='Chain1_ChatApplication',
    feedbacks=[f_lang_match], # feedback functions
    tru=tru
)
truchain("This will be automatically logged.")

✅ record record_hash_72334ec4f5e07d9ae580f6435d6832c1 from Chain1_ChatApplication -> default.sqlite
✅ chain Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c -> default.sqlite


[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m


{'prompt': 'This will be automatically logged.',
 'text': '\n\nSure thing! All your interactions with our chatbot will be automatically logged and stored securely. This helps us improve our service and provide you with the most up to date and accurate information.'}

✅ record record_hash_cc2b453d49b048c5d224984627207dba from Chain1_ChatApplication -> default.sqlite

## Manual Logging

### Wrap with TruChain to instrument your chain

In [14]:
tc = TruChain(chain, chain_id='Chain1_ChatApplication')

`feedback_mode` is FeedbackMode.WITH_CHAIN_THREAD but `tru` was not specified. Reverting to FeedbackMode.NONE .





### Set up logging and instrumentation

Making the first call to your wrapped LLM Application will now also produce a log or "record" of the chain execution.


In [15]:
prompt_input = 'que hora es?'
gpt3_response, record = tc.call_with_record(prompt_input)



[1m> Entering new LLMChain chain...[0m
⚡ feedback feedback_result_hash_cf92f0820b94232543823684516f0dae on record_hash_cc2b453d49b048c5d224984627207dba -> default.sqlite

[1m> Finished chain.[0m


We can log the records but first we need to log the chain itself.

In [16]:
tru.add_chain(chain=truchain)

✅ chain Chain1_ChatApplication -> default.sqlite


Then we can log the record:

In [17]:
tru.add_record(record)

✅ record record_hash_f884d5a92e02dc17fb432696199662b2 from Chain1_ChatApplication -> default.sqlite


'record_hash_f884d5a92e02dc17fb432696199662b2'

### Evaluate Quality

Following the request to your app, you can then evaluate LLM quality using feedback functions. This is completed in a sequential call to minimize latency for your application, and evaluations will also be logged to your local machine.

To get feedback on the quality of your LLM, you can use any of the provided feedback functions or add your own.

To assess your LLM quality, you can provide the feedback functions to `tru.run_feedback()` in a list provided to `feedback_functions`.


In [18]:
feedback_results = tru.run_feedback_functions(
    record=record,
    feedback_functions=[f_lang_match]
)
display(feedback_results)

[FeedbackResult(feedback_result_id='feedback_result_hash_1fc6f5b6e992197dd723e5edf8e1f3a7', record_id='record_hash_f884d5a92e02dc17fb432696199662b2', chain_id='Chain1_ChatApplication', feedback_definition_id='feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c', last_ts=datetime.datetime(2023, 6, 8, 14, 10, 39, 728403), status=<FeedbackResultStatus.DONE: 'done'>, cost=Cost(n_tokens=0, cost=0.0), tags='', name='language_match', calls=[FeedbackCall(args={'text1': 'que hora es?', 'text2': '\n\nLa hora actual es la hora del reloj en la ubicación en la que te encuentras. Si deseas saber la hora exacta en la zona horaria actual, también puedes consultar la hora mundial en línea.'}, ret=0.9977105877696886)], result=0.9977105877696886, error=None)]

After capturing feedback, you can then log it to your local database.

In [19]:
f_lang_match

Feedback(implementation=Method(obj=Obj(cls=Class(name='Huggingface', module=Module(package_name='trulens_eval', module_name='trulens_eval.tru_feedback')), id=4930116176, init_kwargs={}), name='language_match'), aggregator=Function(module=Module(package_name='numpy', module_name='numpy'), cls=None, name='mean'), feedback_definition_id='feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c', selectors={'text1': JSONPath().__record__.main_input, 'text2': JSONPath().__record__.main_output}, imp=<bound method Huggingface.language_match of Huggingface(endpoint=<trulens_eval.provider_apis.Endpoint object at 0x125db9c50>)>, agg=<function mean at 0x110d7f100>)

In [20]:
feedback_results

[FeedbackResult(feedback_result_id='feedback_result_hash_1fc6f5b6e992197dd723e5edf8e1f3a7', record_id='record_hash_f884d5a92e02dc17fb432696199662b2', chain_id='Chain1_ChatApplication', feedback_definition_id='feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c', last_ts=datetime.datetime(2023, 6, 8, 14, 10, 39, 728403), status=<FeedbackResultStatus.DONE: 'done'>, cost=Cost(n_tokens=0, cost=0.0), tags='', name='language_match', calls=[FeedbackCall(args={'text1': 'que hora es?', 'text2': '\n\nLa hora actual es la hora del reloj en la ubicación en la que te encuentras. Si deseas saber la hora exacta en la zona horaria actual, también puedes consultar la hora mundial en línea.'}, ret=0.9977105877696886)], result=0.9977105877696886, error=None)]

In [21]:
tru.add_feedbacks(feedback_results)

⚡ feedback feedback_result_hash_1fc6f5b6e992197dd723e5edf8e1f3a7 on record_hash_f884d5a92e02dc17fb432696199662b2 -> default.sqlite


### Out-of-band Feedback evaluation

In the above example, the feedback function evaluation is done in the same process as the chain evaluation. The alternative approach is the use the provided persistent evaluator started via `tru.start_deferred_feedback_evaluator`. Then specify the `feedback_mode` for `TruChain` as `deferred` to let the evaluator handle the feedback functions.

For demonstration purposes, we start the evaluator here but it can be started in another process.

In [22]:
truchain: TruChain = TruChain(
    chain,
    chain_id='Chain1_ChatApplication',
    feedbacks=[f_lang_match],
    tru=tru,
    feedback_mode="deferred"
)

tru.start_evaluator()
truchain("This will be logged by deferred evaluator.")
tru.stop_evaluator()

✅ chain Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c -> default.sqlite
Looking for things to do. Stop me with `tru.stop_evaluator()`.

[1m> Entering new LLMChain chain...[0m
Starting run for row 15.
⚡ feedback feedback_result_hash_f4208386b9fa850d92c39e345800cccc on record_hash_d01c3cba3aba060589ef45a21edb29a9 -> default.sqlite
⚡ feedback feedback_result_hash_f4208386b9fa850d92c39e345800cccc on record_hash_d01c3cba3aba060589ef45a21edb29a9 -> default.sqlite

[1m> Finished chain.[0m
Evaluator stopped.
✅ record record_hash_96fec64040783dcbba8e79fe0e11fddc from Chain1_ChatApplication -> default.sqlite


# Out-of-the-box Feedback Functions
See: <https://www.trulens.org/trulens_eval/api/tru_feedback/>

## Relevance

This evaluates the *relevance* of the LLM response to the given text by LLM prompting.

Relevance is currently only available with OpenAI ChatCompletion API.

## Sentiment

This evaluates the *positive sentiment* of either the prompt or response.

Sentiment is currently available to use with OpenAI, HuggingFace or Cohere as the model provider.

* The OpenAI sentiment feedback function prompts a Chat Completion model to rate the sentiment from 1 to 10, and then scales the response down to 0-1.
* The HuggingFace sentiment feedback function returns a raw score from 0 to 1.
* The Cohere sentiment feedback function uses the classification endpoint and a small set of examples stored in `feedback_prompts.py` to return either a 0 or a 1.

## Model Agreement

Model agreement uses OpenAI to attempt an honest answer at your prompt with system prompts for correctness, and then evaluates the agreement of your LLM response to this model on a scale from 1 to 10. The agreement with each honest bot is then averaged and scaled from 0 to 1.

## Language Match

This evaluates if the language of the prompt and response match.

Language match is currently only available to use with HuggingFace as the model provider. This feedback function returns a score in the range from 0 to 1, where 1 indicates match and 0 indicates mismatch.

## Toxicity

This evaluates the toxicity of the prompt or response.

Toxicity is currently only available to be used with HuggingFace, and uses a classification endpoint to return a score from 0 to 1. The feedback function is negated as not_toxicity, and returns a 1 if not toxic and a 0 if toxic.

## Moderation

The OpenAI Moderation API is made available for use as feedback functions. This includes hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. Each is negated (ex: not_hate) so that a 0 would indicate that the moderation rule is violated. These feedback functions return a score in the range 0 to 1.

# Adding new feedback functions

Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating `trulens_eval/tru_feedback.py`. If your contributions would be useful for others, we encourage you to contribute to TruLens!

Feedback functions are organized by model provider into Provider classes.

The process for adding new feedback functions is:
1. Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class:

In [23]:
from trulens_eval import Provider

class StandAlone(Provider):
    pass

⚡ feedback feedback_result_hash_d92a5b53e3a79222f35777b3dc2385ea on record_hash_96fec64040783dcbba8e79fe0e11fddc -> default.sqlite


2. Add a new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both prompt (str) and response (str). It should return a float between 0 (worst) and 1 (best).

In [24]:
def feedback(self, text: str) -> float:
        """
        Describe how the model works

        Parameters:
            text (str): Text to evaluate.
            Can also be prompt (str) and response (str).

        Returns:
            float: A value between 0 (worst) and 1 (best).
        """
        return float

In [25]:
"a"

'a'

In [26]:
gpt3_response, record = tc.call_with_record(prompt_input)



[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m


In [27]:
record

Record(record_id='record_hash_6f1f4bfc9032e183b94a35dac69149f6', chain_id='Chain1_ChatApplication', cost=Cost(n_tokens=62, cost=0.00124), ts=datetime.datetime(2023, 6, 8, 14, 10, 48, 530952), tags='', main_input='que hora es?', main_output='\n\nLa hora actual es [ time ]. Esto se determina por la rotación de la Tierra alrededor del Sol y se mide en Zona Horaria Universal (UTC).', main_error='None', calls=[RecordChainCall(chain_stack=(RecordChainCallMethod(path=JSONPath().chain, method=MethodIdent(module_name='langchain.chains.llm', class_name='LLMChain', method_name='_call')),), args={'inputs': {'prompt': 'que hora es?'}}, rets={'text': '\n\nLa hora actual es [ time ]. Esto se determina por la rotación de la Tierra alrededor del Sol y se mide en Zona Horaria Universal (UTC).'}, error=None, start_time=datetime.datetime(2023, 6, 8, 14, 10, 45, 329934), end_time=datetime.datetime(2023, 6, 8, 14, 10, 48, 530740), pid=44277, tid=39466436)])

In [28]:
f_lang_match = Feedback(hugs.language_match).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

In [29]:
feedback_results = tru.run_feedback_functions(
    record=record,
    feedback_functions=[f_lang_match]
)

In [30]:
feedback_results

[FeedbackResult(feedback_result_id='feedback_result_hash_13b60d050cd1e0912dddb0851ca60915', record_id='record_hash_6f1f4bfc9032e183b94a35dac69149f6', chain_id='Chain1_ChatApplication', feedback_definition_id='feedback_definition_hash_ef50204d7ba1af7567641ca2fffb444c', last_ts=datetime.datetime(2023, 6, 8, 14, 10, 48, 653412), status=<FeedbackResultStatus.DONE: 'done'>, cost=Cost(n_tokens=0, cost=0.0), tags='', name='language_match', calls=[FeedbackCall(args={'text1': 'que hora es?', 'text2': '\n\nLa hora actual es [ time ]. Esto se determina por la rotación de la Tierra alrededor del Sol y se mide en Zona Horaria Universal (UTC).'}, ret=0.9979190410085721)], result=0.9979190410085721, error=None)]

In [31]:
tru.add_feedbacks(feedback_results)

⚡ feedback feedback_result_hash_13b60d050cd1e0912dddb0851ca60915 on record_hash_6f1f4bfc9032e183b94a35dac69149f6 -> default.sqlite


# Start Working Flow

In [59]:
z=truchain("This will be automatically logged.")



[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
✅ record record_hash_391870bd47cf45dd9d62a058a27e7c83 from Chain1_ChatApplication -> default.sqlite
⚡ feedback feedback_result_hash_f6374fbb47e521daebcccad1264e1eeb on record_hash_391870bd47cf45dd9d62a058a27e7c83 -> default.sqlite


In [60]:
z

{'prompt': 'This will be automatically logged.',
 'text': '\n\nThis response is referring to data which is being kept and tracked automatically. This type of data logging is commonly used in software programs or applications in which the data collected is used to track and analyze user activity, trends, or other factors. By logging certain data points, businesses and organizations can have a better understanding of how their users interact with their systems.'}

In [58]:
record.record_id
tc.chain_id

'Chain1_ChatApplication'

In [62]:
from trulens_eval.schema import FeedbackResult
# Simple Feedback add
tru.db.insert_feedback(FeedbackResult(name="thumbs up result", 
                                      record_id="record_hash_391870bd47cf45dd9d62a058a27e7c83",
                                      chain_id="Chain1_ChatApplication", 
                                      result=False))

⚡ feedback feedback_result_hash_31e6dc231f64b9274854ceaa0a1f5801 on record_hash_391870bd47cf45dd9d62a058a27e7c83 -> default.sqlite


In [54]:
tru.run_dashboard()

Starting dashboard ...


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>


  You can now view your Streamlit app in your browser.

  Network URL: http://192.168.0.168:8564
  External URL: http://96.60.0.140:8564

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog
            


openai api: 0requests [00:00, ?requests/s]
huggingface api: 0requests [00:00, ?requests/s][A

cohere api: 0requests [00:00, ?requests/s][A[A2023-06-08 16:59:36.389 Serialization of dataframe to Arrow table was unsuccessful due to: ("Could not convert <FeedbackResultStatus.NONE: 'none'> with type FeedbackResultStatus: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column status with type object'). Applying automatic fixes for column types to make the dataframe Arrow-compatible.


In [47]:
tru.db.insert_chain_id("myapp")
#tru._insert_or_replace_vals(table=self.TABLE_CHAINS, vals=("myapp",""))

✅ chain myapp -> default.sqlite


'myapp'

In [32]:
# What is a record: Needed because it is something that can be serialized by us
record

Record(record_id='record_hash_6f1f4bfc9032e183b94a35dac69149f6', chain_id='Chain1_ChatApplication', cost=Cost(n_tokens=62, cost=0.00124), ts=datetime.datetime(2023, 6, 8, 14, 10, 48, 530952), tags='', main_input='que hora es?', main_output='\n\nLa hora actual es [ time ]. Esto se determina por la rotación de la Tierra alrededor del Sol y se mide en Zona Horaria Universal (UTC).', main_error='None', calls=[RecordChainCall(chain_stack=(RecordChainCallMethod(path=JSONPath().chain, method=MethodIdent(module_name='langchain.chains.llm', class_name='LLMChain', method_name='_call')),), args={'inputs': {'prompt': 'que hora es?'}}, rets={'text': '\n\nLa hora actual es [ time ]. Esto se determina por la rotación de la Tierra alrededor del Sol y se mide en Zona Horaria Universal (UTC).'}, error=None, start_time=datetime.datetime(2023, 6, 8, 14, 10, 45, 329934), end_time=datetime.datetime(2023, 6, 8, 14, 10, 48, 530740), pid=44277, tid=39466436)])

In [36]:
from trulens_eval import Record
# Need to intiate my record
# Need to explain Records are serialized class dictionaries
# need to disseminate main_input and main_output are the first class items

# record id should be assigned
my_record = Record(record_id="1234", 
                   chain_id="myapp", 
                   main_input="This is an input query?", 
                   main_output="This is the LLM App output response", 
                   my_thumbs_field=True)

In [37]:
from trulens_eval import Provider

class StandAlone(Provider):
    pass
    def my_human_feedback(self, text: str, thumbs:bool) -> float:
        """
        Describe how the model works

        Parameters:
            text (str): Text to evaluate.
            Can also be prompt (str) and response (str).

        Returns:
            float: A value between 0 (worst) and 1 (best).
        """
        len(text) + int(thumbs)
        return float

In [38]:
class MyHugs(Huggingface):
    # https://huggingface.co/models
    # https://huggingface.co/ProsusAI/finbert
    # show hosted_ingerence_api on model card


    def my_hugs_feedback(self, text1: str, text2: str) -> float:
        HUGS_SENTIMENT_API_URL = "https://api-inference.huggingface.co/models/ProsusAI/finbert"

        def get_scores(text):
            payload = {"inputs": text}
            hf_response = self.endpoint.post(
                url=HUGS_LANGUAGE_API_URL, payload=payload, timeout=30
            )
            return {r['label']: r['score'] for r in hf_response}

        scores: AsyncResult[Dict] = TP().promise(
            get_scores, text=text1[:max_length]
        )
        return scores

        return l1

In [39]:
my_hugs = MyHugs()
my_standalone = StandAlone()

In [40]:
my_feedback_function_hugs = Feedback(my_hugs.my_hugs_feedback).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

In [41]:
my_feedback_function_standalone = Feedback(my_standalone.my_human_feedback).on(
    text=Query.RecordOutput, thumbs=Query.Record.my_thumb_field
)

In [48]:
feedback_results = tru.run_feedback_functions(
    record=my_record,
    feedback_functions=[my_feedback_function_standalone, my_feedback_function_hugs]
)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
# What is a feedback_results: Needed because it is something that can be serialized by us


In [None]:
feedback_results

In [None]:
tru.add_feedbacks(feedback_results)

# Start Ideal Flow

In [None]:
from trulens_eval import Provider

class StandAlone(Provider):
    pass

In [None]:
def feedback(self, text: str) -> float:
        """
        Describe how the model works

        Parameters:
            text (str): Text to evaluate.
            Can also be prompt (str) and response (str).

        Returns:
            float: A value between 0 (worst) and 1 (best).
        """
        return float

In [None]:
my_feedback_function_standalone = Feedback(standalone.my_human_feedback).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

In [None]:
my_feedback_function_hugs = Feedback(hugs.my_hugs_feedback).on(
    text1=Query.RecordInput, text2=Query.RecordOutput
)

In [None]:
feedback_results = tru.run_feedback_functions(
    record=record,
    feedback_functions=[my_feedback_function]
)

In [None]:
tru.add_feedbacks(feedback_results)

In [None]:
# See dashboard