Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Running on existing data #1215

Open
funkjo opened this issue Jun 14, 2024 · 7 comments
Open

[FEAT] Running on existing data #1215

funkjo opened this issue Jun 14, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@funkjo
Copy link

funkjo commented Jun 14, 2024

Feature Description
Easier way to obtain feedback results when running feedbacks through virtual recorder and virtual app

Reason
I am following the documentation here and after running

for record in data:
    virtual_recorder.add_record(rec)

I am unable to figure out how to obtain the results of the question relevance feedback function

Importance of Feature
The value this unlocks is the ability to very easily run feedback functions on large amounts of data after they have been logged, rather than during runtime of LLM application.

@funkjo funkjo added the enhancement New feature or request label Jun 14, 2024
@sfc-gh-jreini
Copy link
Contributor

Hey @funkjo - you can wait for computation and then display feedback results using the following:

for feedback, feedback_result in rec.wait_for_feedback_results().items():
    print(feedback.name, feedback_result.result)

Does that help?

@sfc-gh-jreini sfc-gh-jreini self-assigned this Jun 14, 2024
@sfc-gh-jreini sfc-gh-jreini added the question Further information is requested label Jun 14, 2024
@funkjo
Copy link
Author

funkjo commented Jun 18, 2024

@sfc-gh-jreini thanks for the comment. Not seeing any results when trying that out. Code runs but prints nothing.

Here is the full code FYI:

import pandas as pd

data = {
    'prompt': ['Where is Germany?', 'What is the capital of France?'],
    'response': ['Germany is in Europe', 'The capital of France is Paris'],
    'context': ['Germany is a country located in Europe.', 'France is a country in Europe and its capital is Paris.']
}
df = pd.DataFrame(data)
df.head()

from trulens_eval import Select
from trulens_eval.tru_virtual import VirtualApp
from trulens_eval.tru_virtual import VirtualRecord

virtual_app = dict(
    llm=dict(
        modelname="some llm component model name"
    ),
    template="information about the template I used in my app",
    debug="all of these fields are completely optional"
)

virtual_app = VirtualApp(virtual_app) # can start with the prior dictionary
virtual_app[Select.RecordCalls.llm.maxtokens] = 1024

retriever_component = Select.RecordCalls.retriever
virtual_app[retriever_component] = "this is the retriever component"

context_call = retriever_component.get_context

data_dict = df.to_dict('records')

data = []

for record in data_dict:
    rec = VirtualRecord(
        main_input=record['prompt'],
        main_output=record['response'],
        calls=
            {
                context_call: dict(
                    args=[record['prompt']],
                    rets=[record['context']]
                )
            }
        )
    data.append(rec)


from trulens_eval.feedback.provider import OpenAI
from trulens_eval.feedback.feedback import Feedback
from trulens_eval.feedback.provider.openai import AzureOpenAI as fAzureOpenAI
from azure.identity import EnvironmentCredential
from dotenv import load_dotenv
load_dotenv()
import os


# Get token so you don't need API key anymore in env
def getToken():
    """ gets the AzureAD token from AzureAD """
    credential = EnvironmentCredential()
    token = credential.get_token("get token url")
    print("token acquired ")
    return token

fopenai = fAzureOpenAI(
                        api_key=getToken().token,
                        azure_deployment=os.environ["CHATGPT_DEPLOYMENT"],
                        azure_endpoint=os.environ["OPENAI_URL"],
                        deployment_name=os.environ["CHATGPT_MODEL"],
                        api_version=os.environ["OPENAI_API_VERSION"]
)

# Select context to be used in feedback. We select the return values of the
# virtual `get_context` call in the virtual `retriever` component. Names are
# arbitrary except for `rets`.
context = context_call.rets[:]

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(fopenai.qs_relevance)
    .on_input()
    .on(context)

from trulens_eval.tru_virtual import TruVirtual

virtual_recorder = TruVirtual(
    app_id="a virtual app",
    app=virtual_app,
    feedbacks=[f_context_relevance]
)

temp = []

for record in data:
    result = virtual_recorder.add_record(record=record, feedback_mode=f_context_relevance)
    temp.append(result)

for feedback, feedback_result in rec.wait_for_feedback_results().items():
    print(feedback.name, feedback_result.result)
    print('hello')

@funkjo
Copy link
Author

funkjo commented Jun 18, 2024

@sfc-gh-jreini when I print the output of add_record that I saved in a temp list, I see the following:

[VirtualRecord(record_id='record_hash_77668b896f8e4bbb87eb94da85fa8a2e', app_id='a virtual app', cost=Cost(n_requests=0, n_successful_requests=0, n_classes=0, n_tokens=0, n_stream_chunks=0, n_prompt_tokens=0, n_completion_tokens=0, cost=0.0), perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202525), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202526)), ts=datetime.datetime(2024, 6, 18, 13, 4, 55, 202525), tags='', meta=None, main_input='Where is Germany?', main_output='Germany is in Europe', main_error=None, calls=[RecordAppCall(call_id='85f88090-6962-4db7-9bb1-b97bc3c51ecd', stack=[RecordAppCallMethod(path=Lens(), method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='root')), RecordAppCallMethod(path=Lens().app.retriever, method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='get_context'))], args=['Where is Germany?'], rets=['Germany is a country located in Europe.'], error=None, perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202525), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202526)), pid=0, tid=0), RecordAppCall(call_id='b123973a-4b42-4c96-85c4-65b01eeefec3', stack=[RecordAppCallMethod(path=Lens(), method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='root'))], args=['Where is Germany?'], rets=['Germany is in Europe'], error=None, perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202525), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 202526)), pid=0, tid=0)], feedback_and_future_results=None, feedback_results=None),
 VirtualRecord(record_id='record_hash_3a6ffcbb0f012c7572f6d463f409a9ca', app_id='a virtual app', cost=Cost(n_requests=0, n_successful_requests=0, n_classes=0, n_tokens=0, n_stream_chunks=0, n_prompt_tokens=0, n_completion_tokens=0, cost=0.0), perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208946), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208947)), ts=datetime.datetime(2024, 6, 18, 13, 4, 55, 208946), tags='', meta=None, main_input='What is the capital of France?', main_output='The capital of France is Paris', main_error=None, calls=[RecordAppCall(call_id='66de71b8-2931-4f3f-8d07-808b9b81263d', stack=[RecordAppCallMethod(path=Lens(), method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='root')), RecordAppCallMethod(path=Lens().app.retriever, method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='get_context'))], args=['What is the capital of France?'], rets=['France is a country in Europe and its capital is Paris.'], error=None, perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208946), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208947)), pid=0, tid=0), RecordAppCall(call_id='6db08e44-694e-4fcd-b4a6-49c15bf1b898', stack=[RecordAppCallMethod(path=Lens(), method=Method(obj=Obj(cls=trulens_eval.tru_virtual.VirtualApp, id=0, init_bindings=None), name='root'))], args=['What is the capital of France?'], rets=['The capital of France is Paris'], error=None, perf=Perf(start_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208946), end_time=datetime.datetime(2024, 6, 18, 13, 4, 55, 208947)), pid=0, tid=0)], feedback_and_future_results=None, feedback_results=None)]

at the very end, feedback_results=None

@sfc-gh-jreini
Copy link
Contributor

sfc-gh-jreini commented Jun 19, 2024

I think the issue here might be your setting of feedback mode. Feedback mode should be set when you set up the recorder, andf_context_relevance is not a valid value for this argument

for record in data:
    result = virtual_recorder.add_record(record=record, feedback_mode=f_context_relevance)
    temp.append(result)

To run feedbacks immediately when the record is added, this argument can be left out altogether.

for record in data:
    result = virtual_recorder.add_record(record=record)
    temp.append(result)

To run feedbacks in deferred mode, you can add it to the recorder setup.

virtual_recorder = TruVirtual(
    app_id="a virtual app",
    app=virtual_app,
    feedbacks=[f_context_relevance, f_groundedness, f_qa_relevance],
    feedback_mode = "deferred" # optional
)

for record in data:
    result = virtual_recorder.add_record(record=record)
    temp.append(result)
    
tru.start_evaluator()

Read more about feedback mode

@funkjo
Copy link
Author

funkjo commented Jun 20, 2024

I made the suggested change and removed feedback_mode parameter in add_record

for feedback, feedback_result in rec.wait_for_feedback_results().items():
    print(feedback.name, feedback_result.result)
    print('hello')

output:
qs_relevance None
hello

rec.wait_for_feedback_results().items() has following output:

dict_items([(FeedbackDefinition(qs_relevance,
	selectors={'question': Lens().__record__.main_input, 'context': Lens().__record__.app.retriever.get_context.rets[:]},
	if_exists=None
), qs_relevance (FeedbackResultStatus.FAILED) = None
)])

@sfc-gh-jreini
Copy link
Contributor

Thanks @funkjo - was able to replicate this. Will work on a solution this week.

@funkjo
Copy link
Author

funkjo commented Jun 28, 2024

thanks @sfc-gh-jreini any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants