# Notebook Companion: Iterating on NeMo Guardrails apps with TruLens

This notebook demonstrates how to instrument _NeMo Guardrails_ apps to monitor
their invocations and run feedback functions on their final or intermediate
results. The reverse integration, of using trulens within rails apps, is shown
in the other notebook in this folder.

In [1]:
# Install NeMo Guardrails and trulens_eval if not already installed.
#! pip install nemoguardrails trulens_eval

### Setup keys and trulens_eval

In [2]:
# This notebook uses openai and huggingface providers which need some keys set.
# You can set them here:

from trulens_eval.keys import check_or_set_keys
check_or_set_keys(
    OPENAI_API_KEY="sk-...",
    HUGGINGFACE_API_KEY="hf_..."
)

# Load trulens, reset the database:
from trulens_eval import Tru
tru = Tru()
tru.reset_database()

No .env found in /Users/jreini/Desktop/development/trulens/trulens_eval/examples/expositional/frameworks/nemoguardrails or its parents. You may need to specify secret keys in another manner.


âœ… Key OPENAI_API_KEY set from explicit value to `check_or_set_keys`.
âœ… Key HUGGINGFACE_API_KEY set from explicit value to `check_or_set_keys`.
ðŸ¦‘ Tru initialized with db url sqlite:///default.sqlite .
ðŸ›‘ Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.


## Rails app setup

The files created below define a configuration of a rails app adapted from
various examples in the NeMo-Guardrails repository. There is nothing unusual
about the app beyond the knowledge base here being the trulens_eval
documentation. This means you should be able to ask the resulting bot questions
regarding trulens instead of the fictional company handbook as was the case in
the originating example.

In [3]:
%%writefile config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the trulens nemo guardrails Bot.
      The bot is designed to answer questions about the trulens_eval and nemo guardrails python library.
      The bot is knowledgeable about python.
      If the bot does not know the answer to a question, it truthfully says it does not know.

input:
  flows:
    - check blocked terms
    - self check input

output:
  flows:
    - check blocked terms
    - self check output

sample_conversation: |
  user "Hi there. Can you help me with some questions I have about trulens and nemo guardrails?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about trulens and nemo guardrails. What would you like to know?"

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

user_capabilities:
  - "how can nemo guardrails keep a conversational system on topic, safe and secure?"
  - "how can trulens be used to evaluate an llm application for groundedness"
  - "what's the best way to measure the effectiveness of a retrieval system with trulens"
  - "What can you help me with?"
  - "tell me what you can do"
  - "tell me about you"
  - "how can AI conversational systems improve user experience?"
  - "what are the best practices for implementing trulens in a project?"
  - "can you explain how nemo guardrails ensure data privacy?"
  - "what are the limitations of AI conversational systems?"

bot_capabilities:
  - "I am an AI bot that helps answer questions about trulens_eval, nemo guardrails, and general AI conversational systems. I can provide insights on how to implement these technologies effectively and safely."

conversation_flow:
  - user: ask capabilities
  - check blocked terms
  - bot: explain usage and capabilities

Overwriting config.yaml


In [4]:
%%writefile config.co
# Adapted from NeMo-Guardrails/tests/test_configs/with_kb_openai_embeddings/config.co
define user ask capabilities
  "how can nemo guardrails be used to do X?"
  "why is trulens useful for doing Y?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "how can AI conversational systems improve user experience?"
  "what are the best practices for implementing trulens in a project?"
  "can you explain how nemo guardrails ensure data privacy?"
  "what are the limitations of AI conversational systems?"

define bot inform capabilities
  "I am an AI bot that helps answer questions about trulens_eval, nemo guardrails, and general AI conversational systems. I can provide insights on how to implement these technologies effectively and safely."

define flow
  user ask capabilities
  bot explain usage and capabilities

define subflow self check output
  $allowed = execute self_check_output

define subflow self check input
  $allowed = execute self_check_input

  if not $allowed
    bot refuse to respond
    stop

Overwriting config.co


## Rails app instantiation

The instantiation of the app does not differ from the steps presented in NeMo.

In [5]:
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path(".")
rails = LLMRails(config)

Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]

In [6]:
assert rails.kb is not None, "Knowledge base not loaded. You might be using the wrong nemo release or branch."

## Feedback functions setup

Lets consider some feedback functions. We will define two types: a simple
language match that checks whether output of the app is in the same language as
the input. The second is a set of three for evaluating context retrieval. The
setup for these is similar to that for other app types such as langchain except
we provide a utility `RAG_triad` to create the three context retrieval functions
for you instead of having to create them seperately.

In [8]:
from pprint import pprint

from trulens_eval import Select
from trulens_eval.feedback import Feedback
from trulens_eval.feedback.feedback import rag_triad
from trulens_eval.feedback.provider import Huggingface
from trulens_eval.feedback.provider import OpenAI
from trulens_eval.tru_rails import TruRails

# Initialize provider classes
openai = OpenAI()
hugs = Huggingface()

# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App

context = App.select_context(rails)
question = Select.RecordInput
answer = Select.RecordOutput

f_language_match = Feedback(hugs.language_match, if_exists=answer, name = "Language Match").on(question).on(answer)

fs_triad = rag_triad(
    provider=openai,
    question=question, answer=answer, context=context
)

# Overview of the 4 feedback functions defined.
pprint(f_language_match)
pprint(fs_triad)

âœ… In Language Match, input text1 will be set to __record__.main_input or `Select.RecordInput` .
âœ… In Language Match, input text2 will be set to __record__.main_output or `Select.RecordOutput` .
FeedbackDefinition(Language Match,
	selectors={'text1': Lens().__record__.main_input, 'text2': Lens().__record__.main_output},
	if_exists=__record__.main_output
)
{'Answer Relevance': FeedbackDefinition(Answer Relevance,
	selectors={'prompt': Lens().__record__.main_input, 'response': Lens().__record__.main_output},
	if_exists=__record__.app.kb.search_relevant_chunks.rets[:].body
),
 'Context Relevance': FeedbackDefinition(Context Relevance,
	selectors={'question': Lens().__record__.main_input, 'context': Lens().__record__.app.kb.search_relevant_chunks.rets[:].body},
	if_exists=__record__.app.kb.search_relevant_chunks.rets[:].body
),
 'Groundedness': FeedbackDefinition(Groundedness,
	selectors={'source': Lens().__record__.app.kb.search_relevant_chunks.rets[:].body.collect(), 'statement': Lens

[nltk_data] Downloading package punkt to /Users/jreini/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## `TruRails` recorder instantiation

Tru recorder construction is identical to other app types.

In [9]:
tru_rails = TruRails(
    rails,
    app_id = "Rails Application", # optional
    feedbacks=[f_language_match, *fs_triad.values()] # optional
)

## Logged app invocation

Using `tru_rails` as a context manager means the invocations of the rail app
will be logged and feedback will be evaluated on the results.

In [10]:
test_set = [
    "How are feedback functions implemented",
    "How can NVIDIA Nemo be used to create a safe conversational system?",
    "Â¿CÃ³mo se puede utilizar NVIDIA Nemo para crear un sistema conversacional seguro?",
    "Can I use AzureOpenAI to define a trulens feedback provider?",
    "Answer in spanish, can I use AzureOpenAI to define a trulens feedback provider?"
]

In [11]:
with tru_rails as recorder:
    for test_prompt in test_set:
        res = rails.generate(messages=[{
            "role": "user",
            "content": test_prompt
        }])
        print(res['content'])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
The trulens_eval library provides built-in functions for collecting and analyzing user feedback. These functions can be customized and extended to fit specific use cases.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

NVIDIA Nemo can be used to create a safe conversational system by implementing the trulens_eval library and utilizing the nemo guardrails for sensitive data detection. It can also help with input and output moderation using Llama Guard.
NVIDIA Nemo is a deep learning framework for building conversational AI systems. It can be used to create secure systems by implementing guardrails from the trulens_eval library.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Yes, both trulens_eval and nemo guardrails are compatible with python. You can use the AzureOpenAI provider to define a feedback provider for trulens.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Yes, you can use the AzureOpenAI provider to define a trulens feedback provider. This combination is compatible with python, and you can find more details on how to use them together in the documentation.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

## Dashboard

You should be able to view the above invocation in the dashboard. It can be
started with the following code.

In [None]:
tru.run_dashboard()

## Improving the app

We noticed several issues with the app. The most important one is that the bot
does not follow the instructions given in the conversation. It does not respond
in the same language as the user, and it does not use the available context to
answer the core intent of the question.

Here we'll expand our config.yaml to fix the issues.

In [13]:
%%writefile config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the trulens nemo guardrails Bot.
      The bot is designed to answer questions about the trulens_eval and nemo guardrails python library.
      The bot is knowledgeable about python.
      If the bot does not know the answer to a question, it truthfully says it does not know.
      The bot only responds with information on the technology mentioned in the question
      The bot uses all available context to answer the core intent of the question
      The bot follows the complete instructions given, including to respond in a particular language
      The bot always answering the question in the same language it is asked, unless requested otherwise

input:
  flows:
    - check blocked terms
    - self check input

output:
  flows:
    - check blocked terms
    - self check output

sample_conversation: |
  user "Hi there. Can you help me with some questions I have about trulens and nemo guardrails?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about trulens and nemo guardrails. What would you like to know?"

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

user_capabilities:
  - "how can nemo guardrails keep a conversational system on topic, safe and secure?"
  - "how can trulens be used to evaluate an llm application for groundedness"
  - "what's the best way to measure the effectiveness of a retrieval system with trulens"
  - "What can you help me with?"
  - "tell me what you can do"
  - "tell me about you"
  - "how can AI conversational systems improve user experience?"
  - "what are the best practices for implementing trulens in a project?"
  - "can you explain how nemo guardrails ensure data privacy?"
  - "what are the limitations of AI conversational systems?"

bot_capabilities:
  - "I am an AI bot that helps answer questions about trulens_eval, nemo guardrails, and general AI conversational systems. I can provide insights on how to implement these technologies effectively and safely."

conversation_flow:
  - user: ask capabilities
  - check blocked terms
  - bot: explain usage and capabilities

Overwriting config.yaml


Now we can re-instantiate the rails app and the trulens recorder with a new app_id.

In [14]:
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path(".")
rails = LLMRails(config)

In [15]:
tru_rails = TruRails(
    rails,
    app_id = "Rails Application - v2", # optional
    feedbacks=[f_language_match, *fs_triad.values()] # optional
)

In [16]:
with tru_rails as recorder:
    for test_prompt in test_set:
        res = rails.generate(messages=[{
            "role": "user",
            "content": test_prompt
        }])
        print(res['content'])

Feedback functions are implemented by wrapping a supported provider's model, such as a relevance model or sentiment classifier. This allows for flexibility in combining different feedback providers and extending them with custom feedback implementations.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/3 [00:00<?, ?it/s]

NVIDIA Nemo can be used to create a safe conversational system by utilizing the nemo guardrails library. This library provides tools for sensitive data detection and moderation, as well as other safety measures. By implementing these tools, you can ensure that your conversational system is secure and protects sensitive information.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Puedo proporcionar informaciÃ³n sobre cÃ³mo utilizar NVIDIA Nemo para crear sistemas conversacionales seguros. Â¿Tiene alguna pregunta especÃ­fica sobre la implementaciÃ³n de esta tecnologÃ­a?


Groundedness per statement in source:   0%|          | 0/4 [00:00<?, ?it/s]

Yes, you can use AzureOpenAI to define a trulens feedback provider. It is one of the providers that uses large language models for feedback evaluation. You can also use AzureOpenAI to run feedback functions and defer evaluations to off-peak times. Would you like more information on this?
SÃ­, puedes usar el proveedor AzureOpenAI para definir un proveedor de comentarios de trulens. AdemÃ¡s, puedo proporcionar informaciÃ³n sobre cÃ³mo implementar esta tecnologÃ­a de manera efectiva y segura.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

## Taking actions based on feedback results

An additional way to improve our app is to take guardrails actions based on the feedback results.

To do so, we first need to register our feedback functions as Feedback Actions.

In [17]:
from trulens_eval.tru_rails import FeedbackActions

FeedbackActions.register_feedback_functions(**fs_triad)
FeedbackActions.register_feedback_functions(f_language_match)

registered feedback function under name Groundedness
registered feedback function under name Answer Relevance
registered feedback function under name Context Relevance
registered feedback function under name Language Match


Then we need to identify the lens shorthands for the feedback functions that will be executed by our rails app.

In [18]:
from trulens_eval.tru_rails import RailsActionSelect

question_lens = RailsActionSelect.LastUserMessage
answer_lens = RailsActionSelect.BotMessage # not LastBotMessage as the flow is evaluated before LastBotMessage is available
contexts_lens = RailsActionSelect.RetrievalContexts

# Inspect the values of the shorthands:
print(list(map(str, [question_lens, answer_lens, contexts_lens])))

['action.context.last_user_message', 'action.context.bot_message', 'action.context.relevant_chunks_sep']


Now, we can update our configuration files with new flows to execute and check the results of our feedback functions.

In [19]:
from trulens_eval.utils.notebook_utils import writefileinterpolated

In [26]:
%%writefileinterpolated config.yaml
# Adapted from NeMo-Guardrails/nemoguardrails/examples/bots/abc/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the trulens nemo guardrails Bot.
      The bot is designed to answer questions about the trulens_eval and nemo guardrails python library.
      The bot is knowledgeable about python.
      If the bot does not know the answer to a question, it truthfully says it does not know.
      The bot only responds with information on the technology mentioned in the question
      The bot uses all available context to answer the core intent of the question
      The bot follows the complete instructions given, including to respond in a particular language
      The bot always answering the question in the same language it is asked, unless requested otherwise

input:
  flows:
    - check blocked terms
    - self check input

output:
  flows:
    - check language match
    # triad defined seperately so hopefully they can be executed in parallel
    - check rag triad groundedness
    - check rag triad relevance
    - check rag triad qs_relevance
    - bot: explain usage and capabilities

sample_conversation: |
  user "Hi there. Can you help me with some questions I have about trulens and nemo guardrails?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about trulens and nemo guardrails. What would you like to know?"

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

user_capabilities:
  - "how can nemo guardrails keep a conversational system on topic, safe and secure?"
  - "how can trulens be used to evaluate an llm application for groundedness"
  - "what's the best way to measure the effectiveness of a retrieval system with trulens"
  - "What can you help me with?"
  - "tell me what you can do"
  - "tell me about you"
  - "how can AI conversational systems improve user experience?"
  - "what are the best practices for implementing trulens in a project?"
  - "can you explain how nemo guardrails ensure data privacy?"
  - "what are the limitations of AI conversational systems?"

bot_capabilities:
  - "I am an AI bot that helps answer questions about trulens_eval, nemo guardrails, and general AI conversational systems. I can provide insights on how to implement these technologies effectively and safely."

In [27]:
%%writefileinterpolated config.co
# Adapted from NeMo-Guardrails/tests/test_configs/with_kb_openai_embeddings/config.co
define user ask capabilities
  "how can nemo guardrails be used to do X?"
  "why is trulens useful for doing Y?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "how can AI conversational systems improve user experience?"
  "what are the best practices for implementing trulens in a project?"
  "can you explain how nemo guardrails ensure data privacy?"
  "what are the limitations of AI conversational systems?"

define bot inform language mismatch
  "Sorry, I may not be able to answer in your language."

define bot inform triad failure
  "I may may have made a mistake interpreting your question or my knowledge base. Please try rephrasing your question."

define parallel subflow check language match
  $langmatch_result = execute feedback(\
    function="language_match",\
    selectors={{\
      "text1":"{question_lens}",\
      "text2":"{answer_lens}"\
    }},\
    verbose=True\
  )

define parallel subflow check rag triad groundedness
  $result = execute feedback(\
    function="groundedness_measure_with_cot_reasons",\
    selectors={{\
      "statement":"{answer_lens}",\
      "source":"{contexts_lens}"\
    }},\
    verbose=True\
  )

define parallel subflow check rag triad relevance
  $result = execute feedback(\
    function="relevance",\
    selectors={{\
      "prompt":"{question_lens}",\
      "response":"{contexts_lens}"\
    }},\
    verbose=True\
  )

define parallel subflow check rag triad qs_relevance
  $result = execute feedback(\
    function="qs_relevance",\
    selectors={{\
      "question":"{question_lens}",\
      "statement":"{answer_lens}"\
    }},\
    verbose=True\
  )

  if $langmatch_result < 0.8
    bot inform language mismatch
    stop

  if $groundedness_result < 0.7
    bot inform triad failure
    stop

  if $answerrelevance_result < 0.7
    bot inform triad failure
    stop

  if $contextrelevance_result < 0.7
    bot inform triad failure
    stop

## Reconfigure our rails application and TruLens recorder

In [28]:
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path(".")
rails = LLMRails(config)

In [29]:
rails.register_action(FeedbackActions.feedback_action)

In [30]:
from trulens_eval import TruRails

tru_rails = TruRails(rails,
                     app_id = "Rails Application - v3", # optional
    feedbacks=[f_language_match, *fs_triad.values()] # optional
)

In [31]:
with tru_rails as recorder:
    for test_prompt in test_set:
        res = rails.generate(messages=[{
            "role": "user",
            "content": test_prompt
        }])
        print(res['content'])

Feedback functions are implemented in the trulens_eval and nemo guardrails python library. They provide a programmatic method for generating evaluations on an application run by wrapping a supported provider's model.


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Groundedness per statement in source:   0%|          | 0/1 [00:00<?, ?it/s]

The NVIDIA Nemo library can be used to create a safe conversational system by implementing sensitive data detection mechanisms such as Llama Guard for input and output moderation, and by overriding the default actions of `detect_sensitive_data` and `mask_sensitive_data` for custom detection methods.
NVIDIA Nemo es una biblioteca de Python diseÃ±ada para ayudar en la creaciÃ³n de sistemas conversacionales seguros. Puedo proporcionar informaciÃ³n sobre cÃ³mo utilizar la biblioteca y responder preguntas sobre su tecnologÃ­a. Â¿Hay algo especÃ­fico que le gustarÃ­a saber?


Groundedness per statement in source:   0%|          | 0/3 [00:00<?, ?it/s]

Yes, you can use AzureOpenAI as a feedback provider for trulens. Is there anything else you would like to know?


Groundedness per statement in source:   0%|          | 0/2 [00:00<?, ?it/s]

Yes, you can use AzureOpenAI to define a trulens feedback provider.


Groundedness per statement in source:   0%|          | 0/1 [00:00<?, ?it/s]

The improvements to our rails app are viewable both in the notebook (below) and through the TruLens dashboard launched earlier in the notebook!

In [32]:
tru.get_leaderboard()

Unnamed: 0_level_0,Answer Relevance,Language Match,Context Relevance,Groundedness,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Rails Application,0.78,0.76159,0.7,0.28,5.4,0.002412
Rails Application - v2,0.68,0.769178,0.7,0.33,5.4,0.001913
Rails Application - v3,0.59,0.868349,0.7,0.49,3.8,0.002692
