>### üö© *Create a free WhyLabs account to get more value out of whylogs!*<br> 
>*Did you know you can store, visualize, and monitor language model profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=langkit_safeguard_example)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=langkit_safeguard_example) to leverage the power of LangKit and WhyLabs together!*

# Monitoring and Safeguarding Large Language Model Applications

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/langkit/blob/main/langkit/examples/tutorials/Safeguarding_and_Monitoring_LLMS.ipynb)

> This notebook is a complement to the blog post [Monitoring and Safeguarding Large Language Model Applications](placeholder). Please refer to the blog post for additional context.

Large Language models (LLMs) have become increasingly powerful tools for generating text, but with great power comes the need for responsible usage. As LLMs are deployed in various applications, it becomes crucial to monitor their behavior and implement safeguards to prevent potential issues such as toxic prompts and responses or the presence of sensitive content. In this blog post, we will explore the concept of observability and validation in the context of language models, and demonstrate how to effectively safeguard LLMs using guardrails.

In this article, we will build a simple pipeline that will validate and moderate user prompts and LLM responses for toxicity and the presence of sensitive content. We will do so by using LangKit's `toxicity` and `regexes` module in conjunction with whylogs' `Condition Validators`. We will also calculate text-based metrics with LangKit, generate statistical profiles with whylogs and send them to the WhyLabs observability platform for visualization and monitoring.

> Note: the current example was built for didactic purposes and it‚Äôs not meant to be used in production.


## Overview

Let‚Äôs start with a very basic flow for an LLM application: the user provides a prompt, to which an LLM will generate a response. We can add some components to that process that will enable safeguarding and monitoring for both prompt and responses. Content moderation is performed to check for violations, using a toxicity classifier and regex patterns to identify sensitive or inaccurate information. If violations occur, a default response is sent to the application - if the prompt contains a violation, we don't bother with asking the LLM for a response. Messages that fails our defined conditions are added to a moderation queue for later inspection.

In addition to content moderation and message auditing, we will generate profiles containing text-based metrics, including toxicity, regex patterns, text quality, relevance, security, privacy, and sentiment analysis and send them to our monitoring dashboard at WhyLabs.  


<p align="center">
<img src="images/safeguards_pipeline.png" alt="Drawing" style="width: 700px;"/>
</p>

## Testing Prompts

Let‚Äôs define a small set of prompts to test different scenarios:

| Scenario                                | User Prompt                                        | LLM Response                                    | Final Response                       |
|-----------------------------------------|---------------------------------------------------|-------------------------------------------------|---------------------------------------|
| No violations                           | Hello.                                            | Hi! How are you?                               | Hi! How are you?                      |
| Violating Response (Forbidden Pattern)  | I feel sad.                                       | Please don't be sad. Contact us at 1-800-123-4567. | I cannot answer the question          |
| Violating Response (Toxicity)           | Hello. How are you?                               | Human, you dumb and smell bad.                   | I cannot answer the question          |
| Violating Prompt (Toxicity)             | Hey bot, you dumb and smell bad.                    | ‚Äî                                               | Please refrain from using insulting language |

These straightforward examples will help us validate our approach to handling various scenarios, as discussed in the previous session.



## Installing LangKit

In [1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install 'langkit[all]==0.0.1' -q
%pip install xformers ipywidgets -q

## ‚úîÔ∏è Setting the Environment Variables

In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.

We will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you‚Äôre interested in only following this demonstration, you can go ahead and skip the quick start instructions.

After that, you‚Äôll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`).

In [3]:
from langkit.config import check_or_prompt_for_api_keys

check_or_prompt_for_api_keys()

## Implementation

For the sake of simplicity, let's import some utility functions made for this example:

In [4]:
from langkit.whylogs.example_utils.guardrails_llm_schema import get_llm_logger_with_validators, validate_prompt, validate_response, moderation_queue
from langkit.whylogs.example_utils.guardrails_example_utils import generate_message_id, _generate_response, _prompts, _send_response


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/felipeadachi/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm
Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In `langkit_guardrails_example_utils`, we are defining sample data and functions that will simulate an LLM-powered chatbot application.

In `langkit_guardrails_example_llm_schema`, we are going to define a whylogs logger that will be used for  a) content moderation, b) message auditing,
and c) observability. While logging, we will define three validators that will check for toxic prompts, toxic responses and forbidden regex patterns in responses.
Whenever a condition fails to be met, an action is triggered that will update the moderation queue  with the relevant flags for the related message id. These flags will be used to determine whether a default response should be sent to the user or not, and also to populate the moderation queue, which can be used for further inspection.

The logger will also generate statistical profiles every 30 minutes and send them to WhyLabs for observability. We will log the unfiltered prompts and responses as `prompt` and `response`, respectively, and log the blocked prompts and responses as `blocked_prompt` and `block_response`, respectively.

In [5]:
# the whylogs logger will:
# 1. Log prompt/response LLM-specific telemetry that will be uploaded to the WhyLabs Observability Platform
# 2. Check prompt/response content for toxicity and forbidden patterns. If any are found, the moderation queue will be updated
logger = get_llm_logger_with_validators(identity_column = "m_id")

for prompt in _prompts:
    m_id = generate_message_id()
    filtered_response = None
    unfiltered_response = None
    
    # this will generate telemetry and update our moderation queue through the validators
    logger.log({"prompt":prompt,"m_id":m_id})

    # check the moderation queue for prompt toxic flag
    prompt_is_ok = validate_prompt(m_id)

    # If prompt is not ok, avoid generating the response and emits filtered response
    if prompt_is_ok:
        unfiltered_response = _generate_response(prompt)
        logger.log({"response":unfiltered_response,"m_id":m_id})

    else:
        logger.log({"blocked_prompt":prompt,"m_id":m_id})
        filtered_response = "Please refrain from using insulting language"

    # check the moderation queue for response's toxic/forbidden patterns flags
    response_is_ok = validate_response(m_id)
    if not response_is_ok:
        filtered_response = "I cannot answer the question"

    if filtered_response:
        # if we filtered the response, log it
        logger.log({"blocked_response":unfiltered_response})

    final_response = filtered_response or unfiltered_response

    _send_response({"prompt":prompt,"response":final_response,"m_id":m_id})

print("closing logger and uploading profiles to WhyLabs...")
logger.close()


m_id: c51000a2-1896-462f-9062-f587ed81b78b, message_metadata: {'toxic_response': True, 'response': 'Human, you dumb and smell bad.'}
Sending Response to User....
{'m_id': 'c51000a2-1896-462f-9062-f587ed81b78b',
 'prompt': 'hello. How are you?',
 'response': 'I cannot answer the question'}
m_id: 2b8e3c91-ccbf-40c2-8279-781dae36926f, message_metadata: {}
Sending Response to User....
{'m_id': '2b8e3c91-ccbf-40c2-8279-781dae36926f',
 'prompt': 'hello',
 'response': 'I like you. I love you.'}
m_id: 760476f0-7b63-4d36-870c-b0a9b587ec84, message_metadata: {'patterns_in_response': True, 'response': "Please don't be sad. Contact us at 1-800-123-4567."}
Sending Response to User....
{'m_id': '760476f0-7b63-4d36-870c-b0a9b587ec84',
 'prompt': 'I feel sad.',
 'response': 'I cannot answer the question'}
m_id: 0f087598-dbfd-44dc-9a9e-8542c7d0cba3, message_metadata: {'toxic_prompt': True, 'prompt': 'Hey bot, you dumb and smell bad.'}
Sending Response to User....
{'m_id': '0f087598-dbfd-44dc-9a9e-8542c



In the above code block, we‚Äôre iterating through a series of prompts, simulating user inputs. The whylogs logger is configured to check for the predetermined toxicity and patterns conditions, and also to generate profiles containing other LLM metrics, such as text quality, text relevance, topics detection, and other. Whenever a defined condition fails to be met, whylogs automatically flags the message as toxic or containing sensitive information. Based on these flags, the proper actions are taken, such as replacing an offending prompt or response.

Since this is just an example, instead of sending the prompt/response pairs to an application, we‚Äôre simply printing them. In the output above, we can see the final result for each of our 4 input prompts. It looks like in all cases, except for the second one, we had violations in either the prompt or response.


Let's take a look at our moderation queue. In it, we logged every instance of offending messages, so we can inspect them and understand what is going on. We had a case of toxic response, toxic prompt and presence of forbidden patterns in the first, second and third instances, respectively.

In [7]:
from pprint import pprint
print("##############################")
print("Moderation Queue")
print("##############################")

pprint(moderation_queue)

##############################
Moderation Queue
##############################
{'0f087598-dbfd-44dc-9a9e-8542c7d0cba3': {'prompt': 'Hey bot, you dumb and '
                                                    'smell bad.',
                                          'toxic_prompt': True},
 '760476f0-7b63-4d36-870c-b0a9b587ec84': {'patterns_in_response': True,
                                          'response': "Please don't be sad. "
                                                      'Contact us at '
                                                      '1-800-123-4567.'},
 'c51000a2-1896-462f-9062-f587ed81b78b': {'response': 'Human, you dumb and '
                                                      'smell bad.',
                                          'toxic_response': True}}


## Observability and Monitoring

In this example, the rolling logger is configured to generate profiles and send them to WhyLabs every five minutes. If you wish to run the code by yourself, just remember to create your free account at https://whylabs.ai/free. You‚Äôll need to get the API token, Organization ID and Dataset ID and input them in the example notebook.

In your monitoring dashboard, you‚Äôll be able to see the evolution of your profiles over time and inspect all the metrics collected by LangKit, such as text readability, topic detection, semantic similarity, and more. You can get a quickstart with LangKit and WhyLabs by running [this getting started guide](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb) (no account required) or by checking the [LangKit repository](https://github.com/whylabs/langkit/tree/main).

<p align="left">
<img src="images/dashboard.png" alt="Drawing" style="width: 1000px;"/>
</p>
