# Monitoring LLMs with LangChain, OpenAI, LangKit, and WhyLabs 

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/langkit/blob/main/langkit/examples/Langchain_OpenAI_LLM_Monitoring_with_WhyLabs.ipynb)



We'll show how you can generate out-of-the-box text metrics using LangKit + Langchain and monitor them in the WhyLabs Observability Platform.

You'll need a free WhyLabs account & and OpenAI API account to follow along.

With LangKit, you'll be able to extract and monitor relevant signals from unstructured text data, such as:

- [Text Quality](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/quality.md)
- [Text Relevance](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/relevance.md)
- [Security and Privacy](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/security.md)
- [Sentiment and Toxicity](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/sentiment.md)

For this example, we'll pay attention to sentiment change between prompts and responses. Sentiment can be a valuable metric to understand how users interact with your LLM in production and how any system prompts or template updates change responses.


### Install LangKit & LangChain



In [None]:
%pip install langkit[all]==0.0.2
%pip install langchain==0.0.205

### Set OpenAI and WhyLabs credentials:
To send LangKit profiles to WhyLabs we will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to [https://whylabs.ai/free](https://whylabs.ai/free) and grab a free account. You can follow along with the quick start examples or skip them if you'd like to follow this example immediately.

1. Create a new project and note its ID (if it's a model project, it will look like `model-xxxx`)
2. Create an API token from the "Access Tokens" tab
3. Copy your org ID from the same "Access Tokens" tab

Get your OpenAI API key from your [OpenAI account](https://openai.com/)

Replace the placeholder string values with your own OpenAI and WhyLabs API Keys below:

In [1]:
# Set OpenAI and WhyLabs credentials
import os

os.environ["OPENAI_API_KEY"] = "OPENAIAPIKEY"
os.environ["WHYLABS_DEFAULT_ORG_ID"] = "WHYLABSORGID"
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = "MODELID"
os.environ["WHYLABS_API_KEY"] = "WHYLABSAPIKEY"

#### Import LangChain callbacks, OpenAI LLM, and additional language metrics

In [5]:
from langchain.callbacks import WhyLabsCallbackHandler
from langchain.llms import OpenAI

# Import additional language metrics
import langkit.sentiment
import langkit.topics

#### Initialize WhyLabs callback and OpenAI GPT models

In [3]:
# Initialize WhyLabs Callback & GPT model with LangChain
whylabs = WhyLabsCallbackHandler.from_params()
llm = OpenAI(temperature=0, callbacks=[whylabs])

#### Generate responses on prompts & close WhyLabs session

The rolling logger for whylogs will write profiles every 5 minutes or when `.flush()` or `.close()` is called.

In [None]:
# generate responses to positive prompts from LLM
result = llm.generate(
    [
        "I love nature, its beautilful and amazing!",
        "This product is awesome. I really enjoy it.",
        "Chatting with you has been a great experience! you're very helpful."
    ]
)
print(result)

# close WhyLabs Session which will also push profiles to WhyLabs
whylabs.close()

Thats it! Language metrics about the prompts and model responses are now being tracked in WhyLabs.

Navigate to the profile tab and click on "View details" over the `prompt.sentiment_nltk` metric to see the distribution of sentiment scores for the prompt. 

In this example, all the prompts have a positive sentiment score of 80+.

![](../../static/img/langchain-positive-sentiment.png)

Click on the "Show Insights" button to see further insights about language metrics for prompts and responses.

![](../../static/img/langchain-insights.png)

As more profiles are written on different dates, you'll get a time series pattern you can analyze & set monitors like in the [Demo org](https://bit.ly/3NOq0Od).

You can also backfill batches of data by overwriting the date and time as seen in [this example](https://github.com/whylabs/langkit/blob/main/langkit/examples/Batch_to_Whylabs.ipynb).

![](../../static/img/sentiment-monitor.png)

### Watch the sentiment value change from negative prompts
After inspecting the results in WhyLabs, try changing your prompts to trigger a change in the metric you're monitoring, such as prompt sentiment.

In [None]:
# Intialize WhyLabs Callback & GPT with Langchain
whylabs = WhyLabsCallbackHandler.from_params()
llm = OpenAI(temperature=0, callbacks=[whylabs])

In [None]:
result = llm.generate(
    [
        "I hate nature, its ugly.",
        "This product is bad. I hate it.",
        "Chatting with you has been a terrible experience!."
        "I'm terrible at saving money, can you give me advice?"
    ]
)
print(result)

# close WhyLabs Session
whylabs.close()

Viewing the histogram results in WhyLabs again, you can see the sentiment value change from only positive prompts to containing a range of negative & positive prompts.

We can configure monitors to alert us automatically when the sentiment value changes in the monitor manager tab.

![](../../static/img/langchain-negative-sentiment.png)

In this example, we've seen how you can use LangKit to extract and monitor sentiment from unstructured text data. You can also use LangKit to extract and monitor other relevant signals from text data.

# More Resources

Learn more about monitoring LLMs in production with LangKit

- [Intro to LangKit Example](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)
- [LangKit GitHub](https://github.com/whylabs/langkit)
- [whylogs GitHub - data logging & AI telemetry](https://github.com/whylabs/whylogs)
- [WhyLabs - Safeguard your Large Language Models](https://whylabs.ai/safeguard-large-language-models)



