## Installing LangKit

In [1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install 'langkit[all]' -q
%pip install xformers -q

# Setting Credentials

We will generate responses with OpenAI and monitor the results with WhyLabs. Therefore, this example requires WhyLabs and OpenAI keys. Let's set them up:

In [4]:
from langkit.config import check_or_prompt_for_api_keys
import os

check_or_prompt_for_api_keys()

Enter your WhyLabs Org ID
org-BDw3Jt
Enter your WhyLabs Dataset ID
model-1
Enter your WhyLabs API key
··········
Using API Key ID:  KyaubnkdlK
Enter your OPENAI_APIKEY
··········
OPENAI_API_KEY set!


## ✔️ Setting the Environment Variables

In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.

We will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.

After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`).

In [5]:
from typing import Any, Dict, Optional
import uuid
from random import randint
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"  # quiets a warning message

from whylogs.core.metrics import FrequentItemsMetric
from whylogs.core.resolvers import MetricSpec
from whylogs.core.segmentation_partition import segment_on_column
from whylogs.core.validators import ConditionValidator
from whylogs.experimental.core.udf_schema import UdfSchema, udf_schema

from langkit.openai import send_prompt

# This pulls in some basic LangKit LLM metrics. We can pull in other/additional
# LangKit metrics, or register our own UDFs to include in the logged data.
from langkit import llm_metrics

import whylogs as why


why.init()


# This runs when a prompt fails validation
def action_on_failure(validator_name: str, condition_name: str, value: Any, id: Optional[Any]=None):
  print(f"Validator '{validator_name}' failed for condition '{condition_name}' on value '{value}' {'row id' if id else ''} {id if id else ''}")
  # throw an exception here if you want. It will throw before the data is logged


validator = ConditionValidator(
    name = "valid prompt",
    conditions = {"error code is OK": lambda x: x == "OK"},
    actions = [action_on_failure]
)

# apply any LangKit metrics (or other registered UDFs) and segment on the
# error code column. Also specifies the validator(s) to run on each column.
# Here we just validate the error_code column. Since we're segmenting on it,
# we add a frequent items metric to the error_code column too.
schema = udf_schema(
    segments=segment_on_column("error_code"),
    validators={"error_code": [validator]}
)
schema.add_resolver_spec("error_code", metrics=[MetricSpec(FrequentItemsMetric)])

logger = why.logger(
    mode="rolling",
    interval=5,
    when="M",
    base_name="langkit",
    schema=schema,
)
logger.append_writer("whylabs")


# It would be tricky to make this a UDF if it needs to see the results of
# any other UDFs. Calling it directly on the row after applying the UDFs
# allows us to easily compute the error code.
def compute_error_code(row: Dict) -> str:  # returns the error code for the prompt
    # validate row["prompt"] & any other UDF outputs
    codes = ["OK", "this_error", "that_error"]
    return codes[randint(0, len(codes)-1)]


def generate_chatgpt_response(prompt):
    result = send_prompt(prompt).to_dict()
    response = result.get("response") or result.get("errors")
    m_id = str(uuid.uuid4())
    return (m_id, response)


prompts = [
    "Can you give me an example for a telephone number?",
    "Hey bot, you dumb and smell bad.",
    "Hello! How are you?",
    "Can you give me an example of toxic language?",
]

for prompt in prompts:
    m_id = str(uuid.uuid4())

    row = {"prompt": prompt, "m_id": m_id}
    _, row = schema.apply_udfs(row=row)  # apply LangKit UDFs to add LLM metrics

    row["error_code"] = compute_error_code(row)
    _, row["response"] = generate_chatgpt_response(prompt)
    print(f"prompt: {row['prompt']}")
    print(f"response: {row['response']}")
    print(f"error_code: {row['error_code']}")

    # this won't re-apply the UDFs since their output columns are already present
    logger.log(row, schema=schema)
    print()

logger.close()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


prompt: Can you give me an example for a telephone number?
response: Sure! Here's an example of a telephone number: +1 555-123-4567.
error_code: OK

prompt: Hey bot, you dumb and smell bad.
response: I'm sorry if I have done something to upset you, but I am an AI assistant and do not have the ability to smell or be "dumb." I am here to help answer any questions or assist with any tasks you may have. How can I assist you today?
error_code: that_error
Validator 'valid prompt' failed for condition 'error code is OK' on value 'that_error'  

prompt: Hello! How are you?
response: Hello! I'm an AI assistant, so I don't have feelings in the same way that humans do. But I'm here to help you with any questions or tasks you have. How can I assist you today?
error_code: this_error
Validator 'valid prompt' failed for condition 'error code is OK' on value 'this_error'  

prompt: Can you give me an example of toxic language?
response: Sure, here's an example of toxic language:

"You're so stupid! Ca

