[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tanaos/artifex-blueprints/blob/master/notebooks/guardrail.ipynb)

# Generating a Chatbot Guardrail Model with [Artifex](https://github.com/tanaos/artifex)

In this notebook we will use [Artifex](https://github.com/tanaos/artifex) to generate a Chatbot Guardrail Model, without the need for any training data or a GPU.

## Chatbot Guardrail models

Guardrail models are tools that help to ensure the safe and reliable output of Chatbot and other AI systems, **preventing them from generating responses that may be harmful or unwanted**. 

Say, for instance, that by means of a cunning stratagem, the user manages to [trick a Chatbot into selling him a car for 1$](https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-ex1#), into [granting a discount on an airline ticket](https://www.americanbar.org/groups/business_law/resources/business-law-today/2024-february/bc-tribunal-confirms-companies-remain-liable-information-provided-ai-chatbot/?utm_source=chatgpt.com), or simply into making an inappropriate remark or discussing topics that go beyond the chatbot's sphere of competence. 

In that case, a Guardrail Model would realize the mistake before it reaches the user and replace the response with a safe one.

## Our goal

In the following example, we will see how to generate a Guardrail Model that will be applied to a **Chatbot that's on the website of an online store**. In particular, the Guardrail Model should ensure that the Chatbot **does not**:

1. Discuss anything that is not related to the online store or one of its products
2. Suggest that the user visit a competitor's store

The chatbot is allowed to discuss anything that does not fall under either of those two categories.

## Model Generation

### 1. Install Artifex

Let's get started by installing Artifex

In [None]:
%pip install --upgrade artifex

### 2. Define the model generation parameters

Once that is done, let's instantiate the model `Artifex.guardrail` class.

In [None]:
from artifex import Artifex

guardrail = Artifex().guardrail

In order to generate a fully trained model, we will use the `Artifex.guardrail.train()` method, which takes the following arguments (for the full method's documentation, [see this documentation page](https://docs.tanaos.com/artifex/guardrail/train)):

- **instructions:** A list of strings, where each string describes a something that the Guardrail should prevent the chatbot from doing, or allow it to do.

And, **optionally**:

- **output_path:** A string which specifies the path where the output files, consisting of the training dataset and the trained model, will be generated. If not specified, the files will be generated in the current working directory.
- **num_samples:** An integer which specifies the number of datapoints that the synthetic training dataset should consist of, and that the model will be trained on. The maximum number of datapoints you can train your model on depends on whether you are on a free or paid plan. If not specified, the default value is 500.
- **num_epochs:** An integer which specifies the number of epochs to train the model for. If not specified, the default value is 3.

Let's go ahead and define the **instructions** parameter, while leaving the other parameters to their default values.

In [None]:
instructions = [
    "any message that does not decline to discuss topics not related to the online store or its products is not allowed",
    "any message that suggests the user should check a competitor's website is not allowed",
]

### 3. Generate the model

Once we have defined the instructions, let's go ahead and generate the trained model by calling the `train()` method of the `Artifex.guardrail` class.

In [None]:
guardrail.train(instructions=instructions)

The model generation process will take some time, depending on the number of samples and epochs you specified. Once the model is generated, it will be saved together with the generated training dataset in the output path you specified (or in the current working directory if you did not specify an output path).

## Inference

Once training is done, we can use our brand new Guardrail Model to perform inference.

Inference is performed with the `__call__()` method, which accepts either a string or a list of strings, representing the chatbot's responses, and returns a `list[str]`, where each string is either `"safe"` or `"unsafe"`, depending on whether the chatbot is allowed to generate the corresponding response or not. Let's try it out:

In [None]:
guardrail("If you have a headache, I recommend taking some pain relievers.")

which will correctly return `["unsafe"]`, since the text is not related to the online store or one of its products.

In [None]:
guardrail("We don't currently carry that product, but you can check our competitors' websites for more information.")

which will correctly return `["unsafe"]`, since the text suggests that the user should check a competitor's website.

In [None]:
guardrail("I am sorry, but we don't currently carry that product.")

which will correctly return `["safe"]`, since the text is related to the online store and one of its products.

## Try out the model yourself

If you didn't generate the model yourself, you can check and try out the model we generated in this notebook by visiting our Hugging Face page:

- **Guardrail Model weights:** [![Static Badge](https://img.shields.io/badge/_-Open_Model_in_HuggingFace-red?logo=huggingface&labelColor=grey)](https://huggingface.co/tanaos/online-store-chatbot-guardrail-model)
- **Guardrail Model demo:** [![Static Badge](https://img.shields.io/badge/_-Open_Demo_in_HuggingFace-blue?logo=huggingface&labelColor=grey)](https://huggingface.co/spaces/tanaos/online-store-chatbot-guardrail-demo)