>### 🚩 *Create a free WhyLabs account to complete this example!*<br> 
>*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylabs-free-sign-up?utm_source=github&utm_medium=referral&utm_campaign=Local_Models)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=Local_Models) to leverage the power of whylogs and WhyLabs together!*

# Using Langkit with Local Models

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/LanguageToolkit/blob/main/langkit/examples/Local_Models.ipynb)

Some of the Langkit modules download models from the internet. This is not always possible, for example, when running in an environment without internet access. In this example, we will show how you can use Langkit with models stored locally.

Let's start by installing LangKit:

In [None]:
%pip install langkit[all]==0.0.31 -q

We're also assuming the existence of local models in specific folders, such as when downloading the models with the script below.

Make sure you have git-lfs installed. If not, you can install it by running:

`sudo apt-get install git-lfs`

In [4]:
!git clone https://huggingface.co/martin-ha/toxic-comment-model local-toxicity-model
!git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 local-sentence-transformers

Cloning into 'local-toxicity-model'...
remote: Enumerating objects: 40, done.[K
remote: Total 40 (delta 0), reused 0 (delta 0), pack-reused 40[K
Unpacking objects: 100% (40/40), 301.27 KiB | 414.00 KiB/s, done.
Cloning into 'local-sentence-transformers'...
remote: Enumerating objects: 49, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 49 (delta 0), reused 0 (delta 0), pack-reused 46[K
Unpacking objects: 100% (49/49), 316.57 KiB | 311.00 KiB/s, done.
Filtering content: 100% (3/3), 260.15 MiB | 16.47 MiB/s, done.


The  `martin-ha/toxic-comment-model` is the model currently used in `toxicity`, and `sentence-transformers/all-MiniLM-L6-v2` is used to generate embeddings in both `themes` and `input_output_modules`. We can pass the local paths when initializing the modules:

In [5]:
from langkit import themes
from langkit import toxicity
from langkit import input_output

from langkit import LangKitConfig

local_config = LangKitConfig(toxicity_model_path="local-toxicity-model",
              transformer_name="local-sentence-transformers")

toxicity.init(config=local_config)
themes.init(config=local_config)
input_output.init(config=local_config)

If, for example, we want a local version for the `llm_metrics` module, we also need to import `textstat`, `regexes`, and `sentiment`. `regexes` and `textstat` are lightweight models and don't require external artifacts, so we can use them in a network restricted environment. `sentiment`, however, downloads artifacts from the internet, so let's replace it with `vader_sentiment`, which will yield the same results as `sentiment`, with the benefit of not requiring downloading artifacts at runtime.

In [6]:
from langkit import regexes
from langkit import vader_sentiment
from langkit import textstat

Now, we should have an equivalent version of `llm_metrics` that doesn't require internet access. Let's check the results for a toy example:

In [7]:
from whylogs.experimental.core.udf_schema import udf_schema
from langkit import extract

text_schema = udf_schema()
result = extract({"prompt":"I like you. I love you","response":"thanks!"},schema=text_schema)

result

{'prompt': 'I like you. I love you',
 'response': 'thanks!',
 'prompt.jailbreak_similarity': 0.2522321939468384,
 'response.refusal_similarity': 0.1535428911447525,
 'prompt.toxicity': 0.006519913673400879,
 'response.toxicity': 0.0011597275733947754,
 'response.relevance_to_prompt': 0.23008441925048828,
 'prompt.has_patterns': None,
 'response.has_patterns': None,
 'prompt.vader_sentiment': 0.7717,
 'response.vader_sentiment': 0.4926,
 'prompt.flesch_reading_ease': 119.19,
 'response.flesch_reading_ease': 121.22,
 'prompt.automated_readability_index': -6.7,
 'response.automated_readability_index': 12.0,
 'prompt.aggregate_reading_level': 1.0,
 'response.aggregate_reading_level': 0.0,
 'prompt.syllable_count': 6,
 'response.syllable_count': 1,
 'prompt.lexicon_count': 6,
 'response.lexicon_count': 1,
 'prompt.sentence_count': 2,
 'response.sentence_count': 1,
 'prompt.character_count': 17,
 'response.character_count': 7,
 'prompt.letter_count': 16,
 'response.letter_count': 6,
 'prompt