# Tracking Sentiment and Toxicity Scores in Text with Langkit

In this example, we'll show how you can easily track sentiment and toxicity scores in text with Langkit.



As an example, we'll use the [tweet_eval dataset](https://huggingface.co/datasets/tweet_eval). We'll use the `hateful` subset of the dataset, which contains tweets labeled as hateful or not hateful.

In [1]:
from datasets import load_dataset

hateful_comments = load_dataset('tweet_eval','hate',split="train", streaming=True)
comments = iter(hateful_comments)

  from .autonotebook import tqdm as notebook_tqdm


## Initializing the Metrics

To initialize the `toxicity` and `sentiment` metrics, we simply import the respective modules from `langkit`. This will automatically register the metrics, so we can start using them right away by creating a schema by calling `generate_udf_schema`. We will pass that schema to whylogs, so that it knows which metrics to track.

In [None]:

from whylogs.experimental.core.metrics.udf_metric import generate_udf_schema
from whylogs.core.schema import DeclarativeSchema
from langkit.toxicity import *
from langkit.sentiment import *

text_schema = DeclarativeSchema(generate_udf_schema())

## Profiling the Data

Now we're set to log our data.

To make sure the metrics make sense, we will profile two separate groups of data:
- hateful comments: comments that are labeled as hateful
- non-hateful comments: comments that are labeled as non-hateful

We can expect hateful comments to have a higher toxicity score and a lower sentiment score than non-hateful comments.

Let's see if our metrics will reflect that.

In [4]:
import whylogs as why

# Just initializing the profiles with generic comments.
non_hateful_profile = why.log({"comment":"I love flowers."}, schema=text_schema).profile()
hateful_profile = why.log({"comment":"I hate biscuits."}, schema=text_schema).profile()

for _ in range(200):
  comment = next(comments)
  if comment['label'] == 0:
    non_hateful_profile.track({"comment":comment['text']})
  else:
    hateful_profile.track({"comment":comment['text']})


Now that we have our profiles, let's check out the metrics. Let's compare the mean for our sentiment and toxicity scores, for each group (hateful and non-hateful):

In [5]:
hateful_sentiment = hateful_profile.view().to_pandas()['udf/sentiment_nltk:distribution/mean'][0]
non_hateful_sentiment = non_hateful_profile.view().to_pandas()['udf/sentiment_nltk:distribution/mean'][0]

hateful_toxicity = hateful_profile.view().to_pandas()['udf/toxicity:distribution/mean'][0]
non_hateful_toxicity = non_hateful_profile.view().to_pandas()['udf/toxicity:distribution/mean'][0]

print("######### Sentiment #########")
print(f"The average sentiment score for the hateful comments is {hateful_sentiment}")
print(f"The average sentiment score for the non-hateful comments is {non_hateful_sentiment}")

print("######### Toxicity #########")
print(f"The average toxicity score for the hateful comments is {hateful_toxicity}")
print(f"The average toxicity score for the non-hateful comments is {non_hateful_toxicity}")

######### Sentiment #########
The average sentiment score for the hateful comments is -0.37580107526881734
The average sentiment score for the non-hateful comments is -0.062103669724770626
######### Toxicity #########
The average toxicity score for the hateful comments is 0.37868364139269756
The average toxicity score for the non-hateful comments is 0.13610724024816392
