# 📓 Hotspots: detecting suspicious features in your evals

TruLens Hotspots is a tool for detecting suspicious _features_ in your evaluation results. For instance, it can detect that a specific word in the input lowers the evaluation score much.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/examples/quickstart/hotspots.ipynb)

In [12]:
# !pip install trulens-hotspots

## Running Hotspots on a CSV file

Hotspots can be run in a simple way, without using any other TruLens features.



What you need for TruLens Hotspots is a **data frame with per-sample evaluation scores** (but it does matter what evaluation metric you're specific).

So let's read a CSV file.

In [6]:
from pandas import read_csv

df = read_csv("../../tests/files/sample.csv.gz")

df

Unnamed: 0,score,id,text,gold,predicted
0,18.532006,662ed514d56f7bc8743aa6f23794c731,rin 11K ui i rsognfd inlriliinnts i>r the town...,1838.834247,1857.366252
1,22.427336,0c3ac40edfe6a167ab692fdb9219a93c,ton County feel an interest in. tn great is-\n...,1857.691781,1880.119116
2,13.356245,b298097f3afd2f8c06b61fa2308ec725,But at our own doors we have evidence ten\ning...,1847.012329,1833.656084
3,6.576518,1d50cf957a6a9cbbe0ee7773a72a76d4,The wonderful Flexibility and great comfort\na...,1867.541096,1860.964578
4,2.033201,5a7297b76de00c7d9e1fb159384238c0,Illinois.—The Legislature met at Ya:.ualia\non...,1826.083562,1828.116763
...,...,...,...,...,...
157,25.538300,b76777718b4bd54271490e7a9005ad45,". /trusted, government or nature? by which, we...",1827.675342,1853.213642
158,4.601610,066778f64e92c1b3a46e180011e7ab21,"started up to get it, getting out on the\nnxf ...",1881.628767,1877.027158
159,0.042690,155dc7dc3340eb39ec9d78d218aa83f1,"question, he would leave the meeting as eiilig...",1842.765753,1842.723063
160,12.130832,011cac8b2caddc9f570319d352a578bc,the Spectator is pleased to say. He even conde...,1844.042350,1831.911518


Now you need to define a TruLens Hotspots configuration; you're required to specify the column with the evaluation score. You can also list irrelevant columns to be skipped. If your evaluation metric is the-lower-the-better, like Mean Absolute/Square Error, you need to state that explicitly.

In [13]:
from trulens.hotspots import hotspots_as_df, HotspotsConfig

hotspots_config = HotspotsConfig(score_column="score", skip_columns=["id"], the_lower_the_better=True)

Running Hotspots is simple:

In [14]:
hotspots_df = hotspots_as_df(hotspots_config, df)

hotspots_df

Unnamed: 0,hotspot,#,avg score,deterioration,opportunity,p-value
0,"""gold"" smaller than 1872.8129331968685",81,21.890299,9.659035,-4.829517,2.6e-05
1,"""predicted"" greater than or equal to 1854.4750...",121,17.571753,2.018962,-1.50799,0.428918
2,"""text"" contains ""ma""",6,40.009869,23.831745,-0.882657,0.000749
3,"""text"" contains ""our""",51,22.894854,8.514593,-2.68052,0.000463
4,"""text"" contains ""certainly""",6,32.267838,15.791943,-0.584887,0.0039
5,"""text"" contains ""best""",19,23.095829,6.836907,-0.801859,0.003165
6,"""text"" contains ""lay""",5,30.188588,13.545889,-0.418083,0.006496
7,"""text"" contains ""custom""",5,32.990408,16.436939,-0.507313,0.006496
8,"""text"" contains ""individual""",6,32.310946,15.836709,-0.586545,0.020685
9,"""text"" contains ""sell""",6,30.313601,13.762543,-0.509724,0.034526


The task being evaluated is predicting the publication year of a short historical text. Older texts are the hardest, also some specific words make the overall score worse.