### Vectara HHEM Evaluator Quickstart

In this quickstart, you'll learn how to use the HHEM evaluator feedback function from TruLens in your application. The Vectra HHEM evaluator, or Hughes Hallucination Evaluation Model, is a tool used to determine if a summary produced by a large language model (LLM) might contain hallucinated information.

- **Purpose:** The Vectra HHEM evaluator analyzes both inputs and assigns a score indicating the probability of response containing hallucinations.
- **Score :** The returned value is a floating point number between zero and one that represents a boolean outcome : either a high likelihood of hallucination if the score is less than 0.5 or a low likelihood of hallucination if the score is more than 0.5 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/models/Vectara_HHEM_evaluator.ipynb)

### Install Dependencies

Run the cells below to install the utilities we'll use in this notebook to demonstrate Vectara's HHEM model.
- uncomment the cell below if you havent yet installed the langchain or TruEra's TruLens. 

In [None]:
#!pip install langchain==0.0.354 ,langchain-community==0.0.20 ,langchain-core==0.1.23,trulens_eval

### Import Utilities

we're using Langchain utilities to facilitate RAG retrieval and demonstrate Vectara's HHEM.
- run the cells below to get started. 

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader,DirectoryLoader
from langchain_community.vectorstores import Chroma
import json,getpass,os

### PreProcess Your Data
Run the cells below to split the Document TEXT into text Chunks to feed in ChromaDb.
These are our primary sources for evaluation. 

In [14]:
loader = DirectoryLoader('./data/', glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

### e5 Embeddings
e5 embeddings set the SOTA on BEIR and MTEB benchmarks by using only synthetic data and less than 1k training steps. this method achieves
strong performance on highly competitive text embedding benchmarks without using any labeled data. Furthermore, when fine-tuned with a mixture of synthetic and labeled data, this model sets new state-of-the-art results on the BEIR and MTEB benchmarks.[Improving Text Embeddings with Large Language Models](https://arxiv.org/pdf/2401.00368.pdf). It also requires a unique prompting mechanism.

In [17]:
inference_api_key =getpass.getpass("Enter your HF Inference API Key:\n\n")


Enter your HF Inference API Key:

 ········


In [18]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

embedding_function = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, model_name="intfloat/multilingual-e5-large-instruct"
)



### Initialize a Vector Store

Here we're using Chroma , our standard solution for all vector store requirements.
- run the cells below to initialize the vector store.

In [19]:
db = Chroma.from_documents(texts, embedding_function)

### Wrap a Simple RAG application with TruLens
- **Retrieval:** to get relevant docs from vector DB
- **Generate completions:** to get response from LLM.

run the cells below to create a RAG Class and Functions to Record the Context and LLM Response for Evaluation

In [21]:
from trulens_eval.tru_custom_app import instrument
import requests

class Rag:
    def __init__(self):
        pass
    
    @instrument
    def retrieve(self, query: str) -> str:
        docs = db.similarity_search(query)
        # Concatenate the content of the documents
        content = ''.join(doc.page_content for doc in docs)
        return content
        
    @instrument
    def generate_completion(self, content: str, query: str) -> str:
        url = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
        headers = {
            "Authorization": "Bearer your hf token",
            "Content-Type": "application/json"
        }

        data = {
            "inputs": f"answer the following question from the information given Question:{query}\nInformation:{content}\n"
        }

        try:
            response = requests.post(url, headers=headers, json=data)
            response.raise_for_status()
            response_data = response.json()

            # Extract the generated text from the response
            generated_text = response_data[0]['generated_text']
            # Remove the input text from the generated text
            response_text = generated_text[len(data['inputs']):]

            return response_text
        except requests.exceptions.RequestException as e:
            print("Error:", e)
            return None
            
    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(context_str, query)
        return completion


# Instantiate the applications above
- run the cells below to start the applications above.

In [22]:
rag1 = Rag()

In [28]:
from trulens_eval import Feedback, Huggingface, Tru, Select
tru = Tru()
tru.reset_database()

### Initialize HHEM Feedback Function
HHEM takes two inputs:

1. The summary/answer itself generated by LLM.
2. The original source text that the LLM used to generate the summary/answer (retrieval context).


In [40]:
huggingface_provider = Huggingface()
f_hhem_score=(
    Feedback(huggingface_provider.hallucination_evaluator, name = "HHEM_Score")
    .on(Select.RecordCalls.generate_completion.rets)
    .on(Select.RecordCalls.retrieve.rets)   
)

✅ In HHEM_Score, input model_output will be set to __record__.app.generate_completion.rets .
✅ In HHEM_Score, input retrieved_text_chunks will be set to __record__.app.retrieve.rets .


### Record The HHEM Score
- run the cell below to create a feedback function for Vectara's HHEM model's score. 

In [41]:
feedbacks = [f_hhem_score]

### Wrap the custom RAG with TruCustomApp, add HHEM  feedback for evaluation
- it's as simple as running the cell below to complete the application and feedback wrapper.

In [42]:
from trulens_eval import TruCustomApp
tru_rag = TruCustomApp(rag1,
    app_id = 'RAG v1',
    feedbacks =feedbacks)

### Run the App

In [43]:
with tru_rag as recording:
    rag1.query("What is Vint Cerf")

In [44]:
tru.get_leaderboard(app_ids=["RAG v1"])

Unnamed: 0_level_0,Context Relevance,HHEM_Score,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
RAG v1,0.205199,0.133374,18.0,0.0


### Explore in a Dashboard

In [45]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Network URL: http://192.168.0.104:8501



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>