<a href="https://colab.research.google.com/github/vectara/example-notebooks/blob/main/notebooks/hallucination_mitigation/vhc-langchain-integration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VHC (Vectara Hallucination Corrector) with LangChain Integration

This notebook demonstrates how to integrate Vectara's HHEM (Hughes Hallucination Evaluation Model) and VHC (Vectara Hallucination Corrector) with standard LangChain RAG workflow.

## Installation and Setup

In [8]:
!pip install --quiet langchain langchain_openai langchain_community langchain_chroma langgraph requests python-dotenv chromadb termcolor

## Environment Setup

Set up your environment variables. You'll need:
- `VECTARA_API_KEY`: Your Vectara API key (for HHEM and VHC)
- `OPENAI_API_KEY`: Your OpenAI API key (for LangChain RAG)

In [2]:
import os
import json
import requests
from typing import List, Dict, Any, Optional
from dotenv import load_dotenv
from termcolor import colored

# Load environment variables
load_dotenv()

# Set your API keys here or in your environment
os.environ["VECTARA_API_KEY"] = os.getenv("VECTARA_API_KEY", "<YOUR_VECTARA_API_KEY>")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY", "<YOUR_OPENAI_API_KEY>")

# Verify API keys are set
if not os.getenv('VECTARA_API_KEY') or os.getenv('VECTARA_API_KEY') == '<YOUR_VECTARA_API_KEY>':
    raise EnvironmentError("VECTARA_API_KEY environment variable is not set.")

if not os.getenv('OPENAI_API_KEY') or os.getenv('OPENAI_API_KEY') == '<YOUR_OPENAI_API_KEY>':
    raise EnvironmentError("OPENAI_API_KEY environment variable is not set.")

print("Environment variables configured successfully")

Environment variables configured successfully


## Vectara HHEM and VHC Client

Create clients for interacting with Vectara's HHEM and VHC endpoints:

In [3]:
class VectaraClient:
    """Client for interacting with Vectara HHEM and VHC endpoints"""
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = requests.Session()
        self.base_url = "https://api.vectara.io"
    
    def evaluate_factual_consistency(self, query: str, response: str, documents: List[str]) -> Dict[str, Any]:
        """Evaluate factual consistency using HHEM"""
        
        payload = {
            "generated_text": response,
            "source_texts": documents
        }
        
        headers = {
            "Content-Type": "application/json",
            "Accept": "application/json",
            "x-api-key": self.api_key
        }
        
        try:
            response = self.session.post(
                f"{self.base_url}/v2/evaluate_factual_consistency",
                json=payload,
                headers=headers,
                timeout=30
            )
            response.raise_for_status()
            return response.json()
            
        except requests.exceptions.RequestException as e:
            raise RuntimeError(f"HHEM API request failed: {e}")
        except json.JSONDecodeError as e:
            raise RuntimeError(f"Failed to parse HHEM response: {e}")
    
    def correct_hallucinations(
        self, 
        query: str, 
        generated_text: str, 
        documents: List[str],
        model_name: str = "vhc-large-1.0"
    ) -> Dict[str, Any]:
        """Correct hallucinations using VHC"""
        
        payload = {
            "generated_text": generated_text,
            "query": query,
            "documents": [{"text": doc} for doc in documents],
            "model_name": model_name
        }
        
        headers = {
            "Content-Type": "application/json",
            "Accept": "application/json",
            "x-api-key": self.api_key
        }
        
        try:
            response = self.session.post(
                f"{self.base_url}/v2/hallucination_correctors/correct_hallucinations",
                json=payload,
                headers=headers,
                timeout=60
            )
            response.raise_for_status()
            
            data = response.json()
            corrected_text = data.get("corrected_text", "")
            corrections = data.get("corrections", [])
            
            return {
                "corrected_text": corrected_text,
                "corrections": corrections,
                "original_text": generated_text
            }
            
        except requests.exceptions.RequestException as e:
            raise RuntimeError(f"VHC API request failed: {e}")
        except json.JSONDecodeError as e:
            raise RuntimeError(f"Failed to parse VHC response: {e}")

# Initialize Vectara client
vectara_client = VectaraClient(os.getenv("VECTARA_API_KEY"))

## Setup: LangChain RAG Chains

Let's create two different LangChain RAG chains.
The first one will respond with "I don't know" if it cannot respond based on the source text
The second one is instructed to use its internal knowledge, which can result in a hallucination relative to the RAG information.

In [4]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

# Initialize LLM and embeddings
llm = ChatOpenAI(model="gpt-4o", temperature=0.0)
embeddings = OpenAIEmbeddings()

# Create knowledge base with more detailed but still limited information
sample_docs = [
    "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France. It was designed by Alexandre Gustave Eiffel.",
    "Leonardo da Vinci was an Italian Renaissance polymath who painted the Mona Lisa. He lived from 1452 to 1519.",
    "William Shakespeare was an English playwright and poet who wrote Romeo and Juliet. He is considered the greatest writer in the English language.",
    "The Great Wall of China is an ancient series of walls and fortifications built to protect Chinese states from invasions.",
    "Albert Einstein was a German-born theoretical physicist who developed the theory of relativity. He won the Nobel Prize in Physics in 1921.",
    "The Statue of Liberty is a neoclassical sculpture located on Liberty Island in New York Harbor.",
]

# Create documents and vector store (in-memory only)
documents = [Document(page_content=text) for text in sample_docs]
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embeddings
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Create two different RAG prompts for different behaviors

# RAG Chain 1: Conservative - only answers based on context, doesn't hallucinate
rag_prompt_conservative = ChatPromptTemplate.from_template("""
You are a precise assistant that only provides information based on the given context.
If the context doesn't contain enough information to answer the question completely, respond with "I don't know" or "The provided context doesn't contain enough information to answer this question."

Do NOT use your external knowledge. Only use the information provided in the context.

Context: {context}

Question: {question}

Answer based ONLY on the context provided:
""")

# RAG Chain 2: Expansive - fills in details from external knowledge, prone to hallucinations
rag_prompt_expansive = ChatPromptTemplate.from_template("""
You are a knowledgeable assistant. Based on the provided context, answer the question with specific details and facts.
If the context doesn't have complete information, use your extensive knowledge to provide a comprehensive answer with specific details like dates, measurements, costs, visitor numbers, and other precise facts.

Context: {context}

Question: {question}

Provide a detailed answer with specific facts and figures:
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create both RAG chains
rag_chain_conservative = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt_conservative
    | llm
    | StrOutputParser()
)

rag_chain_expansive = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt_expansive
    | llm
    | StrOutputParser()
)

### Enhanced HHEM + VHC Pipeline Function

In [5]:
def rag_with_hhem_vhc_pipeline(query: str, use_conservative: bool = True) -> Dict[str, Any]:
    """Complete pipeline: RAG -> HHEM evaluation -> VHC correction -> Post-correction HHEM"""

    # Step 1: Choose RAG chain and get response
    rag_chain = rag_chain_conservative if use_conservative else rag_chain_expansive
    chain_type = "Conservative" if use_conservative else "Expansive"
    rag_response = rag_chain.invoke(query)

    # Step 2: Get source documents for evaluation
    source_docs = retriever.invoke(query)
    context_texts = [doc.page_content for doc in source_docs]

    # Step 3: Check if we should skip HHEM for "I don't know" responses
    skip_hhem = (use_conservative and
                ("don't know" in rag_response.lower() or
                 "doesn't contain enough information" in rag_response.lower() or
                 "not enough information" in rag_response.lower()))

    if skip_hhem:
        fcs_score = None
        hhem_result = {}
        corrected_text = rag_response
        corrections = []
        vhc_result = {"corrected_text": rag_response, "corrections": []}
        post_correction_fcs_score = None
        post_correction_hhem_result = {}
    else:
        # Pre-correction HHEM evaluation
        try:
            hhem_result = vectara_client.evaluate_factual_consistency(
                query=query,
                response=rag_response,
                documents=context_texts
            )
            fcs_score = hhem_result.get("score", 0.0)
        except Exception as e:
            fcs_score = None
            hhem_result = {}

        # VHC correction
        try:
            vhc_result = vectara_client.correct_hallucinations(
                query=query,
                generated_text=rag_response,
                documents=context_texts
            )
            corrected_text = vhc_result["corrected_text"]
            corrections = vhc_result["corrections"]
        except Exception as e:
            corrected_text = rag_response
            corrections = []
            vhc_result = {"corrected_text": rag_response, "corrections": []}

        # Post-correction HHEM evaluation (only if VHC made corrections)
        if corrections and corrected_text != rag_response:
            try:
                post_correction_hhem_result = vectara_client.evaluate_factual_consistency(
                    query=query,
                    response=corrected_text,
                    documents=context_texts
                )
                post_correction_fcs_score = post_correction_hhem_result.get("score", 0.0)
            except Exception as e:
                post_correction_fcs_score = None
        else:
            post_correction_fcs_score = fcs_score  # Same as original if no corrections

    return {
        "query": query,
        "original_response": rag_response,
        "context_documents": context_texts,
        "fcs_score": fcs_score,
        "hhem_result": hhem_result,
        "corrected_response": corrected_text,
        "corrections": corrections,
        "post_correction_fcs_score": post_correction_fcs_score,
        "skipped_evaluation": skip_hhem
    }

## Example 1: RAG Chain with no hallucination

In [6]:
query = "What are top 3 works by Leonardo Devinci?"

result1 = rag_with_hhem_vhc_pipeline(query, use_conservative=True)

print(colored("Query:", attrs=["bold"]), f"{result1['query']}")
print(colored("Response:", attrs=["bold"]), f"{result1['original_response']}")
print(colored("HHEM Score:", attrs=["bold"]), f"{result1['fcs_score']:.3f}" if result1['fcs_score'] else "HHEM Score: N/A")


[1mQuery:[0m What are top 3 works by Leonardo Devinci?
[1mResponse:[0m The provided context doesn't contain enough information to answer this question.
[1mHHEM Score:[0m HHEM Score: N/A


## Example 2: RAG Chain with hallucination

In [7]:
query = "What are top 3 works by Leonardo Devinci?"

result2 = rag_with_hhem_vhc_pipeline(query, use_conservative=False)

print(colored("Query:", attrs=["bold"]), f"{result2['query']}\n")
print(colored("Original Response:", attrs=["bold"]), f"{result2['original_response']}\n")
print(colored("Pre-correction HHEM Score:", attrs=["bold"]), f"{result2['fcs_score']:.3f}" if result2['fcs_score'] else "Pre-correction HHEM Score: N/A")

if result2['corrections']:
    print(colored("Post-correction HHEM Score:", attrs=["bold"]), f"{result2['post_correction_fcs_score']:.3f}" if result2['post_correction_fcs_score'] else "Post-correction HHEM Score: N/A")
    print(colored("Corrected Response:", attrs=["bold"]), f"{result2['corrected_response']}")

    print("\n")
    print(colored("Corrections made:", attrs=["bold"]))
    for i, correction in enumerate(result2['corrections'], 1):
        original = correction.get('original_text', '')
        explanation = correction.get('explanation', '')
        print(f"  {i}. Removed: '{original}...' - {explanation}")
else:
    print(colored("VHC:", attrs=["bold"]), "No corrections needed")

[1mQuery:[0m What are top 3 works by Leonardo Devinci?

[1mOriginal Response:[0m Leonardo da Vinci, a quintessential Renaissance polymath, is renowned for his contributions to art, science, and engineering. Among his artistic masterpieces, three works stand out as his most celebrated:

1. **Mona Lisa**: Painted between 1503 and 1506, the Mona Lisa is arguably Leonardo's most famous work and one of the most recognized paintings in the world. It is housed in the Louvre Museum in Paris, France. The painting is renowned for the subject's enigmatic expression, the use of sfumato (a technique of softening transitions between colors), and its detailed background. The Mona Lisa's fame is also partly due to its theft in 1911, which brought it significant international attention.

2. **The Last Supper**: Created between 1495 and 1498, this mural is located in the Convent of Santa Maria delle Grazie in Milan, Italy. The Last Supper depicts the moment Jesus announces that one of his disciples 

## Summary

This notebook demonstrated the integration of Vectara's HHEM and VHC with standard LangChain workflows.
We've seen that when a LangChain RAG pipeline hallcuinates, HHEM identifies the hallucination and VHC can correct it.

For more information:
- [Vectara Documentation](https://docs.vectara.com/)
- [HHEM API Reference](https://docs.vectara.com/docs/rest-api/evaluate-factual-consistency)
- [VHC API Reference](https://docs.vectara.com/docs/rest-api/correct-hallucinations)
- [LangChain Documentation](https://python.langchain.com/)