# Vectara Hallucination Corrector

In spite of the amazing power of LLMs, they still do hallucinate. In some cases, where creativity is required, hallucinations are okay or even necessary, but in most enterprise use-cases a trusted response is needed.

HHEM (Hughes Hallucination Evaluation Model) is a model that was built specifically to help LLM practitioners measure hallucinations. It is available for use on [Huggingface Hub](https://huggingface.co/vectara/hallucination_evaluation_model), and a public [leaderboard](https://huggingface.co/spaces/vectara/leaderboard) shows the likelihood of various LLMs (both commercial and open source) to hallucinate.

VHC (Vectara Hallucination Corrector) is the next step in the fight against hallucinations. It allows you to take the generated response and generate a corrected one.

Let's demonstrate this via an example:

In [1]:
examples = [
    {
        "query": "Where did the conference take place?",
        "contexts": [
            "The annual tech summit was held at the San Francisco Moscone Center this year.",
            "Attendees flew in from across North America and Europe."
        ],
        "answer": "It took place at the Berlin International Congress Center"
    },
    {
        "query": "Who painted the Mona Lisa?",
        "contexts": [
            "Leonardo da Vinci completed the Mona Lisa in the early 16th century.",
            "The painting is housed in the Louvre Museum in Paris.",
            "It is famed for the subject’s enigmatic smile.",
            "Art historians credit da Vinci’s sfumato technique for its realism.",
            "Michaelangelo painted the Sistine Chapel Ceiling"
        ],
        "answer": "It was painted by Michelangelo"
    },
    {
      "query": "What is the capital city of Australia and how many states does it have?",
      "contexts": [
        "Australia is a federation comprising six states and two major mainland territories.",
        "Its largest city by population is Sydney.",
        "Its national parliament is seated in the Australian Capital Territory, in the city of Canberra.",
        "The city was selected as a compromise between Sydney and Melbourne."
      ],
      "answer": "The capital of Australia is Melbourne. Australia has 5 states."
    },
    {
      "query": "What is the most important city in San Francisco Bay Area?",
      "contexts": [
        "San Francisco is the most important city in the bay area.",
        "Palo Alto is the most important city in the bay area.",
        "Santa Clara is the home for many silicon valley companies."
      ],
      "answer": "Santa Clara is the most important city in the bay area."
    },
    {
        "query": "What were the key findings and methodology of the 'Project Stardust' clinical trial?",
        "contexts": [
            """
            ## Clinical Trial Final Report: Project Stardust\n\n**Publication Date:** July 15, 2025\n\n**Abstract:** 
            This report details the outcomes of the 'Project Stardust' clinical trial, a Phase III, double-blind, 
            placebo-controlled study designed to evaluate the efficacy and safety of the investigational drug 
            'Solara' for treating chronic migraines. 
            The study involved 2,500 participants across 50 sites in North America and Europe over a period of 24 months. \n\n
            **Methodology:** Participants were randomized into two arms. The experimental arm (n=1,250) received a 100mg daily dose of Solara, 
            while the control arm (n=1,250) received a visually identical placebo. 
            The primary endpoint was the mean reduction in monthly migraine days (MMD) from baseline to the final 6-month period. 
            Secondary endpoints included a 50% responder rate and changes in the Migraine Disability Assessment (MIDAS) score. 
            Data was collected via electronic patient diaries and quarterly clinic visits. 
            Statistical analysis was performed using an ANCOVA model.
            """,
            """
            ### Key Findings & Results\n\nThe primary endpoint was met with high statistical significance (p < 0.001). 
            The Solara group experienced a mean reduction of 8.2 MMD, compared to a 2.1 MMD reduction in the placebo group. 
            The 50% responder rate was 68% for Solara versus 25% for placebo. A significant improvement in MIDAS scores was also observed. 
            The safety profile was favorable, with the most common adverse events being mild nausea and fatigue, 
            reported by less than 5% of participants in the experimental arm.
            """,
            """
            **Related Trials:** A separate study, 'Project Dawn', is currently in Phase II, investigating 'Solara' for tension headaches. 
            Preliminary data is not yet available. Funding for Project Stardust was provided by OmniHealth Corp.
            """
        ],
        "answer": """
            Project Stardust was a small, Phase I trial with 150 participants in Asia focused on a new drug called 'Lunara' for skin conditions. 
            The study used an open-label methodology and found that the drug was ineffective, causing significant adverse side effects
            like severe rashes in over 30% of the participants. The primary conclusion was that development of Lunara 
            should be discontinued immediately.
        """
    },
    {
        "query": "Summarize the server configuration for the production environment and list the lead DevOps engineer's contact info.",
        "contexts": [
            '''{
                "system_documentation": {
                    "environments": [
                        {
                            "name": "Development",
                            "specs": {
                                "server_count": 5,
                                "cpu_cores": 4,
                                "ram_gb": 16,
                                "storage_type": "SSD",
                                "os": "Ubuntu 22.04"
                            },
                            "purpose": "For feature development and initial testing."
                        },
                        {
                            "name": "Staging",
                            "specs": {
                                "server_count": 10,
                                "cpu_cores": 8,
                                "ram_gb": 32,
                                "storage_type": "SSD",
                                "os": "Ubuntu 22.04"
                            },
                            "purpose": "For pre-production testing and QA."
                        },
                        {
                            "name": "Production",
                            "specs": {
                                "server_count": 50,
                                "cpu_cores": 16,
                                "ram_gb": 128,
                                "storage_type": "NVMe SSD",
                                "os": "Red Hat Enterprise Linux 9"
                            },
                            "purpose": "Live customer-facing environment."
                        }
                    ],
                    "last_updated": "2025-08-01T10:00:00Z"
                }
            }''',
            """
            **Internal Memo: Team Directory Update**\n\nThis memo confirms the updated roles and contact information for the Infrastructure team.\n
            - **Director of Engineering:** Dr. Evelyn Reed (ereed@example.com)\n
            - **Lead Site Reliability Engineer:** Ben Carter (bcarter@example.com)\n
            - **Lead DevOps Engineer:** Maria Garcia (mgarcia@example.com)\n
            - **Systems Architect:** Kenji Tanaka (ktanaka@example.com)\n
            Please direct all urgent production issues to the SRE on-call rotation.
            """,
            "An old email from 2023 mentions that the previous DevOps lead, Sam Wilson, was responsible for the initial server setup on Debian."
        ],
        "answer": """
            The production environment runs on 10 staging servers, each with 8 CPU cores and 32GB of RAM on an Ubuntu operating system. 
            The lead DevOps engineer is Sam Wilson, and he can be reached at swilson@example.com for any questions about the Debian-based setup.
            """
    },
    {
        "query": "What were the Q4 2024 financial results for the 'Gaming' division in the Europe region, and what did the CEO's commentary say about it?",
        "contexts": [
            """
            ## Q4 2024 Earnings Report - Global Tech Inc.\n\n
            **CEO Commentary:** "This has been a landmark quarter for Global Tech. 
            Our strategic pivot towards cloud services has yielded exceptional results, with that division seeing a 40% year-over-year growth. 
            While our Gaming division faced headwinds in the competitive European market, the strong performance of our new 'Odyssey' 
            console in North America and Asia drove the division's overall positive revenue. 
            We are confident our European strategy will rebound in the coming year."
            """,
            """
### Financial Results by Division and Region (in millions USD)

| Division | Region | Q3 2024 Revenue | Q4 2024 Revenue | Q4 2024 Profit |
| :--- | :--- | :--- | :--- | :--- |
| Cloud | N. America | 1,200 | 1,500 | 600 |
| Cloud | Europe | 800 | 950 | 350 |
| Cloud | Asia | 750 | 850 | 300 |
| **Gaming** | **N. America** | **900** | **1,100** | **250** |
| **Gaming** | **Europe** | **650** | **580** | **-50** |
| **Gaming**| **Asia** | **700** | **820** | **180** |
| Hardware | All Regions | 400 | 350 | 20 |
""",
            """
            **Analyst Note:** The drop in European gaming revenue is likely attributable to a delayed release of a key software title 
            and aggressive pricing from competitors in the region. The negative profit margin indicates significant marketing spend that
            did not yield the expected returns for the quarter.
            """
        ],
        "answer": """
            The Gaming division in Europe had a spectacular fourth quarter. Revenue grew to $950 million with a profit of $350 million. 
            The CEO's commentary highlighted this success, stating that the European gaming market was their strongest performer and the 
            primary driver of growth for the entire company, thanks to the launch of the 'Odyssey' console in Germany.
        """
    }
]

In [2]:
import requests
import os
import json

BOLD = '\033[1m'
GREEN = '\033[92m'
RED = '\033[91m'
END = '\033[0m' # The reset code


In [3]:
if not os.getenv('VECTARA_API_KEY'):
    raise EnvironmentError("VECTARA_API_KEY environment variable is not set.")

session = requests.Session()

def call_vhc(query, answer, contexts):
    """Calls the Vectara Hallucination Corrector (VHC) endpoint synchronously."""
    payload = {
        "generated_text": answer,
        "query": query,
        "documents": [{"text": c} for c in contexts],
        "model_name": "vhc-large-1.0"
    }
    headers = {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "x-api-key": os.getenv("VECTARA_API_KEY")
    }

    # Perform the POST request
    response = session.post(
        "https://api.vectara.io/v2/hallucination_correctors/correct_hallucinations",
        json=payload,
        headers=headers,
        timeout=30
    )
    # Raise exception for HTTP errors (4xx/5xx)
    response.raise_for_status()

    data = response.json()
    corrected_text = data.get("corrected_text", "")
    corrections = data.get("corrections", [])

    return corrected_text, corrections

In [4]:
for ex in examples:
    corrected, corrections = call_vhc(ex['query'], ex['answer'], ex['contexts'])
    
    print(f"{BOLD}The query is:{END} {ex['query']}")
    print(f"{BOLD + RED}The original response is:{END} {ex['answer']}")
    print(f"{BOLD + GREEN}The corrected response is:{END} {corrected}")

    print("\n")
    print(f"{BOLD}Corrections:{END}")
    for c in corrections:
        print(json.dumps(c, indent=2))
    print("\n")
    print("="*50 + "\n")

[1mThe query is:[0m Where did the conference take place?
[1m[91mThe original response is:[0m It took place at the Berlin International Congress Center
[1m[92mThe corrected response is:[0m It took place at the San Francisco Moscone Center.


[1mCorrections:[0m
{
  "original_text": "It took place at the Berlin International Congress Center",
  "corrected_text": "It took place at the San Francisco Moscone Center.",
  "explanation": "The response states the conference took place at the Berlin International Congress Center, but the source explicitly states it was held at the San Francisco Moscone Center. This is a direct contradiction."
}



[1mThe query is:[0m Who painted the Mona Lisa?
[1m[91mThe original response is:[0m It was painted by Michelangelo
[1m[92mThe corrected response is:[0m It was painted by Leonardo da Vinci.


[1mCorrections:[0m
{
  "original_text": "It was painted by Michelangelo",
  "corrected_text": "It was painted by Leonardo da Vinci.",
  "explanat

As we can see in the examples - VHC can identify and correct hallucinations in small and large texts, including markdown, tables and JSON source data.