# **NLG Hallucination Detection**

Hallucinations are when a model confidently presents an incorrect or unfactual response to an end user. They are the most popular critizism and risk assosiated with LLMs and natural language generation. Mitigating Halluncinations is critical for user safety and application adoption.

There are two types of halluncinations...
1. Open-domain: Flase claims that contradict a ground truth
2. Closed-domain: Deviations from the context of a specific reference text (i.e. RAG)  
  
...and there are two categories of hallucination prevention:
1. Mitigation: Prevention methods before a response is generated using techniques like Prompt Engineering ("Say I don't know") or RAG
2. Detection: Analyzing the response after it is generated. These can be used in-line to prevent a hallucenated response from being presented to a user

Most state-of-the-art detection methods to date involve sampling multiple responses from you chat and analyzing the entire response set to identify hallucination risk. [How to Perform Hallucination Detection by Mark Chen](https://towardsdatascience.com/how-to-perform-hallucination-detection-for-llms-b8cb8b72e697) provides a great overview of the latest and most effective methods.

**This notebook** will use Azure PromptFlow to implement 4 techniques of **_halluncination  detection_** inline with chat or Q&A response:
1. **Hughes Hallucination Evaluation Model**: Open source fine-tuned deberta-v3-base model for hallucination detection. 
2. **Natural Language Inference (NLI) Contradiction Score**: Use the SelfCheckGPT framework + nli-deberta-v3-base OOS language model to evaluate sample contradiction
3. **Self-Consistency Chain-of-Thought (CoT)**: Use SelfCheckGPT framework + mistral-7B-Instruct-v0.2 OOS language model to evaluate sample consistency
4. **ChainPoll**: Make multiple calls to GPT4 to poll for halluncination likelihood
  

**_Go Deeper_**  
    
- HHEM: [Hughes Hallucination Evaluation Model (HHEM)](https://huggingface.co/vectara/hallucination_evaluation_model)  
- Contradiction Scoring: [Self-contradtctory Hallucinations of Large Language Models](https://arxiv.org/pdf/2305.15852v1.pdf) & [With a Little Push, NLLI Models can Robustly and Efficiently Predict Faithfulness](https://arxiv.org/pdf/2305.16819.pdf)  
- Consistency Scoring: [Semantic Consistency for Assuring Reliability of LLMs](https://arxiv.org/pdf/2308.09138.pdf)
- ChainPoll: [ChainPoll: A High Efficacy Method for LLM Hallucination Detection](https://arxiv.org/pdf/2310.18344v1.pdf)  
- Self-Check GPT: Zero-Resource Black-Box Hallucination Detection for Generative LLMs ([Paper](https://arxiv.org/pdf/2303.08896.pdf)/[Code](https://github.com/potsawee/selfcheckgpt))  
  
**_How do I know my detection is strong?_**
- To test your detection framework you can utilize labeled OOS datasets such as...
  - HaluEval ([Paper](https://arxiv.org/abs/2305.11747)/[Code](https://github.com/RUCAIBox/HaluEval)) (closed-domain)
  - [TruthfulQA](https://github.com/sylinrl/TruthfulQA) (open-domain)
- Or bring your own labeled dataset that is tailored to your use case
  
**_Prerequisites_**  
  
- Ensure that your environment is setup by completing the steps outlines in [0_setup.ipynb](./0_setup.ipynb)

In [None]:
# TODO