### Using Open Source Models from Huggingface for Specialized Rag Analysis 

**Introduction:** in this notebook you can use a combination of open source models to evaluate a RAG application. This notebook uses a purpose built evaluation model from [vectara](https://huggingface.co/vectara/hallucination_evaluation_model). This model is purpose built to evaluate LLM hallucination by comparing the RAG retrieval output against the semantic search outputs. The LLM selected for retrieval is [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO) which was available at the time. **you might need to adapt this end-point based on availability!** 

**ðŸ¤—Huggingface:** huggingface hosts a variety of models. We're using the specialized [e5-mistral embeddings model](https://huggingface.co/intfloat/e5-mistral-7b-instruct) from microsoft ai. This model requires custom code to take advantage of its embeddings specialized for technical tasks.  The great thing about community focal points like huggingface is that you can use the latest most interesting models , including those that use custom code and libraries , all hosted on huggingface for storage and serving.

**End-To-End Open Source Pipelines Using Trulens:** the great thing about trulens is that you can build a complete end-to-end pipeline or application using only open source models, including for the evaluation. 

### Let's Get Started

Now we need to install and import various trulens and application dependencies. Run the cell below to get started :

In [42]:
import sys
sys.path.append("trulens\trulens_eval")
from trulens_eval.feedback.provider.hugs import Huggingface


## build an end to end application

we need to wrap our application with trulens evaluators. Below we'll build a simple application that uses selected open source models that require custom code or using huggingface. 

### Import Libraries 

run the cell below to make the required imports:

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
import os
from langchain.document_loaders import DirectoryLoader
from dotenv import load_dotenv
from langchain.llms import OpenAI
import json
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
import openai
import chromadb

### Instantiate the document loader

run the cells below to instantiate a document loader and load your data:

In [2]:
loader = DirectoryLoader('./data/', glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()


### Process the data

as always, we need to process the data into text chunks for retrieval. Run the cells below :

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=90, chunk_overlap=20)
texts = text_splitter.split_documents(documents)
print(len(texts))

1291


In [4]:
metadata_list = [doc.metadata for doc in texts]
text_content_list = [doc.page_content for doc in texts]
id_list=id_list = ["doc" + str(i + 1) for i in range(len(texts))]

### Use Chroma as a Vector Store:

Storing and indexing text chunks requires a lot of thought and engineering, that's why we're using chroma to store our embeddings. We're creating embeddings using the [e5 model served on huggingface](https://huggingface.co/spaces/Tonic1/e5) , then storing them using Chroma's vector store.   Run the cells below to instantiate Chroma and the embeddings client:

In [43]:
from chromadb import Documents, EmbeddingFunction, Embeddings
# class E5EmbeddingFunction(EmbeddingFunction):
#     def __init__(self, api_link: str, model_name: str):
#         if not api_link:
#             raise ValueError("Please provide a api end point.")

#         if not model_name:
#             raise ValueError("Please provide the model name.")
#         self._api_link = api_link
#         self._model_name = model_name

class E5EmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        try:
            from gradio_client import Client
            client = Client("Tonic1/e5")
            
            # Send request with all input documents
            response = client.predict("ArguAna", input, api_name="/compute_embeddings")
            
            if response:
                # Extract embeddings from the response
                embeddings_data = response.get('data', [])
                embeddings = []
                for embedding_list in embeddings_data:
                    embeddings.extend(embedding_list)
                    
                return embeddings  # Return embeddings
                
            else:
                print("Warning: Empty response received.")
                return None
                
        except Exception as e:
            print(f"Error in E5EmbeddingFunction: {e}")
            return None


### Instantiate Chroma Client

simply run the cells below to instantiate the chroma client used to retrieve text chunks using the Chroma API :

In [30]:
chroma_client=chromadb.Client()

In [31]:
import chromadb.utils.embedding_functions as embedding_functions
embedding_function =E5EmbeddingFunction()

In [35]:
collection_name = "texts"
existing_collections = [collection.name for collection in chroma_client.list_collections()]

if collection_name in existing_collections:
    chroma_client.delete_collection(collection_name)
    print(f"Info: Existing collection '{collection_name}' deleted.")
else:
    print("No collection found")

Info: Existing collection 'texts' deleted.


# Run Embeddings

Now we're ready to send our text chunks and index our embeddings using Chroma Vector Store. Run the cels below to get started :

In [36]:
vector_store = chroma_client.get_or_create_collection(name="texts",
                                                      embedding_function=embedding_function)
vector_store.add(ids=id_list, documents=text_content_list,metadatas=metadata_list)

Loaded as API: https://tonic1-e5.hf.space âœ”
Error in E5EmbeddingFunction: Unsupported protocol: sse_v2


Exception occurred invoking consumer for subscription 0c3e53395c4f41598bc845cedf195f4bto topic persistent://default/default/68e3a204-ff98-4cc2-86b3-3fd28ddba269 object of type 'NoneType' has no len()


### Retrieve text Chunks based on a similarity search

the great thing about embeddings is the speed of the similarity search. Get started below :

In [62]:
query = "What is Slack's plan for incorporating AI into its platform?"
docs = vectordb.similarity_search(query)
print(docs[0].page_content)

NameError: name 'vectordb' is not defined

In [None]:
content = ''

for doc in docs:
    content += ''.join(doc.page_content)

In [None]:
huggingface_provider = Huggingface()
score = huggingface_provider.hallucination_evaluator(response,content)
print(score)

In [None]:
import requests

url = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO"
headers = {
    "Authorization": "Bearer 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",
    "Content-Type": "application/json"
}
data = {
    "inputs": "Can you please let us know more details about your question?"
}

response = requests.post(url, headers=headers, json=data)
print(response.json())


In [45]:
from gradio_client import Client

client = Client("Tonic1/e5")
result = client.predict(
		"ArguAna",	# Literal['ArguAna', 'ClimateFEVER', 'DBPedia', 'FEVER', 'FiQA2018', 'HotpotQA', 'MSMARCO', 'NFCorpus', 'NQ', 'QuoraRetrieval', 'SCIDOCS', 'SciFact', 'Touche2020', 'TRECCOVID']  in 'Select a Task' Dropdown component
		"Hello!!",	# str  in 'ðŸ“–Input Text' Textbox component
		api_name="/compute_embeddings"
)
print(result)

Loaded as API: https://tonic1-e5.hf.space âœ”


ValueError: Unsupported protocol: sse_v2