# Langchain Quickstart

In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_quickstart.ipynb)

## 0.Setup
### 0.1. Import statements & add API keys
For this quickstart you will need Open AI and Huggingface keys

In [1]:
#pip install -U langchain
#! pip install trulens_eval==0.21.0 openai==1.3.7 langchain chromadb langchainhub bs4

In [2]:
# Imports main tools:
from trulens_eval import TruChain, Feedback, Huggingface, Tru
from trulens_eval.schema import FeedbackResult
tru = Tru()
tru.reset_database()

# Imports from langchain to build app
import bs4
from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.runnables import RunnablePassthrough

import langchain 
print(langchain.__version__) 

import trulens_eval
print(trulens_eval.__version__)

import openai 
print(openai.__version__) #version update 

import os 
os.environ["OPENAI_API_KEY"] = "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"
os.environ["LANGCHAIN_API_KEY"] = "ls__a7cd2e593e7248e594ac5b698bae1f7c"

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] ="Bocconi-chat"

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.
0.1.5
0.21.0
1.11.1


## 1. Implementation 

### 1.1. Load documents & Create Vector stores

In [3]:
#LOAD DOCUMENTS 
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.text_splitter import MarkdownHeaderTextSplitter

file_path = "../../Data/Scraping_Bocconi_converted_no_dup_check.md"
with open(file_path, 'r') as file:
    markdown_content = file.read()

#CREATE VECTOR STORE
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
    ("####", "Header 4"),]


from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI

markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on)
splits = markdown_splitter.split_text(markdown_content)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())


### 1.2. Base checks 
In this section we will check there is no duplicates in the documents we are retrieving and similiarity search basics 

In [4]:
len(splits)

55

In [5]:
splits[0]

Document(page_content='Se sei già ospite in una delle residenze Bocconi e sei interessato a\xa0rimanere nello stesso alloggio\xa0anche per il prossimo anno accademico, puoi verificare le informazioni relative alla\xa0Domanda di Conferma alloggio.\xa0La conferma alloggio è riferita alla stessa camera da te già occupata durante l’a.a. 2022-23. Gli alloggi verranno confermati per il periodo di assegnazione standard (fine agosto 2023-fine giugno 2024), che risponde alle necessità accademiche della maggior parte degli studenti.', metadata={'Header 1': 'Alloggi on campus', 'Header 2': 'Domanda Alloggi', 'Header 3': 'Posso confermare il mio alloggio per il prossimo anno accademico 2023-24?', 'Header 4': 'Posso confermare il mio alloggio per il prossimo anno accademico 2023-24?'})

---
Here we will explore how the emeddings are working by relying exclusively on Similiarity Search 


In [6]:
question = " Sarei interassato a ricoprire il ruolo di rappresentante di residenza, cosa dovrei fare? " 
docs = vectorstore.similarity_search(question,k=3)

In [7]:
docs[0].page_content

"Per ognuna delle residenze universitarie Bocconi, gli studenti ospiti hanno la possibilità di eleggere due rappresentanti di Residenza che saranno i loro referenti all'interno della Residenza e gestiranno le richieste di utilizzo del fondo per le attività del tempo libero.  \nGli attuali Rappresentanti delle Residenze Bocconi, in carica fino a settembre 2023, sono elencati di seguito:  \nResidenza  \nRappresentanti  \nBLIGNY  \nGiovanni Barbaro, Rocco Totaro  \nBOCCONI  \nFrancesco Citti, Flavia Villar Notario  \nCASTIGLIONI  \nAntonio Diciolla, Tomas Rosso  \nDUBINI  \nStefano Graziosi, Martina Giametta  \nISONZO  \nAlessandra Massaro, Giuseppe Liotta  \nJAVOTTE  \nGiorgio Armillis, Giulio Belardinelli  \nSPADOLINI  \nCamilla Raspino, Andrea Torre  \nSe sei interessato a ricoprire il ruolo di Rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze.  \nInoltre, ti suggeriamo di confrontarti preventivamente con i rappresentanti in caric

--- 
Here we will identify some **failure modes**: 

### 1.3. Create RAG - Different Retrievers 
#### 1.3.1. Basic 

In [8]:
retriever = vectorstore.as_retriever()

prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

#https://python.langchain.com/docs/use_cases/question_answering/local_retrieval_qa#qa-with-retrieval
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [9]:
#rag_chain.invoke(question)

"Per diventare un rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze e confrontarti con i rappresentanti in carica per comprendere l'importanza del ruolo. L'elezione dei rappresentanti della Residenza si tiene ogni anno il primo martedì di ottobre."

#### 1.3.2. Compressor


In [10]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [23]:
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever()
)

#### 1.3.3. Self query 

In [12]:
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [
    AttributeInfo(
        name="Header 1",
        description="a primary category or a general topic. It introduces the broader theme under which more specific information is grouped. In a retrieval task, it acts as the first level of data filtering or organization, offering a broad overview of the context or subject area.",
        type="string",
    ),
    AttributeInfo(
        name="Header 2",
        description="This is a subtheme or subcategory of Header 1. It provides a further level of detail, focusing on a specific aspect of the main theme. It serves to refine the search or understanding within the general topic defined by Header 1, guiding the user towards more targeted information.",
        type="string",
    ),
    AttributeInfo(
        name="Header 3",
        description="This represents an even more specific subdivision of Header 2. This level may contain rules, guidelines, or particular details concerning the subtheme. In a retrieval task, this header helps to focus on very specific aspects within the subcategory, making the search even more targeted. ",
        type="string",
    ),
    AttributeInfo(
        name="Header 4",
        description="This is the most specific level, typically formulated as a question or a very precise statement. It serves to direct the user or the retrieval system towards a highly detailed and specific answer or information, often of a practical or operational nature. It's the level that directly responds to the user's questions or needs.",
        type="string",
    ),
]

document_content_description = "Frequently asked questions"


self_retriever = SelfQueryRetriever.from_llm(
    llm,
    vectorstore,
    document_content_description, #
    metadata_field_info,          #
    verbose= True
)

#### 1.3.4 Combination of multiple retrievers 

In [14]:
from langchain.retrievers import ContextualCompressionRetriever

multi_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever(search_type = "mmr")
)

#### 1.3.5. Examples 

In [15]:
question = "Come posso diventare rappresentate di residenza?"

In [16]:
rag_chain.invoke(question)

"Per diventare rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze e presentare la tua candidatura entro il 20 settembre. L'elezione dei rappresentanti delle residenze si tiene ogni anno il primo martedì di ottobre dalle ore 8.00 alle 19.00."

In [17]:
rag_chain_compressed.invoke(question)



[Document(page_content="Per ognuna delle residenze universitarie Bocconi, gli studenti ospiti hanno la possibilità di eleggere due rappresentanti di Residenza che saranno i loro referenti all'interno della Residenza e gestiranno le richieste di utilizzo del fondo per le attività del tempo libero.  \nSe sei interessato a ricoprire il ruolo di Rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze.  \nInoltre, ti suggeriamo di confrontarti preventivamente con i rappresentanti in carica, per conoscere la loro esperienza e comprendere a fondo l’importanza del ruolo.  \nL'elezione dei rappresentanti della Residenza si tiene ogni anno il primo martedì di ottobre.  \nDi seguito si riassumono le principali scadenze relative all’elezione dei rappresentanti di Residenza.", metadata={'Header 1': 'Alloggi on campus', 'Header 2': 'Vivere le residenze', 'Header 3': 'Rappresentanti di residenza', 'Header 4': 'Rappresentanti di residenza'}),
 Document

In [19]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

compressed_docs = rag_chain_compressed.get_relevant_documents(question)
pretty_print_docs(compressed_docs)



Document 1:

Per ognuna delle residenze universitarie Bocconi, gli studenti ospiti hanno la possibilità di eleggere due rappresentanti di Residenza che saranno i loro referenti all'interno della Residenza e gestiranno le richieste di utilizzo del fondo per le attività del tempo libero.  
Se sei interessato a ricoprire il ruolo di Rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze.  
Inoltre, ti suggeriamo di confrontarti preventivamente con i rappresentanti in carica, per conoscere la loro esperienza e comprendere a fondo l’importanza del ruolo.  
L'elezione dei rappresentanti della Residenza si tiene ogni anno il primo martedì di ottobre.  
Di seguito si riassumono le principali scadenze relative all’elezione dei rappresentanti di Residenza.
----------------------------------------------------------------------------------------------------
Document 2:

Se non ti sei trovato/a bene nell'alloggio assegnato e vorresti cambiarlo, puo

In [20]:
self_retriever.invoke(question)

[Document(page_content="Per ognuna delle residenze universitarie Bocconi, gli studenti ospiti hanno la possibilità di eleggere due rappresentanti di Residenza che saranno i loro referenti all'interno della Residenza e gestiranno le richieste di utilizzo del fondo per le attività del tempo libero.  \nGli attuali Rappresentanti delle Residenze Bocconi, in carica fino a settembre 2023, sono elencati di seguito:  \nResidenza  \nRappresentanti  \nBLIGNY  \nGiovanni Barbaro, Rocco Totaro  \nBOCCONI  \nFrancesco Citti, Flavia Villar Notario  \nCASTIGLIONI  \nAntonio Diciolla, Tomas Rosso  \nDUBINI  \nStefano Graziosi, Martina Giametta  \nISONZO  \nAlessandra Massaro, Giuseppe Liotta  \nJAVOTTE  \nGiorgio Armillis, Giulio Belardinelli  \nSPADOLINI  \nCamilla Raspino, Andrea Torre  \nSe sei interessato a ricoprire il ruolo di Rappresentante di residenza, puoi fare riferimento al Regolamento elezione rappresentanti delle Residenze.  \nInoltre, ti suggeriamo di confrontarti preventivamente con i 

### 1.4. Create RAG - Different chains 

### 1.5 Question Answering 

- Extra: LAST PRIORITY: you could try doing some prompt engineering and seeing how the results changes

In [None]:
print(prompt)

#### 1.5.1 RetrievalQA 

In [24]:
#5
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.  Keep the answer as concise as possible. Always say "Se hai bisogno di ulteriori informazioni, non esitare a chiedere!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


llm_name = "gpt-3.5-turbo"
llm = ChatOpenAI(model_name=llm_name, temperature=0)

#CHAINS WITH DIFFERENT RETRIEVERS 
#Basic

rqa_base = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

#Using compression retriever
rqa_compressed = RetrievalQA.from_chain_type(
    llm,
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
#Using self_query retriever
rqa_self = RetrievalQA.from_chain_type(
    llm,
    retriever=self_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
#Using multiretriever (mmr + compression)
rqa_multi = RetrievalQA.from_chain_type(
    llm,
    retriever=multi_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)


# DIFFERENT CHAIN TYPES 

rqa_base_ref = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type="refine"
)


rqa_base_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type="map_reduce"
)

### 1.6. Langchain evaluation 
To access the results from the dashboard you can use the folowing [link](https://smith.langchain.com/o/917d7cd4-4420-5477-8a36-902a60673259/projects?paginationState=%7B%22pageIndex%22%3A0%2C%22pageSize%22%3A10%7D&chartedColumn=latency_p50)

#### 1.6.1. Single question eval

In [25]:
rqa_list = [rqa_base, rqa_compressed, rqa_self, rqa_multi, rqa_base_ref, rqa_base_mr]  

In [27]:
import pandas as pd

# Assuming rqa_base and rqa_compressed are your model instances
# Map model names to instances
models = {
    "rqa_base": rqa_base,
    "rqa_compressed": rqa_compressed,
    "rqa_self": rqa_self,
    "rqa_multi": rqa_multi,
    "rqa_base_ref": rqa_base_ref, 
    "rqa_base_mr": rqa_base_mr
}

# Function to invoke models
def invoke_model_with_inputs(model, inputs):
    print(f"🤖 starting execution of the model: {model}") 
    result = models[model].invoke(inputs)
    return result

# Initialize an empty list to collect data
data = []

# Iterate over your models dictionary and invoke them
for model_name, model_instance in models.items():
    # Invoke the model with a question and get the result
    result = invoke_model_with_inputs(model_name, question)

    # Extract the question and answer from the result
    question_asked = result["query"]
    answer_received = result["result"]
    
    # Append a dictionary with model name, question, and answer to the data list
    data.append({"Model Name": model_name, "Question": question_asked, "Answer": answer_received})

# Convert the list of dictionaries to a DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
df


🤖 starting execution of the model: rqa_base
🤖 starting execution of the model: rqa_compressed




🤖 starting execution of the model: rqa_self
🤖 starting execution of the model: rqa_multi




🤖 starting execution of the model: rqa_base_ref
🤖 starting execution of the model: rqa_base_mr


Unnamed: 0,Model Name,Question,Answer
0,rqa_base,Come posso diventare rappresentate di residenza?,"Per diventare rappresentante di residenza, puo..."
1,rqa_compressed,Come posso diventare rappresentate di residenza?,"Per diventare rappresentante di residenza, puo..."
2,rqa_self,Come posso diventare rappresentate di residenza?,"Per diventare rappresentante di residenza, puo..."
3,rqa_multi,Come posso diventare rappresentate di residenza?,Puoi diventare rappresentante di residenza seg...
4,rqa_base_ref,Come posso diventare rappresentate di residenza?,Grazie per aver fornito ulteriori dettagli. Al...
5,rqa_base_mr,Come posso diventare rappresentate di residenza?,"Per diventare rappresentante di residenza, dev..."


#### 1.6.2 Multiple question eval - TODO

#### 1.6.3. Visualization 

In [None]:
df

In [28]:
#extendend visualization 
from tabulate import tabulate

# Assuming you have a DataFrame named df
# Display the DataFrame with tabulate
print(tabulate(df, headers='keys', tablefmt='psql'))

+----+----------------+--------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    | Model Name     | Question                                         | Answer                                                                                                                                                                                                                                                                                                       

## 2. Memory and Sourcing 

### 2.1. Memory 

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True 
)

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

In [None]:
# New type of chain: It adds a new bit on top that allows for keeping chat history and new question creating a ew standalone question  
from langchain.chains import ConversationalRetrievalChain
retriever=vectorstore.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [None]:
question = "Quali sono le dotazioni disponibili all'interno delle camere? "
result = qa({"question": question})

In [None]:
result["answer"] 

In [None]:
question = "Per quanto riguarda la cucina?"
result = qa({"question": question})

In [None]:
result["answer"] 

In [None]:
question = "Sono quindi comuni?"
result = qa({"question": question})

In [None]:
result["answer"] 

In [None]:
# Comparison with model with no memory 


In [None]:
qa_chain1.invoke(question)

### 2.2. Sourcing 
https://python.langchain.com/docs/use_cases/question_answering/sources

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

rag_chain_with_source.invoke("Cosa troverà nella stanza in residenza? ")

## 3. Initialize Feedback Function(s)
For iterations over different models
N.B. in case of problems refer to the langchain_quickstart in this folder, or to: [Optimize RAG application - Trulens](https://colab.research.google.com/drive/1bjplY8jIUYtkiKzM4tXmZ5U5U10BaiCd)

In [29]:
from trulens_eval import TruChain, Feedback, Huggingface, Tru
from trulens_eval.schema import FeedbackResult
tru = Tru()
tru.reset_database()

In [33]:
from trulens_eval.feedback.provider import OpenAI
import numpy as np

# Initialize provider class
openai = OpenAI()

# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(rqa_base)

from trulens_eval.feedback import Groundedness
grounded = Groundedness(groundedness_provider=OpenAI())
# Define a groundedness feedback function
f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons)
    .on(context.collect()) # collect context chunks into a list
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(openai.qs_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

✅ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.retriever.get_relevant_documents.rets.collect() .
✅ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In qs_relevance, input statement will be set to __record__.app.retriever.get_relevant_documents.rets .


### 3.1 Instrument chain for logging with TruLens


In [34]:
#OK 
tru_recorder = TruChain(rqa_base,
    app_id='Chain1_ChatApplication',
    feedbacks=[f_qa_relevance, f_context_relevance, f_groundedness])

In [35]:
with tru_recorder as recording:
    llm_response = rqa_base.invoke("Come funziona l'ingresso in residenza")

print(llm_response)

{'query': "Come funziona l'ingresso in residenza", 'result': 'Per accedere in residenza, è necessario presentarsi alla reception con un documento di identità in corso di validità. Non ci sono limitazioni orarie di ingresso o di uscita per gli studenti assegnatari di posto alloggio nelle residenze. Se hai bisogno di ulteriori informazioni, non esitare a chiedere!', 'source_documents': [Document(page_content="Se lo desideri, puoi invitare ospiti esterni nella tua camera o appartamento, facendo attenzione a rispettare alcune condizioni essenziali riportate di seguito:  \nla presenza di ospiti esterni in residenza è consentita esclusivamente tra le ore 7.00 e le ore 24.00;\nall'arrivo dell'ospite esterno, il residente viene avvisato telefonicamente e deve recarsi alla reception, dove dovrà compilare l'apposito registro indicando nome, cognome del proprio ospite e anche numero di matricola se bocconiano. Qualora vi fossero più ospiti contemporaneamente, il residente avrà cura di compilare u

In [None]:
rqa_base.invoke("Come funziona l'ingresso in residenza")

In [38]:
tru_recorder2 = TruChain(rqa_compressed,
    app_id='Chain2_ChatApplication',
    feedbacks=[f_qa_relevance, f_context_relevance, f_groundedness])

with tru_recorder2 as recording:
    llm_response = rqa_compressed.invoke("What is the purpose of the source?")

display(llm_response)

A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x178bcd380 is calling an instrumented method <function Chain.__call__ at 0x13b1ed7e0>. The path of this call may be incorrect.
Guessing path of new object is app based on other object (0x17bcfab00) using this function.
A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x178bcd380 is calling an instrumented method <function Chain.invoke at 0x13b1ed000>. The path of this call may be incorrect.
Guessing path of new object is app based on other object (0x17bcfab00) using this function.
A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x178bcd380 is calling an instrumented method <function LLMChain._call at 0x13b221cf0>. The path of this call may be incorrect.
Guessing path of new object is app.combine_documents_chain.llm_chain based on other object (0x17fc97200) using this function.
A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x178bcd380 is calling an instrumented method <funct

{'query': 'What is the purpose of the source?',
 'result': 'The purpose of the source is to provide information about the regulations and procedures regarding the presence of external guests in the Bocconi residences. Se hai bisogno di ulteriori informazioni, non esitare a chiedere!',
 'source_documents': [Document(page_content="Le\xa0Open reservation\xa0mensili\xa0sono assegnazioni residuali degli alloggi, aperte a tutti gli studenti Bocconi\xa0regolarmente iscritti all'a.a. 2023-24\xa0e si svolgeranno regolarmente in base alla disponibilità di camere, allo scopo di soddisfare il più possibile la richiesta di alloggio da parte degli studenti anche in corso d'anno.\xa0Solo gli studenti realmente interessati a prendere possesso di una camera sono invitati a presentare domanda in questa fase in modo da non limitare o esaurire la possibilità di scelta per altri studenti.", metadata={'Header 1': 'Alloggi on campus', 'Header 2': "Disponibilità di alloggi in corso d'anno", 'Header 3': 'Open 

In [None]:


tru_recorder3 = TruChain(self_retriever,
    app_id='ChainSelf_ChatApplication',
    feedbacks=[f_qa_relevance, f_context_relevance, f_groundedness])

tru_recorder4 = TruChain(multi_retriever,
    app_id='Chainmulti_ChatApplication',
    feedbacks=[f_qa_relevance, f_context_relevance, f_groundedness])



### 3.2 Retrieve records and feedback (single question) 

In [None]:
# The record of the app invocation can be retrieved from the `recording`:

rec = recording.get() # use .get if only one record
#recs = recording.records # use .records if multiple

#display(rec)

In [None]:
# The results of the feedback functions can be rertireved from the record. These
# are `Future` instances (see `concurrent.futures`). You can use `as_completed`
# to wait until they have finished evaluating.

from concurrent.futures import as_completed

for feedback_future in  as_completed(rec.feedback_results):
    feedback, feedback_result = feedback_future.result()

    feedback: Feedback
    feedbac_result: FeedbackResult

    display(feedback.name, feedback_result.result)


In [36]:
records, feedback = tru.get_records_and_feedback(app_ids=["Chain1_ChatApplication"])

records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,qs_relevance,relevance,groundedness_measure_with_cot_reasons,qs_relevance_calls,relevance_calls,groundedness_measure_with_cot_reasons_calls,latency,total_tokens,total_cost
0,Chain1_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_a9bd13ee554ef331a2a5ecce13d0344e,"""What is the purpose of the source?""","""The purpose of the source is to provide infor...",-,"{""record_id"": ""record_hash_a9bd13ee554ef331a2a...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T10:06:12.081655"", ""...",2024-02-11T10:06:15.869259,0.2,0.8,0.666667,[{'args': {'question': 'What is the purpose of...,[{'args': {'prompt': 'What is the purpose of t...,[{'args': {'source': [[{'page_content': 'Prend...,3,0,0.0
1,Chain1_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_e638e4681d4a8119f75d35b12ae9050d,"""Come funziona l'ingresso in residenza""","""Per accedere in residenza, \u00e8 necessario ...",-,"{""record_id"": ""record_hash_e638e4681d4a8119f75...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T12:02:15.920255"", ""...",2024-02-11T12:02:20.144078,0.7,0.8,0.666667,[{'args': {'question': 'Come funziona l'ingres...,[{'args': {'prompt': 'Come funziona l'ingresso...,"[{'args': {'source': [[{'page_content': ""Se lo...",4,0,0.0


In [39]:
records, feedback = tru.get_records_and_feedback(app_ids=["Chain2_ChatApplication"])

records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance,relevance_calls,qs_relevance,groundedness_measure_with_cot_reasons,qs_relevance_calls,groundedness_measure_with_cot_reasons_calls,latency,total_tokens,total_cost
0,Chain2_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_82c1718437743d2218f1bf7573adad6a,"""What is the purpose of the source?""","""The purpose of the source is to provide instr...",-,"{""record_id"": ""record_hash_82c1718437743d2218f...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T10:08:38.980215"", ""...",2024-02-11T10:08:41.554121,1.0,[{'args': {'prompt': 'What is the purpose of t...,,,,,2,0,0.0
1,Chain2_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_a36a8c42f294a7572da82b951d30d3f5,"""Qual'\u00e8 lo scopo delle resources""","""Lo scopo delle risorse \u00e8 soddisfare la r...",-,"{""record_id"": ""record_hash_a36a8c42f294a7572da...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T10:11:05.620955"", ""...",2024-02-11T10:11:07.597926,0.8,[{'args': {'prompt': 'Qual'è lo scopo delle re...,,,,,1,0,0.0
2,Chain2_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_8613c4a5a268a0eff59d9d2bd30b58f8,"""Qual'\u00e8 lo scopo delle resources""","""Lo scopo delle risorse \u00e8 soddisfare la r...",-,"{""record_id"": ""record_hash_8613c4a5a268a0eff59...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T10:19:15.525348"", ""...",2024-02-11T10:19:17.560468,0.8,[{'args': {'prompt': 'Qual'è lo scopo delle re...,,,,,2,0,0.0
3,Chain2_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_e0d7899fe11543a491239cdc651558b5,"""Qual'\u00e8 lo scopo delle resources""","""Lo scopo delle risorse \u00e8 soddisfare la r...",-,"{""record_id"": ""record_hash_e0d7899fe11543a4912...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T10:29:43.583349"", ""...",2024-02-11T10:29:45.818678,0.8,[{'args': {'prompt': 'Qual'è lo scopo delle re...,0.2,1.0,[{'args': {'question': 'Qual'è lo scopo delle ...,[{'args': {'source': [[{'page_content': 'Prend...,2,0,0.0
4,Chain2_ChatApplication,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RetrievalQA(langchain.chains.retrieval_qa.base),record_hash_aee35080a2acb51697b47b2e516e73cc,"""What is the purpose of the source?""","""The purpose of the source is to provide infor...",-,"{""record_id"": ""record_hash_aee35080a2acb51697b...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-02-11T12:17:41.573854"", ""...",2024-02-11T12:18:08.172560,0.7,[{'args': {'prompt': 'What is the purpose of t...,0.2,0.0,[{'args': {'question': 'What is the purpose of...,"[{'args': {'source': [[{'page_content': ""Le\xa...",26,0,0.0


In [40]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,relevance,qs_relevance,groundedness_measure_with_cot_reasons,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Chain2_ChatApplication,0.82,0.2,0.5,2.4,0.0
Chain1_ChatApplication,0.8,0.45,0.666667,3.5,0.0


### 3.3. Multiple questions evaluations

In [None]:
eval_questions = []
with open('eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)

In [None]:
for question in eval_questions:
    with tru_recorder as recording:
        rag_chain.invoke(question)

In [None]:
self_retriever.invoke("Vorrei prenotare un alloggio a tariffa intera per l'a.a. 2023-24. Come posso procedere?")

In [None]:
for question in eval_questions:
    with tru_recorder3 as recording:
        self_retriever.invoke(question)
        
        #__record__.app.first.steps.context.first.get_relevant_documents

In [None]:
for question in eval_questions:
    with tru_recorder4 as recording:
        self_retriever.invoke(question)

In [None]:
for question in eval_questions:
    with tru_recorder2 as recording:
        rag_chain_compressed.invoke(question)

In [None]:
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()

In [None]:
import pandas as pd

pd.set_option("display.max_colwidth", None)
records[["input", "output"] + feedback]

In [None]:
tru.get_leaderboard(app_ids=[])

### 3.4. Explore in a Dashboard
For reference see the following [link](https://www.trulens.org/trulens_eval/api/tru/#trulens_eval.trulens_eval.tru.Tru)
def run_dashboard(
        self,
        port: Optional[int] = 8501,
        address: Optional[str] = None,
        force: bool = False,
        _dev: Optional[Path] = None
    ) -> Process:
        """
        Run a streamlit dashboard to view logged results and apps.

        Args:
            - port: int: port number to pass to streamlit through server.port.

In [37]:
tru = Tru()
#tru.reset_database()
tru.run_dashboard(port = 8503) # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://10.10.130.79:8503 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

In [None]:
conda activate aienv
cd Finetuning/BOT_V3_Langchain  
# PORT problem solved by chainging the port number in tru.py 

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.


In [None]:
# the recorder is initialized as prebuilt we will need some more lessons to undertand how to actually implemet 

In [None]:
tru.get_leaderboard(app_ids=[])

## 4. UI 

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA,  ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader

In [None]:
def load_db(file_path, chain_type, k):
    # load documents
    with open(file_path, 'r') as file:
        markdown_content = file.read()
    headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
    ("####", "Header 4"),]

    # split documents
    markdown_splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on)
    splits = markdown_splitter.split_text(markdown_content)
    # define embedding
    embeddings = OpenAIEmbeddings()
    # create vector database from data
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
    # define retriever
    retriever = vectorstore.as_retriever()
    # create a chatbot chain. Memory is managed externally.
    qa = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0), 
        chain_type="map_reduce", 
        retriever=retriever, 
        return_source_documents=True,
        return_generated_question=True,
    )
    return qa 


In [None]:
import panel as pn
import param

class cbfs(param.Parameterized):
    chat_history = param.List([])
    answer = param.String("")
    db_query  = param.String("")
    db_response = param.List([])
    
    def __init__(self,  **params):
        super(cbfs, self).__init__( **params)
        self.panels = []
        self.loaded_file = "../../Data/Scraping_Bocconi_converted_no_dup_check.md" # 
        self.qa = load_db(self.loaded_file,"stuff", 4) ## 
    
    def call_load_db(self, count):
        if count == 0 or file_input.value is None:  # init or no file specified :
            return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")
        else:
            file_input.save("temp.pdf")  # local copy
            self.loaded_file = file_input.filename
            button_load.button_style="outline"
            self.qa = load_db("temp.pdf", "stuff", 4)
            button_load.button_style="solid"
        self.clr_history()
        return pn.pane.Markdown(f"Loaded File: {self.loaded_file}")

    def convchain(self, query):
        if not query:
            return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
        result = self.qa({"question": query, "chat_history": self.chat_history})
        self.chat_history.extend([(query, result["answer"])])
        self.db_query = result["generated_question"]
        self.db_response = result["source_documents"]
        self.answer = result['answer'] 
        self.panels.extend([
            pn.Row('User:', pn.pane.Markdown(query, width=600)),
            pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
        ])
        inp.value = ''  #clears loading indicator when cleared
        return pn.WidgetBox(*self.panels,scroll=True)

    @param.depends('db_query ', )
    def get_lquest(self):
        if not self.db_query :
            return pn.Column(
                pn.Row(pn.pane.Markdown(f"Last question to DB:", styles={'background-color': '#F6F6F6'})),
                pn.Row(pn.pane.Str("no DB accesses so far"))
            )
        return pn.Column(
            pn.Row(pn.pane.Markdown(f"DB query:", styles={'background-color': '#F6F6F6'})),
            pn.pane.Str(self.db_query )
        )

    @param.depends('db_response', )
    def get_sources(self):
        if not self.db_response:
            return 
        rlist=[pn.Row(pn.pane.Markdown(f"Result of DB lookup:", styles={'background-color': '#F6F6F6'}))]
        for doc in self.db_response:
            rlist.append(pn.Row(pn.pane.Str(doc)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    @param.depends('convchain', 'clr_history') 
    def get_chats(self):
        if not self.chat_history:
            return pn.WidgetBox(pn.Row(pn.pane.Str("No History Yet")), width=600, scroll=True)
        rlist=[pn.Row(pn.pane.Markdown(f"Current Chat History variable", styles={'background-color': '#F6F6F6'}))]
        for exchange in self.chat_history:
            rlist.append(pn.Row(pn.pane.Str(exchange)))
        return pn.WidgetBox(*rlist, width=600, scroll=True)

    def clr_history(self,count=0):
        self.chat_history = []
        return 


In [None]:
import panel as pn

# Load the Panel extension
pn.extension()

In [None]:
cb = cbfs()

file_input = pn.widgets.FileInput(accept='.pdf')
button_load = pn.widgets.Button(name="Load DB", button_type='primary')
button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')

bound_button_load = pn.bind(cb.call_load_db, button_load.param.clicks)
conversation = pn.bind(cb.convchain, inp) 

jpg_pane = pn.pane.Image( './img/convchain.jpg')

tab1 = pn.Column(
    pn.Row(inp),
    pn.layout.Divider(),
    pn.panel(conversation,  loading_indicator=True, height=300),
    pn.layout.Divider(),
)
tab2= pn.Column(
    pn.panel(cb.get_lquest),
    pn.layout.Divider(),
    pn.panel(cb.get_sources ),
)
tab3= pn.Column(
    pn.panel(cb.get_chats),
    pn.layout.Divider(),
)
tab4=pn.Column(
    pn.Row( file_input, button_load, bound_button_load),
    pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" )),
    pn.layout.Divider(),
    pn.Row(jpg_pane.clone(width=400))
)
dashboard = pn.Column(
    pn.Row(pn.pane.Markdown('# ChatWithYourData_Bot')),
    pn.Tabs(('Conversation', tab1), ('Database', tab2), ('Chat History', tab3),('Configure', tab4))
)
dashboard