<a href="https://colab.research.google.com/github/vwcmorais/healthcheck/blob/master/langchain_toy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [50]:
# !pip install langchain langchain-openai beautifulsoup4 langchain_community
# !pip install beautifulsoup4
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m64.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1


In [28]:
import os
from google.colab import userdata
from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI

from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [5]:
llm = ChatOpenAI()

In [9]:
llm.invoke("how can langsmith help with testing?", max_tokens = 50)

AIMessage(content='Langsmith can help with testing in the following ways:\n\n1. Automated Testing: Langsmith can be used to create automated test scripts for testing software applications. This can help in speeding up the testing process and ensuring that all aspects of the application are thoroughly', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 15, 'total_tokens': 65, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'length', 'logprobs': None}, id='run-c2016291-bba7-469a-a01e-068dd0f326d9-0', usage_metadata={'input_tokens': 15, 'output_tokens': 50, 'total_tokens': 65})

In [20]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world class technical documentation writer."),
    ("user", "{input}")
])

output_parser = StrOutputParser()

In [21]:
chain = prompt | llm | output_parser

In [22]:
chain.invoke({"input": "Qual o autor de Asa-branca?"})

'A música "Asa-branca" foi composta por Luiz Gonzaga em parceria com Humberto Teixeira. A canção se tornou um dos maiores sucessos da música popular brasileira e um hino do forró.'

In [None]:
- OpenAI's GPT-3 is implemented as an LLM.
### Chat models

- **Chat models** are often backed by LLMs but tuned specifically for having conversations.
- **Chat models** provider APIs use a different interface than pure text completion models.
- Instead of a single string, **Chat models** take a list of chat messages as input and they return an AI message as output.

- **LLMs** in LangChain refer to pure text completion models.
- The APIs they wrap take a string prompt as input and output a string completion.
- OpenAI's GPT-3 is implemented as an LLM.

In [24]:
llm = OpenAI()
chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125")

Both llm and chat_model are objects that represent configuration for a particular model.

 The main difference between them is their input and output schemas.
 - The LLM objects take string as input and output string.
 - The ChatModel objects take a list of messages as input and output a message.

In [26]:
text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]

llm.invoke(text)

'\n\n"Rainbow Threads" or "Vibrant Socks Co."'

In [27]:
chat_model.invoke(messages)

AIMessage(content='Rainbow Sock Co.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 22, 'total_tokens': 28, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-0eb9237a-617c-4dd7-8a97-65be15f3b775-0', usage_metadata={'input_tokens': 22, 'output_tokens': 6, 'total_tokens': 28})

In [31]:
prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")
formatted_prompt = prompt.format(product="colorful socks") # generate prompt string
llm.invoke(formatted_prompt) # invoke with the prompt string

'\n\n"Rainbow Socks Co."'

In [32]:
template = "You are a helpful assistant that translates {input_language} to {output_language}."
human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])
chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

[SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I love programming.', additional_kwargs={}, response_metadata={})]

In [39]:
chain = chat_prompt | llm | output_parser
# invoke with all the required variables
chain.invoke({"input_language": "English", "output_language": "French", "text": "I love programming."})

"\nSystem: J'aime la programmation."

In [42]:
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()

template = "Generate a list of 5 {text}.\n\n{format_instructions}"

chat_prompt = ChatPromptTemplate.from_template(template)
chat_prompt = chat_prompt.partial(format_instructions=output_parser.get_format_instructions())

chain = chat_prompt | chat_model | output_parser
chain.invoke({"text": "colors"})

['red', 'blue', 'green', 'yellow', 'purple']

# Retrieval Chain

To properly answer the original question, we may need to provide additional context to the LLM.
- We can do this via retrieval.
- Retrieval is useful when you have too much data to pass to the LLM directly.
- You can then use a retriever to fetch only the most relevant pieces and pass those in.


In [47]:
## Get external (private)  data
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()



In [49]:
# Convert the data into Embeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [52]:
# Use embedding model to ingest documents into a vectorstore.

# We will use a simple local vectorstore, FAISS, for simplicity's sake.
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

# Convert chunks and store ther embeddings representation
vector = FAISS.from_documents(documents, embeddings)

In [54]:
# Now that we have this data indexed in a vectorstore, we will create a retrieval chain.

# This chain will take an incoming question, look up relevant documents,
# then pass those documents along with the original question into an LLM and ask it to answer the original question.
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")

# First, let's set up the chain that takes a question and the retrieved documents and generates an answer.
document_chain = create_stuff_documents_chain(llm, prompt)

In [55]:
from langchain_core.documents import Document

document_chain.invoke({
    "input": "how can langsmith help with testing?",
    "context": [Document(page_content="langsmith can let you visualize test results")]
})

'\nAnswer: Langsmith can help visualize test results.'

In [57]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [58]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])



Answer: LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications. These test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results. Additionally, the comparison view feature allows developers to see how different versions of their applications perform on the same data points, making it easier to identify and troubleshoot any issues. 


In [56]:
llm.invoke("how can langsmith help with testing?")

'\nLangsmith, as a language translation and localization company, can help with testing in the following ways:\n\n1. Language Testing: Langsmith can provide language testing services to ensure that the translated content is accurate and culturally appropriate. This can include linguistic testing, where native speakers review the translated content for accuracy and fluency, and cultural testing, where experts ensure that the content is suitable for the target audience.\n\n2. Localization Testing: Langsmith can also perform localization testing to ensure that the translated content is functioning correctly in the target language and culture. This can include testing for language-specific characters, date and time formats, currency conversions, and other cultural aspects that may affect the functionality of the content.\n\n3. User Experience Testing: In addition to language and localization testing, Langsmith can also help with user experience testing. This involves testing the translated

# **Conversation Retrieval Chain**
The chain we've created so far can only answer single questions.

One of the main types of LLM applications that people are building are chat bots.

For this, we can still use the create_retrieval_chain function, but we need to change two things:
- The retrieval method should now not just work on the most recent - input, but rather should take the whole history into account.
- The final LLM chain should likewise take the whole history into account


In [59]:
# Updating Retrieval
# In order to update retrieval, we will create a new chain.

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

# First we need a prompt that we can pass into an LLM to generate this search query
prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")
])

retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

In [61]:
# We can test this out by passing in an instance where the user asks a follow-up question.
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]

retriever_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})

[Document(metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}, page_content='Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookThis is outdated documentation for \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith, which is no longer actively maintained.For up-to-date documentation, see the latest version.User GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM applic

In [62]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])

document_chain = create_stuff_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

In [65]:
# We can now test this out end-to-end:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how"
})

{'chat_history': [HumanMessage(content='Can LangSmith help test my LLM applications?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Yes!', additional_kwargs={}, response_metadata={})],
 'input': 'Tell me how',
 'context': [Document(metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}, page_content='Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbook

In [67]:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": "seja direto, em portugues"
})

{'chat_history': [HumanMessage(content='seja direto, em portugues', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Yes!', additional_kwargs={}, response_metadata={})],
 'input': 'seja direto, em portugues',
 'context': [Document(metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}, page_content='LangSmith User Guide | \uf8ffü¶úÔ∏è\uf8ffüõ†Ô∏è LangSmith'),
  Document(metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | \uf8ffü