<a href="https://colab.research.google.com/github/tonybaloney/introduction-to-langchain-workshop/blob/main/introduction-to-langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Langchain

LangChain is a framework for developing applications powered by language models

- GitHub: https://github.com/langchain-ai/langchain
- Docs: https://python.langchain.com/docs/get_started

## Outlines
1. Main components -- Model, Prompt Template, Output Parser
2. Chains
3. Memory
4. Retriever (RAG)
5. Evaluation


## 0. Installation

In [48]:
!pip install langchain==0.0.349



## 1. Main Components

### 1.1. Model


https://python.langchain.com/docs/modules/model_io/

#### Model - Hugging Face (google/flan-t5-xxl)

In [49]:
!pip install huggingface_hub==0.19.4



In [50]:
import os
from langchain.llms import HuggingFaceHub
from google.colab import userdata

os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HUGGINGFACEHUB_API_TOKEN')

repo_id = "google/flan-t5-xxl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options


llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)

SecretNotFoundError: ignored

In [None]:
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))

#### Model - Local (LlamaCPP)

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

In [None]:
!wget https://gpt4all.io/models/gguf/mistral-7b-instruct-v0.1.Q4_0.gguf

In [None]:
from langchain.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/content/mistral-7b-instruct-v0.1.Q4_0.gguf",
    n_gpu_layers=200,
    n_ctx=32000,
    verbose=True,
)

In [None]:
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))

#### Model - OpenAI

In [51]:
!pip install openai==1.3.9



In [52]:
import os
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from google.colab import userdata

# Get it from https://platform.openai.com/account/api-keys
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_BASE"] = userdata.get('OPENAI_API_BASE')

##### LLM Model

LLMs in LangChain refer to pure text completion models. The APIs they wrap take a string prompt as input and output a string completion.

In [53]:
from langchain.llms import AzureOpenAI
from google.colab import userdata

text = "What would be a good company name for a company that makes colorful socks?"

llm = AzureOpenAI(temperature=0, deployment_name="text-davinci-002", model_name="davinci-002", api_version="2023-05-15")
print(f"Using LLM model: {llm.model_name}")

Using LLM model: davinci-002




In [54]:
text = "What would be a good company name for a company that makes colorful socks?"

print(llm(text))

 I'm starting a company that makes colorful socks. I need a good company name. I'm thinking of something like "Sock It To Me" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "Sock It Up" or "

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:00:00

2017-07-20 17:


##### Chat Model

Chat models are often backed by LLMs but tuned specifically for having conversations. And, crucially, their provider APIs use a different interface than pure text completion models. Instead of a single string, they take a list of chat messages as input. Usually these messages are labeled with the speaker (usually one of "System", "AI", and "Human"). And they return an AI chat message as output

In [55]:
from langchain.schema import HumanMessage
from langchain.chat_models import AzureChatOpenAI
os.environ["OPENAI_API_VERSION"] = userdata.get("OPENAI_API_VERSION")
chat = AzureChatOpenAI(azure_deployment="gpt-4")
print(f"Using chat model: {chat.model_name}")

Using chat model: gpt-3.5-turbo




In [56]:
messages = [HumanMessage(content="What would be a good company name for a company that makes colorful socks?")]

chat(messages)

AIMessage(content='1. "Vibrant Steps"\n2. "Kaleidoscope Socks"\n3. "Rainbow Toes"\n4. "BoldSox"\n5. "Spectrum Strides"\n6. "Colorful Comforts"\n7. "Chroma Crew"\n8. "Sunny Socks Co."\n9. "ColorPop Hosiery"\n10. "Prism Peds"\n11. "Technicolor Treads"\n12. "Saturated Steps"\n13. "Feet Fiestas"\n14. "Artistic Ankles"\n15. "Dazzling Digits"\n16. "Socks Spectrum"\n17. "Rainbow Wraps".\n18. "Hue Huggers"\n19. "Bursting Bright Socks"\n20. "Vivid Velvet Socks".')

#### Bonus: Using LLM as a question-answering model

In [57]:
text = """Question: What would be a good company name for a company that makes colorful socks?

Let's think step by step.

Answer: """

In [58]:
print(llm(text))

1. What is the name of the company?

2. What is the name of the company?

3. What is the name of the company?

4. What is the name of the company?

5. What is the name of the company?

6. What is the name of the company?

7. What is the name of the company?

8. What is the name of the company?

9. What is the name of the company?

10. What is the name of the company?

11. What is the name of the company?

12. What is the name of the company?

13. What is the name of the company?

14. What is the name of the company?

15. What is the name of the company?

16. What is the name of the company?

17. What is the name of the company?

18. What is the name of the company?

19. What is the name of the company?

20. What is the name of the company?

21. What is the name of the company?

22. What is the name of the company?

23. What is the name of the company?

24. What is the name of the company?

25. What is the name of the company?

26. What is the name


### 1.2. Prompt Templates

Prompt templates are pre-defined recipes for generating prompts for language models.

https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/

In [59]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Let's think step by step, and then summarize the final answer in this format:

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

In [60]:
prompt_text = prompt.format(question="What is a good name for a company that makes video games")
print(prompt_text)

Question: What is a good name for a company that makes video games

Let's think step by step, and then summarize the final answer in this format:

Answer: 


In [61]:
print(llm(prompt_text))

1. The name of the company should be short and easy to remember. 2. The name should be easy to pronounce. 3. The name should be easy to spell. 4. The name should be easy to pronounce. 5. The name should be easy to spell. 6. The name should be easy to pronounce. 7. The name should be easy to spell. 8. The name should be easy to pronounce. 9. The name should be easy to spell. 10. The name should be easy to pronounce. 11. The name should be easy to spell. 12. The name should be easy to pronounce. 13. The name should be easy to spell. 14. The name should be easy to pronounce. 15. The name should be easy to spell. 16. The name should be easy to pronounce. 17. The name should be easy to spell. 18. The name should be easy to pronounce. 19. The name should be easy to spell. 20. The name should be easy to pronounce. 21. The name should be easy to spell. 22. The name should be easy to pronounce. 23. The name should be easy to spell


### 1.3. Output Parser

Language models output text. But many times you may want to get more structured information than just text back. This is where output parsers come in.

In [62]:
customer_review = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

In [63]:
review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [64]:
prompt_template = PromptTemplate(template=review_template, input_variables=["text"],)

In [65]:
answer = llm(prompt_template.format(text=customer_review))
print(answer)

text: I bought this leaf blower for my wife for her birthday. It arrived in two days, and she was so happy with it that she was speechless. I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

text: I bought this leaf blower for my wife for her birthday. It arrived in two days, and she was so happy with it that she was speechless. I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

text: I bought this leaf blower for my wife for her birthday. It arrived in two days, and she was so happy with it that she was speechless. I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

text: I

In [66]:
## Throw an error because answer is just a string - not dict
answer['gift']

TypeError: ignored

#### Using output parsers

In [68]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

In [69]:
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema,
                    delivery_days_schema,
                    price_value_schema]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [70]:
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.
	"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.
	"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.
}
```


In [71]:
review_template_with_instructions = """\
For the following text, extract the following information:

{format_instructions}

text: {text}
"""

In [72]:
prompt_template_with_output = PromptTemplate(template=review_template_with_instructions, input_variables=["text"], partial_variables={"format_instructions": format_instructions})

In [73]:
prompt_and_model = prompt_template_with_output | llm
output = prompt_and_model.invoke({"text": customer_review})

In [74]:
# Note: Sometimes, if the model cannot extract the information, this can throw an error
# -- which is good because we want to know early that the model cannot achieve the tasks we want it to

result = output_parser.invoke(output)
print(result)

OutputParserException: ignored

In [None]:
result["gift"]

### Bonus: Chaining Stuffs

Because all of the objects implements the `Runnable` interface. It can be chained together.

More info: https://python.langchain.com/docs/expression_language/why

In [None]:
output_chain = prompt_template_with_output | llm | output_parser

In [None]:
output_chain.invoke({"text": customer_review})

## 2. Chains

Using an LLM in isolation is fine for simple applications, but more complex applications require chaining LLMs - either with each other or with other components.

LangChain provides two high-level frameworks for “chaining” components. The legacy approach is to use the Chain interface. The updated approach is to use the LangChain Expression Language (LCEL).

https://python.langchain.com/docs/modules/chains/

### Old way

The legacy interface for “chained” applications. We define a Chain very generically as a sequence of calls to components, which can include other chains.

In [None]:
from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt_template_with_output, llm=llm, output_parser=output_parser)

llm_chain.run(text=customer_review)

### New way

LCEL provides an intuitive and readable syntax for composition.

In [None]:
from langchain_core.runnables import RunnablePassthrough

chain = (
 { "text": RunnablePassthrough() }
 | prompt_template_with_output
 | llm
 | output_parser
)
chain.invoke(customer_review)

## 3. Memory

Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to information introduced earlier in the conversation

https://python.langchain.com/docs/modules/memory/

### Manipulating the memory

In [75]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

In [76]:
print(memory.load_memory_variables({})['history'])

Human: hi!
AI: what's up?


In [77]:
memory.save_context({"input": "how yo doin'"}, {"output": "fine. thank you!"})

In [78]:
print(memory.load_memory_variables({})['history'])

Human: hi!
AI: what's up?
Human: how yo doin'
AI: fine. thank you!


In [79]:
template = """You are a nice chatbot having a conversation with a human.

New human question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)

#### Without memory

In [None]:
from langchain import LLMChain

no_memory_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=False,
)

In [None]:
no_memory_chain({"question": "Hello, My name is Junior"})

In [None]:
no_memory_chain({"question": "I have just introduced myself. What is my name?"})

#### With memory - LLMChain

In [None]:
# Notice that "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)
# Notice that we need to align the `memory_key`
memory = ConversationBufferMemory(memory_key="chat_history")
with_memory_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=False,
    memory=memory ## here - we are giving it a memory
)

In [None]:
with_memory_chain({"question": "ay yo!"})

In [None]:
memory.save_context({"input": "how yo doin' My name is Junior. Nice to meet you."}, {"output": "Nice to meet you, Junior!. I am fine. Thank you!"})

In [None]:
# Some model, like Hugging Face's, might not always work

with_memory_chain({"question": "what is my name again?"})

#### With memory - ConversationChain

In [None]:
from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=llm,
    verbose=False,
)

In [None]:
print(conversation.predict(input="how yo doin' My name is Junior. Nice to meet you."))

In [None]:
print(conversation.predict(input="what is my name again?"))

### Bonus: Different types of Memory

More info: https://python.langchain.com/docs/modules/memory/types/

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationSummaryMemory(llm=llm) # Note that the ConversationSummaryMemory will need an LLM as an input to do the summarization
)

In [None]:
# Notice the prompt that ConversationChain format for us

conversation.predict(input="how yo doin' My name is Junior. Nice to meet you.")

In [None]:
conversation.predict(input="I like the color red. My favorite subject is Math.")

In [None]:
conversation.predict(input="Can you guess my favorite subject?")

## 4. Retriever (RAG)

Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing this is through Retrieval Augmented Generation (RAG)

More info: https://python.langchain.com/docs/modules/data_connection/

***Note: LlamaCPP model users should skip this, as it might crash the notebook due to limited resource***

In [80]:
import requests

url = "https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
  f.write(res.text)

In [81]:
# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

In [82]:
# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

#### Hugging Face Embeddings

In [None]:
!pip install sentence_transformers==2.2.2 faiss-cpu==1.7.4

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

In [None]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

#### LlamaCPP Embeddings

***Note: LlamaCPP model users should skip this, as it might crash the notebook due to limited resource***

In [None]:
!pip install chromadb==0.4.19 tiktoken==0.5.2

In [None]:
from langchain.embeddings import LlamaCppEmbeddings

embeddings = LlamaCppEmbeddings(model_path="/content/mistral-7b-instruct-v0.1.Q4_0.gguf")

In [None]:
from langchain.vectorstores import Chroma

db = Chroma.from_documents(docs, embeddings)

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

#### OpenAI Embeddings

In [83]:
!pip install sentence_transformers==2.2.2 faiss-cpu==1.7.4 tiktoken==0.5.2

Collecting sentence_transformers==2.2.2
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting faiss-cpu==1.7.4
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken==0.5.2
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m87.0 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece (from sentence_transformers==2.2.2)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32

In [84]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [None]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

#### Using Retriever

LangChain supports many different retrieval algorithms and is one of the places where it adds the most value

In [None]:
retriever = db.as_retriever()

In [None]:
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}

Let's think step by step.

Answer:
"""

prompt = PromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm

)

In [None]:
answer = chain.invoke("Who is Ketanji Brown Jackson?")
print(answer)

In [None]:
import langchain

langchain.debug = True
chain.invoke("What does the speech say about Russia?")
langchain.debug = False

## 5. Evaluation

Building applications with language models involves many moving parts. One of the most critical components is ensuring that the outcomes produced by your models are reliable and useful across a broad array of inputs, and that they work well with your application's other software components.

More info: https://python.langchain.com/docs/guides/evaluation/


### Generating test datasets & evaluate its accuracy

In [None]:
!pip install langchain[docarray]==0.0.349 openai==1.3.9 huggingface_hub==0.19.4

In [None]:
import requests

url = "https://raw.githubusercontent.com/Ryota-Kawamura/LangChain-for-LLM-Application-Development/main/OutdoorClothingCatalog_1000.csv"
res = requests.get(url)
with open("OutdoorClothingCatalog_1000.csv", "w") as f:
  f.write(res.text)

In [None]:
from langchain.llms import OpenAI
from langchain.llms import HuggingFaceHub
from langchain.chat_models import ChatOpenAI


os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HUGGINGFACEHUB_API_TOKEN')
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

hugging_face_repo_id = "google/flan-t5-xxl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options


openai_llm = OpenAI(temperature=0)
hf_llm = HuggingFaceHub(
    repo_id=hugging_face_repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
chat_model = ChatOpenAI()

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.chains import RetrievalQA

file = "./OutdoorClothingCatalog_1000.csv"
loader = CSVLoader(file_path=file)
data = loader.load()

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [None]:
print(data[10].page_content)

In [None]:
test_data_examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
]

# But how can we automate the generation of these questions & answers ? (:thinking:)

In [None]:
from langchain.evaluation.qa import QAGenerateChain

example_gen_chain = QAGenerateChain.from_llm(chat_model)

In [None]:
generated_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

In [None]:
generated_examples[0]

In [None]:
print(data[0].page_content)

In [None]:
# Define the model under test

qa = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
)

In [None]:
qa.run(test_data_examples[0]["query"])

In [None]:
test_data = []
for example in generated_examples:
  test_data.append(example['qa_pairs'])

test_data

In [None]:
predictions = qa.apply(test_data)
predictions

`answer` is the actual answer expected from the question

`result` is what model under test predicts

Notice that the `answer` and the `result` is not an exact 1-to-1 match, BUT the content could be saying the same thing.

This is why we need **ANOTHER** LLM model to help evaluate whether the answer and the predicted result is saying the same thing.

In [None]:
from langchain.evaluation.qa import QAEvalChain

eval_chain = QAEvalChain.from_llm(openai_llm)

In [None]:
graded_outputs = eval_chain.evaluate(test_data, predictions)

In [None]:
for i, eg in enumerate(generated_examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print('Verdict: ' + graded_outputs[i]['results'])
    print()

### Bonus: Evaluator

In [None]:
from langchain.evaluation import load_evaluator
from langchain.evaluation import EvaluatorType

evaluator = load_evaluator(EvaluatorType.CRITERIA, criteria="conciseness")

eval_result = evaluator.evaluate_strings(
    prediction="What's 2+2? That's an elementary question. The answer you're looking for is that two and two is four.",
    input="What's 2+2?",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

In [None]:
eval_result = evaluator.evaluate_strings(
    prediction="four.",
    input="What's 2+2?",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

#### Different type of evaluator -- Labeled Criteria

In [None]:
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Bangkok",
    reference="The capital of the US is Washington D.C.",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

In [None]:
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Washington D.C.",
    reference="The capital of the US is Washington D.C.",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')