# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!
  

- 🤝 Breakout Room #2:
  1. Creating an Evaluation Dataset
  2. Adding Evaluators
  3. Evaluating

# 🤝 Breakout Room #1

## LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effetively allowing us to recreate appliation flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [1]:
!pip install -qU langchain langchain_openai langgraph arxiv duckduckgo-search

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [3]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE1 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [1]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
    DuckDuckGoSearchRun(description='A wrapper around DuckDuckGo Search. Useful for when you need to answer questions about current topics. Input should be a search query.'),
    ArxivQueryRun(description='A wrapper around Arxiv Search. Useful for when you need to answer questions of scientific or technical nature. Input should be a search query.'),
]

### Actioning with Tools

Now that we've created our tool belt - we need to create a process that will let us leverage them when we need them.

We'll use the built-in [`ToolExecutor`](https://github.com/langchain-ai/langgraph/blob/main/langgraph/prebuilt/tool_executor.py) to do so.

In [28]:
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tool_belt)

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [29]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [30]:
from langchain_core.utils.function_calling import convert_to_openai_function

functions = [convert_to_openai_function(t) for t in tool_belt]
model = model.bind_functions(functions)

#### ❓ Question #1:

How does the model determine which tool to use?

The model determines which tool to use based on the input it receives and the predefined workflow or logic encoded within the LangChain application. When a question or task is presented to the model, it assesses the requirements of the query against the capabilities of the tools available in its "tool belt." The decision-making process involves analyzing the input's context, the nature of the task at hand, and any state information stored from previous interactions.

In LangChain, tools are typically associated with specific types of tasks, such as searching for information, processing language, or generating responses. The model uses a combination of natural language understanding techniques and predefined logic to match the input task with the most suitable tool. This might involve keyword detection, pattern matching, or more sophisticated NLP techniques to understand the intent and requirements of the input.

Moreover, if the application utilizes LangGraph to create a cyclic application, the graph's structure and the conditional edges defined within it can also play a critical role in determining the path of execution and thereby which tool gets used. Conditional edges can route the process flow based on the current context or the output of previous nodes, allowing the model to dynamically select the appropriate tool for each situation.

## Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [31]:
from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[Sequence[BaseMessage], operator.add]

## It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [32]:
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

def call_tool(state):
  last_message = state["messages"][-1]

  action = ToolInvocation(
      tool=last_message.additional_kwargs["function_call"]["name"],
      tool_input=json.loads(
          last_message.additional_kwargs["function_call"]["arguments"]
      )
  )

  response = tool_executor.invoke(action)

  function_message = FunctionMessage(content=str(response), name=action.tool)

  return {"messages" : [function_message]}

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `call_tool` is a node which will call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [33]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [34]:
workflow.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [35]:
def should_continue(state):
  last_message = state["messages"][-1]

  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [36]:
workflow.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [37]:
app = workflow.compile()

#### ❓ Question #2:

- Is there any specific limit to how many times we can cycle?

By default, LangGraph does not impose a strict limit on the number of cycles a workflow can execute. The cycle count is primarily constrained by practical considerations such as execution time, resource availability, and the specific requirements of the task at hand. In an open-ended cycle, the workflow might continue until it meets a certain condition or until external factors (such as a timeout or resource limitation) halt the process.

- How could we impose a limit to the number of cycles?

To impose a limit on the number of cycles, we can introduce a stateful mechanism that tracks the number of iterations a workflow has completed. Here are a few approaches to achieve this:

- State Variable: Incorporate a state variable into the workflow that counts the number of cycles. At each iteration, the variable is incremented. Conditional logic can then be used to check if the count has reached a predefined maximum, at which point the cycle can be terminated or redirected to a concluding node.

- Conditional Edges: Use conditional edges within LangGraph to evaluate the current cycle count against a threshold. If the count exceeds the limit, the workflow can be directed towards an exit path or a node designed to handle such situations.

- Limit in the Application Logic: Implement the cycle limit directly in the application logic that orchestrates the workflow. This approach allows for more flexibility in managing how the limit is enforced and can incorporate additional logic based on the outcome of reaching the limit.

- Time-based Limit: Instead of, or in addition to, a count-based limit, a time-based limit could be set to ensure that the workflow does not exceed a certain duration. This approach might be useful in scenarios where execution time is a critical factor.

- Evaluation Control: We can add an Evaluation Control which can determine if we met the required Evaluation Criteria and we can halt the workflow. 

By implementing such mechanisms, we can control the execution flow of cyclic applications within LangGraph, ensuring that workflows complete in a timely manner and do not consume excessive resources. ​​

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [38]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="What is RAG in the context of Large Language Models? When did it break onto the scene?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is RAG in the context of Large Language Models? When did it break onto the scene?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"RAG in the context of Large Language Models"}', 'name': 'duckduckgo_search'}}, response_metadata={'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content='Key Takeaways. RAG is a relatively new artificial intelligence technique that can improve the quality of generative AI by allowing large language model (LLMs) to tap additional data resources without retraining. RAG models build knowledge repositories based on the organization\'s own data, and the repositories can be continually updated to ... The beauty of RAG lies in its ability to enable a language model to draw upon and leverage your own data to generate responses. While base models are traditionally trained on specific, point-in-time data, ensuring their effectiveness in performing tasks and adapti

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "function_call" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "function_call" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [39]:
inputs = {"messages" : [HumanMessage(content="What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"QLoRA in Machine Learning"}', 'name': 'duckduckgo_search'}}, response_metadata={'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content="Balancing this tradeoff is a key contribution of QLoRA. QLoRA. QLoRA (or Quantized Low-Rank Adaptation) combines 4 ingredients to get the most out of a machine's limited memory without sacrificing model performance. I will briefly summarize key points from each. More details are available in the QLoRA paper [4]. Ingredient 1: 4-bit NormalFloat As the world of machine learning evolves, tools like HuggingFace and the open-source LLMs movement have significantly simplified the… · 12 min read · Jan 18, 2024 1 Our results show that

####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

1. Receiving the Input: The agent started by receiving an input question. This input is the starting point for the workflow.

2. Parsing the Input: The input is analyzed to understand its nature and requirements. This involves natural language processing techniques to extract key information, intent, and context.

3. Selecting the Starting Node: Based on the input analysis, the agent selected the appropriate starting node in the LangGraph workflow. 

4. Executing Graph Nodes: The agent proceeded through the graph by executing nodes in sequence. Each node represents a specific action or processing step.

5. Data Retrieval: The agent used a node designed to fetch data from external sources.

6. Processing and Analysis: Data and input were processed and analyzed, involving additional tools.

7. Following Conditional Paths: The workflow included conditional edges that dictated the flow based on current data or state. The agent evaluated these conditions to decide the next steps.

8. Iterating if Necessary: In this cyclic workflow, the agent might loop through certain steps multiple times, refining its understanding or gathering more information as needed.

9. Formulating the Answer: Once all relevant information had been processed and the necessary computations were complete, the agent formulated an answer or response.

10. Outputting the Answer: The final step involved presenting the answer to the user.


### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [40]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | app | parse_output

In [41]:
agent_chain.invoke({"question" : "What is RAG?"})

"RAG stands for Retrieval-augmented generation (RAG). It is an AI framework that enhances generative AI models with facts from external sources to improve the quality of generated responses. RAG combines retrieval-based and generative models to enhance AI systems' understanding and generation of human-like text."

# 🤝 Breakout Room #2

## Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [42]:
questions = [
    "What is the capital of France?",
    "Who is the author of 'To Kill a Mockingbird'?",
    "What is the chemical symbol for gold?",
    "What is the largest planet in our solar system?",
    "Who painted the Mona Lisa?"
]

answers = [
    {"must_mention": ["Paris"]},
    {"must_mention": ["Harper Lee"]},
    {"must_mention": ["Au"]},
    {"must_mention": ["Jupiter"]},
    {"must_mention": ["Leonardo da Vinci"]}
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [43]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

This assignment involves tasks related to natural language processing, model evaluation, and the creation of an evaluation dataset for LangSmith, associating correct answers with questions is a crucial aspect. Here's how this process typically works and considerations regarding its potential problems:

- Association Process

Manual Annotation: In many cases, the correct answers are associated with questions through a process of manual annotation. Subject matter experts or annotators review the questions and provide the correct answers based on their knowledge or research. This method ensures high-quality and accurate associations but is time-consuming and labor-intensive.

Automated Techniques: Automated methods might be used to associate answers with questions, especially in large datasets. This can involve using existing models to predict answers or extracting answers from source texts based on question context. While faster, these methods can introduce errors or biases based on the underlying models or data sources.

Crowdsourcing: Crowdsourcing platforms allow for the distribution of annotation tasks to a large number of people, potentially speeding up the process and diversifying the pool of knowledge. However, this approach requires careful quality control to ensure the reliability of the associations.

- Potential Problems

Accuracy and Reliability: Ensuring the correctness of the associated answers is a significant challenge. Incorrect associations can lead to misleading evaluations of models trained or tested on the dataset.

Bias: The process of associating answers with questions can introduce bias, particularly if the data sources or annotators have specific perspectives or if automated methods rely on biased datasets.

Scalability: Manually associating answers with questions is not easily scalable to very large datasets commonly used in training and evaluating AI models.

Ambiguity: Some questions might have multiple correct answers, or their answers might depend on context not captured within the dataset. Handling such ambiguities effectively is crucial for the utility of the dataset.

In our context, associating correct answers with questions for evaluation purposes is likely managed through one or a combination of these methods. Addressing potential problems involves implementing robust quality control measures, considering the diversity of data and annotators, and possibly employing hybrid approaches to balance accuracy, bias, and scalability. ​​

## Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [44]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

We can improve metrics by enhancing its ability to accurately reflect the performance, fairness, or reliability of the system it is measuring. Some of the strategies for improvement:

- Enhancing Accuracy and Reliability

Increase Data Diversity: Ensure the evaluation dataset covers a wide range of scenarios, languages, dialects, and domains. A diverse dataset helps in assessing the model's performance across different conditions.

Incorporate Domain Expertise: Engaging domain experts in the development or refinement of metrics can provide insights that improve the metric's relevance and accuracy for specific applications.

Refine Ground Truth: Improve the quality of the ground truth data used for evaluation. This could involve more rigorous annotation processes or revising existing annotations for clarity and correctness.

- Addressing Bias

Bias Detection and Correction: Implement methods to detect and correct bias in both the evaluation data and the metric itself. This might involve analyzing the metric's performance across various demographic groups or scenarios.

Expand Evaluation Criteria: Introduce additional criteria or sub-metrics that specifically measure bias or fairness, providing a more holistic view of the model's performance.

- Improving Interpretability

Enhance Transparency: Make the metric more understandable by clearly explaining its components and how it calculates scores. Transparent metrics are more easily scrutinized and refined.

User-Centric Design: Consider the end-users of the metric (e.g., researchers, developers, or policymakers) and adapt the metric to be more aligned with their needs and understanding.

- Identifying Gaps

Conduct Gap Analysis: Regularly review the metric's performance and its alignment with the desired outcomes. Identify areas where the metric fails to capture important aspects of performance or where it might produce misleading results.

Engage with the Community: Solicit feedback from a broad range of stakeholders, including users affected by the model's decisions. Community input can highlight overlooked aspects or potential improvements.

- Alternative Suggestions

Multi-Metric Evaluation: Instead of relying on a single metric, use a combination of metrics that together provide a comprehensive evaluation of the model's performance.

Dynamic Metrics: Develop metrics that can adapt over time to reflect changing societal norms, technology advancements, or shifts in the model's application context.
Improving a metric or addressing gaps in its method involves a continuous process of assessment, feedback, and refinement. By applying these strategies, the effectiveness and fairness of AI systems can be better evaluated, leading to more reliable and equitable outcomes. ​​

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it, and a few others:

- `"criteria"` includes the default criteria which, in this case, means "helpfulness"
- `"cot_qa"` includes a criteria that bases whether or not the answer is correct by utilizing a Chain of Thought prompt and the provided context to determine if the response is correct or not.

In [45]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
    evaluators=[
        "criteria",
        "cot_qa",
    ],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [46]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - aea38f41' at:
https://smith.langchain.com/o/2f142be8-b860-5498-aaf0-55a67fe7e836/datasets/cebf9262-180b-4867-a341-bd77686964fa/compare?selectedSessions=6b9aea70-c94d-48df-8619-d9bf3679f4d0

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - 5e646c3d at:
https://smith.langchain.com/o/2f142be8-b860-5498-aaf0-55a67fe7e836/datasets/cebf9262-180b-4867-a341-bd77686964fa
[------------------------------------------------->] 5/5

Unnamed: 0,feedback.helpfulness,feedback.COT Contextual Accuracy,feedback.must_mention,error,execution_time,run_id
count,5.0,5.0,5,0.0,5.0,5
unique,,,1,0.0,,5
top,,,True,,,b600b3f6-8bd7-439a-ad4e-fd0f0ac88187
freq,,,5,,,1
mean,1.0,1.0,,,1.175753,
std,0.0,0.0,,,0.335969,
min,1.0,1.0,,,0.867373,
25%,1.0,1.0,,,0.929941,
50%,1.0,1.0,,,1.001336,
75%,1.0,1.0,,,1.532214,


{'project_name': 'RAG Pipeline - Evaluation - aea38f41',
 'results': {'35abdb83-d9c4-4ba7-88ae-9a7a4c6b35d4': {'input': {'question': 'What is the chemical symbol for gold?'},
   'feedback': [EvaluationResult(key='helpfulness', score=1, value='Y', comment='The criterion for this task is "helpfulness". \n\nThe submission provides the correct answer to the question asked in the input. The question asks for the chemical symbol for gold, and the submission correctly states that the chemical symbol for gold is Au. \n\nThe submission is helpful because it provides the information asked for in the input. It is insightful because it provides the correct scientific information. It is appropriate because it directly answers the question without adding unnecessary or irrelevant information. \n\nTherefore, the submission meets the criterion. \n\nY', correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('d17eca97-9933-4e18-9e12-c319f2ae51c0'))}, source_run_id=None, target_run_id=None),
    E