# API Inference Example with Claude 3.5 / OpenAI

This Jupyter notebook demonstrates the capabilities of the latest Claude 3.5 model using the Entropic and OpenAI API streams. We'll showcase the state-of-the-art performance of one of the largest language models available via API. For simple tasks we use the GPT-40-Mini as this is more than enough for simple classification tasks and a very cheap and fast model. 

### Key Features:
1. **Advanced Output**: Compared to open-source alternatives, you'll notice more refined formatting and nuanced responses.
2. **LangSmith Integration**: We've included trace links from LangSmith, allowing you to examine the step-by-step process of the model's reasoning.

### What to Expect:
- Insights into the model's decision-making process
- A comparison point for evaluating other language models


In [1]:
# Load packages from the src directory
import sys
from IPython.display import Markdown, display, Image
sys.path.append('../src')

from langchain_core.messages import HumanMessage
from vectrix_graphs import default_flow

# Display the graph
display(Image(default_flow.get_graph().draw_mermaid_png()))



<IPython.core.display.Image object>

In [3]:
#Ask the question
input = [HumanMessage(content="How does self attention work ?")]


# Run the graph
response = await default_flow.ainvoke({"messages": input})
display(Markdown(response['messages'][-1].content))

  super()._validate_conn(conn)
  self.batches = resources.AsyncBatchesWithRawResponse(client.batches)


Based on the provided sources, here's an explanation of how self-attention works:

Self-attention, also known as intra-attention, is an attention mechanism that relates different positions within a single sequence to compute a representation of that sequence¹. In the context of the Transformer model, self-attention operates in the following way:

In the encoder's self-attention layers, all of the keys, values, and queries originate from the same place - specifically, from the output of the previous layer in the encoder. This allows each position in the encoder to attend to all positions in the previous layer². 

One notable advantage of self-attention is that it can potentially create more interpretable models. Research has shown that individual attention heads learn to perform different tasks, and many demonstrate behavior that relates to both the syntactic and semantic structure of sentences³.

Self-attention has proven successful in various applications, including:
- Reading comprehension
- Abstractive summarization
- Textual entailment
- Learning task-independent sentence representations¹

References:
1. "Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations"
2. "The encoder contains self-attention layers. In a self-attention layer all of the keys, values and queries come from the same place, in this case, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder."
3. "As side benefit, self-attention could yield more interpretable models. We inspect attention distributions from our models and present and discuss examples in the appendix. Not only do individual attention heads clearly learn to perform different tasks, many appear to exhibit behavior related to the syntactic and semantic structure of the sentences."

In [4]:
#Ask the question
input = [HumanMessage(content="How tall is the effel tower?")]


# Run the graph
response = await default_flow.ainvoke({"messages": input})
display(Markdown(response['messages'][-1].content))

Based on the provided sources, I cannot answer the question "How tall is the Eiffel Tower?" The given sources contain information about machine translation models, training data, and model variations. There is no information about the Eiffel Tower or its height in any of the provided source materials.

In [5]:
#Ask the question
input = [HumanMessage(content="Hello, how are you?")]


# Run the graph
response = await default_flow.ainvoke({"messages": input})
display(Markdown(response['messages'][-1].content))

  if hasattr(one_value, 'encode'):


Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?