## Models and Messages 

This notebook shows **how to use LangChain models directly** (no agents) and how **message objects** work.

We’ll use:

- **Groq** chat models via `ChatGroq` (fast, OpenAI-compatible API).
- **HuggingFace** embeddings via `HuggingFaceEmbeddings`.

Topics covered:

- **Models**: Chat models, LLMs, and embedding models
- **Messages**: The core I/O format for conversational AI
- **Runnables**: LangChain's composable abstraction for building pipelines


In [4]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0.7,
)

response = llm.invoke("Say 'Hello, LangChain!' in a creative way.")
print(response.content)

"Hello, LangChain!" echoed through the digital realm, as a chorus of code and conversation came together in perfect harmony, marking the beginning of a beautiful friendship between humans and AI.


### LangChain model interfaces: quick overview

- **Chat models**: take a sequence of messages and return a message.   
- **Embeddings models**: turn text into vectors for similarity / search.   

Key methods:

- `.invoke(input)` – single call
- `.batch(list_of_inputs)` – run many calls in parallel
- `.stream(input)` – (used later) stream tokens/chunks as they are generated

### Why LangChain Wraps Models?
- **Portability**: Switch between OpenAI, Groq, Anthropic with minimal code changes
- **Unified API**: Same interface across all providers
- **Enhanced Features**: Built-in retry logic, caching, tracing

### Messages: The Core I/O Format

LangChain uses **message objects** instead of raw strings to represent conversational interactions.

### Why Messages?
- **Structure**: Clear separation of roles (system, user, assistant)
- **Metadata**: Attach additional info (timestamps, sources, tool calls)
- **Type Safety**: Easier to validate and debug

### Common Message Types

| Type | Role | Purpose |
|------|------|---------|
| `SystemMessage` | system | Instructions/context for the model |
| `HumanMessage` | user | User input/queries |
| `AIMessage` | assistant | Model-generated responses |
| `ToolMessage` | tool | Results from tool/function calls (covered later) |

In [3]:
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant specializing in Python programming."),
    HumanMessage(content="What is a list comprehension?")
]

response = llm.invoke(messages)

print("Response type:", type(response))
print("Response content:", response.content)
print("\nFull response object:")
print(response)

Response type: <class 'langchain_core.messages.ai.AIMessage'>
Response content: **List Comprehension Definition**

A list comprehension is a compact way to create lists in Python. It consists of brackets containing the expression, which is executed for each element, along with the `for` loop to loop over the elements.

**Basic Syntax**

The basic syntax of a list comprehension is as follows:
```python
new_list = [expression for element in iterable]
```
Here:
- `expression` is the operation you want to perform on each element.
- `element` is the temporary variable used to represent each element in the `iterable`.
- `iterable` is the list, tuple, or other iterable you want to process.

**Example**

Here's an example of using a list comprehension to square each number in a list:
```python
numbers = [1, 2, 3, 4, 5]
squared_numbers = [x**2 for x in numbers]
print(squared_numbers)  # Output: [1, 4, 9, 16, 25]
```
**Conditional List Comprehension**

You can also add a condition to filter the 

In [3]:
# Simulate a multi-turn conversation
conversation_history = [
    SystemMessage(content="You are a concise technical explainer."),
    HumanMessage(content="Explain transformers in one sentence."),
]

# First turn
response1 = llm.invoke(conversation_history)
print("Turn 1 (AI):", response1.content)

conversation_history.append(AIMessage(content=response1.content))

# Second turn - ask follow-up
conversation_history.append(
    HumanMessage(content="Now explain attention mechanism in the same way.")
)

response2 = llm.invoke(conversation_history)
print("\nTurn 2 (AI):", response2.content)

# View full conversation
print("\n" + "="*50)
print("Full Conversation History:")
for msg in conversation_history:
   msg.pretty_print()

Turn 1 (AI): Transformers are a type of neural network architecture that use self-attention mechanisms to weigh and combine input elements, allowing for highly effective processing of sequential data such as text and speech.

Turn 2 (AI): The attention mechanism is a technique that enables neural networks to focus on specific parts of the input data by assigning weighted importance to each element, allowing the model to selectively concentrate on relevant information.

Full Conversation History:

You are a concise technical explainer.

Explain transformers in one sentence.

Transformers are a type of neural network architecture that use self-attention mechanisms to weigh and combine input elements, allowing for highly effective processing of sequential data such as text and speech.

Now explain attention mechanism in the same way.


In [4]:
# Messages can carry metadata
message_with_metadata = HumanMessage(
    content="Translate this to French: Hello, world!",
    additional_kwargs={"user_id": "12345", "session": "abc"}
)

response = llm.invoke([message_with_metadata])

# Inspect response metadata
print("Content:", response.content)
print("\nResponse Metadata:")
print(f"  Model: {response.response_metadata.get('model_name')}")
print(f"  Tokens used: {response.response_metadata.get('token_usage')}")
print(f"  Finish reason: {response.response_metadata.get('finish_reason')}")

Content: Bonjour, monde !

Response Metadata:
  Model: llama-3.3-70b-versatile
  Tokens used: {'completion_tokens': 5, 'prompt_tokens': 44, 'total_tokens': 49, 'completion_time': 0.008872069, 'completion_tokens_details': None, 'prompt_time': 0.001960832, 'prompt_tokens_details': None, 'queue_time': 0.053055007, 'total_time': 0.010832901}
  Finish reason: stop


## Working with Chat Models

Chat models are the primary interface for modern LLMs in LangChain.

### Initialization Parameters
```python
ChatGroq(
    model="llama-3.3-70b-versatile",  # Model identifier
    temperature=0.7,                   # Randomness (0=deterministic, 1=creative)
    max_tokens=500,                    # Max response length
    timeout=30,                        # Request timeout
    max_retries=2,                     # Retry failed requests
)
```

### Common Patterns

**Single-turn Q&A:**
```python
llm.invoke("Your question here")
```

**Few-shot learning:**
```python
llm.invoke([
    SystemMessage(content="Classify sentiment as positive/negative/neutral."),
    HumanMessage(content="I love this product! → Positive"),
    HumanMessage(content="It's okay. → Neutral"),
    HumanMessage(content="This is terrible. → ?")
])
```

**Role-based system prompts:**
```python
llm.invoke([
    SystemMessage(content="You are a Shakespearean poet."),
    HumanMessage(content="Describe programming in verse.")
])
```

In [5]:
# Process multiple queries in one call (more efficient than loop)
queries = [
    [HumanMessage(content="What is machine learning?")],
    [HumanMessage(content="What is deep learning?")],
    [HumanMessage(content="What is reinforcement learning?")]
]

# Batch invoke
responses = llm.batch(queries)

for i, response in enumerate(responses, 1):
    print(f"{i}. {response.pretty_print()}")
    print()


**Machine Learning Overview**

Machine learning (ML) is a subset of artificial intelligence (AI) that involves the use of algorithms and statistical models to enable machines to perform tasks without being explicitly programmed. It allows systems to learn from data, identify patterns, and make predictions or decisions with minimal human intervention.

**Key Characteristics:**

1. **Data-driven**: ML relies on large datasets to train and improve models.
2. **Automated learning**: ML algorithms can learn from data without being explicitly programmed.
3. **Pattern recognition**: ML models can identify complex patterns in data.
4. **Prediction and decision-making**: ML models can make predictions or decisions based on learned patterns.

**Types of Machine Learning:**

1. **Supervised Learning**: The model is trained on labeled data to learn the relationship between input and output.
2. **Unsupervised Learning**: The model is trained on unlabeled data to identify patterns or structure.
3. 

### Embedding Models
Embeddings convert text into dense numerical vectors that capture semantic meaning.

### Why Embeddings?
- **Semantic Search**: Find similar documents based on meaning, not keywords
- **RAG**: Retrieve relevant context for language models
- **Classification**: Use vectors as features for ML models
- **Clustering**: Group similar texts together

### How They Work
```
"Paris is the capital of France" 
    ↓ embedding model ↓
[0.234, -0.123, 0.456, ..., 0.789]  # 768-dimensional vector
```

Similar texts produce similar vectors (high cosine similarity).

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Initialize embeddings model
embeddings_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={"device": "cpu"},  # Use "cuda" if you have GPU
    encode_kwargs={"normalize_embeddings": True}  # Normalize to unit length
)

print("Embeddings model loaded")
print(f"Model: {embeddings_model.model_name}")

# Embed single text
text = "Transformers revolutionized natural language processing."
embedding = embeddings_model.embed_query(text)

print(f"Text: {text}")
print(f"Embedding dimension: {len(embedding)}")
print(f"First 10 values: {embedding[:10]}")

✅ Embeddings model loaded
Model: sentence-transformers/all-mpnet-base-v2
Text: Transformers revolutionized natural language processing.
Embedding dimension: 768
First 10 values: [0.06474940478801727, 0.04542214050889015, -0.023475218564271927, 0.02088870294392109, -0.020095515996217728, -0.005383903626352549, 0.026434894651174545, 0.007855139672756195, -0.07080433517694473, -0.08333080261945724]


### Runnable Pipelines (LangChain Core Pattern)

### What is a Runnable?
A `Runnable` is LangChain's universal abstraction for composable components.

**Key idea:** Everything implements the same interface:
- `.invoke(input)` - Synchronous execution
- `.batch(inputs)` - Process multiple inputs
- `.stream(input)` - Streaming output
- `.ainvoke()` / `.abatch()` / `.astream()` - Async versions

### Why Runnables Matter?
- **Composability**: Chain components using `|` operator - Lanchain expression language
- **Consistency**: Same API for models, prompts, retrievers, agents
- **Debugging**: Built-in tracing and logging
- **Production**: Easy to swap implementations

### Basic Pattern
```python
chain = prompt | model | output_parser
result = chain.invoke({"topic": "AI"})
```

### Components that are Runnables:
- ✅ Models (chat, LLM, embeddings)
- ✅ Prompts (prompt templates)
- ✅ Output parsers (extract structured data)
- ✅ Retrievers (vector search)
- ✅ Tools (functions)
- ✅ Agents (orchestrators)

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Create a prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains {topic} concepts concisely."),
    ("human", "Explain {concept} in 2 sentences.")
])

# Chain: prompt → model
chain = prompt | llm | StrOutputParser()

# Invoke chain with inputs
# Now result is just a string (not AIMessage object) due to the output parser
result = chain.invoke({
    "topic": "machine learning",
    "concept": "gradient descent"
})

# print(result.content) # If output parser is not used the result is an AIMessage object
print(result)

Gradient descent is an optimization algorithm used in machine learning to minimize the loss function of a model by iteratively adjusting its parameters in the direction of the negative gradient, which is the direction of the steepest descent. By repeatedly calculating the gradient of the loss function and updating the parameters, the model converges to a local or global minimum, resulting in the best possible fit to the training data.


In [8]:
# Process multiple inputs through the chain
inputs = [
    {"topic": "databases", "concept": "indexing"},
    {"topic": "networking", "concept": "TCP/IP"},
    {"topic": "algorithms", "concept": "binary search"}
]

results = chain.batch(inputs)

for i, result in enumerate(results, 1):
    print(f"{i}. {result}\n")

1. Indexing in databases is a technique that improves query performance by creating a data structure that facilitates quick lookup and retrieval of specific data, similar to an index in a book. By indexing specific columns, the database can rapidly locate and access the required data, reducing the time it takes to execute queries and enhancing overall database efficiency.

2. The TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of communication protocols that enables devices to communicate over the internet, with TCP ensuring reliable data transfer and IP providing device addressing and routing. This protocol suite allows devices to establish connections, exchange data, and manage communication flows, making it the foundation of the modern internet and enabling global networking and communication.

3. Binary search is an efficient algorithm that finds an item in a sorted list by repeatedly dividing the list in half and searching for the item in one of the two halves. T

In [9]:
from langchain_core.runnables import chain

# Decorate any function to make it a Runnable
# The @chain decorator converts a normal Python function into a Runnable
@chain
def uppercase_output(ai_message):
    """Custom processing: convert response to uppercase."""
    return ai_message.content.upper()

# Chain: prompt → model → custom function
chain = prompt | llm | uppercase_output

result = chain.invoke({
    "topic": "programming",
    "concept": "recursion"
})

print(result)

RECURSION IS A PROGRAMMING TECHNIQUE WHERE A FUNCTION CALLS ITSELF REPEATEDLY UNTIL IT REACHES A BASE CASE THAT STOPS THE RECURSION, ALLOWING THE FUNCTION TO SOLVE PROBLEMS BY BREAKING THEM DOWN INTO SMALLER INSTANCES OF THE SAME PROBLEM. THIS PROCESS UNWINDS AS EACH RECURSIVE CALL RETURNS, COMBINING THE RESULTS TO PRODUCE THE FINAL SOLUTION, MAKING RECURSION A POWERFUL TOOL FOR SOLVING COMPLEX PROBLEMS WITH A SIMPLE AND ELEGANT CODE STRUCTURE.


In [None]:
# View the chain components
simple_chain = prompt | llm | StrOutputParser()

print("Chain structure:")
print(simple_chain)

# Get input/output schema (useful for debugging)
print("\nInput schema:")
print(simple_chain.input_schema.schema())

print("\nOutput schema:")
print(simple_chain.output_schema.schema())

### 9. Extra: Parallel pipelines with RunnableMap / RunnableParallel

LangChain’s `RunnableParallel` (alias `RunnableMap`) lets you run multiple independent runnables on the same input — returning a dictionary of results. :contentReference[oaicite:5]{index=5}  
Below are examples:

- Wrapping a Python function with `RunnableLambda`.  
- Running embedding generation + simple text processing in parallel.

This pattern is powerful when you want to derive **multiple outputs** from the same input (e.g. summary + metadata + embeddings), or do **parallel tasks** for efficiency.  

You can integrate this with prompt → model → post-processing chains, agents, or RAG pipelines.


In [19]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel



joke_chain = (
    ChatPromptTemplate.from_template("tell me a joke about {topic}") | llm
)
poem_chain = (
    ChatPromptTemplate.from_template("write a 2-line poem about {topic}")
    | llm
)

runnable = RunnableParallel(joke=joke_chain, poem=poem_chain)

runnable.invoke({"topic": "bear"})
for msg in runnable.invoke({"topic": "bear"}).values():
    msg.pretty_print()


Why did the bear go to the doctor?

Because it had a grizzly cough!

In the forest, a bear does roam, 
Its gentle strength, a wondrous home.
