# ðŸ”— LangChain + LLM Abstractions

In this notebook weâ€™ll move from the raw OpenAI SDK to **LangChain**, which provides higher-level abstractions to make LLM use easier.

Weâ€™ll cover:
1. Setting up `ChatOpenAI`.  
2. Prompt templates & the **LangChain Expression Language (LCEL)**.  
3. Varying parameters like `temperature` and `top_p`.  
4. **Streaming** responses with callbacks.  
5. **Batching & parallel calls** with `.batch()`.  
6. Getting **structured outputs** with Pydantic models.  
7. Swapping configs at runtime (`.bind`, `.with_config`).  
8. (Optional) Using Azure OpenAI through LangChain.

By the end, youâ€™ll see how LangChain **simplifies interaction, chaining, and integration** with LLMs.

In [1]:
import os
from typing import Any

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.3,   # same knobs as OpenAI
    # top_p=0.9,
    api_key=os.getenv("OPENAI_API_KEY"),
)

In [2]:
llm.invoke("Give me two quick tips for learning Python.")

AIMessage(content="Sure! Here are two quick tips for learning Python:\n\n1. **Practice Regularly**: Consistency is key when learning a programming language. Set aside time each day or week to practice coding. Work on small projects, solve coding challenges on platforms like LeetCode or HackerRank, and try to build something that interests you.\n\n2. **Utilize Online Resources**: Take advantage of the wealth of online resources available. Websites like Codecademy, freeCodeCamp, and Coursera offer interactive courses, while documentation and tutorials on the official Python website can provide in-depth knowledge. Engaging with communities on forums like Stack Overflow or Reddit can also help you get answers to your questions and learn from others' experiences.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 145, 'prompt_tokens': 16, 'total_tokens': 161, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reason

In [None]:
llm.invoke(
    [
        ("system", "You are a helpful, concise assistant."),
        ("user", "Summarize this in 2 bullets:\n\n{text}"),
    ]
)

AIMessage(content="Sure! Please provide the text you'd like me to summarize.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 30, 'total_tokens': 42, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CPWzwNRVCPYwmaoVU2kGYICEyNFKn', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--3a0fab8f-394f-436a-bca0-43495c94132b-0', usage_metadata={'input_tokens': 30, 'output_tokens': 12, 'total_tokens': 42, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [10]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful, concise assistant."),
    ("user", "Summarize this in 2 bullets:\n\n{text}")
])

In [11]:
prompt

ChatPromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a helpful, concise assistant.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='Summarize this in 2 bullets:\n\n{text}'), additional_kwargs={})])

In [14]:
text_example = "Transformers use attention to weigh context; embeddings turn tokens into vectors."

In [15]:
prompt_rendered = prompt.invoke({"text": text_example})
prompt_rendered

ChatPromptValue(messages=[SystemMessage(content='You are a helpful, concise assistant.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Summarize this in 2 bullets:\n\nTransformers use attention to weigh context; embeddings turn tokens into vectors.', additional_kwargs={}, response_metadata={})])

In [16]:
response = llm.invoke(prompt_rendered)

In [18]:
print(response.content)

- Transformers utilize attention mechanisms to prioritize context in processing information.
- Embeddings convert tokens into vector representations for effective model input.


In [19]:
content = StrOutputParser().invoke(response)
print(content)

- Transformers utilize attention mechanisms to prioritize context in processing information.
- Embeddings convert tokens into vector representations for effective model input.


In [24]:
chain1 = prompt | llm

In [22]:
chain.invoke({"text": text_example})

AIMessage(content='- Transformers utilize attention mechanisms to prioritize context in processing data.\n- Embeddings convert tokens into vector representations for better understanding and manipulation.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 42, 'total_tokens': 68, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CPXB4Vdldpsa4adgGlhvNSyJa8PHd', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--eb575178-fda2-4264-ab18-1127973de7d1-0', usage_metadata={'input_tokens': 42, 'output_tokens': 26, 'total_tokens': 68, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [26]:
chain = prompt | llm | StrOutputParser()
response = chain.invoke({"text": text_example})

In [27]:
print(response)

- Transformers utilize attention mechanisms to prioritize context in processing data.
- Embeddings convert tokens into vector representations for better model understanding.


In [28]:
from langchain_core.callbacks import BaseCallbackHandler

class PrintHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        print(token, end="")

stream_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, streaming=True, callbacks=[PrintHandler()])
(stream_llm | StrOutputParser()).invoke("Stream a 5-sentence haiku about rain.")
print()

Whispers of the rain,  
Dancing on the windowpanes,  
Nature's soft embrace.  
Puddles form like mirrors,  
Reflecting gray skies' tears.


In [29]:
chain = llm | StrOutputParser()

In [30]:
questions = [
    "What is overfitting?",
    "Explain dropout in one line.",
    "Contrast precision vs recall briefly."
]
# Runnable.batch for parallel execution
answers = chain.batch(questions)

In [31]:
for q, a in zip(questions, answers):
    print(f"Q: {q}\nA: {a}\n")


Q: What is overfitting?
A: Overfitting is a common problem in machine learning and statistical modeling where a model learns the training data too well, capturing noise and fluctuations rather than the underlying patterns. This results in a model that performs exceptionally well on the training dataset but poorly on unseen data or test datasets. 

Key characteristics of overfitting include:

1. **High Training Accuracy, Low Test Accuracy**: The model shows excellent performance on the training data but fails to generalize to new, unseen data.

2. **Complex Models**: Overfitting often occurs with overly complex models that have too many parameters relative to the amount of training data. Such models can fit the training data very closely, including its noise.

3. **Insufficient Data**: When the amount of training data is limited, models may overfit because they do not have enough examples to learn the true underlying distribution.

To mitigate overfitting, several techniques can be empl

In [32]:
from pydantic import BaseModel, Field

class Flashcard(BaseModel):
    term: str = Field(..., description="Short term")
    definition: str = Field(..., description="One-sentence definition")

structured_llm = llm.with_structured_output(Flashcard)  # Let LC coax JSONâ†’model
card = structured_llm.invoke("Create a flashcard about 'positional encoding' in Transformers.")
card

Flashcard(term='Positional Encoding', definition='A technique used in Transformers to inject information about the position of tokens in a sequence, allowing the model to understand the order of the input data.')

In [34]:
card.term

'Positional Encoding'