# Chat Models in Langchain and Configuration

## Bootstrap

⚓--- Before proceeding futher it is very important you do the following: --- 👾

Select the 🗝 (key) icon in the left pane and include your OpenAI Api key with Name as "OPENAPI_KEY" and value as the key, and grant it notebook access in order to be able to run this notebook.

Run the below two cells in the order they are in, before running further cells. Wait till a number appears in place of '*' or '[ ]'. Below the cell you should see "✅ Ready: Models (ChatOpenAI, OpenAIEmbeddings)."

In [None]:
!pip install -q langchain langchain-openai langchain-community

In [None]:
# Bootstrap: environment & imports
from google.colab import userdata

key = userdata.get('OPENAI_API_KEY')  # returns None if not granted
if not key:
    raise RuntimeError("Set OPENAI_API_KEY in a .env file next to this notebook.")

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.callbacks import StdOutCallbackHandler, CallbackManager
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_core.runnables import RunnableLambda

print("✅ Ready: Models (ChatOpenAI, OpenAIEmbeddings)")

## Chat model

Chat Models are a component in Langchain. OpenAI even has it's own configuration in Langchain.

In [None]:
llm = ChatOpenAI(
    model="gpt-4o-mini", api_key=key
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant."),
    ("user", "List 3 use-cases of chat models for developers.")
])

chain = prompt | llm | StrOutputParser()
result = chain.invoke({})
print("\n--- Final Output ---")
print(result)

## Configuration and Controlling

Temperature defines the creativity vs determinism in your model. `streaming=true` part allows you have streaming of chunks if required when you use `.stream`.

`max_tokens` is a hard cap on generated tokens.

In [None]:
controlled = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,     # more deterministic
    max_tokens=120,      # hard cap on generated tokens
    top_p=1.0,           # nucleus sampling; 1.0 = off
    streaming=True,
    api_key=key
)

ctl_chain = (
    ChatPromptTemplate.from_messages([
        ("system", "Be precise and brief."),
        ("user", "Explain {topic} in <= 80 tokens.")
    ])
    | controlled
    | StrOutputParser()
)

print(ctl_chain.invoke({"topic": "temperature vs top_p"}))

## ChatOpenAI Low-level control

In ChatOpenAI, you can directly invoke the llm with the Messages rather a PromptTemplate. This way the output containing the metadata and additional information other than the text content can also be accessed.

In [None]:
messages = [
    SystemMessage(content="You translate to French, concisely."),
    HumanMessage(content="Translate: 'Good morning, team. Let's start.'")
]

raw_resp = llm.invoke(messages)
print("\n--- Raw AIMessage ---")
print(type(raw_resp), raw_resp)

print("\n--- As text ---")
print(raw_resp.content)

## Fallbacks and Timeouts

Use `.with_fallbacks([...])` to try backup models if the primary fails. Use `.with_config(timeout=...)` for per-run time limits.

In [None]:
primary = ChatOpenAI(model="gpt-4o-mini", temperature=0.2, api_key=key)
backup  = ChatOpenAI(model="gpt-4o-mini", temperature=0.8, api_key=key)  # in practice use a different provider/model

robust = (ChatPromptTemplate.from_messages([
    ("system", "Answer crisply."),
    ("user", "{q}")
]) | primary).with_config(timeout= 5).with_fallbacks([backup]) | StrOutputParser()

print(robust.invoke({"q": "Give me 3 bullets on LCEL."}))

The timeout can be configured in the `.invoke` stage as well. Like this

```python
print(robust.invoke({"q": "Give me 3 bullets on LCEL."}, config={"timeout": 15}))
```

## Embedding Models

Apart from the llms used in the application, embedding models are another type of models that help in converting chunks of text into vectors that can be stored in a vector database.

In [None]:
emb = OpenAIEmbeddings(model="text-embedding-3-small", api_key=key)  # fast & cheap for learning

texts = [
    "LangChain composes LLM apps using runnables and chains.",
    "Vector embeddings map text to numeric vectors for similarity.",
    "Bananas are a good source of potassium."
]

vecs = emb.embed_documents(texts)
query = "How do I represent text for similarity search?"
qvec = emb.embed_query(query)