# Week 3 — **LangChain Essentials I (Complete, Annotated)**
**Covers Sections 3.1 → 3.9**

This single notebook is designed for **Colab or VS Code**. It uses a small open‑source LLM
(`flan-t5-small`) so you can run fully offline. Later, you can swap in a hosted LLM by changing one cell.

### What you will build
1. An LLM “first chain” (prompt → answer).
2. Dense + sparse retrieval (FAISS + BM25) and a **hybrid ranker**.
3. A tiny RAG system (retrieval + prompt fusion) with evaluation helpers.
4. Practical **metrics** (Precision@k, Recall@k, AP, nDCG) and rough latency/cost models.
5. Optional observability hooks (LangSmith).

> **Tip:** Run cells **top to bottom**. Each section includes a short explanation of *why* and *what to expect*.


## 0) Setup

Installs minimal libraries:
- `langchain`, `langchain-community` — core wrappers
- `sentence-transformers` — embeddings
- `faiss-cpu` — vector store
- `rank-bm25` — sparse baseline
- `transformers`, `accelerate` — open LLM runtime
- `numpy`, `scipy` — math utilities


In [None]:
!pip -q install langchain langchain-community langchain-text-splitters sentence-transformers faiss-cpu rank-bm25 transformers accelerate numpy scipy

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.[0m[31m
[0m

In [None]:
!pip install -qU langchain langchain-core langchain-huggingface transformers accelerate

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/106.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.2/106.2 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m467.2/467.2 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.4/155.4 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.8/45.8 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.8/56.8 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.6/207.6 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## 0.1) Shared Imports & Utilities

These helpers are reused across sections:

- `wrap_blocks`: neat wrapping for contexts in prompts
- Metric functions: `precision_at_k`, `recall_at_k`, `average_precision`, `nDCG`


In [3]:
import os, time, math, textwrap, numpy as np
from math import log2
from typing import List

def wrap_blocks(texts: List[str], width=92):
    return "\n\n".join(textwrap.fill(t.strip(), width) for t in texts)

# --- Retrieval metrics ---
def precision_at_k(rel, k):
    rel_k = np.array(rel[:k])
    return float(rel_k.sum()/k) if k>0 else 0.0

def recall_at_k(rel, k, total_relevant):
    rel_k = int(np.array(rel[:k]).sum())
    return float(rel_k/total_relevant) if total_relevant>0 else 0.0

def average_precision(rel, total_relevant):
    hits, precs = 0, []
    for i, r in enumerate(rel, start=1):
        if r==1:
            hits += 1; precs.append(hits/i)
    return float(sum(precs)/total_relevant) if total_relevant>0 else 0.0

def dcg_at_k(rels):
    return float(sum((2**r-1)/log2(i+2) for i, r in enumerate(rels)))

def ndcg_at_k(rels, ideal_rels):
    dcg = dcg_at_k(rels); idcg = dcg_at_k(ideal_rels)
    return float(dcg/idcg) if idcg>0 else 0.0

## 3.1 — Your First Chain (LLM-only)

**Goal:** Prove the plumbing: a prompt template + an LLM.  
We wrap `flan-t5-small` in a LangChain LLM so later you can swap the model
without touching downstream code.

**Expected output:** a one‑line definition of LangChain.

In [54]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

tok = AutoTokenizer.from_pretrained("google/flan-t5-small")
mid = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")

# ↑ Make it talk more: higher max/min tokens, sampling, slight length boost, avoid repeats
pipe = pipeline(
    "text2text-generation",
    model=mid,
    tokenizer=tok,
    max_new_tokens=200,        # was 128
    min_new_tokens=60,         # ensure it doesn't stop super early
    do_sample=True,            # switch from beams to sampling
    temperature=0.8,           # a bit more creative
    top_p=0.9,
    no_repeat_ngram_size=3,    # avoid short loops
    length_penalty=1.2,        # nudge longer completions
    early_stopping=False,      # don't cut off beams early (safe with sampling too)
)

llm = HuggingFacePipeline(pipeline=pipe)

# Ask for the length you want
template = """You are a helpful assistant.
Write a clear 3–5 sentence answer.

Question: {q}
Answer:"""
prompt = PromptTemplate.from_template(template)

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"q": "What is LangChain?"}))

Device set to use cpu


LangChain is a commune in the Haute-Seine department in the north-eastern region of France. It is located on the northern edge of the French Alps and is named after the French colonial explorer, Charles de Gaulle, who was the first French colonist to become a colonist.


## 3.2 — Math/CS Mini: Retrieval Metrics & Simple Bounds

We compute **Precision@k**, **Recall@k**, **Average Precision (AP)**, and **nDCG** on a dummy relevance list.
Then we sketch a latency model and a system upper bound: `A_system ≤ A_retriever × A_LLM`.

**Interpretation guide:**  
- High Precision, low Recall ⇒ clean but fetching too little.  
- High Recall, low Precision ⇒ noisy; tighten filters or index.  
- nDCG rewards *early* relevant hits (rank awareness).

In [24]:
rels = [1,0,1,1,0,0,1,0,1,0]   # relevance of ranks 1..10
k = 10
total_relevant = 8
ideal = [1]*min(total_relevant,k) + [0]*max(0,k-total_relevant)

print("Precision@10:", precision_at_k(rels,k))
print("Recall@10   :", recall_at_k(rels,k,total_relevant))
print("AP          :", average_precision(rels,total_relevant))
print("nDCG@10     :", ndcg_at_k(rels, ideal))

def latency_model(T_embed=0.005, T_retrieve=0.002, T_rank=0.003,
                  tok_in=800, tok_out=180, tok_rate=120):
    """Toy latency: seconds for each stage; LLM cost is token_count / tokens_per_sec."""
    T_llm = (tok_in + tok_out)/tok_rate
    return T_embed + T_retrieve + T_rank + T_llm

print("Total latency (s):", round(latency_model(),3))

def system_bound(A_retriever, A_llm):
    return A_retriever*A_llm

print("System bound baseline:", system_bound(0.90, 0.85))
print("System bound +5+5   :", system_bound(0.95, 0.90))

Precision@10: 0.5
Recall@10   : 0.625
AP          : 0.4429563492063492
nDCG@10     : 0.6488081217425109
Total latency (s): 8.177
System bound baseline: 0.765
System bound +5+5   : 0.855


## 3.3 — Dense + Sparse + Hybrid Retrieval

We’ll build three views over a tiny corpus:
- **Dense**: sentence embeddings + FAISS (semantic meaning).
- **Sparse (BM25)**: keyword exactness (lexical match).
- **Hybrid**: a z‑scored blend `score = α⋅dense + (1−α)⋅sparse`.

**Expected output:** top‑3 chunks for a query, printed with scores.

In [25]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer, util
from rank_bm25 import BM25Okapi
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS as LCFAISS
import numpy as np

corpus = (
    "Aspirin reduces pain and fever but may irritate the stomach lining.\n"
    "Ibuprofen is an NSAID; avoid on an empty stomach.\n"
    "Paracetamol (acetaminophen) is preferred for children with fever.\n"
    "Vitamin C supports immune function.\n"
    "Consult a doctor before mixing painkillers or if you have ulcers."
)
chunks = RecursiveCharacterTextSplitter(chunk_size=120, chunk_overlap=20).split_text(corpus)

# Dense FAISS
emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = LCFAISS.from_texts(chunks, emb)

# Sparse BM25
bm25 = BM25Okapi([c.lower().split() for c in chunks])

# Dense scoring helper
st = SentenceTransformer('all-MiniLM-L6-v2')
vecs = st.encode(chunks, normalize_embeddings=True)

def dense_scores(q):
    qv = st.encode(q, normalize_embeddings=True)
    return util.cos_sim(qv, vecs)[0].cpu().numpy()

def sparse_scores(q):
    return np.array(bm25.get_scores(q.lower().split()), dtype=float)

def zscore(x):
    x = np.array(x, dtype=float); sd = x.std() if x.std()>0 else 1.0
    return (x - x.mean())/sd

def hybrid_rank(q, k=3, alpha=0.6):
    ds, ss = dense_scores(q), sparse_scores(q)
    hyb = alpha*zscore(ds) + (1-alpha)*zscore(ss)
    idx = np.argsort(-hyb)[:k]
    return [(i, float(hyb[i])) for i in idx]

def show(q, mode="hybrid", k=3, alpha=0.6):
    if mode=="dense":
        s = dense_scores(q); idx = np.argsort(-s)[:k]; pairs = [(i,float(s[i])) for i in idx]
    elif mode=="sparse":
        s = sparse_scores(q); idx = np.argsort(-s)[:k]; pairs = [(i,float(s[i])) for i in idx]
    else:
        pairs = hybrid_rank(q,k,alpha)
    print("\nQuery:", q)
    for r,(i,score) in enumerate(pairs,1):
        print(f"[{r}] ({score:+.3f}) {chunks[i]}")
    return [i for i,_ in pairs]

_ = show("preferred painkiller for kids", "hybrid", alpha=0.7)

  emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Query: preferred painkiller for kids
[1] (+1.228) Paracetamol (acetaminophen) is preferred for children with fever.
Vitamin C supports immune function.
[2] (-0.114) Consult a doctor before mixing painkillers or if you have ulcers.
[3] (-1.114) Aspirin reduces pain and fever but may irritate the stomach lining.
Ibuprofen is an NSAID; avoid on an empty stomach.


### 3.3b — Prompt Fusion (RAG)

We convert retrieved chunks into a **context block** and inject it into a template.
The LLM must only use this context; otherwise it replies “insufficient context”.

**Expected output:** a one‑sentence grounded answer.


In [27]:
from langchain_core.prompts import ChatPromptTemplate

SYSTEM = ("You are a concise medical assistant. Use only the provided context; "
          "if missing, say 'insufficient context'. One-sentence answer.")
prompt_rag = ChatPromptTemplate.from_messages([
    ("system", SYSTEM),
    ("human", "Context:\n{context}\n\nQuestion: {q}\nAnswer:")
])

def format_ctx(idxs):
    return wrap_blocks([chunks[i] for i in idxs])

def rag_answer(q, alpha=0.6):
    idxs = show(q, "hybrid", 3, alpha)
    ctx = format_ctx(idxs)
    # invoke() returns PromptValue → then .to_string()
    prompt_value = prompt_rag.invoke({"context": ctx, "q": q})
    return llm.invoke(prompt_value.to_string())



print(rag_answer("Which painkiller is preferred for children with fever?", alpha=0.7))


Query: Which painkiller is preferred for children with fever?
[1] (+1.353) Paracetamol (acetaminophen) is preferred for children with fever.
Vitamin C supports immune function.
[2] (-0.364) Aspirin reduces pain and fever but may irritate the stomach lining.
Ibuprofen is an NSAID; avoid on an empty stomach.
[3] (-0.988) Consult a doctor before mixing painkillers or if you have ulcers.
Paracetamol (acetaminophen) is preferred for children with fever. Vitamin C supports immune function. Aspirin reduces pain and fever but may irritate the stomach lining. Ibuprofen is an NSAID; avoid on an empty stomach. Consult a doctor before mixing painkillers or if you have ulcers.


### 3.3c — α‑Tuning with Tiny Qrels

We simulate ground truth (qrels) and sweep α to see which blend gives best **Recall@3**.
This mirrors how you’d tune hybrid weight in production with a validation set.


In [28]:
qrels = {
    "children fever": {2},
    "stomach upset NSAID": {0,1},
}

def indices_for(q,a): return [i for i,_ in hybrid_rank(q,3,a)]
def recall3(q,a):
    got=set(indices_for(q,a)); gold=qrels[q]
    return len(gold & got)/len(gold)

for a in [0.0,0.3,0.6,0.8,1.0]:
    import numpy as np
    r = np.mean([recall3(q,a) for q in qrels])
    print(f"alpha={a:.1f} → mean Recall@3={r:.2f}")

alpha=0.0 → mean Recall@3=1.00
alpha=0.3 → mean Recall@3=1.00
alpha=0.6 → mean Recall@3=1.00
alpha=0.8 → mean Recall@3=1.00
alpha=1.0 → mean Recall@3=1.00


## 3.4 — Your First RAG Chain (LangChain `LCEL`)


This wraps the previous steps into one reusable chain.  
Optionally enable **LangSmith** tracing by uncommenting the two environment lines and adding your key.


In [53]:
# Optional: enable LangSmith tracing for observability
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

# Define the retriever
retriever = vectordb.as_retriever(search_kwargs={"k":3})

# Define the prompt template
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use a maximum of three sentences and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

# Define the RAG chain using LCEL
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

q = "What should I do before mixing painkillers?"
t0=time.time(); out = rag_chain.invoke(q); dt=time.time()-t0
print("Answer:", out)
print(f"Latency: {dt*1000:.1f} ms")

Answer: Consult a doctor before mixing painkillers or if you have ulcers. [Document(id='30b9865e-5e59-491a-ae59-d195cd4c366a', metadata='Aspirin reduces pain and fever but may irritate the stomach lining.']
Latency: 8304.1 ms


## 3.6 — Mini Evaluator Helpers

Quick functions to evaluate **Recall@k** across multiple queries for our toy corpus.


In [51]:
qrels_full = {
    "preferred for children": {2},
    "avoid on empty stomach": {1},
    "ulcers and mixing drugs": {4},
}

def eval_recall_at_k(queries, k=3, alpha=0.6):
    recs=[]
    for q in queries:
        idxs = [i for i,_ in hybrid_rank(q,k,alpha)]
        got=set(idxs); gold=qrels_full.get(q,set())
        recs.append(len(gold & got)/len(gold) if gold else 0.0)
    return float(np.mean(recs))

queries = list(qrels_full.keys())
for a in [0.3, 0.5, 0.7, 0.9]:
    print(f"alpha={a}: mean Recall@3={eval_recall_at_k(queries,3,a):.2f}")

alpha=0.3: mean Recall@3=0.67
alpha=0.5: mean Recall@3=0.67
alpha=0.7: mean Recall@3=0.67
alpha=0.9: mean Recall@3=0.67


## 3.7 — Capstone Packaging (what to include)

Use this structure in your repo submission:


In [52]:
print("""
week3_langchain_essentials/
├── main.ipynb      # this notebook
├── readme.md       # describe pipeline, metrics, and decisions
└── config.yaml     # model + retriever config (alpha, k, model_name)
""")


week3_langchain_essentials/
├── main.ipynb      # this notebook
├── readme.md       # describe pipeline, metrics, and decisions
└── config.yaml     # model + retriever config (alpha, k, model_name)



## 3.9 — Final Helper: `ask()`

Tiny convenience wrapper so you can quickly query the hybrid RAG.


In [50]:
def ask(question):
    return rag_chain.invoke(question)

print(ask("Does aspirin irritate the stomach?"))

Reduces pain and fever but may irritate the stomach lining. (id='962556ae-068c-40a5-84b7-24760c58487a', metadata='Aspirin Reduces Pain and Fever'), Document(id="30b9865e-5e59-491a-ae59-d195cd4c366a'), metadata= 'Paracetamol (acidaminophen) is preferred for children with fever.nVitamin C supports immune function.')]
