<a href="https://colab.research.google.com/github/towardsai/ai-tutor-rag-system/blob/notebook%2Faman/notebooks/Observablity_And_Tracing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install Dependencies

In [1]:
!pip install -q langchain-openai==0.3.33 langchain==0.3.27

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/447.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m447.5/447.5 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Setting up env

In [2]:
import os
# os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY"
# os.environ["LANGCHAIN_API_KEY"] = "<LANGCHAIN_API_KEY>"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = "<LANGCHAIN_PROJECT>"

from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LangChain_API_Key')
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "demo-project"

## Tracing Langchain calls

In [3]:
from langchain_openai import ChatOpenAI
from langchain import hub

prompt = hub.pull("wikianes/section_writer")
llm = ChatOpenAI(model="gpt-5-mini")

chain = prompt | llm
output = chain.invoke({"section_description": "Quantum Physics"})
print(output.content)

I’m missing one key detail: “Quantum Physics” is not a surgical procedure, so I can’t sensibly produce procedure-specific preoperative anesthetic guidance for it. Do you mean a particular operation (for example, “quadriceps tendon repair,” “quaternary iliofemoral arthroplasty,” or a different named procedure)? Please confirm the intended surgical procedure or section title.

If you intended a general preoperative section template for WikiAnesthesia (useful until you tell me the exact procedure), here is a concise, WikiAnesthesia‑style preoperative section you can drop into any procedure page and then I can tailor it further to the exact surgery:

Preoperative assessment
- Focused history: baseline functional status, exercise tolerance, cardiopulmonary disease, bleeding history, prior anesthetic complications, airway problems, concurrent medications, and opioid/benzodiazepine tolerance.
- Comorbidities: identify and document active cardiac, pulmonary, renal, hepatic, endocrine, neurolog

## Tracing OpenAI calls

In [4]:
from openai import OpenAI
from langsmith.wrappers import wrap_openai

# wrap_openai from langsmith provides wrapper to
ai_client = wrap_openai(OpenAI())


def retrieve_documents(inquiry: str):
    docs = [
        "Vector databases enable efficient semantic search for LLMs.",
        "Retrieval-Augmented Generation (RAG) improves LLM accuracy with relevant context.",
        "Large Language Models (LLMs) have set new benchmarks in NLP tasks.",
        "Transformer architectures have revolutionized many areas of machine learning.",
        "Embedding models convert text into vector representations for similarity comparisons."
    ]
    return docs

def ask_qa(query):
    context = retrieve_documents(query)
    system_prompt = f"""
    You are an assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question.
    Context: {context}
    """

    return ai_client.responses.create(
        instructions=system_prompt,
        input=query,
        model="gpt-5-mini",
        reasoning={'effort':'low'} # For Reasoning Models
    )

In [5]:
ask_qa("what is vector db?")

Response(id='resp_68d138d45f6c81a1a9276e626f228da106ce867c79a9ec01', created_at=1758542036.0, error=None, incomplete_details=None, instructions="\n    You are an assistant for question-answering tasks.\n    Use the following pieces of retrieved context to answer the question.\n    Context: ['Vector databases enable efficient semantic search for LLMs.', 'Retrieval-Augmented Generation (RAG) improves LLM accuracy with relevant context.', 'Large Language Models (LLMs) have set new benchmarks in NLP tasks.', 'Transformer architectures have revolutionized many areas of machine learning.', 'Embedding models convert text into vector representations for similarity comparisons.']\n    ", metadata={}, model='gpt-5-mini-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_68d138d4d5b481a1ab51cdca73d5f77306ce867c79a9ec01', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_68d138d5f84881a1b4feab828aa5dad006ce867c79a9ec01'

In [6]:
ask_qa("what is vector db?").output[1].content[0].text

'A vector database (vector DB) is a specialized database designed to store and search high-dimensional numeric vectors — typically embeddings produced by models that convert text, images, or other data into dense vectors. Instead of exact-match queries on keywords, a vector DB lets you find items that are semantically similar by comparing vector distances (e.g., cosine similarity, Euclidean distance).\n\nKey points:\n- Purpose: Enable efficient semantic search and similarity matching for applications like semantic text search, recommendation, image retrieval, and Retrieval-Augmented Generation (RAG) for LLMs.\n- How it works: You index embeddings (vector representations) of documents or items. At query time, you convert the query to an embedding and run a nearest-neighbor search to find the most similar stored vectors.\n- Technology: Uses specialized indexes and algorithms (approximate nearest neighbor methods such as HNSW, IVF, PQ) to search quickly at scale.\n- Benefits: Much better 

## Using Langsmith traceable decorator

In [7]:
from openai import OpenAI
from langsmith import traceable

openai_client = OpenAI()

@traceable(run_type="llm", name="OpenAI call")
def call_llm(model, system_prompt,query):
    result = openai_client.responses.create(
            instructions=system_prompt,
            input=query,
            model=model,
            reasoning={'effort':'low'} # For Reasoning Models
        )
    return result.output[1].content[0].text


@traceable(run_type="retriever")
def retrieve_documents(inquiry: str):
    docs = [
        "Vector databases enable efficient semantic search for LLMs.",
        "Retrieval-Augmented Generation (RAG) improves LLM accuracy with relevant context.",
        "Large Language Models (LLMs) have set new benchmarks in NLP tasks.",
        "Transformer architectures have revolutionized many areas of machine learning.",
        "Embedding models convert text into vector representations for similarity comparisons."
    ]
    return docs

@traceable(run_type="chain")
def ask_qa(query):
    context = retrieve_documents(query)
    system_prompt = f"""
    You are an assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question.
    Context: {context}
    """

    return call_llm("gpt-5-mini", system_prompt,query)

In [8]:
ask_qa("What are large language models", langsmith_extra={"metadata": {"user": "test_user@gmail.com"}})

'Large language models (LLMs) are machine learning models designed to understand and generate human language. Key points:\n\n- Architecture and training\n  - Most modern LLMs use the transformer architecture, which enabled breakthroughs in language tasks through self-attention mechanisms.\n  - They’re trained on very large text corpora (web pages, books, code, etc.) using unsupervised or self-supervised objectives (e.g., next-token prediction), which lets them learn grammar, facts, reasoning patterns, and linguistic structure.\n\n- What they can do\n  - Generate fluent text, summarize, translate, answer questions, write code, extract information, and perform many other NLP tasks often at or above human benchmarks.\n  - Produce vector embeddings: internal or external embedding models convert text into high-dimensional vectors useful for similarity comparisons.\n\n- How they’re used in systems\n  - In many applications LLM outputs are combined with retrieval systems (Retrieval-Augmented 