# NojdarBot: Multilingual Medical Question Answering with Generative AI

Welcome to this Kaggle notebook where we build a complete medical question answering system for Kurdish speakers using generative AI. By the end of this notebook, you'll have created a working system that translates Kurdish medical questions, retrieves relevant information, and provides accurate answers in the user's native language.

## 1. The Problem: Medical Information Access for Kurdish Speakers

Access to accurate medical information in languages other than English presents a significant global healthcare challenge. For the ~30 million Kurdish speakers worldwide, particularly those who speak Sorani Kurdish, finding reliable medical information in their native language is extremely difficult.

Traditional translation services often fail with medical content due to:
- Lack of specialized medical vocabulary
- Cultural context misalignment
- Inaccurate translations that could lead to dangerous medical misunderstandings

## 2. How Generative AI Solves This Problem

NojdarBot leverages multiple AI models in a custom pipeline:

1. **Translation**: Gemini 1.5 Pro translates Kurdish questions to English
2. **Retrieval**: ChromaDB vector database retrieves medical knowledge
3. **Answer Generation**: Gemini generates medically accurate responses
4. **Evaluation**: Claude 3.5 Haiku assesses response quality
5. **Back-Translation**: Specialized translation preserves medical terminology

The following sections walk through building this system step by step.
"""

"""
## 3. Environment Setup

First, we install the necessary libraries and import dependencies. We'll need:
- `chromadb`: For vector storage of medical knowledge
- `google-generativeai`: To access Gemini models for translation and answer generation
- `anthropic`: To access Claude models for quality evaluation
- `langgraph`: For building our processing pipeline

We also import standard utilities for working with data, regex, and JSON

### 3.1 Environment Setup

In [9]:
!pip install -q chromadb google-generativeai anthropic langgraph
import ast
import os
import re
import numpy as np
import chromadb
from google import genai
from google.genai import types
from google.api_core import retry
import json
from datetime import datetime

Next, we define an enumeration to represent the quality rating levels for our evaluation system.
This gives us a consistent way to categorize and reference the quality of generated answers.

### 3.2 Enumerations for Quality Ratings


In [10]:
import enum

class AnswerRating(enum.Enum):
    VERY_GOOD = '5'
    GOOD = '4'
    OK = '3'
    BAD = '2'
    VERY_BAD = '1'

## 4. API Configuration and Model Access

In this section, we set up access to the AI models we'll use:

1. **Gemini API**: For translation and answer generation
2. **Claude API**: For quality evaluation of generated answers

We use Kaggle secrets to securely access API keys, which protects them from being exposed in the notebook.

In [11]:
from kaggle_secrets import UserSecretsClient
from anthropic import Anthropic


user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GOOGLE_API_KEY")
client = genai.Client(api_key=api_key)


claude_api_key = user_secrets.get_secret("ANTHROPIC_API_KEY")
claude_client = Anthropic(api_key=claude_api_key)

## 5. Translation Pipeline: The Intake Node

This first node in our pipeline handles the translation of Sorani Kurdish medical questions to English.

The function:
1. Receives user input in Kurdish
2. Uses Gemini 1.5 Pro with a specialized prompt to translate the question
3. Stores both the input and translation in the state object
4. Maintains conversation history for context-aware translations

This translation step is critical - the quality of all subsequent steps depends on accurate translation of the medical terminology.


In [12]:
def intake_node(state: dict) -> dict:
    # === Step 1: Prepare conversation history prompt ===
    history_prompt = "\n".join(
        [f"User: {turn['user']}\nAgent: {turn['agent']}" for turn in state.get("chat_history", [])]
    )
    
    # === Step 2: Construct translation prompt with history context ===
    prompt = (
        f"{history_prompt}\n"
        f"User: Translate this Sorani Kurdish medical question to English:\n{state['user_input']}\n"
        f"Agent:"
    )

    # === Step 3: Configure and call the translation model ===
    config = types.GenerateContentConfig(temperature=0.3)
    response = client.models.generate_content(
        model="gemini-1.5-pro-latest",
        config=config,
        contents=[prompt]
    )

    # === Step 4: Process result and update state ===
    translated = response.text.strip()
    state["translated_input"] = translated

    state["chat_history"] = state.get("chat_history", [])
    state["chat_history"].append({
        "user": state["user_input"],
        "agent": translated
    })

    return state

## 6. Knowledge Base: Building the Medical Information Retrieval System

In the next sections, we build our Retrieval-Augmented Generation (RAG) system, which provides the knowledge base for our medical questions.

The steps include:
1. Parsing a medical Q&A dataset (MedQuAD) into structured data
2. Creating vector embeddings of these Q&A pairs
3. Storing the embeddings in ChromaDB for semantic search
4. Setting up efficient retrieval functionality

This approach ensures our answers are grounded in reliable medical information rather than being hallucinated.

# 6.1 Dataset Parser for MedQuAD

In [13]:
import xml.etree.ElementTree as ET

def parse_medquad_folder(base_path):
    qa_pairs = []
    for root_dir, _, files in os.walk(base_path):
        for fname in files:
            if fname.endswith(".xml"):
                fpath = os.path.join(root_dir, fname)
                try:
                    tree = ET.parse(fpath)
                    root = tree.getroot()
                    for qa in root.findall(".//QAPair"):
                        question_elem = qa.find("Question")
                        answer_elem = qa.find("Answer")
                        if question_elem is not None and answer_elem is not None:
                            question = question_elem.text.strip() if question_elem.text else None
                            answer = answer_elem.text.strip() if answer_elem.text else None
                            if question and answer:
                                qa_pairs.append({
                                    "question": question,
                                    "answer": answer,
                                    "source_file": fname,
                                    "source_folder": os.path.basename(root_dir),
                                    "pid": qa.attrib.get("pid", "N/A"),
                                    "qid": question_elem.attrib.get("qid", "N/A"),
                                    "qtype": question_elem.attrib.get("qtype", "N/A")
                                })
                except Exception as e:
                    print(f"❌ Error parsing {fname}: {e}")
    return qa_pairs

### 6.2 Vector Database Setup with ChromaDB

Now we initialize our vector database to store and retrieve medical knowledge efficiently. ChromaDB provides:
- Persistent storage of document embeddings
- Fast similarity search using cosine distance
- Metadata storage for source tracking

The system checks if the database already exists. If not, it processes the MedQuAD dataset, 
extracts question-answer pairs, generates embeddings using Gemini's embedding model, and stores them in ChromaDB.

In [14]:
# === Step 1: Folder path for MedQuAD on Kaggle ===
folder = "/kaggle/input/medquad-master-zip/MedQuAD-master"

# === Step 2: Initialize Chroma Persistent Client ===
chroma_client = chromadb.PersistentClient(path="./chroma_storage")
collection = chroma_client.get_or_create_collection(
    name="medical-knowledge",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:search_ef": 100,
        "hnsw:construction_ef": 100,
    }
)

# === Step 3: Check if collection is already populated ===
if collection.count() == 0:
    print("No documents found in Chroma. Ingesting and embedding now...")

    # === Step 4: Define folders to parse ===
    folders_to_parse = [
        "1_CancerGov_QA", "2_GARD_QA", "3_GHR_QA", "4_MPlus_Health_Topics_QA",
        "5_NIDDK_QA", "6_NINDS_QA", "7_SeniorHealth_QA", "9_CDC_QA"
    ]

    retrievable_docs = []
    for name in folders_to_parse:
        folder_path = os.path.join(folder, name)
        docs = parse_medquad_folder(folder_path)
        print(f"Extracted {len(docs)} question–answer pairs from {folder_path}")
        retrievable_docs.extend(docs)

    # === Step 5: Build corpus and metadata ===
    corpus = [f"{item['question']}\n{item['answer']}" for item in retrievable_docs]
    metadatas = [
        {
            "qid": item["qid"],
            "qtype": item["qtype"],
            "pid": item["pid"],
            "source_file": item["source_file"],
            "source_folder": item["source_folder"]
        }
        for item in retrievable_docs
    ]

    # === Step 6: Define Gemini Embedding Function with batching ===
    is_retriable = lambda e: isinstance(e, genai.errors.APIError) and e.code in {429, 503}

    @retry.Retry(predicate=is_retriable, timeout=300.0)
    def safe_embed_text_batches(corpus, task_type="retrieval_document", batch_size=100):
        all_embeddings = []
        for start in range(0, len(corpus), batch_size):
            end = min(start + batch_size, len(corpus))
            batch = corpus[start:end]
            print(f"Embedding batch {start} to {end}...")
            try:
                response = client.models.embed_content(
                    model="models/text-embedding-004",
                    contents=batch,
                    config=types.EmbedContentConfig(task_type=task_type)
                )
                batch_embeddings = [e.values for e in response.embeddings]
                all_embeddings.extend(batch_embeddings)
            except Exception as e:
                print(f"Error embedding batch {start}-{end}: {e}")
        return all_embeddings

    print("Generating document embeddings using Gemini...")
    doc_embeddings = safe_embed_text_batches(corpus)

    # === Step 7: Upload to Chroma in batches ===
    def batch_add_to_chroma(collection, corpus, embeddings, metadatas, batch_size=5000):
        total = len(corpus)
        for start in range(0, total, batch_size):
            end = min(start + batch_size, total)
            print(f"Adding Chroma batch {start} to {end}...")
            collection.add(
                documents=corpus[start:end],
                embeddings=embeddings[start:end],
                metadatas=metadatas[start:end],
                ids=[f"doc_{i}" for i in range(start, end)]
            )

    batch_add_to_chroma(
        collection=collection,
        corpus=corpus,
        embeddings=doc_embeddings,
        metadatas=metadatas,
        batch_size=5000
    )

    # === Step 8: Done
    print(f"Successfully indexed {len(corpus)} documents into ChromaDB.")

else:
    print(f"ChromaDB already contains {collection.count()} documents. Skipping ingestion.")

ChromaDB already contains 15848 documents. Skipping ingestion.


In [15]:
print(dir(collection))

['__annotations__', '__class__', '__class_getitem__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__orig_bases__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_client', '_data_loader', '_embed', '_embed_record_set', '_embedding_function', '_is_protocol', '_model', '_transform_get_response', '_transform_peek_response', '_transform_query_response', '_update_model_after_modify_success', '_validate_and_prepare_add_request', '_validate_and_prepare_delete_request', '_validate_and_prepare_get_request', '_validate_and_prepare_query_request', '_validate_and_prepare_update_request', '_validate_and_prepare_upsert_request', '_validate_modify_request', 'add', 'configuration', 'configuration_json', 'count', 'database', 'delete',

In [16]:
print(f"Collection type: {type(collection).__name__}")

Collection type: Collection


In [17]:
if hasattr(collection, "_client"):
    print(f"Client type: {type(collection._client).__name__}")
    
    if hasattr(collection._client, "get_settings"):
        settings = collection._client.get_settings()
        print(f"Settings: {settings}")

Client type: RustBindingsAPI
Settings: environment='' chroma_api_impl='chromadb.api.rust.RustBindingsAPI' chroma_server_nofile=None chroma_server_thread_pool_size=40 tenant_id='default' topic_namespace='default' chroma_server_host=None chroma_server_headers=None chroma_server_http_port=None chroma_server_ssl_enabled=False chroma_server_ssl_verify=None chroma_server_api_default_path=<APIVersion.V2: '/api/v2'> chroma_server_cors_allow_origins=[] is_persistent=True persist_directory='./chroma_storage' chroma_memory_limit_bytes=0 chroma_segment_cache_policy=None allow_reset=False chroma_auth_token_transport_header=None chroma_client_auth_provider=None chroma_client_auth_credentials=None chroma_server_auth_ignore_paths={'/api/v2': ['GET'], '/api/v2/heartbeat': ['GET'], '/api/v2/version': ['GET'], '/api/v1': ['GET'], '/api/v1/heartbeat': ['GET'], '/api/v1/version': ['GET']} chroma_overwrite_singleton_tenant_database_access_from_auth=False chroma_server_authn_provider=None chroma_server_authn

## 7. Query Processing: Retrieval and Answer Generation

With our knowledge base in place, we now implement the core RAG functionality:

1. **Query Embedding**: Convert the translated question into a vector representation
2. **Semantic Search**: Find the most relevant medical information in our database
3. **Answer Generation**: Create a comprehensive answer based on the retrieved information

This multi-step process ensures that our responses are both relevant and factually accurate.

# 7.1 Query Embedding Function


In [18]:
@retry.Retry(predicate=lambda e: isinstance(e, genai.errors.APIError) and e.code in {429, 503}, timeout=300.0)
def embed_text(text, task_type="retrieval_query"):
    response = client.models.embed_content(
        model="models/text-embedding-004",
        contents=[text],
        config=types.EmbedContentConfig(task_type=task_type)
    )
    return response.embeddings[0].values

### 7.2 Retrieval Node: Finding Relevant Medical Information

The retriever node is responsible for:
1. Converting the English question into a vector embedding
2. Searching the medical knowledge base for similar content
3. Filtering results by a similarity threshold
4. Structuring the results for the answer generation phase

This semantic search approach helps find conceptually relevant information even when exact keyword matches aren't present.
"""

In [19]:
def retriever_node(state: dict) -> dict:
    query_en = state["translated_input"]

    # === Step 1: Embed query
    query_embedding = embed_text(query_en)

    # === Step 2: Chroma query with distances
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=5,
        include=["documents", "metadatas", "distances"]
    )

    # === Step 3: Distance-based filtering
    threshold = 0.4
    filtered_chunks = []
    for doc, meta, dist in zip(results["documents"][0], results["metadatas"][0], results["distances"][0]):
        if dist < threshold:
            q_a = doc.split("\n", 1)
            question = q_a[0] if len(q_a) > 0 else ""
            answer = q_a[1] if len(q_a) > 1 else ""
            filtered_chunks.append({"question": question, "answer": answer, **meta})

    # === Step 4: Update state
    state["retrieved_chunks"] = filtered_chunks
    state["retrieved_metadata"] = results["metadatas"][0]
    return state

### 7.3 Answer Generation: The Recommender Node

The recommender node transforms retrieved medical information into a coherent answer:

1. It formats the retrieved context into a structured prompt
2. Provides clear instructions to the Gemini model on how to use the context
3. Generates a factual, comprehensive answer grounded in the retrieved information
4. Maintains conversation history for follow-up questions

The prompt engineering here is critical for ensuring the model only uses the provided information and doesn't hallucinate medical facts.
"""

In [20]:
def recommender_node(state: dict) -> dict:
    """
    Constructs a final prompt from retrieved chunks and generates an answer.
    """
    question_en = state["translated_input"]
    context_list = state["retrieved_chunks"]

    # Format context into a clean list of Q:A pairs
    context_str = "\n".join(
        f"Q: {r['question']}\nA: {r['answer']}" for r in context_list
    )

    # Build the final prompt for Gemini
    prompt = f"""You are a medical assistant.
Use the information below to answer the patient's question accurately and clearly.

Context:
{context_str}

Question:
{question_en}

Be factual. If unsure, say you don’t know.
Don't summarize. Keep the same level of detail."""

    config = types.GenerateContentConfig(temperature=0.3)

    response = client.models.generate_content(
        model="gemini-1.5-pro-latest",
        config=config,
        contents=[prompt]
    )

    state["ai_response_en"] = response.text.strip()
    state["chat_history"] = state.get("chat_history", [])
    state["chat_history"].append({
        "user": state["user_input"],
        "agent": state["ai_response_en"]
    })
    print("\n🧠 Chat History So Far:")
    for i, turn in enumerate(state["chat_history"], 1):
        print(f"\nTurn {i}:")
        print(f"User: {turn['user']}")
        print(f"Agent: {turn['agent']}")
    return state

## 8. Quality Assurance: Evaluating and Refining Responses

To ensure our medical answers meet high standards, we implement an automated evaluation system using Claude 3.5 Haiku.

The evaluation assesses four key dimensions:
1. **Instruction following**: Does the answer address the specific question?
2. **Groundedness**: Is the answer based only on the provided medical context?
3. **Completeness**: Does it fully address all aspects of the question?
4. **Fluency**: Is the answer clear and easy to understand?

This evaluation provides a quality score and detailed feedback that helps improve the system.

# 8.1 Evaluation Prompt Template

In [21]:
QA_PROMPT = """# Instruction
You are an expert evaluator. Your task is to evaluate the quality of AI-generated answers to medical questions.

# Evaluation
## Metric Definition
You will assess instruction following, groundedness, completeness, and fluency.

## Criteria
Instruction following: Does it answer the question asked?
Groundedness: Is it based only on the provided context?
Completeness: Does it fully answer?
Fluency: Is it easy to read?

## Rating Rubric
5: Very good  
4: Good  
3: Okay  
2: Bad  
1: Very bad

# User Input
{prompt}

# AI-generated Response
{response}
"""

# 8.2 Evaluator Node Implementation


In [22]:
def evaluator_node(state: dict) -> dict:
    question_en = state["translated_input"]
    context_str = "\n".join(
        f"Q: {r['question']}\nA: {r['answer']}" for r in state["retrieved_chunks"]
    )

    formatted_prompt = QA_PROMPT.format(
        prompt=f"Context:\n{context_str}\n\nQuestion:\n{question_en}",
        response=state["ai_response_en"]
    )

    verbose_eval_response = claude_client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=1024,
        temperature=0.0,
        messages=[
            {"role": "user", "content": formatted_prompt}
        ]
    )
    verbose_eval = verbose_eval_response.content[0].text

    score_response = claude_client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=10,
        temperature=0.0,
        messages=[
            {"role": "user", "content": formatted_prompt},
            {"role": "assistant", "content": verbose_eval},
            {"role": "user", "content": "Give a rating between 1 and 5, only the number."}
        ]
    )
    score_text = score_response.content[0].text.strip()

    match = re.search(r"[1-5]", score_text)
    rating_value = int(match.group()) if match else 3

    state["ai_rating_text"] = verbose_eval
    state["ai_rating_score"] = str(rating_value)
    state["ai_rating_confidence"] = rating_value / 5.0

    return state

## 9. Back-Translation: Converting Answers to Kurdish

Once we have a high-quality English answer, we need to translate it back to Sorani Kurdish in a way that preserves medical accuracy.

The controlled translation approach:
1. Uses a specialized prompt that emphasizes preserving medical details
2. Instructs the model to keep medical terms in English if no clear Kurdish equivalent exists
3. Prevents summarization to ensure all medical information is retained
4. Handles translation errors gracefully with fallbacks

This specialized translation is critical for ensuring medical information remains accurate across languages.


# 9.1 Controlled Translation Node

In [23]:
def controlled_translate_node(state: dict) -> dict:
    """
    Translates the AI-generated English answer into Sorani Kurdish (CKB),
    preserving medical accuracy and detail, using Gemini.
    """
    text = state["ai_response_en"]

    prompt = f"""
    Translate the following English medical answer to **Sorani Kurdish**, keeping the same level of detail.

    Do **not summarize**. Translate all sentences unless medically unsafe.

    If any English medical term has no clear Kurdish equivalent, keep it in Latin/English script.

    Answer to translate:
    {text}
    """

    config = types.GenerateContentConfig(temperature=0.3)

    try:
        response = client.models.generate_content(
            model="gemini-1.5-pro-latest",
            config=config,
            contents=[prompt]
        )
        translated = response.text.strip()
    except Exception as e:
        print(f"Gemini translation error: {e}")
        translated = "Translation failed. Please try again."

    state["ai_response_ckb"] = translated
    return state


### 9.2 Medical Terminology Extraction

To enhance transparency and understanding, we extract key medical terms from the English answer:

1. The function identifies medical terminology using Gemini
2. Returns a structured list of medical terms in lowercase
3. Provides these terms as part of the response
4. Helps users understand specialized terminology

This feature helps bridge the knowledge gap by highlighting important medical concepts that might require further explanation.


# 9.2 Medical Term Extraction Node

In [24]:
def extract_terms_node(state: dict) -> dict:
    """
    Extracts medical terms from the English answer using Gemini,
    and stores them in the state as a list of lowercase strings.
    """
    import ast

    answer_text = state.get("ai_response_en", "")

    prompt = """
Extract the **medical terms** from the following medical answer.
Return them as a **Python list** of lowercase strings in strict syntax (e.g., ["diabetes", "insulin"]).
Only return the list — no explanation.

Answer:
""" + answer_text

    config = types.GenerateContentConfig(temperature=0.3)

    try:
        response = client.models.generate_content(
            model="gemini-1.5-pro-latest",
            config=config,
            contents=[prompt]
        )

        raw = response.text.strip()

        # === Try direct parsing
        if raw.startswith("[") and raw.endswith("]"):
            parsed = ast.literal_eval(raw)
            if isinstance(parsed, list):
                state["medical_terms"] = parsed
                return state

        # === Fallback: parse assignment-style lines
        for line in raw.splitlines():
            if "medical_terms" in line and "=" in line:
                _, list_str = line.split("=", 1)
                parsed = ast.literal_eval(list_str.strip())
                if isinstance(parsed, list):
                    state["medical_terms"] = parsed
                    return state

        # === If nothing works
        raise ValueError("Output could not be parsed as a list")

    except Exception as e:
        print(f"Term extraction failed: {e}")
        state["medical_terms"] = []

    return state

## 10. Fallback Mechanism: Search Grounding

If our primary RAG system can't find relevant information, we implement a fallback using Gemini's built-in search capability:

1. The function sends the question to Gemini 2.0 with search tools enabled
2. Gemini grounds its response in up-to-date web information
3. The system marks the response as coming from web search
4. This provides a safety net for questions outside our medical database

This ensures we can still provide useful information even when our primary knowledge base lacks relevant content.


In [25]:
def search_grounding_node(state: dict) -> dict:
    query = state["translated_input"]

    print("🔎 NojdarBot is grounding via Gemini’s built-in search tool...")

    config_with_search = types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())],
        temperature=0.3
    )

    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[f"""Search online and answer this medical question **with links or source names** if possible:{query}"""],        
        config=config_with_search
    )

    answer_text = response.candidates[0].content.parts[0].text
    state["ai_response_en"] = answer_text
    state["grounded_from_search"] = True
    return state

## 11. State Management: Defining System Data Flow

In this section, we define the state object that flows through our pipeline. Each field represents a specific piece of data that is:
1. Generated by one node
2. Consumed by subsequent nodes
3. Eventually used to build the final response

The `BotState` class uses Python's `TypedDict` to define a structured state object with:
- `user_input`: Original Kurdish question from the user
- `translated_input`: English translation of the question
- `retrieved_chunks`: Medical information passages retrieved from our knowledge base
- `retrieved_metadata`: Source information for the retrieved chunks
- `ai_response_en`: Generated answer in English
- `ai_response_ckb`: Translated answer in Kurdish
- `ai_rating_score`: Quality evaluation score (1-5)
- `ai_rating_confidence`: Confidence score for the evaluation (0.0-1.0)
- `ai_rating_text`: Detailed feedback on answer quality
- `medical_terms`: Extracted medical terminology with definitions
- `fallback`: Flag indicating if the system should use fallback mechanisms

This strongly-typed approach ensures consistent data flow and helps identify potential issues during development.


In [26]:
from typing import TypedDict, Optional, List
from langgraph.graph import StateGraph, END
class BotState(TypedDict):
    user_input: str
    translated_input: Optional[str]
    retrieved_chunks: Optional[List[dict]]
    retrieved_metadata: Optional[List[dict]]
    ai_response_en: Optional[str]
    ai_response_ckb: Optional[str]
    ai_rating_score: Optional[str]
    ai_rating_confidence: Optional[float]
    ai_rating_text: Optional[str]
    medical_terms: Optional[List[str]]
    fallback: Optional[bool]

## 12. System Integration: Building the Complete Pipeline

With all components defined, we now build the complete processing pipeline using LangGraph:

1. Define the state structure that will flow through our pipeline
2. Create a graph connecting all processing nodes in the correct order
3. Set up conditional paths for quality-based fallback mechanisms
4. Compile the graph into an executable pipeline

The graph workflow proceeds as follows:
1. **Intake**: Translates user input from Kurdish to English
2. **Retriever**: Finds relevant medical information in the knowledge base
3. **Recommender**: Generates a medical answer using retrieved information
4. **Evaluator**: Assesses answer quality and determines if fallback is needed
5. **Conditional Path**:
   - If quality is sufficient → proceed to translation
   - If quality is insufficient → trigger search-based fallback mechanism
6. **Translator**: Converts English answer back to Kurdish
7. **Term Extractor**: Identifies and explains key medical terminology

This structured approach makes the system modular, extensible, and easy to debug when issues arise. The conditional branching ensures quality control, triggering fallback mechanisms when needed to maintain answer reliability.


In [27]:
graph = StateGraph(BotState)

# Add all your nodes
graph.add_node("intake", intake_node)
graph.add_node("retriever", retriever_node)
graph.add_node("recommender", recommender_node)
graph.add_node("evaluator", evaluator_node)
graph.add_node("search_grounding", search_grounding_node)  # ← Added
graph.add_node("translator", controlled_translate_node)
graph.add_node("term_extractor", extract_terms_node)

# Set entry point
graph.set_entry_point("intake")

# Core flow
graph.add_edge("intake", "retriever")
graph.add_edge("retriever", "recommender")
graph.add_edge("recommender", "evaluator")

# Conditional path: fallback → search_grounding → translator
graph.add_conditional_edges(
    "evaluator",
    lambda state: "search_grounding" if state.get("fallback", False) else "translator",
    {
        "search_grounding": "search_grounding",
        "translator": "translator"
    }
)

# Ensure translator runs even after fallback
graph.add_edge("search_grounding", "translator")

# Final steps
graph.add_edge("translator", "term_extractor")
graph.add_edge("term_extractor", END)

# Compile the graph
nojdarbot_graph = graph.compile()

## 13. Pipeline Invocation: Stateless Function Interface

This function provides a clean, stateless interface to the NojdarBot pipeline. It:
1. Takes a single Kurdish medical question as input
2. Initializes the state object with this input
3. Invokes the full processing pipeline
4. Extracts and formats the final results into a structured response

The returned dictionary contains:
- The Kurdish answer summary
- The original English answer (for reference)
- Urgency assessment (currently fixed as "low")
- Extracted medical terms with definitions
- Source information for transparency
- Context passages used to generate the answer
- Evaluation metrics including quality rating and confidence

This function makes it easy to integrate NojdarBot into various applications without needing to understand the internal state management details.


In [28]:
def nojdarbot_pipeline(user_input_ckb: str) -> dict:
    input_state = {
        "user_input": user_input_ckb
    }
    final_state = nojdarbot_graph.invoke(input_state)

    return {
        "summary": final_state.get("ai_response_ckb"),
        "english_answer": final_state.get("ai_response_en"),
        "urgency": "low",
        "medical_terms": final_state.get("medical_terms", []),
        "sources": final_state.get("retrieved_chunks", []),
        "context_used": [f"{r['question']}\n{r['answer']}" for r in final_state.get("retrieved_chunks", [])],
        "evaluation": {
            "rating_label": final_state.get("ai_rating_score"),
            "rating_value": int(final_state.get("ai_rating_confidence", 0.0) * 5),
            "details": final_state.get("ai_rating_text")
    }}

## 14. User Interaction: Human-in-the-Loop Node

This function implements the human interaction component of our system, allowing for:
1. Displaying the system's response to the user
2. Collecting the next user input
3. Handling exit commands in multiple languages
4. Updating the state with new user input

This follows the standard LangGraph pattern for human-in-the-loop nodes, making it easy to integrate with the rest of the pipeline. The function is designed to work in interactive environments like Jupyter notebooks and command-line interfaces.


In [29]:
def human_node(state: BotState) -> BotState:
    """Display the last model message to the user, and receive the user's input.
    This follows the same pattern as the LangGraph notebook example."""
    
    # If there's a response to display, show it first
    if "ai_response_ckb" in state and state["ai_response_ckb"]:
        print("\nNojdarBot:", state["ai_response_ckb"])
    
    # Get user input
    user_input = input("\n🧑‍⚕️ You: ")
    
    # Check for exit commands
    if user_input.lower() in {"q", "quit", "exit", "goodbye", "دەرچوون"}:
        return state | {"finished": True, "user_input": user_input}
    
    # Return the state with the new user input
    return state | {"user_input": user_input}

## 15. User Interface: Displaying Results

This function handles the presentation layer of NojdarBot, providing two display modes:

1. **Detailed Mode**: Comprehensive view with:
   - English and Kurdish answers
   - Confidence indicators with color coding
   - Medical terminology explanations
   - Source attribution
   - Session tracking information
   - Fallback warnings when appropriate

2. **Minimal Mode**: Streamlined view with:
   - Kurdish answer prominently displayed
   - Collapsible English translation
   - Basic confidence indicator
   - Session ID for tracking

The function supports right-to-left text rendering for Kurdish and uses Markdown formatting for clear visual hierarchy. The confidence badges are color-coded based on the system's confidence level (green for high, orange for medium, red for low confidence), providing users with an immediate visual cue about answer reliability.


In [30]:
from IPython.display import Markdown, display
import uuid
from datetime import datetime

def display_nojdarbot(response: dict, mode: str = "detailed"):
    """
    Displays NojdarBot's response in a user-friendly format, with options for
    detailed or minimal output. Includes timestamp, session ID, confidence
    indicator, and handling for fallback responses.
    """
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    session_id = str(uuid.uuid4())[:8]
    eval_data = response.get("evaluation", {})
    confidence = eval_data.get("rating_value", 0) / 5.0

    # Determine confidence badge color
    if confidence >= 0.8:
        color = "#4CAF50"  # Green
    elif confidence >= 0.6:
        color = "#FF9800"  # Orange
    else:
        color = "#F44336"  # Red

    if mode == "minimal":
        # Minimal Display
        display(Markdown(f"""
### 🩺 **NojdarBot Response**

<div dir="rtl" style="text-align: right; font-family: 'Noto Naskh Arabic', 'Segoe UI', sans-serif; font-size: 16px; background-color: #eef9f0; padding: 12px; border-radius: 8px; border-left: 4px solid #4CAF50;">
{response.get("summary", "نەدۆزرایەوە.")}
</div>

<details>
<summary><strong>English Version</strong></summary>
<blockquote>{response.get("english_answer", "N/A")}</blockquote>
</details>

<small><span style="color: white; background-color: {color}; padding: 4px 8px; border-radius: 5px;">
Confidence: {eval_data.get("rating_value", "?")}/5 ({eval_data.get("rating_label", "N/A")})
</span></small>

<p style="color: grey; font-size: 12px;">Session ID: {session_id} | {timestamp}</p>
"""))
        return

    # Detailed Display
    separator = "\n\n"
    summary = response.get("summary") or ""

    # Fallback warning
    if summary.startswith("ببورە"):
        display(Markdown(f"""
> ⚠️ **No confident answer found**. NojdarBot returned a fallback response.
> Try rephrasing your question or ask a simpler one.
"""))

    # Main Answers
    display(Markdown("### 🩺 **NojdarBot Medical Assistant**"))

    # Web Search Grounding Notice
    if response.get("grounded_from_search"):
        display(Markdown("> 🌐 This answer was generated using real-time web search."))

    # English Answer block
    display(Markdown(f"""
#### **English Answer**
> {response.get('english_answer', 'N/A')}
"""))

    # Kurdish Answer
    display(Markdown(f"""
#### **Kurdish Answer**
<div dir="rtl" style="text-align: right; font-family: 'Noto Naskh Arabic', 'Segoe UI', sans-serif; font-size: 16px; background-color: #f9f9f9; padding: 10px; border-radius: 6px;">
{response.get('summary', 'N/A')}
</div>

<small><span style="color: white; background-color: {color}; padding: 4px 8px; border-radius: 5px;">
Confidence: {eval_data.get("rating_value", "?")}/5 ({eval_data.get("rating_label", "N/A")})
</span></small>
"""))

    # Context
    display(Markdown(f"""
#### **Context Used**
<pre style="background-color: #f7f7f7; padding: 10px; border-radius: 5px; white-space: pre-wrap;">
{separator.join(response.get("context_used", []))}
</pre>

#### **Detected Medical Terms**
<code>{', '.join(response.get('medical_terms', []))}</code>
"""))

    # Evaluation Details
    display(Markdown(f"""
#### **Evaluation Explanation**
<details>
<summary><strong>Show full evaluation (click to expand)</strong></summary>
<div style="padding: 10px; background-color: #f9f9f9; margin-top: 5px;">
{eval_data.get('details', 'N/A')}
</div>
</details>
"""))

    # Sources
    display(Markdown("#### **Context Sources**"))
    for i, item in enumerate(response.get("sources", []), 1):
        display(Markdown(f"""
<details>
<summary><strong>{i}. Question:</strong> {item.get('question', 'N/A')}</summary>
<div style="padding: 10px; background-color: #f9f9f9; margin-top: 5px;">
<strong>Answer:</strong> {item.get('answer', 'N/A')}
</div>
</details>
"""))

    # Footer
    display(Markdown(f"""
<p style="color: grey; font-size: 12px;">Session ID: {session_id} | Generated on: {timestamp}</p>
"""))

### 15.2 Kurdish Answer Display Section

The Kurdish answer block is displayed with specific considerations for right-to-left text:

1. **RTL Text Direction**: Uses `dir="rtl"` HTML attribute to ensure proper text flow
2. **Right Alignment**: Text aligned right for natural Kurdish reading experience
3. **Font Selection**: Prioritizes 'Noto Naskh Arabic' for best Kurdish character rendering
4. **Visual Styling**:
   - Light green background (`#eef9f0`) for better readability
   - 12px padding for comfortable text spacing
   - Rounded corners (8px border-radius) for modern UI appearance
   - Left border accent (4px solid green) for visual hierarchy
5. **Fallback Detection**: Special handling for fallback answers that start with "ببورە"

This section uses HTML within Markdown to achieve the necessary styling and text direction
control, which is essential for Kurdish (Sorani) text display in Jupyter environments.


In [31]:
FEEDBACK_LOG_PATH = "logs/feedback_log.jsonl"

if os.path.exists(FEEDBACK_LOG_PATH):
    with open(FEEDBACK_LOG_PATH, "r", encoding="utf-8") as f:
        feedback_log = [json.loads(line) for line in f.readlines()]
else:
    feedback_log = []

log function

In [32]:
def log_feedback(entry: dict, path: str = "logs/feedback_log.jsonl"):
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "a", encoding="utf-8") as f:
        f.write(json.dumps(entry, ensure_ascii=False) + "\n")


def collect_user_feedback(question, answer, context_list, structured_eval, confidence, verbose_eval, user_id="anon"):
    print("ئەم وەڵامەت پێ چۆن بوو؟")
    print("[1] زۆر خراپ   [2] خراپ   [3] خراپ نەبوو   [4] باش   [5] زۆر باش")

    try:
        choice = int(input("هەڵبژاردن (1–5): ").strip())
        if choice not in range(1, 6):
            raise ValueError
    except:
        print("⚠️ هەڵەیەک ڕوویدا.'٣' هەڵبژێردرا.")
        choice = 3  # Default to neutral


    timestamp = datetime.now().isoformat()
    entry = {
        "timestamp": timestamp,
        "user_id": user_id,
        "question": question,
        "answer": answer,
        "context": context_list,
        "gemini_rating": structured_eval.name,
        "confidence": confidence,
        "reasoning": verbose_eval,
        "user_rating_num": choice,
        "agent_version": "v1.0",  # in case your prompt changes later
        "mode": "notebook"
    }

    
    feedback_log.append(entry)
    log_feedback(entry)  # Save immediately to disk too
    print("✅ Feedback saved!")

##  Test Q

In [33]:
def interactive_nojdarbot_session():
    from IPython.display import display
    import uuid

    print("🩺 پرسیاری تەندروستی خۆت بە کوردی (سۆرانی) بنووسە.")
    print("بنووسه 'q' یان 'quit' بۆ تەواوکردنی وتووێژ.\n")

    last_response = None
    last_question = None

    while True:
        try:
            user_input = input("🧑‍⚕️ تۆ: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nSession interrupted. Proceeding to feedback.")
            break

        if user_input.lower() in {"q", "quit", "exit"}:
            print("\nGot it — let's collect feedback on the last answer.")
            break

        if not user_input:
            print("تکایە پرسیارێک بنووسە.")
            continue

        display_mode = ""
        while display_mode not in {"detailed", "minimal"}:
            display_mode = input("Please choose either 'detailed' if you want English context also or 'minimal' for Kurdish only.").strip().lower()
            if display_mode not in {"detailed", "minimal"}:
                print("Please choose either 'detailed' if you want English context also or 'minimal' for Kurdish only.")

        last_question = user_input
        state = {"user_input": user_input,
                "chat_history": []}
        final_state = nojdarbot_graph.invoke(state)

        last_response = {
            "summary": final_state.get("ai_response_ckb"),
            "english_answer": final_state.get("ai_response_en"),
            "urgency": "low",
            "medical_terms": final_state.get("medical_terms", []),
            "sources": final_state.get("retrieved_chunks", []),
            "context_used": [f"{r['question']}\n{r['answer']}" for r in final_state.get("retrieved_chunks", [])],
            "evaluation": {
                "rating_label": final_state.get("ai_rating_score"),
                "rating_value": int(final_state.get("ai_rating_confidence", 0.0) * 5),
                "details": final_state.get("ai_rating_text")
            },
            "ai_rating_confidence": final_state.get("ai_rating_confidence", 0.0),
            "ai_rating_text": final_state.get("ai_rating_text", "")
        }

        display_nojdarbot(last_response, mode=display_mode)

    # === Collect feedback after loop ends ===
    if last_response:
        try:
            score = last_response["evaluation"]["rating_value"]
            structured_enum = AnswerRating(str(score))
        except Exception:
            structured_enum = AnswerRating.OK

        collect_user_feedback(
            question=last_question,
            answer=last_response["english_answer"],
            context_list=last_response["context_used"],
            structured_eval=structured_enum,
            confidence=last_response["ai_rating_confidence"],
            verbose_eval=last_response["ai_rating_text"]
        )
    else:
        print("⚠️ No valid response was generated to collect feedback on.")

In [34]:
interactive_nojdarbot_session()

🩺 پرسیاری تەندروستی خۆت بە کوردی (سۆرانی) بنووسە.
بنووسه 'q' یان 'quit' بۆ تەواوکردنی وتووێژ.



🧑‍⚕️ تۆ:  ماوەی دوو هەفتەیە کۆکەیەکی وشکم هەیە و تام هەست پێناکەم. دەبێت چی بکەم؟
Please choose either 'detailed' if you want English context also or 'minimal' for Kurdish only. minimal



🧠 Chat History So Far:

Turn 1:
User: ماوەی دوو هەفتەیە کۆکەیەکی وشکم هەیە و تام هەست پێناکەم. دەبێت چی بکەم؟
Agent: A dry cough and loss of taste can be related to several things, including respiratory infections. Since you've been experiencing these symptoms for two weeks, it's important to see a doctor to determine the cause and receive appropriate treatment.  They will be able to evaluate your specific situation and advise you on the best course of action.



### 🩺 **NojdarBot Response**

<div dir="rtl" style="text-align: right; font-family: 'Noto Naskh Arabic', 'Segoe UI', sans-serif; font-size: 16px; background-color: #eef9f0; padding: 12px; border-radius: 8px; border-left: 4px solid #4CAF50;">
کۆکەی وشک و لەدەستدانی تام (Loss of taste) دەکرێت پەیوەندییان بە چەندین شتەوە هەبێت، لەوانە هەوکردنی کۆئەندامی هەناسە. چونکە ئەم نیشانانەت بۆ ماوەی دوو هەفتەیە هەیە، گرنگە سەردانی پزیشک بکەیت بۆ دیاریکردنی هۆکارەکەی و وەرگرتنی چارەسەری گونجاو. ئەوان دەتوانن دۆخەکەت بە وردی هەڵبسەنگێنن و باشترین ڕێگەی چارەسەرکردنت پێ بڵێن.
</div>

<details>
<summary><strong>English Version</strong></summary>
<blockquote>A dry cough and loss of taste can be related to several things, including respiratory infections. Since you've been experiencing these symptoms for two weeks, it's important to see a doctor to determine the cause and receive appropriate treatment.  They will be able to evaluate your specific situation and advise you on the best course of action.</blockquote>
</details>

<small><span style="color: white; background-color: #4CAF50; padding: 4px 8px; border-radius: 5px;">
Confidence: 4/5 (4)
</span></small>

<p style="color: grey; font-size: 12px;">Session ID: 394f3071 | 2025-04-21 06:47:50</p>


🧑‍⚕️ تۆ:  دەتوانی دەرمانێکم پێ بدەیت؟
Please choose either 'detailed' if you want English context also or 'minimal' for Kurdish only. minimal



🧠 Chat History So Far:

Turn 1:
User: دەتوانی دەرمانێکم پێ بدەیت؟
Agent: I cannot give you any medication. I am a medical assistant and am not authorized to prescribe or dispense medication.  You would need to speak with a doctor or other licensed prescriber.
Term extraction failed: Output could not be parsed as a list



### 🩺 **NojdarBot Response**

<div dir="rtl" style="text-align: right; font-family: 'Noto Naskh Arabic', 'Segoe UI', sans-serif; font-size: 16px; background-color: #eef9f0; padding: 12px; border-radius: 8px; border-left: 4px solid #4CAF50;">
من ناتوانم هیچ دەرمانێکت بدەم. من یاریدەدەری پزیشکم و ڕێگەپێدراو نیم دەرمان بنووسم یان دابەشی بکەم.  پێویستە قسە لەگەڵ پزیشکێک یان کەسێکی دیکەی مۆڵەتپێدراو بۆ نووسینی دەرمان بکەیت. (Min natwanm hech darmanêkt bdam. Mn yarīdaderī pizishkm u rêga pēdraw nēm darman bnwsm yan dabeshī bkam.  Pêwīsta qsa laghał pizīshkêk yan kesêkī dīkay mōłatpēdraw bō nwūsīnī darman bkayt.)
</div>

<details>
<summary><strong>English Version</strong></summary>
<blockquote>I cannot give you any medication. I am a medical assistant and am not authorized to prescribe or dispense medication.  You would need to speak with a doctor or other licensed prescriber.</blockquote>
</details>

<small><span style="color: white; background-color: #4CAF50; padding: 4px 8px; border-radius: 5px;">
Confidence: 5/5 (5)
</span></small>

<p style="color: grey; font-size: 12px;">Session ID: 4e71077e | 2025-04-21 06:49:19</p>


🧑‍⚕️ تۆ:  q



Got it — let's collect feedback on the last answer.
ئەم وەڵامەت پێ چۆن بوو؟
[1] زۆر خراپ   [2] خراپ   [3] خراپ نەبوو   [4] باش   [5] زۆر باش


هەڵبژاردن (1–5):  4


✅ Feedback saved!


In [35]:
import pandas as pd

pd.DataFrame(feedback_log)

Unnamed: 0,timestamp,user_id,question,answer,context,gemini_rating,confidence,reasoning,user_rating_num,agent_version,mode
0,2025-04-21T06:50:06.075659,anon,دەتوانی دەرمانێکم پێ بدەیت؟,I cannot give you any medication. I am a medic...,[Do you have information about Medicines\nSumm...,VERY_GOOD,1.0,Let me evaluate this response using the specif...,4,v1.0,notebook
