## Problem Statement

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
!nvidia-smi


Tue Jan 13 04:33:39 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   41C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.28 --force-reinstall --no-cache-dir -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m551.3/551.3 MB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [6]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community langchain-text-splitters chromadb sentence-transformers numpy -q 2>/dev/null || pip install huggingface_hub pandas tiktoken pymupdf langchain langchain-community langchain-text-splitters chromadb sentence-transformers numpy -q

**Note**:
- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [7]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [8]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [12]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file (if it exists)
load_dotenv()

# Get the Hugging Face token (optional for public models)
HF_TOKEN = os.getenv('HUGGINGFACE_TOKEN')

In [13]:
model_path = hf_hub_download(
    token = HF_TOKEN,
    repo_id=model_name_or_path,
    filename=model_basename
)

In [14]:
#uncomment the below snippet of code if the runtime is connected to GPU.
llm = Llama(
    model_path=model_path,
    n_ctx=2300,
    n_gpu_layers=38,
    n_batch=512
)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: Tesla T4, compute capability 7.5, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14992 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                  

#### Response

In [15]:
def response(query,max_tokens=1024,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      top_k=top_k
    )

    return model_output['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [16]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     234.26 ms /    16 tokens (   14.64 ms per token,    68.30 tokens per second)
llama_perf_context_print:        eval time =   26911.77 ms /   647 runs   (   41.59 ms per token,    24.04 tokens per second)
llama_perf_context_print:       total time =   27689.23 ms /   663 tokens
llama_perf_context_print:    graphs reused =        626


**Response:**



Sepsis is a life-threatening condition that can arise from an infection, and it requires prompt recognition and aggressive management in a critical care unit. The following are the general steps for managing sepsis in a critical care unit:

1. Early recognition and suspicion: Septic patients may present with non-specific symptoms such as fever, chills, tachycardia, tachypnea, altered mental status, and lactic acidosis. It is essential to have a high index of suspicion for sepsis, especially in patients with known infections or risk factors.
2. Initial assessment and resuscitation: The first step in managing sepsis is to assess and resuscitate the patient. This includes assessing airway, breathing, circulation, and disability (ABCD) and providing appropriate interventions such as oxygen therapy, fluid resuscitation, and vasopressor support as needed.
3. Source control: Identifying and addressing the source of infection is crucial in managing sepsis. This may involve surgical intervention, such as drainage of an abscess or debridement of necrotic tissue.
4. Antimicrobial therapy: Broad-spectrum antimicrobial therapy should be initiated as soon as possible in suspected sepsis cases. The choice of antibiotics depends on the suspected source of infection and local antibiogram data.
5. Fluid management: Fluid resuscitation is a critical component of sepsis management. Crystalloid solutions are the initial fluid of choice, and colloids may be used in cases of refractory shock.
6. Vasopressor support: In cases of septic shock, vasopressor support may be required to maintain adequate blood pressure and organ perfusion.
7. Corticosteroids: The use of corticosteroids in sepsis remains controversial. Some studies suggest that they may improve outcomes in certain subgroups of patients, such as those with refractory shock or ARDS.
8. Inotropes: Inotropes may be required to support cardiac output in cases of sepsis-induced cardiomyopathy.
9. Renal replacement therapy: In cases of severe sepsis or septic shock with renal dysfunction, renal replacement therapy may be necessary to maintain fluid and electrolyte balance and remove toxins.
10. Monitoring and follow-up: Close monitoring of vital signs, laboratory values, and clinical response to treatment is essential in managing sepsis. Follow-up care may involve ongoing antimicrobial therapy, source control, and supportive care as needed.

Overall, the management of sepsis in a critical care unit requires a multidisciplinary approach, including early recognition and suspicion, initial assessment and resuscitation, source control, antimicrobial therapy, fluid management, vasopressor support, corticosteroids, inotropes, renal replacement therapy, and close monitoring and follow-up care.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [17]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 2 prefix-match hit, remaining 32 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     156.77 ms /    32 tokens (    4.90 ms per token,   204.12 tokens per second)
llama_perf_context_print:        eval time =   17993.96 ms /   398 runs   (   45.21 ms per token,    22.12 tokens per second)
llama_perf_context_print:       total time =   18397.96 ms /   430 tokens
llama_perf_context_print:    graphs reused =        385


**Response:**



Appendicitis is a medical condition characterized by inflammation of the appendix, a small pouch-like structure that extends from the cecum, the first part of the large intestine. The symptoms of appendicitis can vary from person to person, but the following are the most common ones:

1. Abdominal pain: The pain is typically located in the lower right side of the abdomen, although it can also be felt in the middle or left side. The pain may start as a mild discomfort, but it can quickly become severe and constant.
2. Loss of appetite: People with appendicitis may lose their appetite and feel nauseous or vomit.
3. Fever: A fever of 100.4°F (38°C) or higher is common in appendicitis.
4. Abdominal swelling: The abdomen may become swollen and tender to the touch.
5. Diarrhea or constipation: Some people with appendicitis may experience diarrhea, while others may have constipation.

Appendicitis cannot be cured with medicine alone. Antibiotics may be used to treat any accompanying infection, but they will not resolve the inflammation of the appendix. Surgery is the only effective treatment for appendicitis. The most common surgical procedure used to treat appendicitis is an appendectomy, which involves removing the appendix. The appendix is usually removed through a small incision in the abdomen, although laparoscopic surgery may also be used in some cases.

It is important to seek medical attention as soon as possible if you suspect that you have appendicitis. Delaying treatment can increase the risk of complications, such as a ruptured appendix, which can be life-threatening.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [18]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 4 prefix-match hit, remaining 34 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     106.70 ms /    34 tokens (    3.14 ms per token,   318.64 tokens per second)
llama_perf_context_print:        eval time =   32576.87 ms /   658 runs   (   49.51 ms per token,    20.20 tokens per second)
llama_perf_context_print:       total time =   33231.72 ms /   692 tokens
llama_perf_context_print:    graphs reused =        637


**Response:**



Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects the hair follicles, leading to hair loss in small, round patches on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is not known, but it is believed to be related to a problem with the immune system.

There are several treatments that have been shown to be effective in addressing sudden patchy hair loss:

1. Corticosteroids: Corticosteroids are anti-inflammatory medications that can help reduce inflammation and suppress the immune system, allowing the hair follicles to regrow. They can be applied topically or taken orally.
2. Minoxidil: Minoxidil is a medication that has been shown to promote hair growth in some people with alopecia areata. It is applied topically to the affected area.
3. Diphenylcyclopropenone (DPCP): DPCP is a medication that is applied to the scalp to stimulate the immune system and promote hair regrowth. It is used in combination with corticosteroids.
4. Anthralin: Anthralin is a medication that is applied to the scalp to reduce inflammation and promote hair regrowth. It is used in combination with corticosteroids.
5. JAK inhibitors: JAK inhibitors are a new class of medications that have shown promise in treating alopecia areata. They work by suppressing the immune system and promoting hair growth.

The possible causes of sudden patchy hair loss include:

1. Genetics: Alpecia areata is more common in people with a family history of the condition.
2. Stress: Stress can trigger the onset of alopecia areata or worsen existing hair loss.
3. Viruses or infections: Certain viruses or infections may trigger the immune system to attack the hair follicles.
4. Autoimmune disorders: Alpecia areata is an autoimmune disorder, which means that the immune system attacks the body's own tissues.
5. Hormonal imbalances: Hormonal imbalances, such as those caused by thyroid disorders or pregnancy, can lead to sudden patchy hair loss.
6. Nutritional deficiencies: Deficiencies in certain nutrients, such as iron or vitamin D, can contribute to hair loss.
7. Medications: Certain medications, such as those used to treat arthritis, heart disease, or cancer, can cause hair loss as a side effect.
8. Trauma: Physical or emotional trauma can trigger the onset of alopecia areata or worsen existing hair loss.

It is important to note that the effectiveness of these treatments may vary from person to person, and some may have side effects. It is important to consult with a healthcare professional to determine the best treatment approach for your specific situation.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [19]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 2 prefix-match hit, remaining 28 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =      97.11 ms /    28 tokens (    3.47 ms per token,   288.32 tokens per second)
llama_perf_context_print:        eval time =   28746.22 ms /   508 runs   (   56.59 ms per token,    17.67 tokens per second)
llama_perf_context_print:       total time =   29215.16 ms /   536 tokens
llama_perf_context_print:    graphs reused =        491


**Response:**



A person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, is typically diagnosed with a traumatic brain injury (TBI). The treatment for a TBI depends on the severity and location of the injury, as well as the individual's overall health and age.

Immediate treatment for a TBI may include:

1. Emergency medical care: This may include surgery to remove hematomas or other obstructions, as well as treatment for any life-threatening conditions, such as airway obstruction or severe bleeding.
2. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as swelling, pain, or seizures.
3. Rehabilitation: Rehabilitation may include physical therapy, occupational therapy, speech therapy, and cognitive rehabilitation to help the person regain lost skills and functions.
4. Supportive care: This may include assistance with daily living activities, such as bathing, dressing, and feeding, as well as emotional and social support.

Long-term treatment for a TBI may include:

1. Medications: Depending on the symptoms, medications may be prescribed to manage conditions such as seizures, depression, or anxiety.
2. Rehabilitation: Rehabilitation may continue for an extended period to help the person regain as much function as possible.
3. Assistive devices: Assistive devices, such as wheelchairs, walkers, or communication aids, may be necessary to help the person function independently.
4. Vocational rehabilitation: Vocational rehabilitation may be necessary to help the person return to work or find a new job if they are unable to return to their previous employment.
5. Support groups: Joining a support group can provide emotional and social support for the person and their family.
6. Lifestyle modifications: Lifestyle modifications, such as getting enough rest, eating a healthy diet, and engaging in regular physical activity, can help improve overall health and well-being.

It is important to note that every person's recovery from a TBI is unique, and treatment plans may vary depending on the individual's specific needs and circumstances. Regular follow-up care with healthcare professionals is essential to ensure that the person receives the appropriate care and support throughout their recovery process.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [20]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 2 prefix-match hit, remaining 35 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     106.91 ms /    35 tokens (    3.05 ms per token,   327.37 tokens per second)
llama_perf_context_print:        eval time =   25166.46 ms /   463 runs   (   54.36 ms per token,    18.40 tokens per second)
llama_perf_context_print:       total time =   25592.05 ms /   498 tokens
llama_perf_context_print:    graphs reused =        448


**Response:**



First and foremost, if a person has fractured their leg during a hiking trip, it is essential to ensure their safety and prevent further injury. Here are some necessary precautions and treatment steps:

1. Assess the situation: Check the extent of the injury and assess the person's condition. If the fracture is open or the person is in severe pain, do not move them unless it is necessary to prevent further harm.
2. Call for help: If possible, call for emergency medical assistance. If there is no cell phone reception, try to find a way to signal for help, such as using a mirror to reflect sunlight or using a whistle to attract attention.
3. Provide first aid: If the person's condition is stable, provide first aid by immobilizing the leg with a splint or a makeshift sling. Use a clean cloth to apply pressure to any bleeding wounds and elevate the leg to reduce swelling.
4. Keep the person warm: Hypothermia can be a significant risk in cold environments, so keep the person warm by covering them with blankets or insulating materials.
5. Provide fluids: Dehydration can also be a concern, so provide the person with water or other fluids to keep them hydrated.
6. Transport the person: Once medical assistance arrives, help transport the person to the hospital for further evaluation and treatment.

During the person's care and recovery, consider the following factors:

1. Medical treatment: The person will likely need to undergo surgery to repair the fracture and may require ongoing medical care, such as physical therapy or pain management.
2. Rehabilitation: The person will need to undergo a rehabilitation program to regain strength and mobility in the affected leg.
3. Emotional support: The person may experience anxiety or depression during their recovery, so it is essential to provide emotional support and encouragement.
4. Return to hiking: Once the person has fully recovered, they may need to take precautions to prevent future injuries while hiking, such as wearing proper footwear and taking breaks to rest and hydrate.

## Question Answering using LLM with Prompt Engineering

In [21]:
system_prompt = """
You are a highly specialized medical information assistant with expertise in interpreting clinical references from the Merck Manual. Your role is to provide accurate, evidence-based medical information to healthcare professionals.

### Instructions:
1. **Context Source**: You will receive context from the Merck Manual, a trusted medical reference covering disorders, diagnostics, treatments, and pharmaceutical information. This context begins with the token: ###Context.

2. **Question Format**: User questions will begin with the token: ###Question.

3. **Response Guidelines**:
   - Provide precise, clinically accurate answers based ONLY on the provided context
   - Use proper medical terminology while maintaining clarity
   - Structure your response with clear sections when appropriate (e.g., Symptoms, Diagnosis, Treatment, Prognosis)
   - Include relevant dosages, procedures, or protocols when mentioned in the context
   - Distinguish between first-line and alternative treatments when applicable

4. **Accuracy Requirements**:
   - Do NOT hallucinate or infer information not present in the context
   - Do NOT provide personal medical advice or diagnoses
   - If the context contains partial information, clearly state what is available and what is missing
   - If the answer is not found in the context, respond: "The provided Merck Manual excerpt does not contain sufficient information to answer this question."

5. **Medical Disclaimer**: Always remember that responses are for informational purposes and should be verified by qualified healthcare professionals before clinical application.

Respond in a clear, professional manner suitable for healthcare practitioners.
"""

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [22]:
user_input = system_prompt + "\n\n\n" + "###Question: What is the protocol for managing sepsis in a critical care unit?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 1 prefix-match hit, remaining 379 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     478.30 ms /   379 tokens (    1.26 ms per token,   792.39 tokens per second)
llama_perf_context_print:        eval time =   37380.98 ms /   661 runs   (   56.55 ms per token,    17.68 tokens per second)
llama_perf_context_print:       total time =   38422.57 ms /  1040 tokens
llama_perf_context_print:    graphs reused =        639


**Response:**



###Context: Sepsis is a serious condition that arises from a dysregulated host response to infection. Early recognition and intervention are crucial to improve outcomes. The Merck Manual provides the following information on sepsis management in a critical care unit.

Symptoms:
- Fever or hypothermia
- Tachycardia or bradycardia
- Respiratory distress
- Altered mental status
- Decreased urine output

Diagnosis:
- Clinical suspicion based on symptoms and risk factors
- Laboratory confirmation: positive blood culture and elevated white blood cell count

Treatment:
- Fluid resuscitation: 30 mL/kg crystalloid bolus, repeat as needed
- Antibiotics: broad-spectrum coverage, adjust based on culture and sensitivity results
- Vasopressors: if needed for persistent hypotension
- Corticosteroids: for refractory septic shock
- Inotropes: for cardiac dysfunction

Prognosis:
- Mortality rate is high, especially in elderly or those with comorbidities
- Early recognition and intervention can improve outcomes

###Response:

In managing sepsis in a critical care unit, the following protocol is recommended based on the provided Merck Manual context:

1. **Recognition and Diagnosis**: Suspect sepsis based on symptoms such as fever or hypothermia, tachycardia or bradycardia, respiratory distress, altered mental status, and decreased urine output. Diagnosis is confirmed with positive blood culture and an elevated white blood cell count.

2. **Initial Management**: Begin with fluid resuscitation using 30 mL/kg crystalloid bolus, repeating as needed to maintain adequate tissue perfusion.

3. **Antimicrobial Therapy**: Initiate broad-spectrum antibiotics as soon as possible. Adjust the antibiotic regimen based on culture and sensitivity results.

4. **Circulatory Support**: If the patient remains hypotensive despite fluid resuscitation, administer vasopressors to maintain mean arterial pressure (MAP) ≥65 mm Hg.

5. **Corticosteroids**: Consider corticosteroids for refractory septic shock.

6. **Inotropic Support**: Provide inotropic agents for cardiac dysfunction.

7. **Close Monitoring**: Continuously monitor the patient's hemodynamic status, respiratory function, and organ dysfunction.

8. **Source Control**: Address the underlying infection source, if possible.

9. **Supportive Care**: Provide adequate nutrition, fluid, and electrolyte balance, and maintain adequate oxygenation and ventilation.

10. **Prognosis**: The mortality rate for sepsis is high, especially in elderly or those with comorbidities. Early recognition and intervention can significantly improve outcomes.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [23]:
user_input = system_prompt + "\n\n\n" + "###Question: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 366 prefix-match hit, remaining 32 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     101.75 ms /    32 tokens (    3.18 ms per token,   314.48 tokens per second)
llama_perf_context_print:        eval time =   26121.80 ms /   471 runs   (   55.46 ms per token,    18.03 tokens per second)
llama_perf_context_print:       total time =   26548.12 ms /   503 tokens
llama_perf_context_print:    graphs reused =        455


**Response:**



###Context: Appendicitis is an inflammatory condition of the vermiform appendix, a small, finger-like projection from the cecum. The condition is most commonly caused by an obstruction of the appendiceal lumen, usually by a fecal mass. The obstruction leads to bacterial overgrowth and subsequent inflammation.

Symptoms:
- Abdominal pain, usually localized in the right lower quadrant
- Nausea and vomiting
- Loss of appetite
- Fever
- Anorexia
- Constipation or diarrhea

If left untreated, the appendix may rupture, leading to peritonitis, a potentially life-threatening condition.

Treatment:
- Appendicitis is typically treated via surgical removal of the appendix (appendectomy).
- Antibiotics may be administered preoperatively to reduce the risk of peritonitis.
- In some cases, a laparoscopic appendectomy may be performed, which involves making small incisions and using a laparoscope to remove the appendix.
- Open appendectomy, a more traditional surgical procedure, may be necessary if the appendix has ruptured or if there are complications.

Prognosis:
- The prognosis for appendicitis is generally good with prompt treatment.
- Complications, such as peritonitis, can lead to serious health issues if not addressed promptly.

Therefore, the common symptoms for appendicitis include abdominal pain localized in the right lower quadrant, nausea and vomiting, loss of appetite, fever, anorexia, constipation or diarrhea. Appendicitis cannot be cured via medicine alone, and treatment typically involves surgical removal of the appendix (appendectomy). Antibiotics may be administered preoperatively to reduce the risk of peritonitis. Depending on the severity of the condition, a laparoscopic or open appendectomy may be performed. The prognosis for appendicitis is generally good with prompt treatment.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [24]:
user_input = system_prompt + "\n\n\n" + "###Question: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 368 prefix-match hit, remaining 34 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     110.56 ms /    34 tokens (    3.25 ms per token,   307.53 tokens per second)
llama_perf_context_print:        eval time =   27606.04 ms /   493 runs   (   56.00 ms per token,    17.86 tokens per second)
llama_perf_context_print:       total time =   28065.08 ms /   527 tokens
llama_perf_context_print:    graphs reused =        477


**Response:**



###Context: The Merck Manual provides information on various health conditions, including alopecia areata. Alopecia areata is a common autoimmune disorder that typically results in nonscarring hair loss. The condition is characterized by round or oval patches of baldness on the scalp, beard, or other areas of the body. The exact cause of alopecia areata is unknown, but it is believed to be an autoimmune reaction in which the body's immune system attacks the hair follicles. The condition may be triggered by stress, illness, or other factors. Treatment options for alopecia areata include topical corticosteroids, immunotherapy, and hair transplantation.

###Response:

#### Symptoms:
Sudden patchy hair loss, characterized by round or oval bald spots on the scalp, beard, or other areas of the body.

#### Causes:
The exact cause of alopecia areata is unknown, but it is believed to be an autoimmune reaction in which the body's immune system attacks the hair follicles. Triggers for this condition may include stress, illness, or other factors.

#### Treatments:
1. **Topical Corticosteroids**: These medications are applied directly to the affected area to reduce inflammation and promote hair regrowth. Commonly used topical corticosteroids include betamethasone dipropionate, clobetasol propionate, and fluocinonide.
2. **Immunotherapy**: Injections of certain substances, such as diphenylcyclopropenone (DPCP) or squaric acid dibutyl ester (SADBE), can help stimulate hair regrowth by altering the immune response.
3. **Hair Transplantation**: In severe cases, hair transplantation may be considered as a last resort. This procedure involves removing healthy hair follicles from one area of the body and transplanting them to the bald spot.

It is important to note that the effectiveness of these treatments may vary from person to person, and the best treatment option depends on the individual's specific condition and preferences. Consult a healthcare professional for a proper evaluation and treatment plan.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [25]:
user_input = system_prompt + "\n\n\n" + "###Question: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 366 prefix-match hit, remaining 28 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     100.78 ms /    28 tokens (    3.60 ms per token,   277.85 tokens per second)
llama_perf_context_print:        eval time =   24633.65 ms /   448 runs   (   54.99 ms per token,    18.19 tokens per second)
llama_perf_context_print:       total time =   25041.01 ms /   476 tokens
llama_perf_context_print:    graphs reused =        433


**Response:**



###Context: The Merck Manual provides information on various types of brain injuries, including contusions, lacerations, and penetrating injuries. These injuries can result in a range of symptoms, from mild to severe, including headache, dizziness, memory loss, and paralysis. Treatment for brain injuries typically involves supportive care, such as maintaining adequate fluid and nutritional intake, managing symptoms, and preventing complications. In some cases, surgical intervention may be necessary to remove hematomas or repair damaged tissue. Rehabilitation therapy, including physical, occupational, and speech therapy, may also be recommended to help patients regain lost function and improve overall quality of life. Medications, such as anti-inflammatory drugs and anticonvulsants, may be prescribed to manage symptoms and prevent complications.

###Response:

Given the context, a person with a brain injury, resulting in temporary or permanent impairment of brain function, would typically receive the following treatments:

1. **Supportive Care**: This includes maintaining adequate fluid and nutritional intake, managing symptoms, and preventing complications.

2. **Surgical Intervention**: In some cases, surgical intervention may be necessary to remove hematomas or repair damaged tissue.

3. **Rehabilitation Therapy**: Rehabilitation therapy, including physical, occupational, and speech therapy, may be recommended to help patients regain lost function and improve overall quality of life.

4. **Medications**: Medications, such as anti-inflammatory drugs and anticonvulsants, may be prescribed to manage symptoms and prevent complications.

First-line treatments would primarily focus on supportive care and addressing any immediate threats to the patient's health. Surgical intervention and rehabilitation therapy would be considered based on the severity and specifics of the injury.

It's important to note that individual treatment plans may vary depending on the specifics of the injury and the patient's overall health. Always consult with a qualified healthcare professional for personalized medical advice.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [26]:
user_input = system_prompt + "\n\n\n" + "###Question: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"
respstr = response(user_input)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Response:**\n\n{respstr}"))

Llama.generate: 366 prefix-match hit, remaining 35 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =     110.93 ms /    35 tokens (    3.17 ms per token,   315.51 tokens per second)
llama_perf_context_print:        eval time =   28129.46 ms /   494 runs   (   56.94 ms per token,    17.56 tokens per second)
llama_perf_context_print:       total time =   28593.59 ms /   529 tokens
llama_perf_context_print:    graphs reused =        478


**Response:**



###Context: The Merck Manual provides information on various types of bone fractures, including those of the leg. A fracture is defined as a break in a bone. Symptoms may include pain, swelling, deformity, inability to move the limb, and bruising. Treatment for a fractured leg typically involves immobilizing the affected limb with a cast or splint to allow the bone to heal properly. Surgery may be required in certain cases, such as when the bone is severely displaced or the skin is broken. In the case of a hiking injury, it is important to seek medical attention as soon as possible to prevent complications. The person should not bear weight on the affected leg until the fracture has healed. Proper care and recovery include maintaining a healthy diet, getting plenty of rest, and performing gentle exercises to prevent muscle atrophy. The Merck Manual also advises against smoking, as it can impair bone healing.

###Response:

Symptoms and Diagnosis:
- A fractured leg is characterized by symptoms such as pain, swelling, deformity, inability to move the limb, and bruising.
- If you suspect a leg fracture during a hiking trip, seek medical attention immediately.

Precautions and Initial Care:
- Do not bear weight on the affected leg to prevent further damage and complications.
- Apply a cold pack to reduce swelling and pain.
- Immobilize the leg using a cast or splint to promote proper healing.
- Seek medical attention for potential surgical intervention if the bone is severely displaced or the skin is broken.

Treatment:
- The primary treatment for a fractured leg is immobilization using a cast or splint.
- In some cases, surgery may be necessary to realign the bone and promote proper healing.

Care and Recovery:
- Maintain a healthy diet to support bone healing and overall health.
- Get plenty of rest to allow the body to focus on healing.
- Perform gentle exercises to prevent muscle atrophy and promote flexibility.
- Avoid smoking, as it can impair bone healing.
- Follow the healthcare professional's instructions for post-fracture care and rehabilitation.

## Data Preparation for RAG

### Loading the Data

In [27]:
# Option 1: Download from a public URL (GitHub, S3, etc.)
# Replace the URL below with your file's public URL
!wget -q "https://raw.githubusercontent.com/visubramaniam/AI-RAG-GENAI/main/data/medical_diagnosis_manual.pdf" -O medical_diagnosis_manual.pdf

In [28]:
pdf_loader = PyMuPDFLoader("medical_diagnosis_manual.pdf")

In [29]:
merck = pdf_loader.load()

### Data Overview

#### Checking the first 5 pages

In [30]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(merck[i].page_content,end="\n")

Page Number : 1
vsubrama@me.com
1QVXKHA3T6
meant for personal use by vsubrama@m
shing the contents in part or full is liable
Page Number : 2
vsubrama@me.com
1QVXKHA3T6
This file is meant for personal use by vsubrama@me.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Page Number : 3
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    ...........................................................................................................................................................................................
53
1 - Nutritional Di

#### Checking the number of pages

In [31]:
len(merck)

4114

### Data Chunking

In [33]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

In [34]:
#More context: chunk_size=800, chunk_overlap=80 — if responses seem incomplete
#Higher precision: chunk_size=256, chunk_overlap=30 — if retrieval returns too much irrelevant info
#Dense retrieval: chunk_size=1024, chunk_overlap=100 — for complex multi-step medical procedures

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,      # ~512 tokens per chunk - good for medical content context
    chunk_overlap=50     # ~10% overlap to maintain continuity between chunks
)

In [35]:
document_chunks = pdf_loader.load_and_split(text_splitter)

In [36]:
len(document_chunks)

8685

In [37]:
document_chunks[0].page_content

'vsubrama@me.com\n1QVXKHA3T6\nmeant for personal use by vsubrama@m\nshing the contents in part or full is liable'

In [38]:
document_chunks[1].page_content

'vsubrama@me.com\n1QVXKHA3T6\nThis file is meant for personal use by vsubrama@me.com only.\nSharing or publishing the contents in part or full is liable for legal action.'

In [39]:
document_chunks[2].page_content

'Table of Contents\n1\nFront    ................................................................................................................................................................................................................\n1\nCover    .......................................................................................................................................................................................................\n2\nFront Matter    ...........................................................................................................................................................................................\n53\n1 - Nutritional Disorders    ...............................................................................................................................................................\n53\nChapter 1. Nutrition: General Considerations    ...........................................................................................

### Embedding

In [40]:
# Initialize the embedding model for semantic search
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

  embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [41]:
embedding_1 = embedding_model.embed_query(document_chunks[0].page_content)
embedding_2 = embedding_model.embed_query(document_chunks[1].page_content)

In [42]:
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  384


True

In [43]:
embedding_1,embedding_2

([-0.06938155740499496,
  0.028927495703101158,
  -0.012014498934149742,
  -0.03177807852625847,
  0.07338891923427582,
  0.0018049993086606264,
  0.008893112652003765,
  0.09870372712612152,
  0.009720373898744583,
  -0.04362768679857254,
  0.08152824640274048,
  0.03121708519756794,
  -0.017092159017920494,
  -0.03760414198040962,
  -0.02888396754860878,
  -0.04297143593430519,
  0.033611614257097244,
  -0.018414007499814034,
  -0.014989626593887806,
  0.03954098001122475,
  -0.05121440440416336,
  0.02495284378528595,
  -0.0438384972512722,
  0.04766015335917473,
  0.026850096881389618,
  0.010950648225843906,
  -0.039429083466529846,
  0.07888826727867126,
  0.05787385255098343,
  -0.08338332176208496,
  0.01968211494386196,
  0.01839032582938671,
  0.024355372413992882,
  -0.01856439746916294,
  0.028145698830485344,
  0.0561881922185421,
  -0.029818791896104813,
  -0.09858214110136032,
  -0.08176635205745697,
  -0.06594885885715485,
  -0.05061347037553787,
  0.000738105911295861,

### Vector Database

In [44]:
out_dir = 'medical_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)

In [45]:
vectorstore = Chroma.from_documents(
    document_chunks, # Pass the document chunks
    embedding_model, # Pass the embedding model
    persist_directory=out_dir
)

In [46]:
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

  vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


In [47]:
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [49]:
# Test similarity search with a sample medical query
vectorstore.similarity_search("What is the protocol for managing sepsis in a critical care unit?", k=3)

[Document(metadata={'creationDate': 'D:20120615054440Z', 'trapped': '', 'total_pages': 4114, 'source': 'medical_diagnosis_manual.pdf', 'creator': 'Atop CHM to PDF Converter', 'moddate': '2026-01-12T13:48:10+00:00', 'subject': '', 'keywords': '', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'page': 2400, 'file_path': 'medical_diagnosis_manual.pdf', 'creationdate': '2012-06-15T05:44:40+00:00', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'author': '', 'format': 'PDF 1.7', 'modDate': 'D:20260112134810Z'}, page_content="16 - Critical Care Medicine\nChapter 222. Approach to the Critically Ill Patient\nIntroduction\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\nnurse:patient ratio to provide the 

### Retriever

In [51]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 3}  # Retrieve top 3 most relevant document chunks
)

### System and User Prompt Template

In [56]:
# System message describing the assistant's role
qna_system_message = """You are a highly specialized medical information assistant with expertise in clinical references from the Merck Manual. Your role is to provide accurate, evidence-based medical information to healthcare professionals.

Guidelines:
- Provide precise, clinically accurate answers based ONLY on the provided context
- Use proper medical terminology while maintaining clarity
- Structure responses with clear sections (Symptoms, Diagnosis, Treatment) when appropriate
- Include relevant dosages, procedures, or protocols when mentioned in the context
- If the answer is not found in the context, state: "The provided context does not contain sufficient information to answer this question."
- Do NOT hallucinate or infer information not present in the context
- Responses are for informational purposes and should be verified by qualified healthcare professionals
"""

# User message template with placeholders for context and question
qna_user_message_template = """###Context:
{context}

###Question:
{question}

Please provide a comprehensive answer based on the context above."""

### Response Function

In [57]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks using invoke() (new LangChain API)
    relevant_document_chunks = retriever.invoke(user_input)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [58]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
rag_response = generate_rag_response(user_input, k=3, max_tokens=512, top_k=20)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 1 prefix-match hit, remaining 1932 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2590.53 ms /  1932 tokens (    1.34 ms per token,   745.79 tokens per second)
llama_perf_context_print:        eval time =   22981.39 ms /   369 runs   (   62.28 ms per token,    16.06 tokens per second)
llama_perf_context_print:       total time =   25799.43 ms /  2301 tokens
llama_perf_context_print:    graphs reused =        357


**RAG Response:**

###Answer:
In the context provided, sepsis is suspected based on symptoms such as shaking chills, persistent fever, altered sensorium, hypotension, and GI symptoms (abdominal pain, nausea, vomiting, diarrhea). If sepsis or septic shock is suspected, the following protocol should be followed:

1. Obtain cultures: Cultures should be obtained of blood and any other appropriate specimens.
2. Initiate empiric antibiotics: Empiric antibiotics should be given after appropriate cultures are obtained. Early treatment with an appropriate antimicrobial regimen appears to improve survival.
3. Adjust antibiotics: Antibiotics should be adjusted according to the results of culture and susceptibility testing.
4. Surgical intervention: Any abscesses should be surgically drained, and internal devices that are the suspected source of bacteria should be removed.
5. Supportive care: Fluids, antipyretics, analgesics, and oxygen should be provided for patients with hypoxemia.

It is important to note that the context does not provide specific information on the choice of antibiotics or dosages. Consulting local guidelines or professional organizations, such as the Infectious Diseases Society of America (IDSA), is recommended for antibiotic selection and dosing.

Additionally, the context mentions the potential use of biological warfare agents, including anthrax, which can cause sepsis. In the event of a biological attack, it is crucial to follow guidelines from public health authorities and local, state, and federal agencies for proper identification, containment, and treatment of affected individuals.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [59]:
user_input = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
rag_response = generate_rag_response(user_input, k=3, max_tokens=512, top_k=20)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1539 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1986.93 ms /  1539 tokens (    1.29 ms per token,   774.56 tokens per second)
llama_perf_context_print:        eval time =   27859.15 ms /   484 runs   (   57.56 ms per token,    17.37 tokens per second)
llama_perf_context_print:       total time =   30191.57 ms /  2023 tokens
llama_perf_context_print:    graphs reused =        468


**RAG Response:**

###Answer:
The common symptoms for appendicitis, as described in the context, include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia, which is later followed by pain shifting to the right lower quadrant. The pain increases with cough and motion, and there may be direct and rebound tenderness located at McBurney's point. Additional signs include pain felt in the right lower quadrant with palpation of the left lower quadrant (Rovsing sign), pain caused by passive internal rotation of the flexed thigh (obturator sign), or pain felt with passive extension of the right hip joint that stretches the iliopsoas muscle (psoas sign). A low-grade fever is also common. However, these classic findings appear in less than 50% of patients, and many variations of symptoms and signs occur.

Appendicitis cannot be cured via medicine alone. The standard treatment for appendicitis is surgical removal of the appendix, which is called appendectomy. This procedure is typically performed as an open or laparoscopic appendectomy. The delay in treatment increases the likelihood of perforation and subsequent complications, so a negative appendectomy rate of 15% is considered acceptable. If the appendix is perforated, antibiotics should be continued until the patient's temperature and WBC count have normalized or for a fixed course, according to the surgeon's preference. If surgery is impossible, antibiotics can improve the survival rate but are not curative. In cases where a large inflammatory mass is found involving the appendix, terminal ileum, and cecum, resection of the entire mass and ileocolostomy may be preferable. In late cases where a pericolic abscess has already formed, the abscess is drained either by an ultrasound-guided percutaneous catheter or by open operation (with appendectomy to follow at a later date). A Meckel's diverticulum in a patient under the age of 40 should be removed concomitantly with the appendectomy unless extensive inflammation around the appendix prevents the procedure.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [60]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
rag_response = generate_rag_response(user_input, k=3, max_tokens=512, top_k=20)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1314 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1743.17 ms /  1314 tokens (    1.33 ms per token,   753.80 tokens per second)
llama_perf_context_print:        eval time =   29794.10 ms /   511 runs   (   58.31 ms per token,    17.15 tokens per second)
llama_perf_context_print:       total time =   31943.69 ms /  1825 tokens
llama_perf_context_print:    graphs reused =        494


**RAG Response:**

###Answer:
Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects genetically susceptible individuals, often resulting in circular or oval bald spots on the scalp or other hair-bearing areas. The exact cause of alopecia areata is unclear, but it is believed to be triggered by environmental factors in individuals with a predisposition.

The treatment options for alopecia areata depend on the severity and extent of hair loss. For mild cases, topical treatments such as:

1. Corticosteroids: These can be applied directly to the affected area to reduce inflammation and stimulate hair growth.
2. Minoxidil: A topical medication that can help promote hair growth and slow down hair loss.
3. Anthralin: A topical medication that can help reduce inflammation and promote hair regrowth.
4. Immunotherapy: Diphencyprone or squaric acid dibutylester can be used to stimulate an immune response and promote hair regrowth.
5. Psoralen plus ultraviolet A (PUVA): A combination of a photosensitizing agent and ultraviolet A light, which can help slow down hair loss and promote regrowth.

For more severe cases, systemic treatments may be required, such as:

1. Corticosteroids: Oral or intralesional corticosteroids can be used to reduce inflammation and promote hair regrowth.
2. Antimalarials: Oral antimalarials, such as hydroxychloroquine or quinacrine, can be used to suppress the immune system and reduce inflammation.
3. Immunosuppressants: Systemic immunosuppressants, such as cyclosporine or methotrexate, can be used to suppress the immune system and reduce inflammation.

It is important to note that the effectiveness of these treatments can vary from person to person, and some may have side effects. Additionally, hair regrowth may take several months to become noticeable.

Other causes of sudden patchy hair loss include infections, such as tinea capitis, and traction alopecia, which is caused by physical stress on the hair

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [61]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
rag_response = generate_rag_response(user_input, k=3, max_tokens=512, top_k=20)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1142 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1472.06 ms /  1142 tokens (    1.29 ms per token,   775.78 tokens per second)
llama_perf_context_print:        eval time =   29097.41 ms /   511 runs   (   56.94 ms per token,    17.56 tokens per second)
llama_perf_context_print:       total time =   30946.62 ms /  1653 tokens
llama_perf_context_print:    graphs reused =        494


**RAG Response:**

###Answer:
The context discusses the management of traumatic brain injury (TBI) and spinal cord injury, focusing on the importance of early intervention by rehabilitation specialists for maximal functional recovery. For a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, the following treatments are recommended based on the information provided:

1. Rehabilitation: Early intervention by rehabilitation specialists is crucial for patients with brain injuries. Rehabilitation services should be planned as early as possible and include a team approach that combines physical, occupational, and speech therapy, skill-building activities, and counseling to meet the patient's social and emotional needs. The goal is to prevent secondary disabilities, such as pressure ulcers, joint contractures, and pneumonia, and to help patients regain cognitive and emotional functioning.

2. Prevention of complications: During the acute phase, daily routine care is necessary to prevent complications, such as contractures, pressure ulcers, and pneumonia. Measures to prevent other complications, such as orthostatic hypotension, atelectasis, deep venous thrombosis, and pulmonary embolism, should also be taken.

3. Imaging studies: Brain imaging, such as CT or MRI, is required to diagnose and characterize central lesions, including infarcts, hemorrhages, masses, and focal atrophy.

4. Physical and occupational therapy: There is no specific medical treatment for brain injuries. Physical and occupational therapy may modestly improve functioning but is more often useful for making the environment safer and for providing devices that help patients circumvent the primary deficit.

5. Family education: Family education is an essential part of the rehabilitation process.

For spinal cord injuries, the treatment varies depending on the level and extent of the injury. Complete transsection causes flaccid paralysis, while partial transsection causes spastic paralysis of muscles innervated by the affected segment. The affected area must be immobilized surgically or nonsurgically as soon as possible and throughout the acute phase to prevent complications.

There is no specific medical treatment for the cognitive dysfunction that often accompanies brain injuries. Drugs that slow the symptomatic progression of dementia do not

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [62]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
rag_response = generate_rag_response(user_input, k=3, max_tokens=512, top_k=20)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 1317 prefix-match hit, remaining 1 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =   29997.56 ms /   512 runs   (   58.59 ms per token,    17.07 tokens per second)
llama_perf_context_print:       total time =   30379.70 ms /   513 tokens
llama_perf_context_print:    graphs reused =        495


**RAG Response:**

###Answer:
The context discusses the management of traumatic brain injury (TBI) and spinal cord injury, focusing on the importance of early intervention by rehabilitation specialists for maximal functional recovery. For a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, the following treatments are recommended based on the information provided in the context:

1. Rehabilitation: Early intervention by rehabilitation specialists is crucial for patients with brain injury. Rehabilitation services should be planned as early as possible and include a team approach that combines physical, occupational, and speech therapy, skill-building activities, and counseling to meet the patient's social and emotional needs. The goal is to prevent secondary disabilities, such as pressure ulcers, joint contractures, and pneumonia, and to help patients regain cognitive and emotional functioning.

2. Prevention of complications: During the acute phase, daily routine care is necessary to prevent complications, such as contractures, pressure ulcers, and pneumonia. Measures to prevent other complications, such as orthostatic hypotension, atelectasis, deep venous thrombosis, and pulmonary embolism, should also be taken.

3. Imaging studies: Brain imaging, such as CT or MRI, is required to diagnose and characterize central lesions, including infarcts, hemorrhages, masses, and focal atrophy.

4. Physical and occupational therapy: There is no specific medical treatment for brain injury. Physical and occupational therapy may modestly improve functioning but is more often useful for making the environment safer and for providing devices that help patients circumvent the primary deficit.

5. Family education: Family education is an essential part of the rehabilitation process.

For spinal cord injury, the treatment varies depending on the level and extent of the injury. Complete transsection causes flaccid paralysis, while partial transsection causes spastic paralysis of muscles innervated by the affected segment. The affected area must be immobilized surgically or nonsurgically as soon as possible and throughout the acute phase to prevent complications.

There is no specific medical treatment for the cognitive dysfunction that often accompanies brain injury. Drugs that slow the symptomatic progression of dement

### Fine-tuning

In [63]:
user_input = "What is the protocol for managing sepsis in a critical care unit?"
rag_response = generate_rag_response(user_input,temperature=0.5)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))


Llama.generate: 176 prefix-match hit, remaining 1757 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2277.53 ms /  1757 tokens (    1.30 ms per token,   771.45 tokens per second)
llama_perf_context_print:        eval time =    7092.29 ms /   127 runs   (   55.84 ms per token,    17.91 tokens per second)
llama_perf_context_print:       total time =    9431.35 ms /  1884 tokens
llama_perf_context_print:    graphs reused =        122


**RAG Response:**

###Answer:
In the context provided, sepsis is suspected based on symptoms such as shaking chills, persistent fever, altered sensorium, hypotension, and GI symptoms. The management of sepsis in a critical care unit involves several steps:

1. Diagnosis:
If sepsis is suspected, cultures are obtained of blood and any other appropriate specimens. Empiric antibiotics are given after appropriate cultures are obtained to improve survival.

2. Treatment:
- Antibiotics: Early treatment with an appropriate antimicrobial reg

In [64]:
user_input = " What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"
rag_response = generate_rag_response(user_input,temperature=0.5)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1539 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2061.37 ms /  1539 tokens (    1.34 ms per token,   746.59 tokens per second)
llama_perf_context_print:        eval time =    7614.76 ms /   127 runs   (   59.96 ms per token,    16.68 tokens per second)
llama_perf_context_print:       total time =    9737.11 ms /  1666 tokens
llama_perf_context_print:    graphs reused =        122


**RAG Response:**

Answer:
The common symptoms for appendicitis, as described in the context, include epigastric or periumbilical pain followed by brief nausea, vomiting, and anorexia, which is later followed by right lower quadrant pain that increases with cough and motion. Classic signs are right lower quadrant direct and rebound tenderness located at McBurney's point. Other signs include pain with palpation of the left lower quadrant (Rovsing sign), pain with passive extension of the right hip joint (psoas sign), or pain caused by passive internal

In [65]:
user_input = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"
rag_response = generate_rag_response(user_input,temperature=0.5)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1314 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1741.94 ms /  1314 tokens (    1.33 ms per token,   754.33 tokens per second)
llama_perf_context_print:        eval time =    7267.58 ms /   127 runs   (   57.23 ms per token,    17.47 tokens per second)
llama_perf_context_print:       total time =    9068.16 ms /  1441 tokens
llama_perf_context_print:    graphs reused =        122


**RAG Response:**

Answer:

Sudden patchy hair loss, also known as alopecia areata, is a common autoimmune disorder that affects genetically susceptible individuals, often triggered by unclear environmental factors (Merck Manual, 2019). The scalp and beard are the most commonly affected areas, but any hairy part of the body can be involved. This condition can range from a few bald patches to total hair loss (alopecia universalis).

The primary goal in managing alopecia areata is to stimulate hair growth. Several treatment options are

In [66]:
user_input = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"
rag_response = generate_rag_response(user_input,temperature=0.5)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1142 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1476.20 ms /  1142 tokens (    1.29 ms per token,   773.61 tokens per second)
llama_perf_context_print:        eval time =    7381.40 ms /   127 runs   (   58.12 ms per token,    17.21 tokens per second)
llama_perf_context_print:       total time =    8918.33 ms /  1269 tokens
llama_perf_context_print:    graphs reused =        122


**RAG Response:**

###Answer:
The context discusses the management of traumatic brain injury (TBI) and spinal cord injury, focusing on the importance of early intervention by rehabilitation specialists for maximal functional recovery. For a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function, the following treatments are recommended:

1. Rehabilitation: Early intervention by rehabilitation specialists is essential for patients with TBI to establish baseline findings and prioritize treatment. Rehabilitation services should include prevention of secondary disabilities, such as pressure ulcers, joint contract

In [67]:
user_input = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery??"
rag_response = generate_rag_response(user_input,temperature=0.5)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**RAG Response:**\n\n{rag_response}"))

Llama.generate: 176 prefix-match hit, remaining 1561 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2059.52 ms /  1561 tokens (    1.32 ms per token,   757.94 tokens per second)
llama_perf_context_print:        eval time =    7541.79 ms /   127 runs   (   59.38 ms per token,    16.84 tokens per second)
llama_perf_context_print:       total time =    9661.32 ms /  1688 tokens
llama_perf_context_print:    graphs reused =        122


**RAG Response:**

Answer:

Symptoms:
- Leg fracture

Diagnosis:
- Based on the context, a leg fracture is diagnosed through physical examination and possibly imaging studies (x-rays).

Treatment:

1. Initial Management:
   - Hemorrhagic shock, if present, is treated in the emergency department.
   - Immobilize the affected area to prevent further injury and decrease pain. This can be done using a splint, a sling, or, in severe cases, surgical hardware like pins, screws,

## Output Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

- We are using the same Mistral model for evaluation, so basically here the llm is rating itself on how well he has performed in the task.

In [71]:
groundedness_rater_system_message = """You are an expert evaluator assessing the groundedness of AI-generated medical responses. Your task is to determine whether the answer is fully supported by the provided context.

### Evaluation Criteria:
- **Groundedness**: The answer should ONLY contain information that is explicitly stated or directly inferable from the provided context.
- An answer is considered "grounded" if every claim, fact, or recommendation can be traced back to the context.
- An answer is "not grounded" if it contains hallucinations, unsupported claims, or information not present in the context.

### Rating Scale (1-5):
1 - Not Grounded: The answer contains significant information not found in the context (hallucinations)
2 - Poorly Grounded: Most claims are unsupported by the context
3 - Partially Grounded: Some claims are supported, but key information is fabricated
4 - Mostly Grounded: Nearly all information comes from the context with minor unsupported details
5 - Fully Grounded: Every statement in the answer is directly supported by the context

### Instructions:
1. Carefully read the context, question, and answer
2. Identify each claim or fact in the answer
3. Verify if each claim is present in the context
4. Provide your rating and a brief justification

Respond in the following format:
**Rating**: [1-5]
**Justification**: [Brief explanation of your rating]
"""

In [72]:
relevance_rater_system_message = """You are an expert evaluator assessing the relevance of AI-generated medical responses. Your task is to determine whether the answer appropriately addresses the user's question.

### Evaluation Criteria:
- **Relevance**: The answer should directly address what the user is asking about.
- A relevant answer focuses on the specific medical topic, symptoms, treatments, or protocols mentioned in the question.
- An irrelevant answer may discuss unrelated topics, provide off-topic information, or fail to address the core question.

### Rating Scale (1-5):
1 - Not Relevant: The answer does not address the question at all
2 - Slightly Relevant: The answer touches on the topic but misses the main question
3 - Partially Relevant: The answer addresses some aspects but omits key parts of the question
4 - Mostly Relevant: The answer addresses the question well with minor omissions
5 - Fully Relevant: The answer comprehensively and directly addresses all aspects of the question

### Instructions:
1. Carefully read the question and the answer
2. Identify the key aspects the question is asking about
3. Evaluate how well the answer addresses each aspect
4. Provide your rating and a brief justification

Respond in the following format:
**Rating**: [1-5]
**Justification**: [Brief explanation of your rating]
"""

In [73]:
user_message_template = """###Context:
{context}

###Question:
{question}

###Answer:
{answer}

Please evaluate the above answer based on the provided context and question."""

In [74]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks using invoke() (new LangChain API)
    relevant_document_chunks = retriever.invoke(user_input)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [76]:
ground,rel = generate_ground_relevance_response(user_input="What is the protocol for managing sepsis in a critical care unit?",max_tokens=150)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Groundedness Evaluation:**\n\n{ground}"))

display(Markdown(f"**Relevance Evaluation:**\n\n{rel}"))

Llama.generate: 1947 prefix-match hit, remaining 1 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =    8071.83 ms /   150 runs   (   53.81 ms per token,    18.58 tokens per second)
llama_perf_context_print:       total time =    8144.48 ms /   151 tokens
llama_perf_context_print:    graphs reused =        144
Llama.generate: 6 prefix-match hit, remaining 2253 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2899.80 ms /  2253 tokens (    1.29 ms per token,   776.95 tokens per second)
llama_perf_context_print:        eval time =    2568.74 ms /    44 runs   (   58.38 ms per token,    17.13 tokens per second)
llama_perf_context_print:       total time =    5486.80 ms /  2297 tokens
llama_perf_context_print:   

**Groundedness Evaluation:**

 **Rating**: 5
**Justification**: The answer is fully grounded in the context. It accurately identifies the symptoms of sepsis and the diagnostic steps, which include obtaining cultures and suspecting bacterem

**Relevance Evaluation:**

 **Rating**: 5
**Justification**: The answer directly addresses the question by outlining the steps involved in managing sepsis in a critical care unit, which includes suspicion and diagnosis, initial assessment, and treatment. The answer references the context by mentioning symptoms and the importance of prompt identification and

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [77]:
ground,rel = generate_ground_relevance_response(user_input="What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?",max_tokens=150)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Groundedness Evaluation:**\n\n{ground}"))

display(Markdown(f"**Relevance Evaluation:**\n\n{rel}"))

Llama.generate: 6 prefix-match hit, remaining 1724 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2216.69 ms /  1724 tokens (    1.29 ms per token,   777.74 tokens per second)
llama_perf_context_print:        eval time =    8772.38 ms /   149 runs   (   58.88 ms per token,    16.99 tokens per second)
llama_perf_context_print:       total time =   11061.45 ms /  1873 tokens
llama_perf_context_print:    graphs reused =        144
Llama.generate: 6 prefix-match hit, remaining 2035 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2690.53 ms /  2035 tokens (    1.32 ms per token,   756.36 tokens per second)
llama_perf_context_print:        eval time =    3889.14 ms /    62 runs   (   62.73 ms per token,    15.94 tokens per second)
llama_perf_context_print:       total time =    6606.20 ms /  2097 tokens
llama_perf_context_print:   

**Groundedness Evaluation:**

 **Rating**: 5
**Justification**: The answer accurately and completely summarizes the common symptoms of appendicitis as described in the context. It also clarifies that appendicitis cannot be cured via medicine and requires surgical removal. The answer is fully grounded in the provided context.

**Relevance Evaluation:**

 **Rating**: 5
**Justification**: The answer directly addresses the user's question by first listing the common symptoms for appendicitis, which aligns with the information provided in the context. The answer then clarifies that appendicitis cannot be cured via medicine and requires surgical treatment, which is also consistent with the context. The answer also mentions the surgical procedure, which is appendectomy. The answer is comprehensive and directly addresses all aspects of the question.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [78]:
ground,rel = generate_ground_relevance_response(user_input="What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?",max_tokens=150)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Groundedness Evaluation:**\n\n{ground}"))

display(Markdown(f"**Relevance Evaluation:**\n\n{rel}"))

Llama.generate: 6 prefix-match hit, remaining 1499 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1886.06 ms /  1499 tokens (    1.26 ms per token,   794.78 tokens per second)
llama_perf_context_print:        eval time =    7947.61 ms /   149 runs   (   53.34 ms per token,    18.75 tokens per second)
llama_perf_context_print:       total time =    9905.72 ms /  1648 tokens
llama_perf_context_print:    graphs reused =        144
Llama.generate: 6 prefix-match hit, remaining 1810 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2306.94 ms /  1810 tokens (    1.27 ms per token,   784.59 tokens per second)
llama_perf_context_print:        eval time =    4762.49 ms /    84 runs   (   56.70 ms per token,    17.64 tokens per second)
llama_perf_context_print:       total time =    7106.24 ms /  1894 tokens
llama_perf_context_print:   

**Groundedness Evaluation:**

 **Rating**: 5
**Justification**: The answer is fully grounded as it only contains information directly from the context. The answer correctly identifies alopecia areata as the cause of sudden patchy hair loss and lists the effective treatments mentioned in the context, which include topical and intralesional corticosteroids, topical minoxidil, and topical immunotherapy.

**Relevance Evaluation:**

 **Rating**: 5
**Justification**: The answer directly addresses the question by providing effective treatments for sudden patchy hair loss, which is a common symptom of alopecia areata. The answer also mentions the possible cause of alopecia areata and explains that it is an autoimmune disorder. The treatments listed in the answer, such as topical and intralesional corticosteroids, topical minoxidil, and topical immunotherapy, are all mentioned in the context as potential treatments for alopecia areata. Therefore, the answer is fully relevant to the question.

### Query 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [79]:
ground,rel = generate_ground_relevance_response(user_input="What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?",max_tokens=150)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Groundedness Evaluation:**\n\n{ground}"))

display(Markdown(f"**Relevance Evaluation:**\n\n{rel}"))

Llama.generate: 6 prefix-match hit, remaining 1327 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    1714.80 ms /  1327 tokens (    1.29 ms per token,   773.85 tokens per second)
llama_perf_context_print:        eval time =    9104.06 ms /   149 runs   (   61.10 ms per token,    16.37 tokens per second)
llama_perf_context_print:       total time =   10893.26 ms /  1476 tokens
llama_perf_context_print:    graphs reused =        143
Llama.generate: 6 prefix-match hit, remaining 1638 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2137.40 ms /  1638 tokens (    1.30 ms per token,   766.35 tokens per second)
llama_perf_context_print:        eval time =    4555.44 ms /    76 runs   (   59.94 ms per token,    16.68 tokens per second)
llama_perf_context_print:       total time =    6726.69 ms /  1714 tokens
llama_perf_context_print:   

**Groundedness Evaluation:**

 **Rating**: 5
**Justification**: The answer is fully grounded as it only contains information directly from the context regarding the recommended treatments for a person with a brain injury. The answer mentions early intervention by rehabilitation specialists, prevention of secondary disabilities, cognitive therapy, and physical and occupational therapy, all of which are explicitly stated in the context.

**Relevance Evaluation:**

 **Rating**: 5
**Justification**: The answer directly addresses the question by mentioning the recommended treatments for a person with a physical injury to brain tissue, which includes early intervention by rehabilitation specialists, prevention of secondary disabilities, and physical and occupational therapy. The answer also provides additional details about the specific aspects of rehabilitation and the importance of cognitive therapy for patients with severe cognitive dysfunction. The context provided in the question is also referenced in the answer, ensuring relevance.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [80]:
ground,rel = generate_ground_relevance_response(user_input="What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?",max_tokens=150)

# Display the response as formatted markdown
from IPython.display import display, Markdown
display(Markdown(f"**Groundedness Evaluation:**\n\n{ground}"))

display(Markdown(f"**Relevance Evaluation:**\n\n{rel}"))

Llama.generate: 6 prefix-match hit, remaining 1746 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2200.02 ms /  1746 tokens (    1.26 ms per token,   793.63 tokens per second)
llama_perf_context_print:        eval time =    8251.80 ms /   149 runs   (   55.38 ms per token,    18.06 tokens per second)
llama_perf_context_print:       total time =   10523.22 ms /  1895 tokens
llama_perf_context_print:    graphs reused =        143
Llama.generate: 6 prefix-match hit, remaining 2057 prompt tokens to eval
llama_perf_context_print:        load time =     234.54 ms
llama_perf_context_print: prompt eval time =    2631.18 ms /  2057 tokens (    1.28 ms per token,   781.78 tokens per second)
llama_perf_context_print:        eval time =    6595.64 ms /   120 runs   (   54.96 ms per token,    18.19 tokens per second)
llama_perf_context_print:       total time =    9282.50 ms /  2177 tokens
llama_perf_context_print:   

**Groundedness Evaluation:**

 **Rating**: 5
**Justification**: The answer is fully grounded as it only contains information that is explicitly stated or directly inferable from the context. The answer accurately summarizes the necessary precautions and treatment steps for a person who has fractured their leg, including assessing the injury, immobilizing the injury with splinting, and considering surgical intervention if necessary. The answer also mentions the use of RICE (rest, ice, compression, and elevation) for soft-tissue injuries, which is a recommended treatment method mentioned in the context.

**Relevance Evaluation:**

 **Rating**: 5
**Justification**: The answer directly addresses the question by outlining the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip. The answer covers immobilization techniques, pain management, and the importance of seeking medical attention for life-threatening injuries. The answer also touches upon the recovery process, including physical therapy and daily exercises. Overall, the answer comprehensively covers all aspects of the question.

## Actionable Insights and Business Recommendations

### Key Findings from the RAG Implementation

#### 1. **Performance Comparison: Base LLM vs. RAG-Enhanced LLM**

| Approach | Strengths | Limitations |
|----------|-----------|-------------|
| **Base LLM (No Context)** | General medical knowledge, quick responses | May hallucinate, lacks source verification, potentially outdated information |
| **LLM + Prompt Engineering** | Better structured responses, clearer formatting | Still relies on training data, no access to specific medical references |
| **RAG-Enhanced LLM** | Grounded in Merck Manual content, traceable sources, reduced hallucinations | Limited by context window (2300 tokens), retrieval quality dependent on chunking |

#### 2. **Evaluation Results Summary**
Based on the LLM-as-a-Judge evaluation:
- **Groundedness scores (1-5)**: Measures how well answers are supported by retrieved context
- **Relevance scores (1-5)**: Measures how well answers address the specific medical questions
- The RAG system demonstrates improved factual accuracy when context is properly retrieved

---

### Actionable Insights

#### **Insight 1: Information Retrieval Quality is Critical**
- The chunking strategy (512 tokens, 50 token overlap) directly impacts response quality
- Smaller chunks (256 tokens) may improve precision for specific drug dosages
- Larger chunks (800 tokens) may improve context for complex procedures

#### **Insight 2: Context Window Constraints Require Optimization**
- The 2300 token context window limits the amount of retrieved context that can be processed
- Evaluation prompts must be carefully managed to avoid overflow
- Consider summarization techniques for longer retrieved passages

#### **Insight 3: Medical Terminology Handling**
- The system effectively retrieves relevant medical content using semantic similarity
- The all-MiniLM-L6-v2 embedding model (384 dimensions) provides good medical term understanding
- Consider domain-specific medical embeddings for improved retrieval accuracy

#### **Insight 4: Response Structure Improves Usability**
- Structured prompts with clear sections (Symptoms, Diagnosis, Treatment) enhance readability
- Healthcare professionals benefit from standardized response formats

---

### Business Recommendations

#### **1. For Healthcare Implementation**

| Recommendation | Priority | Impact | Effort |
|----------------|----------|--------|--------|
| Deploy as clinical decision support tool | High | High | Medium |
| Implement human-in-the-loop verification | Critical | High | Low |
| Add citation tracking to source pages | High | Medium | Medium |
| Create specialty-specific modules | Medium | High | High |

#### **2. Technical Enhancements**

**Short-term (1-3 months):**
- ✅ Implement response caching for frequently asked questions
- ✅ Add logging for audit trails and compliance
- ✅ Deploy monitoring for response quality metrics

**Medium-term (3-6 months):**
- 🔄 Upgrade to larger context window models (8K+ tokens)
- 🔄 Implement hybrid search (semantic + keyword) for improved retrieval
- 🔄 Add multi-turn conversation support for follow-up questions

**Long-term (6-12 months):**
- 📋 Fine-tune domain-specific embedding models
- 📋 Integrate with Electronic Health Records (EHR) systems
- 📋 Implement patient-specific context injection

#### **3. Risk Mitigation**

| Risk | Mitigation Strategy |
|------|---------------------|
| **Hallucination** | Mandatory human review for critical decisions; confidence scoring |
| **Outdated Information** | Regular Merck Manual updates; version tracking |
| **Context Retrieval Failures** | Fallback to broader search; alert when confidence is low |
| **Regulatory Compliance** | HIPAA-compliant deployment; audit logging; disclaimer enforcement |

#### **4. ROI Considerations**

- **Time Savings**: Estimated 30-50% reduction in medical reference lookup time
- **Accuracy Improvement**: Reduced reliance on memory; consistent access to current guidelines
- **Training Support**: Valuable tool for medical residents and continuing education
- **Scalability**: Single system can serve multiple departments and specialties

---

### Future Development Roadmap

```
Phase 1: Pilot Deployment
├── Single department trial (e.g., Internal Medicine)
├── Collect user feedback and accuracy metrics
└── Refine prompts and retrieval parameters

Phase 2: Expanded Rollout
├── Multi-specialty deployment
├── Integration with hospital information systems
└── Mobile access for on-call physicians

Phase 3: Advanced Features
├── Multi-modal support (images, lab results)
├── Personalized recommendations based on patient history
└── Predictive analytics integration
```

---

### Conclusion

This RAG-based medical AI solution demonstrates the feasibility of combining large language models with authoritative medical references like the Merck Manual. The key success factors are:

1. **Quality retrieval** - Proper chunking and embedding strategies
2. **Grounded responses** - Answers based on retrieved context, not hallucinations
3. **Structured outputs** - Clear, actionable medical information
4. **Continuous evaluation** - LLM-as-a-judge methodology for quality assurance

**Next Steps**: Conduct a controlled pilot study with healthcare professionals to validate real-world performance and gather domain expert feedback for further refinement.

<font size=6 color='blue'>Power Ahead</font>
___