<a href="https://colab.research.google.com/github/vksfdc2019/VamseeK.guthub.io/blob/main/Medical_Assistant_Problem_Statement_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <Center> Problem Statement - Medical Assistant </Center>

### Business Context

The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.

Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.

To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness.

**Common Questions to Answer**

**1. Diagnostic Assistance**: "What are the common symptoms and treatments for pulmonary embolism?"

**2. Drug Information**: "Can you provide the trade names of medications used for treating hypertension?"

**3. Treatment Plans**: "What are the first-line options and alternatives for managing rheumatoid arthritis?"

**4. Specialty Knowledge**: "What are the diagnostic steps for suspected endocrine disorders?"

**5. Critical Care Protocols**: "What is the protocol for managing sepsis in a critical care unit?"

### Objective

As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness.

### Data Description

The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.

The manual is provided as a PDF with over 4,000 pages divided into 23 sections.

## Installing and Importing Necessary Libraries and Dependencies

### Installing llama-cpp-python

In [2]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used
# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m131.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m284.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m285.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m271.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m189.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependenc

### Installing huggingface, pandas ticktoken and related dependencies

In [3]:
# For installing the libraries & downloading models from HF Hub
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 numpy==1.25.2 -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.8 MB/s[0m eta [36m0

### Uninstalling unsupported versions of the libraries and installing new supported libraries

In [None]:
# Uninstall numpy and then reinstall it, forcing the correct version
!pip uninstall numpy -y
!pip install numpy==1.25.2 # Use the version you specified in your initial pip install

# Now install all your other libraries.
# Crucially, ensure the pandas version is compatible with this numpy version.
# If you keep getting errors, try specifying a slightly older pandas version, e.g., 1.4.x or 1.3.x
!pip install huggingface_hub==0.23.2 pandas==1.5.3 tiktoken==0.6.0 pymupdf==1.25.1 langchain==0.1.1 langchain-community==0.0.13 chromadb==0.4.22 sentence-transformers==2.3.1 -q

# Restart your runtime after running these cells
import os
os.kill(os.getpid(), 9) # This will restart the Colab runtime

Found existing installation: numpy 1.25.2
Uninstalling numpy-1.25.2:
  Successfully uninstalled numpy-1.25.2
Collecting numpy==1.25.2
  Using cached numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Using cached numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Installing collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 1.5.3 which is incompatible.
dask-cudf-cu12 25.2.2 requires pandas<2.2.4dev0,>=2.0, but you have pandas 1.5.3 which is incompatible.
peft 0.15.2 requires huggingface_hub>=0.25.0, but you have huggingface-hub 0.23.2 which is incompatible.
diffusers 0.34.0 requires huggingface-hub>=0.27.0, but you have huggingface-hub 0.23.2 which is incompatible.
xarray 2025.3.1 requires pandas>=2.1, 

[31mERROR: Operation cancelled by user[0m[31m
[0m^C


### Display the version of numpy

In [1]:
# Import the numpy library for numerical computations
import numpy

# Print the version number of the installed numpy package
print(numpy.__version__)

1.25.2


In [2]:
# Libraries for processing dataframes and text
import json   # For handling JSON data
import os     # For interacting with the operating system (file paths, environment variables, etc.)
import tiktoken  # For tokenization, often used with language models
import pandas as pd  # For working with tabular data using dataframes

# Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain.text_splitter import RecursiveCharacterTextSplitter  # For splitting long texts into smaller chunks
from langchain_community.document_loaders import PyMuPDFLoader  # For loading PDF documents using PyMuPDF
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings  # For creating embeddings with sentence transformers
from langchain_community.vectorstores import Chroma  # For storing and searching embeddings in a vector store (ChromaDB)

# Libraries for downloading and loading the LLM (Large Language Model)
from huggingface_hub import hf_hub_download  # For downloading models and files from the Hugging Face Hub
from llama_cpp import Llama  # For loading and running Llama models (GGUF format) using llama.cpp backend

## Question Answering using LLM

#### Downloading and Loading the model

In [3]:
# Specify the name of the Hugging Face Hub repository containing the model
model_name = "TheBloke/Llama-2-7B-Chat-GGUF"

# Specify the filename of the desired model weights file in the repo
model_file = "llama-2-7b-chat.Q4_K_M.gguf"

# Download the model file from the Hugging Face Hub and get the local path where it's saved
model_path = hf_hub_download(model_name, filename=model_file)

# Load the Llama language model from the specified file,
# setting the context window to 4096 tokens and using all available GPU layers for faster inference
llm = Llama(model_path=model_path, n_ctx=4096, n_gpu_layers=-1)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-7b-chat.Q4_K_M.gguf:   0%|          | 0.00/4.08G [00:00<?, ?B/s]

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

### Function to Generate Response from Llama Language Model Based on User Prompt

In [4]:
# Function to generate, process, and return the response from the LLM
def generate_llama_response(user_prompt):

    # Define a system message to instruct the LLM on how to respond
    system_message = """
    [INST]<<SYS>> Respond to the user question based on the user prompt<</SYS>>[/INST]
    """

    # Combine the user prompt and system message to form the full prompt for the model
    prompt = f"{user_prompt}\n{system_message}"

    # Generate a response from the LLaMA language model with specified generation parameters
    response = llm(
        prompt=prompt,           # Pass the full prompt to the model
        max_tokens=1024,         # Set the maximum number of tokens in the output
        temperature=0.01,        # Control randomness; lower values make output more deterministic
        top_p=0.95,              # Nucleus sampling parameter (probability mass to sample from)
        repeat_penalty=1.2,      # Penalize repeated phrases in the output
        top_k=50,                # Limit sampling to the top_k most probable next tokens
        stop=['INST'],           # Stop generating when 'INST' is encountered
        echo=False               # Do not include the prompt in the output
    )

    # Extract the generated text from the model's response dictionary
    response_text = response["choices"][0]["text"]
    return response_text         # Return only the generated response text

- **`max_tokens`**: This parameter **specifies the maximum number of tokens that the model should generate** in response to the prompt.

- **`temperature`**: This parameter **controls the randomness of the generated response**. A higher temperature value will result in a more random response, while a lower temperature value will result in a more predictable response.

- **`top_p`**: This parameter **controls the diversity of the generated response by establishing a cumulative probability cutoff for token selection**. A higher value of top_p will result in a more diverse response, while a lower value will result in a less diverse response.

- **`repeat_penalty`**: This parameter **controls the penalty for repeating tokens in the generated response**. A higher value of repeat_penalty will result in a lower probability of repeating tokens, while a lower value will result in a higher probability of repeating tokens.

- **`top_k`**: This parameter **controls the maximum number of most-likely next tokens to consider** when generating the response at each step.

- **`stop`**: This parameter is a **list of tokens that are used to dynamically stop response generation** whenever the tokens in the list are encountered.

- **`echo`**: This parameter **controls whether the input (prompt) to the model should be returned** in the model response.


In [5]:
# Define a function to generate a response from the language model using the provided query and generation parameters
def response(query, max_tokens=128, temperature=0, top_p=0.95, top_k=50):
    # Generate a response from the LLM using the input prompt and specified parameters
    model_output = llm(
        prompt=query,            # The input prompt to generate a response for
        max_tokens=max_tokens,   # Maximum number of tokens in the generated output
        temperature=temperature, # Controls randomness; lower values make the output more deterministic
        top_p=top_p,             # Nucleus sampling: probability mass for candidate tokens
        top_k=top_k              # Only consider the top_k most probable tokens at each step
    )

    # Extract and return the generated text from the model's output
    return model_output['choices'][0]['text']

# Question Answering using LLM

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [6]:
# Define the user's question about sepsis management in a critical care unit
user_prompt = "What is the protocol for managing sepsis in a critical care unit?"

# Generate a response from the Llama language model using the user prompt
response = generate_llama_response(user_prompt)

# Print the generated response to the console
print(response)

 Managing sepsis in a critical care unit requires a coordinated and timely approach that includes several key steps. Here are some of the protocols for managing sepsis in a critical care unit:
1. Early recognition and activation of sepsis protocol: The first step is to recognize the signs and symptoms of sepsis early on and activate the hospital's sepsis protocol. This typically involves notifying the attending physician, who will then initiate treatment.
2. Administering antibiotics: Once sepsis has been confirmed, the next step is to administer broad-spectrum antibiotics effective against common causes of sepsis, such as gram-positive and gram-negative bacteria. The choice of antibiotic will depend on the suspected pathogen and the patient's medical history.
3. Monitoring vital signs: Close monitoring of vital signs is crucial in managing sepsis. This includes tracking temperature, blood pressure, heart rate, respiratory rate, and oxygen saturation levels.
4. Providing fluid resuscit

## Comments/Observations from the above response for Query 1
- Emphasizes the importance of early recognition and rapid activation of sepsis protocols.
- Highlights the need for prompt administration of broad-spectrum antibiotics.
- Stresses continuous monitoring of vital signs and organ function to detect deterioration.
- Recommends individualized fluid resuscitation and vasopressor support for hemodynamic stability.
- Addresses pain management with a preference for non-opioid analgesics initially.
- Underlines the necessity of identifying and treating underlying infections.
- Suggests close management of hyperglycemia to prevent complications.
- Advocates for a multidisciplinary team approach to ensure comprehensive patient care.
- Overall, the protocols are evidence-based and align with current best practices for sepsis management.

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

In [7]:
# Define the user's question about symptoms and treatment options for appendicitis
user_prompt = "What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?"

# Generate a response from the Llama language model using the defined user prompt
response = generate_llama_response(user_prompt)

# Print the generated response to the console
print(response)

Llama.generate: prefix-match hit


 Of course! I'd be happy to help you understand more about appendicitis and its treatment options. 
Common Symptoms of Appendicitis:
The common symptoms of appendicitis include:
1. Abdominal pain: The most distinctive sign of appendicitis is abdominal pain, which usually starts near the navel and then moves to the lower right side of the abdomen. The pain can be sudden and severe, and may worsen with movement or deep breathing.
2. Nausea and vomiting: Patients with appendicitis often experience nausea and vomiting, which can lead to dehydration and electrolyte imbalances.
3. Loss of appetite: A person with appendicitis may lose their appetite due to the severe abdominal pain and discomfort.
4. Fever: Appendicitis often causes a high fever, which can range from 100°F to 102°F (37.8°C to 39°C).
5. Abdominal tenderness: The abdomen is usually tender to the touch in the lower right quadrant, especially around the navel.
6. Rigidity and guarding: As the appendix inflames, the muscles in the


**Comments/Observations on the LLM-Generated Response:Query 2:**

- The response provides a clear and structured overview of appendicitis, covering both symptoms and treatment options.
- Common symptoms are listed in detail, accurately reflecting clinical presentation.
- The explanation distinguishes between symptom management with antibiotics and definitive treatment via surgery, which is medically accurate.
- Both laparoscopic and open appendectomy procedures are described, highlighting when each might be used.
- The response emphasizes the urgency of seeking medical attention, which is important for patient safety.
- Overall, the information is comprehensive, factually correct, and presented in a patient-friendly manner.
- The tone is supportive and informative, making the content accessible to non-medical readers.

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

In [8]:
# Define the user's question about treatments and causes of sudden patchy hair loss (localized bald spots)
user_prompt = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"

# Generate a response from the Llama language model using the defined user prompt
response = generate_llama_response(user_prompt)

# Print the generated response to the console
print(response)

Llama.generate: prefix-match hit


 Sure! I'd be happy to help you with your query. Sudden patchy hair loss, also known as localized bald spots or alopecia areata, can be a distressing condition that affects both men and women. Here are some possible causes and effective treatments for addressing sudden patchy hair loss:
Causes of Sudden Patchy Hair Loss:
1. Autoimmune Disorders: In this condition, the immune system mistakenly attacks healthy hair follicles, leading to hair loss. Alopecia areata is a common autoimmune disorder that can cause patchy hair loss.
2. Genetics: If you have a family history of balding or alopecia areata, you may be more prone to sudden patchy hair loss.
3. Hormonal Imbalances: Hormonal fluctuations during pregnancy, menopause, or thyroid disorders can cause sudden patchy hair loss in some individuals.
4. Stress and Emotional Distress: Prolonged stress and emotional distress can lead to telogen effluvium, a condition where there is an increase in the number of hair follicles that stop growing a


**Comments/Observations on the LLM-Generated Response: Query 3**

- The response provides a comprehensive overview of both causes and treatments for sudden patchy hair loss.
- Causes are clearly categorized, covering autoimmune, genetic, hormonal, stress-related, and nutritional factors.
- Treatment options are detailed and include both medical (medications, corticosteroids), procedural (PRP therapy, hair transplantation), and lifestyle interventions (diet, stress reduction, hair care).
- The explanation of each treatment is concise and accurate, reflecting current clinical practices.
- The response emphasizes the importance of identifying the underlying cause for effective management.
- The tone is supportive and informative, making the information accessible to a broad audience.
- Overall, the content is well-structured, factually correct, and offers actionable advice for individuals experiencing patchy hair loss.

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

In [9]:
# Define the user's question about recommended treatments for brain injuries causing impairment
user_prompt = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"

# Generate a response from the Llama language model using the defined user prompt
response = generate_llama_response(user_prompt)

# Print the generated response to the console
print(response)

Llama.generate: prefix-match hit


 Depending on the severity and location of the injury, various treatments may be recommended for a person who has sustained a physical injury to brain tissue resulting in temporary or permanent impairment of brain function. These treatments can include:
1. Medications: To manage symptoms such as pain, inflammation, and swelling, medications such as analgesics, anti-inflammatory drugs, and corticosteroids may be prescribed.
2. Rehabilitation therapy: Physical, occupational, and speech therapy can help restore lost functions and improve cognitive, motor, and communication skills.
3. Surgery: In some cases, surgery may be necessary to relieve pressure on the brain or repair damaged blood vessels.
4. Cognitive rehabilitation: This type of therapy helps individuals with cognitive impairments resulting from brain injury to improve their memory, attention, and other cognitive functions.
5. Behavioral interventions: These may include counseling and psychotherapy to help the individual cope wit

**Comments/Observations on the LLM-Generated Response:Query 4**

- The response provides a thorough overview of treatment options for brain injuries, covering both medical and supportive interventions.
- It accurately lists a range of therapies, including medications, rehabilitation, surgery, cognitive and behavioral interventions, and assistive technologies.
- The inclusion of home modifications and support groups demonstrates a holistic approach to patient care.
- Alternative therapies are mentioned, reflecting awareness of complementary treatment options.
- The response emphasizes individualized care and the importance of professional guidance, which aligns with best clinical practices.
- Overall, the information is comprehensive, well-structured, and accessible to a broad audience.

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

In [10]:
# Define the user's question about precautions and treatment for a leg fracture sustained during hiking
user_prompt = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"

# Generate a response from the Llama language model using the provided user prompt
response = generate_llama_response(user_prompt)

# Print the model's generated response to the console
print(response)

Llama.generate: prefix-match hit


 If you have fractured your leg while hiking, it is essential to seek medical attention as soon as possible. Here are some necessary precautions and treatment steps:
Precautions:
1. Immobilize the injured limb: Use a splint or bandage to immobilize the affected leg to prevent further injury or movement. This will help reduce pain and promote healing.
2. Apply ice packs: Ice can be applied to the affected area to reduce swelling and relieve pain. Wrap an ice pack in a cloth to avoid direct contact with the skin.
3. Elevate the injured limb: Elevating the leg above heart level can help reduce swelling and promote blood flow.
4. Monitor for signs of complications: Keep an eye out for signs of infection, such as redness, swelling, or increased pain. If you notice any of these symptoms, seek medical attention immediately.
Treatment Steps:
1. X-rays and imaging tests: A healthcare professional will take x-rays to determine the severity of the fracture and create a treatment plan. Depending o

**Comments/Observations on the LLM-Generated Response:Query 5**

- The response provides a clear, step-by-step guide for immediate precautions and treatment following a leg fracture.
- It emphasizes the importance of immobilization, ice application, elevation, and monitoring for complications, which are standard first-aid measures.
- The treatment section accurately covers diagnostic imaging, medication, immobilization, and the role of physical therapy in recovery.
- Care and recovery considerations such as follow-up, rest, compression garments, and nutrition are included, reflecting a holistic approach.
- The advice to seek prompt medical attention and avoid self-treatment is appropriate and prioritizes patient safety.
- The information is well-structured, comprehensive, and accessible to non-medical readers.
- Overall, the response aligns with best practices for fracture management and provides actionable, practical guidance.
```

# Question Answering using LLM with Prompt Engineering

### As per the rubrics, LLM parameter tuning (at least 5 combinations)

### Combination - 1 :  Low randomness for more deterministic, factual output

In [11]:
# Define a function to generate an LLM response using a specific combination of parameters for sepsis management queries
def generate_llama_response_combo1(user_prompt):
    # Define a system message providing context and specific instruction for summarizing clinical guidelines
    system_message = """
    [INST]<<SYS>> Summarize the most up-to-date clinical guidelines for sepsis management in the ICU, with a focus on evidence-based practices.<</SYS>>[/INST]
    """
    # Combine the user prompt and the system message to form the full prompt sent to the language model
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and specified parameter values for generation
    response = llm(
        prompt=prompt,            # The full prompt including user input and instructions
        max_tokens=600,           # Maximum number of tokens to generate in the response
        temperature=0.2,          # Low randomness for more deterministic, factual output
        top_p=0.9,                # Nucleus sampling parameter to limit diversity of output
        repeat_penalty=1.1,       # Slight penalty to discourage repeated phrases
        top_k=40,                 # Only consider the top 40 tokens at each step for next token prediction
        stop=["INST"],            # Stop generating output if the string "INST" is encountered
        echo=False                # Do not include the prompt in the response output
    )
    # Return the text of the generated response from the model's output
    return response["choices"][0]["text"]

### Combination - 2 : Controls randomness/creativity of the output,  No additional penalty for repeating phrases

In [12]:
# Define a function to generate an LLM response using a second set of parameters for sepsis management queries
def generate_llama_response_combo2(user_prompt):
    # Define a system message to instruct the LLM to provide a detailed, step-by-step treatment protocol
    system_message = """
    [INST]<<SYS>> Provide a detailed step-by-step treatment protocol for managing sepsis in an ICU.<</SYS>>[/INST]
    """
    # Combine the user prompt and system message into a single prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt including user input and instructions
        max_tokens=768,         # Sets the maximum length for the generated response (in tokens)
        temperature=0.4,        # Controls randomness/creativity of the output (moderate value)
        top_p=0.85,             # Nucleus sampling: considers tokens with cumulative probability up to 0.85
        repeat_penalty=1.0,     # No additional penalty for repeating phrases
        top_k=60,               # Considers only the top 60 most likely tokens at each step
        stop=["INST"],          # Halts output generation if the string "INST" is encountered
        echo=False              # Does not include the prompt itself in the output
    )
    # Extract and return the generated text response from the LLM output
    return response["choices"][0]["text"]

### Combination - 3 : Controls the randomness/creativity of the output (higher value allows more variety), Slight penalty to discourage repeated phrases in the output

In [13]:
# Define a function to generate an LLM response using a third combination of parameters for sepsis management queries
def generate_llama_response_combo3(user_prompt):
    # Define a system message to instruct the LLM to compare current and older ICU sepsis protocols, focusing on improvements and rationale
    system_message = """
    [INST]<<SYS>> Explain how current ICU sepsis protocols differ from older practices, emphasizing the improvements and rationale behind them.<</SYS>>[/INST]
    """
    # Combine the user prompt and system message into a single prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt including user input and instructions
        max_tokens=700,         # Set the maximum length for the generated response (in tokens)
        temperature=0.6,        # Controls the randomness/creativity of the output (higher value allows more variety)
        top_p=0.95,             # Nucleus sampling: considers tokens with cumulative probability up to 0.95
        repeat_penalty=1.1,     # Slight penalty to discourage repeated phrases in the output
        top_k=50,               # Considers only the top 50 most likely tokens at each step
        stop=["INST"],          # Halts output generation if the string "INST" is encountered
        echo=False              # Does not include the prompt itself in the output
    )
    # Extract and return the generated text response from the LLM output
    return response["choices"][0]["text"]

### Combination - 4 : Very low randomness for highly focused and factual output,  Strong penalty to discourage repeated phrases or words in the output

In [14]:
# Define a function to generate an LLM response using a fourth combination of parameters for sepsis management queries
def generate_llama_response_combo4(user_prompt):
    # Define a system message instructing the LLM to format its response as a checklist summarizing key ICU interventions for sepsis management
    system_message = """
    [INST]<<SYS>> Format your response as a checklist summarizing ICU interventions for managing sepsis, from diagnosis to stabilization.<</SYS>>[/INST]
    """
    # Combine the user prompt and system message into a single prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt, including user input and checklist formatting instruction
        max_tokens=400,         # Sets the maximum length for the generated response (in tokens)
        temperature=0.1,        # Very low randomness for highly focused and factual output
        top_p=0.7,              # Nucleus sampling: considers tokens with cumulative probability up to 0.7 (less diversity)
        repeat_penalty=1.3,     # Strong penalty to discourage repeated phrases or words in the output
        top_k=30,               # Considers only the top 30 most likely tokens at each step
        stop=["INST"],          # Halts output generation if the string "INST" is encountered
        echo=False              # Does not include the prompt itself in the output
    )
    # Extract and return the generated text response from the LLM output
    return response["choices"][0]["text"]

### Combination - 5:  Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)

In [15]:
# Define a function to generate an LLM response using a fifth combination of parameters for sepsis management queries
def generate_llama_response_combo5(user_prompt):
    # Define a system message instructing the LLM to explain ICU sepsis management as if teaching a junior medical resident, using clear and conversational language
    system_message = """
    [INST]<<SYS>> Imagine you're explaining ICU sepsis management to a junior medical resident. Use clear, educational, and conversational language.<</SYS>>[/INST]
    """
    # Combine the user prompt and system message into a single prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM function with the prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The complete prompt, including user input and instructional context
        max_tokens=650,         # Set the maximum length for the generated response (in tokens)
        temperature=0.8,        # Higher randomness for a more conversational and educational tone
        top_p=1.0,              # Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)
        repeat_penalty=1.0,     # No additional penalty for repeated phrases
        top_k=80,               # Considers the top 80 most likely tokens at each step
        stop=["INST"],          # Stops generation if the string "INST" is encountered
        echo=False              # Does not include the prompt itself in the generated output
    )
    # Extract and return the generated text response from the LLM output
    return response["choices"][0]["text"]

### Wrapper function to run all the combinations

In [16]:
def run_all_llama_response_combos(user_prompt):
    results = {
        "Combo 1 – Clinical Guideline Summary": generate_llama_response_combo1(user_prompt),
        "Combo 2 – Step-by-Step Protocol": generate_llama_response_combo2(user_prompt),
        "Combo 3 – Modern vs. Traditional": generate_llama_response_combo3(user_prompt),
        "Combo 4 – Protocol Checklist": generate_llama_response_combo4(user_prompt),
        "Combo 5 – Teaching Style Explanation": generate_llama_response_combo5(user_prompt),
    }

    for title, response in results.items():
        print(f"\n=== {title} ===\n{response}\n{'=' * 60}")

### Query 1: What is the protocol for managing sepsis in a critical care unit?

In [17]:
# Define the question about sepsis management protocols in a critical care unit
question = "What is the protocol for managing sepsis in a critical care unit?"

# Call the function to run all predefined Llama model response parameter combinations with the given question
run_all_llama_response_combos(question)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit



=== Combo 1 – Clinical Guideline Summary ===
 Managing sepsis in a critical care unit requires prompt and effective interventions to address the complex physiological derangements that occur in this life-threatening condition. The Surviving Sepsis Campaign (SSC) and the Society of Critical Care Medicine (SCCM) have developed evidence-based guidelines for sepsis management in the intensive care unit (ICU). Here are some key updates from the most recent clinical guidelines:
1. Early recognition and activation of sepsis protocols: The SSC/SCCM guidelines emphasize the importance of early recognition and activation of sepsis protocols, including prompt administration of antibiotics and fluid resuscitation.
2. Vasopressor therapy: The guidelines recommend using vasopressors to maintain mean arterial pressure (MAP) ≥65 mmHg in septic patients with hypotension (defined as MAP <65 mmHg). However, the choice of vasopressor should be based on the patient's hemodynamic status and other factors.


### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

### Combination 1 – Clinical Guideline Summary (Deterministic)

In [18]:
# Define a function to generate an LLM response using the first combination of parameters for sepsis management queries
def generate_llama_response_combo1(user_prompt):
    # Create a system message instructing the LLM to summarize current clinical guidelines for ICU sepsis management, focusing on evidence-based practices
    system_message = """
    [INST]<<SYS>> Summarize the most up-to-date clinical guidelines for sepsis management in the ICU, with a focus on evidence-based practices.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM (language model) with the combined prompt and a specific set of parameters for text generation
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=600,         # Set the maximum number of tokens for the generated response
        temperature=0.2,        # Low temperature for a more deterministic, less random response
        top_p=0.9,              # Nucleus sampling parameter: considers tokens with cumulative probability up to 0.9
        repeat_penalty=1.1,     # Slight penalty to discourage repeated text in the output
        top_k=40,               # Considers only the top 40 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 2 – Step-by-Step Protocol (Process-Oriented)

In [19]:
# Define a function to generate a response from the LLM using the second parameter combination for sepsis management queries
def generate_llama_response_combo2(user_prompt):
    # Create a system message instructing the LLM to provide a detailed, step-by-step treatment protocol for sepsis in an ICU
    system_message = """
    [INST]<<SYS>> Provide a detailed step-by-step treatment protocol for managing sepsis in an ICU.<</SYS>>[/INST]
    """
    # Combine the user prompt and the system message into a single prompt string for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the constructed prompt and specific generation parameters
    response = llm(
        prompt=prompt,          # The complete prompt to be sent to the language model
        max_tokens=768,         # Set the maximum number of tokens to generate in the response
        temperature=0.4,        # Set the randomness/creativity of the output (moderate value)
        top_p=0.85,             # Nucleus sampling: consider tokens with cumulative probability up to 0.85
        repeat_penalty=1.0,     # No additional penalty for repeated phrases (default value)
        top_k=60,               # Consider the top 60 most likely tokens at each generation step
        stop=["INST"],          # Stop generating output if the string "INST" is encountered
        echo=False              # Do not include the input prompt in the output
    )
    # Return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 3 – Modern vs. Traditional (Comparative, Narrative)

In [20]:
# Define a function to generate a response from the LLM using the third set of parameters for sepsis management queries
def generate_llama_response_combo3(user_prompt):
    # Create a system message instructing the LLM to explain differences between current and older ICU sepsis protocols,
    # with an emphasis on improvements and the rationale behind changes
    system_message = """
    [INST]<<SYS>> Explain how current ICU sepsis protocols differ from older practices, emphasizing the improvements and rationale behind them.<</SYS>>[/INST]
    """
    # Combine the user prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt containing user input and system instruction
        max_tokens=700,         # Maximum number of tokens to generate in the response
        temperature=0.6,        # Controls randomness/creativity of the output (moderate value for balanced results)
        top_p=0.95,             # Nucleus sampling: considers tokens with cumulative probability up to 0.95 (more diversity)
        repeat_penalty=1.1,     # Slight penalty to discourage repeated phrases in the output
        top_k=50,               # Considers only the top 50 most likely tokens for each step
        stop=["INST"],          # Stops generation if the string "INST" appears in the output
        echo=False              # Excludes the prompt itself from the output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 4 – Protocol Checklist (Concise & Actionable)

In [21]:
# Define a function to generate a response from the LLM using the fourth set of parameters for sepsis management queries
def generate_llama_response_combo4(user_prompt):
    # Create a system message instructing the LLM to format its response as a checklist summarizing ICU interventions for sepsis, from diagnosis to stabilization
    system_message = """
    [INST]<<SYS>> Format your response as a checklist summarizing ICU interventions for managing sepsis, from diagnosis to stabilization.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the constructed prompt and specific generation parameters
    response = llm(
        prompt=prompt,          # The complete prompt to be sent to the language model
        max_tokens=400,         # Set the maximum number of tokens for the generated response
        temperature=0.1,        # Very low temperature for a highly focused and factual output
        top_p=0.7,              # Nucleus sampling: consider tokens with cumulative probability up to 0.7 (less diversity)
        repeat_penalty=1.3,     # Strong penalty to discourage repeated phrases or words in the output
        top_k=30,               # Consider only the top 30 most likely tokens at each generation step
        stop=["INST"],          # Stop generating output if the string "INST" is encountered
        echo=False              # Do not include the input prompt in the output
    )
    # Return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 5 – Teaching Style (Conversational & Educational)

In [22]:
# Define a function to generate a response from the LLM using the fifth set of parameters for sepsis management queries
def generate_llama_response_combo5(user_prompt):
    # Create a system message instructing the LLM to explain ICU sepsis management as if teaching a junior medical resident, using clear and conversational language
    system_message = """
    [INST]<<SYS>> Imagine you're explaining ICU sepsis management to a junior medical resident. Use clear, educational, and conversational language.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the LLM with the constructed prompt and specific generation parameters
    response = llm(
        prompt=prompt,          # The complete prompt to be sent to the language model
        max_tokens=650,         # Set the maximum number of tokens for the generated response
        temperature=0.8,        # Higher temperature for more creativity and conversational style
        top_p=1.0,              # Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)
        repeat_penalty=1.0,     # No penalty for repeated phrases (default value)
        top_k=80,               # Consider the top 80 most likely tokens at each generation step
        stop=["INST"],          # Stop generating output if the string "INST" is encountered
        echo=False              # Do not include the input prompt in the output
    )
    # Return the generated text from the model's response
    return response["choices"][0]["text"]

### Wrapper function to run all the combinations

In [23]:
# Define a function to run all Llama response combinations and print their outputs
def run_all_llama_response_combos(user_prompt):
    # Call each of the predefined response combo functions with the user prompt,
    # and store their outputs in a dictionary with descriptive titles as keys
    results = {
        "Combo 1 – Clinical Guideline Summary": generate_llama_response_combo1(user_prompt),
        "Combo 2 – Step-by-Step Protocol": generate_llama_response_combo2(user_prompt),
        "Combo 3 – Modern vs. Traditional": generate_llama_response_combo3(user_prompt),
        "Combo 4 – Protocol Checklist": generate_llama_response_combo4(user_prompt),
        "Combo 5 – Teaching Style Explanation": generate_llama_response_combo5(user_prompt),
    }

    # Iterate over each key-value pair in the results dictionary
    for title, response in results.items():
        # Print the section header, the response, and a separator line for readability
        print(f"\n=== {title} ===\n{response}\n{'=' * 60}")

### Run the query

In [24]:
# Assign the sepsis management question to a variable
question = "What is the protocol for managing sepsis in a critical care unit?"

# Call the function to run all Llama response combinations using the question as input
run_all_llama_response_combos(question)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit



=== Combo 1 – Clinical Guideline Summary ===
 Managing sepsis in a critical care unit requires prompt and effective interventions to address the complex needs of these patients. The Surviving Sepsis Campaign (SSC) and the Society of Critical Care Medicine (SCCM) have developed evidence-based guidelines for sepsis management in the ICU, which are regularly updated to reflect new research findings. Here is a summary of the most recent clinical guidelines for managing sepsis in the ICU:
1. Early recognition and activation of sepsis protocols: Rapid identification and activation of sepsis protocols are critical to ensure timely interventions and improve outcomes. The SSC recommends using a validated screening tool, such as the Quick Sequential Organ Failure Assessment (qSOFA), to identify patients at risk of developing sepsis.
2. Resuscitation: Early recognition of hypoperfusion and shock requires prompt resuscitation with vasopressors, fluids, and oxygen. The SCCM recommends a goal-direc

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

### Combination 1 – Medical Summary (Causes + Treatments)

In [25]:
# Define a function to generate an LLM response using the first combination of parameters for hair loss queries
def generate_llama_response_combo1(user_prompt):
    # Create a system message instructing the LLM to provide a concise summary of causes and treatments for sudden patchy hair loss,
    # including commonly seen scalp conditions such as alopecia areata
    system_message = """
    [INST]<<SYS>> Provide a concise summary of causes and treatments for sudden patchy hair loss, including commonly seen scalp conditions like alopecia areata.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=600,         # Set the maximum number of tokens for the generated response
        temperature=0.2,        # Low temperature for a more deterministic, less random response
        top_p=0.9,              # Nucleus sampling parameter: considers tokens with cumulative probability up to 0.9
        repeat_penalty=1.1,     # Slight penalty to discourage repeated text in the output
        top_k=40,               # Considers only the top 40 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 2 – Step-by-Step Diagnosis and Care Plan

In [26]:
# Define a function to generate an LLM response using the second combination of parameters for hair loss queries
def generate_llama_response_combo2(user_prompt):
    # Create a system message instructing the LLM to outline a step-by-step diagnostic and treatment plan
    # for a patient presenting with sudden localized hair loss on the scalp
    system_message = """
    [INST]<<SYS>> Outline a step-by-step diagnostic and treatment plan for a patient presenting with sudden localized hair loss on the scalp.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=768,         # Set the maximum number of tokens for the generated response
        temperature=0.4,        # Moderate temperature for some creativity while mostly factual output
        top_p=0.85,             # Nucleus sampling: considers tokens with cumulative probability up to 0.85
        repeat_penalty=1.0,     # No additional penalty for repeated phrases (default value)
        top_k=60,               # Considers only the top 60 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 3 – Causes: Then vs. Now (Comparative)

In [27]:
# Define a function to generate an LLM response using the third combination of parameters for hair loss queries
def generate_llama_response_combo3(user_prompt):
    # Create a system message instructing the LLM to compare historical beliefs and modern understanding
    # of the causes and treatments for patchy hair loss, including discussion of autoimmune factors
    system_message = """
    [INST]<<SYS>> Compare historical beliefs and modern understanding of the causes and treatments for patchy hair loss, including autoimmune factors.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=700,         # Set the maximum number of tokens for the generated response
        temperature=0.6,        # Moderate temperature for balanced creativity and factual output
        top_p=0.95,             # Nucleus sampling: considers tokens with cumulative probability up to 0.95
        repeat_penalty=1.1,     # Slight penalty to discourage repeated text in the output
        top_k=50,               # Considers only the top 50 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 4 – Symptom/Treatment Checklist

In [28]:
# Define a function to generate an LLM response using the fourth combination of parameters for hair loss queries
def generate_llama_response_combo4(user_prompt):
    # Create a system message instructing the LLM to provide a checklist of possible causes for sudden patchy scalp hair loss
    # and include corresponding treatment options, both medical and lifestyle-based
    system_message = """
    [INST]<<SYS>> Provide a checklist of possible causes of sudden patchy scalp hair loss and corresponding treatment options, both medical and lifestyle-based.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=400,         # Set the maximum number of tokens for the generated response
        temperature=0.1,        # Very low temperature for factual, less creative output
        top_p=0.7,              # Nucleus sampling: considers tokens with cumulative probability up to 0.7 (less diversity)
        repeat_penalty=1.3,     # Strong penalty to discourage repeated text in the output
        top_k=30,               # Considers only the top 30 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 5 – Patient-Friendly Explanation

In [29]:
# Define a function to generate an LLM response using the fifth combination of parameters for hair loss queries
def generate_llama_response_combo5(user_prompt):
    # Create a system message instructing the LLM to explain the causes and treatment options for patchy hair loss
    # to a patient, using simple and clear language, and including advice on when to see a dermatologist
    system_message = """
    [INST]<<SYS>> Explain the causes and treatment options for patchy hair loss to a patient in simple, clear language, including when to see a dermatologist.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The full prompt to be sent to the language model
        max_tokens=650,         # Set the maximum number of tokens for the generated response
        temperature=0.8,        # Higher temperature for more conversational and creative output
        top_p=1.0,              # Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)
        repeat_penalty=1.0,     # No penalty for repeated phrases (default value)
        top_k=80,               # Considers only the top 80 most likely tokens for each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Wrapper Function: run_all_llama_response_combos_hairloss

In [30]:
# Define a function to run all Llama response combinations for hair loss queries and print their outputs
def run_all_llama_response_combos_hairloss(user_prompt):
    # Call each of the predefined response combo functions with the user prompt,
    # and store their outputs in a dictionary with descriptive titles as keys
    results = {
        "Combo 1 – Medical Summary": generate_llama_response_combo1(user_prompt),
        "Combo 2 – Step-by-Step Diagnosis/Treatment": generate_llama_response_combo2(user_prompt),
        "Combo 3 – Historical vs. Modern Causes": generate_llama_response_combo3(user_prompt),
        "Combo 4 – Checklist Format": generate_llama_response_combo4(user_prompt),
        "Combo 5 – Patient-Friendly Explanation": generate_llama_response_combo5(user_prompt),
    }

    # Iterate over each key-value pair in the results dictionary
    for title, response in results.items():
        # Print the section header, the response, and a separator line for readability
        print(f"\n=== {title} ===\n{response}\n{'=' * 60}")

### Run the code

In [31]:
# Assign the hair loss question to a variable
question = "What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?"

# Call the function to run all Llama response combinations for hair loss using the question as input
run_all_llama_response_combos_hairloss(question)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit



=== Combo 1 – Medical Summary ===
 Sudden patchy hair loss, also known as localized bald spots or alopecia areata, can be caused by a variety of factors, including:

1. Autoimmune disorders: In these conditions, the immune system mistakenly attacks healthy hair follicles, leading to hair loss. Alopecia areata is a common autoimmune disorder that causes patchy hair loss on the scalp.
2. Hormonal imbalances: Hormonal changes during pregnancy, menopause, or thyroid disorders can cause sudden patchy hair loss.
3. Infections: Fungal infections of the scalp, such as ringworm, can cause patchy hair loss.
4. Traction alopecia: This condition occurs when the hair is pulled too tightly, leading to hair breakage and balding.
5. Nutritional deficiencies: Deficiencies in iron, zinc, or other essential nutrients can cause patchy hair loss.
6. Stress: Prolonged stress can lead to telogen effluvium, a condition where the body enters a resting phase and stops growing new hair. This can result in sudde

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

###  Combination 1 – Medical Summary (Generalized + Structured)

In [32]:
# Define a function to generate an LLM response using the first combination of parameters for TBI treatment queries
def generate_llama_response_combo1(user_prompt):
    # Create a system message instructing the LLM to provide a concise overview of treatments for traumatic brain injury (TBI),
    # including both short-term and long-term interventions
    system_message = """
    [INST]<<SYS>> Provide a concise overview of treatments for traumatic brain injury (TBI), covering both short-term and long-term interventions.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model (llm) with the combined prompt and a specific set of generation parameters
    response = llm(
        prompt=prompt,          # The complete prompt to be sent to the language model
        max_tokens=600,         # Set the maximum number of tokens for the generated response
        temperature=0.2,        # Low temperature for more focused and deterministic output
        top_p=0.9,              # Nucleus sampling: considers tokens with cumulative probability up to 0.9
        repeat_penalty=1.1,     # Slight penalty to discourage repeated text in the output
        top_k=40,               # Considers only the top 40 most likely tokens at each step
        stop=["INST"],          # Stops generating if the string "INST" appears in the output
        echo=False              # Exclude the input prompt from the response output
    )
    # Return the generated text from the response dictionary
    return response["choices"][0]["text"]

### Combination 2 - Step-by-Step Treatment Protocol

In [33]:
def generate_llama_response_combo2(user_prompt):
    # Define the system message that instructs the LLM to provide a step-by-step management plan for brain tissue injury
    system_message = """
    [INST]<<SYS>> Describe the step-by-step management plan for a patient with brain tissue injury, from acute care to rehabilitation and long-term support.<</SYS>>[/INST]
    """
    # Combine the user prompt and the system message to create the full prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The prompt to be given to the LLM
        max_tokens=768,          # The maximum number of tokens to generate in the response
        temperature=0.4,         # Controls randomness: lower values make output more focused and deterministic
        top_p=0.85,              # Nucleus sampling: considers tokens with cumulative probability up to 0.85
        repeat_penalty=1.0,      # Penalty for repeated phrases; 1.0 means no extra penalty
        top_k=60,                # Considers only the top 60 most likely tokens at each step
        stop=["INST"],           # Stop generating if the string "INST" is encountered
        echo=False               # Do not include the original prompt in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 3 – Advances in Brain Injury Treatment (Comparative)

In [34]:
def generate_llama_response_combo3(user_prompt):
    # Define the system message that tells the LLM to compare traditional and modern approaches to treating traumatic brain injuries,
    # with an emphasis on innovations in neurorehabilitation and neurosurgery
    system_message = """
    [INST]<<SYS>> Compare traditional and modern approaches to treating traumatic brain injuries, highlighting innovations in neurorehabilitation and neurosurgery.<</SYS>>[/INST]
    """
    # Combine the user's prompt with the system message to create the full prompt for the LLM
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specific generation parameters
    response = llm(
        prompt=prompt,           # The full prompt string to provide context and instructions to the LLM
        max_tokens=700,          # The maximum number of tokens allowed in the generated response
        temperature=0.6,         # A moderate temperature for balanced creativity and determinism
        top_p=0.95,              # Nucleus sampling: considers tokens with cumulative probability up to 0.95
        repeat_penalty=1.1,      # Slight penalty to discourage repeated text in the output
        top_k=50,                # Considers only the top 50 most likely tokens at each generation step
        stop=["INST"],           # Stops generation if the string "INST" is encountered
        echo=False               # Do not include the input prompt in the generated output
    )
    # Extract and return the text generated by the LLM from the response object
    return response["choices"][0]["text"]

### Combination 4  – Checklist Format (Structured and Actionable)

In [35]:
def generate_llama_response_combo4(user_prompt):
    # Define the system message instructing the LLM to provide a checklist of treatment steps and supportive therapies
    # for patients with brain injuries, organized by severity and recovery stage
    system_message = """
    [INST]<<SYS>> Provide a checklist of treatment steps and supportive therapies for patients with brain injuries, categorized by severity and stage of recovery.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified parameters
    response = llm(
        prompt=prompt,           # The full prompt to send to the LLM
        max_tokens=400,          # The maximum number of tokens to generate in the response
        temperature=0.1,         # Low temperature for highly focused and deterministic output
        top_p=0.7,               # Nucleus sampling: considers tokens with cumulative probability up to 0.7
        repeat_penalty=1.3,      # Strong penalty to reduce repeated phrases in the output
        top_k=30,                # Considers only the top 30 most likely tokens at each step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the original prompt in the generated output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 5 – Patient-Centered Education Style

In [36]:
def generate_llama_response_combo5(user_prompt):
    # Define the system message instructing the LLM to explain brain injury treatments in simple, clear language
    # for a non-medical audience, and include information about recovery and available support options
    system_message = """
    [INST]<<SYS>> Explain brain injury treatments in clear, simple terms for a non-medical audience, including what recovery might look like and support options.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The complete prompt string for the LLM
        max_tokens=650,          # Maximum number of tokens to generate for the response
        temperature=0.8,         # Higher temperature for more conversational and creative output
        top_p=1.0,               # Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)
        repeat_penalty=1.0,      # Default penalty; does not discourage repetition
        top_k=80,                # Considers only the top 80 most likely tokens at each generation step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the input prompt in the model's output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

###  Wrapper Function: run_all_llama_response_combos_braininjury

In [37]:
def run_all_llama_response_combos_braininjury(user_prompt):
    # Call each of the predefined brain injury response combo functions with the user prompt,
    # and store their outputs in a dictionary with descriptive titles as keys
    results = {
        "Combo 1 – Medical Summary": generate_llama_response_combo1(user_prompt),       # Concise overview of TBI treatments
        "Combo 2 – Step-by-Step Protocol": generate_llama_response_combo2(user_prompt), # Step-by-step management plan
        "Combo 3 – Advances in Treatment": generate_llama_response_combo3(user_prompt), # Traditional vs. modern approaches and innovations
        "Combo 4 – Checklist Format": generate_llama_response_combo4(user_prompt),      # Checklist of treatments by severity and stage
        "Combo 5 – Patient Education Style": generate_llama_response_combo5(user_prompt),# Simple explanation for patients/non-medical audience
    }

    # Iterate over each key-value pair in the results dictionary
    for title, response in results.items():
        # Print the section header, the generated response, and a separator line for clarity
        print(f"\n=== {title} ===\n{response}\n{'=' * 60}")

### Run the code

In [39]:
# Assign the brain injury treatment question to a variable
question = "What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?"

# Call the function to run all Llama response combinations for brain injury using the question as input
run_all_llama_response_combos_braininjury(question)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit



=== Combo 1 – Medical Summary ===
 Traumatic brain injury (TBI) is a complex and multifaceted condition that requires a comprehensive treatment approach encompassing various interventions. The following is an overview of short-term and long-term treatments for TBI, including their goals, methods, and expected outcomes:
Short-Term Treatments (0-3 months):
1. Acute Management: Immediate management of TBI involves stabilizing the patient's vital signs, controlling bleeding, and addressing any life-threatening complications. This phase focuses on preventing further damage and ensuring the patient's safety.
2. Monitoring and Observation: Patients with mild to moderate TBI are closely monitored for changes in their condition, including neurological symptoms, cognitive function, and behavioral alterations. Regular assessments help identify any deterioration or improvement.
3. Medications: Various medications may be prescribed to manage symptoms such as pain, agitation, anxiety, and depressio

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

### Combinatiion 1 – Medical Summary (Emergency + Recovery Focus)

In [40]:
def generate_llama_response_combo1(user_prompt):
    # Define the system message instructing the LLM to provide a concise medical summary
    # that covers emergency response, treatment steps, and recovery considerations for a leg fracture during hiking
    system_message = """
    [INST]<<SYS>> Provide a concise medical summary outlining emergency response, treatment steps, and recovery considerations for a leg fracture sustained during hiking.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the prompt and the specified generation parameters
    response = llm(
        prompt=prompt,           # The full prompt to send to the LLM
        max_tokens=600,          # Maximum number of tokens to generate for the response
        temperature=0.2,         # Low temperature for focused, deterministic output
        top_p=0.9,               # Nucleus sampling: considers tokens with cumulative probability up to 0.9
        repeat_penalty=1.1,      # Slight penalty to discourage repeated phrases
        top_k=40,                # Considers only the top 40 most likely tokens at each step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the prompt in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 2 – Step-by-Step Protocol

In [41]:
def generate_llama_response_combo2(user_prompt):
    # Define the system message instructing the LLM to outline a step-by-step management process for a leg fracture,
    # starting from the injury on a hiking trail, through hospital care, to recovery
    system_message = """
    [INST]<<SYS>> Outline the step-by-step process to manage a leg fracture from the moment of injury on a hiking trail through hospital treatment and recovery.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The complete prompt to send to the LLM
        max_tokens=768,          # Maximum number of tokens allowed in the generated response
        temperature=0.4,         # Moderate temperature for a balance of determinism and creativity
        top_p=0.85,              # Nucleus sampling: considers tokens with cumulative probability up to 0.85
        repeat_penalty=1.0,      # Standard penalty for repeated text (no extra penalty)
        top_k=60,                # Considers only the top 60 most likely tokens at each step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the prompt itself in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

###  Combination 3 – Field vs. Clinical Treatment (Comparative)

In [42]:
def generate_llama_response_combo3(user_prompt):
    # Define the system message instructing the LLM to compare emergency field management of a fractured leg
    # with in-hospital treatment and rehabilitation, highlighting differences in priorities and tools used
    system_message = """
    [INST]<<SYS>> Compare emergency field management of a fractured leg with in-hospital treatment and rehabilitation, including differences in priorities and tools.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The complete prompt string for the LLM
        max_tokens=700,          # Maximum number of tokens allowed in the generated response
        temperature=0.6,         # Moderate temperature for balanced creativity and determinism
        top_p=0.95,              # Nucleus sampling: considers tokens with cumulative probability up to 0.95
        repeat_penalty=1.1,      # Slight penalty to discourage repeated phrases in the output
        top_k=50,                # Considers only the top 50 most likely tokens at each step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the prompt itself in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Combinatiion 4 – Checklist Format (Quick Reference)

In [43]:
def generate_llama_response_combo4(user_prompt):
    # Define the system message instructing the LLM to provide a checklist of precautions,
    # immediate treatments, and recovery steps for a person with a leg fracture sustained while hiking
    system_message = """
    [INST]<<SYS>> Provide a checklist of precautions, immediate treatments, and recovery steps for a person with a leg fracture sustained while hiking.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The complete prompt to send to the LLM
        max_tokens=400,          # Maximum number of tokens to generate for the response
        temperature=0.1,         # Low temperature for focused and deterministic output
        top_p=0.7,               # Nucleus sampling: considers tokens with cumulative probability up to 0.7
        repeat_penalty=1.3,      # Strong penalty to reduce repeated phrases in the output
        top_k=30,                # Considers only the top 30 most likely tokens at each generation step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the prompt itself in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Combination 5 – Patient-Friendly Education Style

In [44]:
def generate_llama_response_combo5(user_prompt):
    # Define the system message instructing the LLM to explain, in simple, clear language,
    # what to do if someone breaks their leg while hiking, and what to expect during healing and recovery
    system_message = """
    [INST]<<SYS>> Explain in simple, clear language what someone should do if they break their leg while hiking, and what to expect during healing and recovery.<</SYS>>[/INST]
    """
    # Combine the user's prompt and the system message into a single prompt string
    prompt = f"{user_prompt}\n{system_message}"
    # Call the language model with the constructed prompt and specified generation parameters
    response = llm(
        prompt=prompt,           # The full prompt string for the LLM
        max_tokens=650,          # Maximum number of tokens to generate for the response
        temperature=0.8,         # Higher temperature for a more conversational and creative output
        top_p=1.0,               # Nucleus sampling: considers tokens with cumulative probability up to 1.0 (maximum diversity)
        repeat_penalty=1.0,      # Standard penalty for repeated text (no extra penalty)
        top_k=80,                # Considers only the top 80 most likely tokens at each generation step
        stop=["INST"],           # Stops generating if the string "INST" appears in the output
        echo=False               # Do not include the prompt itself in the output
    )
    # Extract and return the generated text from the model's response
    return response["choices"][0]["text"]

### Wrapper Function: run_all_llama_response_combos_legfracture

In [45]:
def run_all_llama_response_combos_legfracture(user_prompt):
    # Call each of the predefined leg fracture response combo functions with the user prompt,
    # and store their outputs in a dictionary with descriptive titles as keys
    results = {
        "Combo 1 – Medical Summary": generate_llama_response_combo1(user_prompt),        # Concise medical summary for leg fracture during hiking
        "Combo 2 – Step-by-Step Protocol": generate_llama_response_combo2(user_prompt),  # Step-by-step management process
        "Combo 3 – Field vs. Hospital Care": generate_llama_response_combo3(user_prompt),# Compare field and hospital/rehab care
        "Combo 4 – Checklist Format": generate_llama_response_combo4(user_prompt),       # Checklist of precautions and treatments
        "Combo 5 – Patient Education Style": generate_llama_response_combo5(user_prompt),# Simple explanation for patients/non-medical audience
    }

    # Iterate over each key-value pair in the results dictionary
    for title, response in results.items():
        # Print the section header, the generated response, and a separator line for clarity
        print(f"\n=== {title} ===\n{response}\n{'=' * 60}")

### Run the code

In [46]:
# Assign the leg fracture scenario question to a variable
question = "What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?"

# Call the function to run all Llama response combinations for leg fracture using the question as input
run_all_llama_response_combos_legfracture(question)

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit



=== Combo 1 – Medical Summary ===
 Emergency Response:
* Call for emergency medical services (EMS) if the person is unable to move or is experiencing severe pain.
* Immobilize the injured leg using a splint or makeshift immobilizing device to prevent further injury.
* Keep the affected area at or below the level of the heart to reduce swelling.
* Monitor vital signs and provide basic first aid as needed.
Treatment Steps:
* Transport the person to a medical facility for proper evaluation and treatment.
* Administer pain medication, such as ibuprofen or acetaminophen, as directed by a medical professional.
* Apply a cast or brace to immobilize the leg and promote healing.
* Monitor for signs of infection, such as redness, swelling, or increased pain.
* Follow up with a medical professional for further evaluation and treatment.
Recovery Considerations:
* Encourage the person to rest and avoid putting weight on the injured leg during the healing process.
* Provide emotional support and re

## Data Preparation for RAG

### Loading the data file provided

### Connect to the google drive

In [47]:
# Mount the google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Load the file

In [48]:

# Corrected path to access the file from Google Drive
pdf_file = "medical_diagnosis_manual.pdf"
loader = PyMuPDFLoader(pdf_file)


### Data Overview

#### Checking the first 5 pages

In [50]:
documents = loader.load()
for i in range(min(5, len(documents))):
    # Print the page number from the document's metadata
    print(f"--- Page {documents[i].metadata['page']} ---")
    # Print the first 500 characters of the page content for preview
    print(documents[i].page_content[:500])
    print("\n")

--- Page 0 ---
vamseekrishna11@hotmail.com
RSKN0VL9FW
for personal use by vamseekrishna11@
shing the contents in part or full is liable 



--- Page 1 ---
vamseekrishna11@hotmail.com
RSKN0VL9FW
This file is meant for personal use by vamseekrishna11@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.



--- Page 2 ---
Table of Contents
1
Front    ................................................................................................................................................................................................................
1
Cover    .......................................................................................................................................................................................................
2
Front Matter    .................................


--- Page 3 ---
491
Chapter 44. Foot & Ankle Disorders    ............................................................................

#### Checking the number of pages

In [51]:
# Print the total number of pages by getting the length of the 'documents' list
print(f"Total number of pages: {len(documents)}")

Total number of pages: 4114


### Data Chunking

In [57]:
# Data Chunking

# Initialize a RecursiveCharacterTextSplitter object using the tiktoken encoder 'cl100k_base',
# with a maximum chunk size of 512 tokens and an overlap of 16 tokens between chunks.
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base', # Specify the tiktoken encoding to use for token counting.
    chunk_size=512,              # Set the maximum number of tokens per chunk.
    chunk_overlap=16             # Set the number of tokens to overlap between consecutive chunks.
)

# Split the loaded documents into smaller chunks using the text splitter.
chunks = text_splitter.split_documents(documents)

# Print the total number of chunks generated.
print(f"Number of chunks: {len(chunks)}")

# Print the content of the first chunk for a quick preview.
print(f"First chunk:\n{chunks[0].page_content}\n")

# Print out the first five chunks for inspection.
for i, chunk in enumerate(chunks[:5]):
    print(f"--- Chunk {i+1} ---")   # Print the current chunk number.
    print(chunk)                    # Print the chunk object (which may include text and metadata).
    print("\n")                     # Add a newline for readability between chunk outputs.

Number of chunks: 8473
First chunk:
vamseekrishna11@hotmail.com
RSKN0VL9FW
for personal use by vamseekrishna11@
shing the contents in part or full is liable

--- Chunk 1 ---
page_content='vamseekrishna11@hotmail.com\nRSKN0VL9FW\nfor personal use by vamseekrishna11@\nshing the contents in part or full is liable' metadata={'source': 'medical_diagnosis_manual.pdf', 'file_path': 'medical_diagnosis_manual.pdf', 'page': 0, 'total_pages': 4114, 'format': 'PDF 1.7', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'author': '', 'subject': '', 'keywords': '', 'creator': 'Atop CHM to PDF Converter', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20250614133033Z', 'trapped': ''}


--- Chunk 2 ---
page_content='vamseekrishna11@hotmail.com\nRSKN0VL9FW\nThis file is meant for personal use by vamseekrishna11@hotmail.com only.\nSharing or publishing the contents in part or full is liable for legal action.' metadata={'sour

### Embedding

### Choosing Embedded model:

"all-MiniLM-L6-v2" is one of the best all-round sentence embedding models for performance, speed, and memory.
For larger projects and highest accuracy, you might consider "all-mpnet-base-v2", but "all-MiniLM-L6-v2" is often preferred for its speed and low resource use.
You can swap "all-MiniLM-L6-v2" for another model as needed.

In [58]:
# Choose a strong, widely used all-purpose model
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Creating a Vector Database and the Retriever

In [64]:

# Add the data into vector DB

# Create the Chroma vector database from your document chunks using the embedding model
vector_db = Chroma.from_documents(chunks, embedding_model)

# Create a retriever from the Chroma vector database with appropriate search method and k value
retriever = vector_db.as_retriever(
    search_type="similarity",   # Use similarity search for most retrieval tasks
    search_kwargs={"k": 5}      # Return the top 5 most similar documents
)

# Now, `retriever` can be used to retrieve relevant documents based on a query

# Example of using the retriever (replace with your actual query)
# query = "What is appendicitis?"
# relevant_docs = retriever.get_relevant_documents(query)
# print(relevant_docs)

print("Vector database created and populated with document chunks.")
print("Retriever created.")


ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Vector database created and populated with document chunks.
Retriever created.


### Retriever Testing for 5 similar responses for the given query

In [65]:

# Example query to test the retriever
query = "What are the common symptoms and treatments for pulmonary embolism?"

# Use the retriever to get relevant documents for the query
relevant_docs = retriever.get_relevant_documents(query)

# Print the retrieved documents
print("Relevant documents retrieved by the retriever:")

# Print the number of relevant documents retrieved
print("Length of relevant documents: ")
print(len(relevant_docs))

# Loop through the retrieved documents and print their content
for i, doc in enumerate(relevant_docs):
    print(f"--- Document {i+1} ---")      # Print the document number
    print(doc.page_content)               # Print the content of the document
    print("\n")                           # Print a newline for readability between documents

Relevant documents retrieved by the retriever:
Length of relevant documents: 
5
--- Document 1 ---
Chapter 194. Pulmonary Embolism
Introduction
Pulmonary embolism (PE) is the occlusion of ≥ 1 pulmonary arteries by thrombi that originate
elsewhere, typically in the large veins of the lower extremities or pelvis. Risk factors are
conditions that impair venous return, conditions that cause endothelial injury or dysfunction,
and underlying hypercoagulable states. Symptoms are nonspecific and include dyspnea,
pleuritic chest pain, cough, and, in severe cases, syncope or cardiorespiratory arrest. Signs are
also nonspecific and may include tachypnea, tachycardia, hypotension, and a loud pulmonic
component of the 2nd heart sound. Diagnosis is based on a CT angiogram, ventilation/perfusion
scan, or a pulmonary arteriogram. Treatment is with anticoagulants and, sometimes, clot
dissolution with thrombolytics or surgical removal. Preventive measures include anticoagulants
and sometimes insertion o

# Loading of LLM for RAG

In the **generation part** of **Retrieval-Augmented Generation (RAG)**, an **LLM (Large Language Model)** is needed because it effectively understands complex queries and generates **coherent, context-aware responses** in natural language. It seamlessly **integrates and synthesizes** the retrieved context with the user's query, ensuring accurate and relevant answers. LLMs handle **ambiguous or complex questions** by reasoning over the retrieved knowledge, generating **fluent and human-like text** for a better user experience.


### Downloading and loading the LLM

In [68]:
from huggingface_hub import hf_hub_download   # Import the function to download models and files from the Hugging Face Hub

from llama_cpp import Llama                   # Import the Llama class for running Llama models locally.
                                              # Note: This import may throw an error if you have installed the GPU version of llama_cpp
                                              # but have not properly enabled or configured your GPU.

In [69]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"         # Specify the Hugging Face repository or local path for the Llama-2-13B-chat model in GGUF format
model_basename = "llama-2-13b-chat.Q5_K_M.gguf"               # Specify the filename of the quantized model to use (Q5_K_M quantization, GGUF format)

In [73]:
model_path = hf_hub_download(                      # Call the hf_hub_download function to download a file from the Hugging Face Hub.
    repo_id=model_name_or_path,                    # Specify the repository ID or path (e.g., "TheBloke/Llama-2-13B-chat-GGUF") where the model is hosted.
    filename=model_basename                        # Specify the exact filename of the model weights to download (e.g., "llama-2-13b-chat.Q5_K_M.gguf").
)                                                  # The function returns the local file path to the downloaded model file and assigns it to model_path.

### System and User Prompt Template

### Response Function

In [66]:
def generate_rag_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=k)
    context_list = [d.page_content for d in relevant_document_chunks]

    # Combine document chunks into a single context
    context_for_query = ". ".join(context_list)

    user_message = qna_user_message_template.replace('{context}', context_for_query)
    user_message = user_message.replace('{question}', user_input)

    prompt = qna_system_message + '\n' + user_message

    # Generate the response
    try:
        response = llm(
                  prompt=prompt,
                  max_tokens=max_tokens,
                  temperature=temperature,
                  top_p=top_p,
                  top_k=top_k
                  )

        # Extract and print the model's response
        response = response['choices'][0]['text'].strip()
    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response

## Question Answering using RAG

### Query 1: What is the protocol for managing sepsis in a critical care unit?

### Query 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?

### Query 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?

### Query 4:  What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?

### Query 5: What are the necessary precautions and treatment steps for a person who has fractured their leg during a hiking trip, and what should be considered for their care and recovery?

### Fine-tuning

## Output Evaluation

In [None]:
groundedness_rater_system_message  = ""

In [None]:
relevance_rater_system_message = ""

In [None]:
user_message_template = ""

In [None]:
def generate_ground_relevance_response(user_input,k=3,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    global qna_system_message,qna_user_message_template
    # Retrieve relevant document chunks
    relevant_document_chunks = retriever.get_relevant_documents(query=user_input,k=3)
    context_list = [d.page_content for d in relevant_document_chunks]
    context_for_query = ". ".join(context_list)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=context_for_query, question=user_input)}
                [/INST]"""

    response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    answer =  response["choices"][0]["text"]

    # Combine user_prompt and system_message to create the prompt
    groundedness_prompt = f"""[INST]{groundedness_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    # Combine user_prompt and system_message to create the prompt
    relevance_prompt = f"""[INST]{relevance_rater_system_message}\n
                {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=answer)}
                [/INST]"""

    response_1 = llm(
            prompt=groundedness_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    response_2 = llm(
            prompt=relevance_prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            stop=['INST'],
            )

    return response_1['choices'][0]['text'],response_2['choices'][0]['text']

## Actionable Insights and Business Recommendations

<font size=6 color='blue'>Power Ahead</font>
___