
# Problem Statement

## Enhancing Data-Driven Decision Making in Healthcare Using RAG (Retrieval-Augmented Generation)



### Objectives
1. Understand why RAG is critical for modern data-driven healthcare analytics.
2. Implement the steps of RAG for handling HCP data efficiently.
   


## Data Description

### Dataset Overview
The dataset contains patient information and visit records from healthcare providers. Below is a description of each column in the dataset:

- **patient_id**: Unique identifier for each patient.
- **name**: Name of the patient.
- **age**: Age of the patient.
- **gender**: Gender of the patient.
- **medical_condition**: Primary medical condition diagnosed for the patient.
- **current_medications**: Medications currently being taken by the patient.
- **visit_id**: Unique identifier for each visit.
- **visit_date**: Date of the patient's visit to the healthcare provider.
- **problem_description**: Description of the medical problem or symptoms reported by the patient during the visit.
- **doctor_notes**: Notes provided by the doctor during the visit, including any recommendations or follow-ups.
- **tests_ordered**: Medical tests that were ordered by the doctor during the visit.
- **test_results**: Results of the tests ordered, including key measurements and findings.

---


In [1]:
import pandas as pd
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss
import re

In [2]:
# Load the CSV data
file_path = r"patient_records_50k.csv"
data = pd.read_csv(file_path)
data = data.iloc[:500,:]
data.head()

Unnamed: 0,patient_id,name,age,gender,medical_conditions,current_medications,visit_id,visit_date,problem_description,doctor_notes,tests_ordered,test_results
0,P00000,Damon Moore,38,Male,Hypertension,Vitamin D; Lisinopril; Albuterol; Levothyroxine,V0001,2024-08-13,Muscle Cramps,Recommend tests: Lisinopril. | Encourage a Lev...,Blood Pressure Monitoring,Blood Pressure Monitoring: 5.1
1,P00000,Damon Moore,38,Male,Hypertension,Vitamin D; Lisinopril; Albuterol; Levothyroxine,V0002,2024-01-04,Fatigue,Recommend tests: Albuterol. | Increase dosage ...,Blood Pressure Monitoring; Lipid Profile; Elec...,Blood Pressure Monitoring: 8.78; Lipid Profile...
2,P00000,Damon Moore,38,Male,Hypertension,Vitamin D; Lisinopril; Albuterol; Levothyroxine,V0003,2024-05-25,Joint Pain,Schedule a follow-up in Atorvastatin weeks for...,Electrolyte Panel,Electrolyte Panel: 6.93
3,P00000,Damon Moore,38,Male,Hypertension,Vitamin D; Lisinopril; Albuterol; Levothyroxine,V0004,2024-07-01,Fatigue,Recommend tests: Albuterol. | Encourage a Lisi...,Nerve Conduction Test; Lipid Profile; Electrol...,Nerve Conduction Test: 1.21; Lipid Profile: 8....
4,P00000,Damon Moore,38,Male,Hypertension,Vitamin D; Lisinopril; Albuterol; Levothyroxine,V0005,2024-09-26,Numbness in Feet,Patient shows signs of Atorvastatin. Recommend...,Lipid Profile; Electrolyte Panel,Lipid Profile: 8.52; Electrolyte Panel: 5.5


In [3]:
import pandas as pd
import re

def preprocess_text(text):
    """
    Function to preprocess text by removing special characters and converting to lowercase.
    """
    if not isinstance(text, str):
        text = ''  # Convert non-string values to empty strings
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    text = text.lower().strip()
    return text

# Combine relevant text columns for vectorization
data['combined_text'] = (
    data['medical_conditions'].fillna('') + ' ' +
    data['current_medications'].fillna('') + ' ' +
    data['problem_description'].fillna('') + ' ' +
    data['doctor_notes'].fillna('')
)

data['combined_text'] = data['combined_text'].apply(preprocess_text)
print(data['combined_text'].head())


0    hypertension vitamin d lisinopril albuterol le...
1    hypertension vitamin d lisinopril albuterol le...
2    hypertension vitamin d lisinopril albuterol le...
3    hypertension vitamin d lisinopril albuterol le...
4    hypertension vitamin d lisinopril albuterol le...
Name: combined_text, dtype: object


In [4]:
# Step 2: Vectorization using Embedding Model
# Using SentenceTransformer for embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(data['combined_text'].tolist(), convert_to_tensor=False)


In [5]:
# Step 3: Building the Vector Store
# Create a FAISS index
vector_dimension = embeddings[0].shape[0]
faiss_index = faiss.IndexFlatL2(vector_dimension)

# Convert embeddings to float32 for FAISS compatibility
embeddings = np.array(embeddings).astype('float32')

# Add embeddings to the FAISS index
faiss_index.add(embeddings)

In [6]:
# Step 4: Implementing Search Capabilities
def search_similar_records(query, top_k=5):
    """
    Function to search for similar patient records based on a query.
    """
    # Preprocess and vectorize the query
    query = preprocess_text(query)
    query_embedding = model.encode([query], convert_to_tensor=False).astype('float32')

    # Search the FAISS index
    distances, indices = faiss_index.search(query_embedding, top_k)

    # Retrieve the corresponding patient records
    results = data.iloc[indices[0]]
    return results


In [7]:
# Example usage of the search function
query = "What are test recommended dugin Frequent Urination"
similar_records = search_similar_records(query)
print(similar_records[['doctor_notes']])


                                          doctor_notes
307  Increase dosage of Metformin to 13 units/day. ...
310  Encourage a Aspirin diet and regular 20 exerci...
207  Patient shows signs of Insulin. Recommend 9. |...
42   Increase dosage of Vitamin D to 15 units/day. ...
130  Recommend tests: Atorvastatin. | Schedule a fo...


In [8]:
similar_records[['doctor_notes']].iloc[0,0]

'Increase dosage of Metformin to 13 units/day. | Encourage a Metformin diet and regular 13 exercise. | Recommend tests: Lisinopril.'

In [9]:
query ="What tests are recommended for patients with chronic back pain and numbness?"
similar_records = search_similar_records(query)
similar_records[['doctor_notes']].iloc[0,0]

'Patient shows signs of Metformin. Recommend 12. | Increase dosage of Albuterol to 9 units/day. | Recommend tests: Losartan.'

In [10]:
# import secrets

# def generate_api_key():
#     return secrets.token_hex(32)

# api_key = generate_api_key()
# print(f"Generated API Key: {api_key}")


Generated API Key: 8e4a91697ccb37c65c3af626012e2dd4356023183e199dea167d542ee5234820


In [12]:
from langchain_community.llms import Together
import os
os.environ['TOGETHER_API_KEY']= "d6117b27b27e38d385e8c72da91dbeec1e20f5a7741f705c5e591b613d07b989"
llm=Together(model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo")

  llm=Together(model="meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo")


In [13]:
def generate_answer(query):
    """
    Function to generate an answer using the Together LLaMA model based on the retrieved patient records.
    """
    # Retrieve similar records
    similar_records = search_similar_records(query)
    global context
    # Prepare context for generation
    context = "\n".join(similar_records['combined_text'].tolist())
    template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say you don't know, don't try to make up an answer.
    Your answer should be helpful and informative, but not too long.
    The answer should explain the concept in a way that is easy to understand for someone who is not an expert in the field.
    The answer should explain the concept in one or two lines and if needed other aspects in bullet points.
    Use bullet point sentences maximum and keep the answer as concise as possible.
    
    {context}
    
    Question: {question}
    
    Helpful Answer:"""
    prompt = template.format(context=context, question=query)
    
    # Generate answer using Together model
    response = llm.generate([prompt]).generations[0][0].text
    return response

In [14]:
# Example usage of the RAG pipeline
query = "What tests are recommended for patients with numbness?"
answer = generate_answer(query)
print(answer)


 The recommended tests for patients with numbness in feet include:
    • Vitamin D test
    • Blood tests to check for anemia, hypertension, and hypothyroidism
    • Kidney function tests for patients with chronic kidney disease
    • Blood glucose tests for patients with insulin and metformin
    • Lipid profile tests for patients with atorvastatin
    • Thyroid function tests for patients with levothyroxine
    • Electrolyte tests for patients with lisinopril and losartan
    • Complete blood count (CBC) test for patients with amlodipine and aspirin
    • Pulmonary function tests for patients with albuterol
    
    Note: These tests are recommended based on the provided context and may not be a comprehensive list of tests for patients with numbness in feet. A healthcare professional should be consulted for a thorough evaluation and diagnosis.
