# AAMR - Data Analysis

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.10.13

## Overview

Analyze Data requirements for AAMR Model development

### Objective

Identify required data elements for training

### Dataset

For demo purposes, we are planning to utilize csv dataset

### Costs 

This tutorial uses billable components of Google Cloud:

* NA


## Data Analysis

We are planning to utilize two datasets.

1. One dataset with list of medications along with their dose, ingradients informations. We can consider Medi-span as data source for this
2. Patient current medications




In [3]:
%%capture
import pandas as pd
import numpy as np

# Make numpy values easier to read.
np.set_printoptions(precision=3, suppress=True)

import tensorflow as tf
from tensorflow.keras import layers

### Medication Dataset

Consider this master dataset where recommendations are pulled from.

Below is sample data elements for medications.

##### Medication Id, Medication Name, Medication Ingradient, Medication Dose, form / dispensible

In [12]:
med_data = pd.read_csv(
    "data/Medicine_Details.csv",
    names=["Medicine Name","Composition","Uses","Side_effects","Image URL","Manufacturer","Excellent Review %","Average Review %","Poor Review %"])

med_data.head()

Unnamed: 0,Medicine Name,Composition,Uses,Side_effects,Image URL,Manufacturer,Excellent Review %,Average Review %,Poor Review %
0,Medicine Name,Composition,Uses,Side_effects,Image URL,Manufacturer,Excellent Review %,Average Review %,Poor Review %
1,Avastin 400mg Injection,Bevacizumab (400mg),Cancer of colon and rectum Non-small cell lun...,Rectal bleeding Taste change Headache Noseblee...,"https://onemg.gumlet.io/l_watermark_346,w_480,...",Roche Products India Pvt Ltd,22,56,22
2,Augmentin 625 Duo Tablet,Amoxycillin (500mg) + Clavulanic Acid (125mg),Treatment of Bacterial infections,Vomiting Nausea Diarrhea Mucocutaneous candidi...,"https://onemg.gumlet.io/l_watermark_346,w_480,...",Glaxo SmithKline Pharmaceuticals Ltd,47,35,18
3,Azithral 500 Tablet,Azithromycin (500mg),Treatment of Bacterial infections,Nausea Abdominal pain Diarrhea,"https://onemg.gumlet.io/l_watermark_346,w_480,...",Alembic Pharmaceuticals Ltd,39,40,21
4,Ascoril LS Syrup,Ambroxol (30mg/5ml) + Levosalbutamol (1mg/5ml)...,Treatment of Cough with mucus,Nausea Vomiting Diarrhea Upset stomach Stomach...,"https://onemg.gumlet.io/l_watermark_346,w_480,...",Glenmark Pharmaceuticals Ltd,24,41,35


### Patient Medications Dataset

This is actual medication patient is currently getting based on their current CKD Stage.
For example:
CKD Stage-1 patients will have following medications that are currently used on daily basis.

We making as assumption current medications are best medications based on doctor recommendations.

There may be other criterias like patient diabetic status and patient past medical history that could potentially impact current medication. 
For demo sake we are not considering those additional feature attributes for training.


Below is sample data elements for patient medications.

##### Patient Id, Patient Medication Id, Patient Medication Dose, Patient medication form, Patient medication Ingradient, Patient CKD Stage

In [43]:
patient_health_data = pd.read_csv(
    "data/patients_health_data.csv",
    names=["Diabetic","AlcoholLevel","HeartRate","BloodOxygenLevel","BodyTemperature","Weight","MRI_Delay","Prescription","Dosage in mg","Age","Education_Level","Dominant_Hand","Gender","Family_History","Smoking_Status","APOE_E4","Physical_Activity","Depression_Status","Cognitive_Test_Scores","Medication_History","Nutrition_Diet","Sleep_Quality","Chronic_Health_Conditions","Dementia"])

patient_health_data.head()

Unnamed: 0,Diabetic,AlcoholLevel,HeartRate,BloodOxygenLevel,BodyTemperature,Weight,MRI_Delay,Prescription,Dosage in mg,Age,...,Smoking_Status,APOE_E4,Physical_Activity,Depression_Status,Cognitive_Test_Scores,Medication_History,Nutrition_Diet,Sleep_Quality,Chronic_Health_Conditions,Dementia
0,1,0.084974,98,96.230743,36.224852,57.563978,36.421028,Galantamine,12.0,60,...,Current Smoker,Negative,Sedentary,No,10,No,Low-Carb Diet,Poor,Diabetes,0
1,0,0.016973,78,93.032122,36.183874,56.832335,31.157633,Galantamine,12.0,61,...,Former Smoker,Positive,Moderate Activity,No,1,Yes,Low-Carb Diet,Poor,Heart Disease,1
2,0,0.009,89,93.566504,37.326321,59.759066,37.640435,Galantamine,12.0,69,...,Former Smoker,Negative,Moderate Activity,No,8,No,Mediterranean Diet,Poor,Heart Disease,0
3,0,0.086437,60,93.90651,37.03062,58.266471,50.673992,Donepezil,23.0,78,...,Never Smoked,Negative,Mild Activity,Yes,5,Yes,Balanced Diet,Poor,Hypertension,1
4,1,0.150747,67,97.508994,36.062121,67.705027,27.810601,Memantine,20.0,77,...,Never Smoked,Positive,Mild Activity,No,0,Yes,Low-Carb Diet,Good,Diabetes,1


## Load csv data to tf dataset

### Dataloading using tf tensor slices using panda dataset

In [40]:
med_dataset = tf.data.Dataset.from_tensor_slices(dict(med_data))

In [29]:
patient_dataset = tf.data.Dataset.from_tensor_slices(dict(patient_health_data))

### Dataloading using tf experimental make csv dataset

In [36]:
titanic_file_path = tf.keras.utils.get_file("med_data.csv", "https://storage.cloud.google.com/vc-model-training/data/med_data.csv")

Downloading data from https://storage.cloud.google.com/vc-model-training/data/med_data.csv
   8192/Unknown - 0s 0us/step

In [37]:
titanic_file_path

'/home/jupyter/.keras/datasets/med_data.csv'

In [None]:
# No label column specified
dataset = tf.data.experimental.make_csv_dataset(titanic_file_path, batch_size=2)
iterator = dataset.as_numpy_iterator()
print(dict(next(iterator)))
# prints a dictionary of batched features:
# OrderedDict([('Feature_A', array([1, 4], dtype=int32)),
#              ('Feature_B', array([b'a', b'd'], dtype=object))])