# Code Challenge - Una Health

## ⭐ Data

We've prepared sample data for three sample patients: `aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa`, `bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb` and `cccccccc-cccc-cccc-cccc-cccccccccccc`. Each patient is identified by a [universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier) (UUID).

The data consists of:

- A `csv` file with the historic blood glucose levels of the patient: `levels_all.csv`. We're interested in blood glucose readings which have an `Aufzeichnungstyp` of either `0` or `1` (those appear in different columns because they are different types, automatically collected every 15 minutes vs. manual scanned by the patient, but are readings from the same sensor and thus should be treated as such). The blood glucose reading is noted in `Glukosewert-Verlauf mg/dL` or `Glukose-Scan mg/dL`. All timestamps noted in this file are UTC.
- A `csv` file with the tracked meals of the patient: `activities_all.csv`. Each meal is identified by a UUID.

## 📉 Tasks
- **Visualise** - Create plots for the historic blood glucose level by each patient and the historic blood glucose level after meals (we usually look at `timestamp_start` of the meal + 3 hours worth of data). Feel free to group and slice the data for the individual meals as you see fit.

- **Interpret** - What conclusions can you draw when looking at the combined data of historic blood glucose levels and tracked meals for an individual patient and for certain meal types of an individual patient? What clusters (if any) do you find? What additional information (if any) do you need? What clustering methods would you apply?

In [1]:
#Libraries
import os
import pandas as pd
import numpy as np
from scipy import stats
import plotly.express as px
import dateutil.parser
from datetime import datetime, timezone, timedelta

In [2]:
#Data loading
data_path = 'data'
a_UUID = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'
b_UUID = 'bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb'
c_UUID = 'cccccccc-cccc-cccc-cccc-cccccccccccc'
a_activities_all = pd.read_csv(f'{data_path}/{a_UUID}/activities_all.csv', header=0).dropna(axis=1, how='all')
b_activities_all = pd.read_csv(f'{data_path}/{b_UUID}/activities_all.csv', header=0).dropna(axis=1, how='all')
c_activities_all = pd.read_csv(f'{data_path}/{c_UUID}/activities_all.csv', header=0).dropna(axis=1, how='all')
a_levels_all = pd.read_csv(f'{data_path}/{a_UUID}/levels_all.csv', header=1).dropna(axis=1, how='all')
b_levels_all = pd.read_csv(f'{data_path}/{b_UUID}/levels_all.csv', header=1).dropna(axis=1, how='all')
c_levels_all = pd.read_csv(f'{data_path}/{c_UUID}/levels_all.csv', header=1).dropna(axis=1, how='all')

In [3]:
#Check loading

In [4]:
a_activities_all.head()

Unnamed: 0,id,user_id,record_type,description,timestamp_start,timestamp_end
0,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa00,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa,MEAL_BREAKFAST,"40 g Haferflocken, 230 g Joghurt 0,3 %, 90 g B...",2021-02-15T08:30:00+01:00,
1,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa01,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa,MEAL_LUNCH,"98 g M�hren-Walnuss-VK-Brot, 87 g Gurke, 55 g ...",2021-02-15T12:45:00+01:00,
2,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa02,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa,MEAL_SNACK,"Mandarine, Teel�ffel Erdnussmu�",2021-02-15T16:15:00+01:00,
3,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa03,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa,ACTVITY_EASY,spazieren,2021-02-15T17:00:00+01:00,2021-02-15T17:30:00+01:00
4,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa04,aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa,MEAL_DINNER,"50 g BasmatiVollkorn Reis, 20g Currypaste, 15 ...",2021-02-15T19:30:00+01:00,


In [5]:
b_activities_all.head()

Unnamed: 0,id,user_id,record_type,description,timestamp_start
0,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb00,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,MEAL_BREAKFAST,2 Pott Kaffe +Zucker,2021-02-19T06:30:00+01:00
1,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb01,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,MEAL_BREAKFAST,Gem�sesuppe instant,2021-02-19T07:30:00+01:00
2,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb02,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,MEAL_SNACK,Gem�sesuppe instant + Gr�ner Tee ohne alles,2021-02-19T09:00:00+01:00
3,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb03,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,DRINK,Kaffee ohne alles,2021-02-19T10:00:00+01:00
4,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbb04,bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb,MEAL_LUNCH,"Kartoffeln (150 g), Quark 250 g, Bohnen 250g ,...",2021-02-19T12:45:00+01:00


In [6]:
c_activities_all.hea()

Unnamed: 0,id,user_id,record_type,description,timestamp_start
0,cccccccc-cccc-cccc-cccc-cccccccccc00,cccccccc-cccc-cccc-cccc-cccccccccccc,MEAL_BREAKFAST,"3 Vollkorntoast, 75g Schmelzk�se, 35g Butter, ...",2021-02-17T06:00:00+01:00
1,cccccccc-cccc-cccc-cccc-cccccccccc01,cccccccc-cccc-cccc-cccc-cccccccccccc,MEAL_LUNCH,"350g Spinat,250g Kartoffeln, 2 Spiegeleier, 40...",2021-02-17T13:45:00+01:00
2,cccccccc-cccc-cccc-cccc-cccccccccc02,cccccccc-cccc-cccc-cccc-cccccccccccc,MEAL_DINNER,"1 Vollkornbrot, 25g Butter, 65g Romadur, 20g E...",2021-02-17T19:45:00+01:00
3,cccccccc-cccc-cccc-cccc-cccccccccc03,cccccccc-cccc-cccc-cccc-cccccccccccc,MEAL_LUNCH,"140g gebackener Leberk�se, 150g Erbsen-M�hren,...",2021-02-19T11:45:00+01:00
4,cccccccc-cccc-cccc-cccc-cccccccccc04,cccccccc-cccc-cccc-cccc-cccccccccccc,MEAL_DINNER,"1 Vollkornbr�tchen, 70g Leberk�se, 100g Mixed ...",2021-02-19T18:30:00+01:00


In [7]:
a_levels_all.head()

Unnamed: 0,Gerät,Seriennummer,Gerätezeitstempel,Aufzeichnungstyp,Glukosewert-Verlauf mg/dL,Glukose-Scan mg/dL
0,FreeStyle LibreLink,1D48A10E-DDFB-4888-8158-026F08814832,18-02-2021 10:57,0,77.0,
1,FreeStyle LibreLink,1D48A10E-DDFB-4888-8158-026F08814832,18-02-2021 11:12,0,78.0,
2,FreeStyle LibreLink,1D48A10E-DDFB-4888-8158-026F08814832,18-02-2021 11:27,0,78.0,
3,FreeStyle LibreLink,1D48A10E-DDFB-4888-8158-026F08814832,18-02-2021 11:42,0,76.0,
4,FreeStyle LibreLink,1D48A10E-DDFB-4888-8158-026F08814832,18-02-2021 11:57,0,75.0,


In [17]:
b_levels_all.head()

Unnamed: 0,Gerät,Seriennummer,Gerätezeitstempel,Aufzeichnungstyp,Glukosewert-Verlauf mg/dL,Glukose-Scan mg/dL
0,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,2021-02-10 09:08:00+00:00,0,138.0,
1,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,2021-02-10 09:25:00+00:00,0,139.0,
2,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,2021-02-10 09:40:00+00:00,0,139.0,
3,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,2021-02-10 09:55:00+00:00,0,138.0,
4,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,2021-02-10 10:10:00+00:00,0,140.0,


In [9]:
c_levels_all.head()

Unnamed: 0,Gerät,Seriennummer,Gerätezeitstempel,Aufzeichnungstyp,Glukosewert-Verlauf mg/dL,Glukose-Scan mg/dL
0,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,10-02-2021 09:08,0,138.0,
1,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,10-02-2021 09:25,0,139.0,
2,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,10-02-2021 09:40,0,139.0,
3,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,10-02-2021 09:55,0,138.0,
4,FreeStyle LibreLink,e09bb0f0-018b-429b-94c7-62bb306a0136,10-02-2021 10:10,0,140.0,


# Data normalization
## 1 - Convert UTC+1 to UTC

In [10]:
def iso_to_utc_date(df, col):
    return [dateutil.parser.isoparse(date).astimezone(timezone.utc) for date in df[f'{col}']] 

In [11]:
a_activities_all["timestamp_start"].head()

0     2021-02-15T08:30:00+01:00
1     2021-02-15T12:45:00+01:00
2     2021-02-15T16:15:00+01:00
3     2021-02-15T17:00:00+01:00
4     2021-02-15T19:30:00+01:00
5     2021-02-17T08:15:00+01:00
6     2021-02-17T08:30:00+01:00
7     2021-02-17T12:15:00+01:00
8     2021-02-17T16:00:00+01:00
9     2021-02-17T19:30:00+01:00
10    2021-02-19T08:30:00+01:00
11    2021-02-19T12:00:00+01:00
12    2021-02-19T15:45:00+01:00
13    2021-02-19T19:00:00+01:00
Name: timestamp_start, dtype: object

In [12]:
a_activities_all["timestamp_start"] = iso_to_utc_date(a_activities_all, "timestamp_start")
b_activities_all["timestamp_start"] = iso_to_utc_date(b_activities_all, "timestamp_start")
c_activities_all["timestamp_start"] = iso_to_utc_date(c_activities_all, "timestamp_start")

In [13]:
a_activities_all["timestamp_start"].head()

0    2021-02-15 07:30:00+00:00
1    2021-02-15 11:45:00+00:00
2    2021-02-15 15:15:00+00:00
3    2021-02-15 16:00:00+00:00
4    2021-02-15 18:30:00+00:00
5    2021-02-17 07:15:00+00:00
6    2021-02-17 07:30:00+00:00
7    2021-02-17 11:15:00+00:00
8    2021-02-17 15:00:00+00:00
9    2021-02-17 18:30:00+00:00
10   2021-02-19 07:30:00+00:00
11   2021-02-19 11:00:00+00:00
12   2021-02-19 14:45:00+00:00
13   2021-02-19 18:00:00+00:00
Name: timestamp_start, dtype: datetime64[ns, UTC]

## 2 - Convert dd-mm-yyyy to UTC

In [14]:
def str_to_utc(df, col):
    return [datetime.strptime(date, "%d-%m-%Y %H:%M").replace(tzinfo=timezone.utc) for date in df[f'{col}']]

In [15]:
a_levels_all["Gerätezeitstempel"] = str_to_utc(a_levels_all, "Gerätezeitstempel")
b_levels_all["Gerätezeitstempel"] = str_to_utc(b_levels_all, "Gerätezeitstempel")
c_levels_all["Gerätezeitstempel"] = str_to_utc(c_levels_all, "Gerätezeitstempel")

In [16]:
a_levels_all["Gerätezeitstempel"]

0      2021-02-18 10:57:00+00:00
1      2021-02-18 11:12:00+00:00
2      2021-02-18 11:27:00+00:00
3      2021-02-18 11:42:00+00:00
4      2021-02-18 11:57:00+00:00
                  ...           
1194   2021-02-18 09:42:00+00:00
1195   2021-02-18 09:57:00+00:00
1196   2021-02-18 10:12:00+00:00
1197   2021-02-18 10:27:00+00:00
1198   2021-02-18 10:42:00+00:00
Name: Gerätezeitstempel, Length: 1199, dtype: datetime64[ns, UTC]

# Visualization

In [None]:
#Get record types
report_type = pd.concat([a_activities_all["record_type"], b_activities_all["record_type"], c_activities_all["record_type"]]).factorize()
report_type