### Imports

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("../data/CS_FRMGHAM2_CLEAN_TRAIN.csv")
df.head()

Unnamed: 0,RANDID,TOTCHOL,SYSBP,BMI,DIABETES,PREVSTRK,PREVHYP,PERIOD,ANYCHD,GLUCOSE
0,2448,195,106.0,26.97,No,0,0,1,Yes,77
1,6238,250,121.0,28.73,No,0,0,1,No,76
2,6238,260,105.0,29.43,No,0,0,2,No,86
3,6238,237,108.0,28.5,No,0,0,3,No,71
4,9428,245,127.5,25.34,No,0,0,1,No,70


### Transformation

Drop irrelevant columns.

In [3]:
df = df.drop(["TOTCHOL", "SYSBP", "BMI", "GLUCOSE"], axis=1)
df.head()

Unnamed: 0,RANDID,DIABETES,PREVSTRK,PREVHYP,PERIOD,ANYCHD
0,2448,No,0,0,1,Yes
1,6238,No,0,0,1,No
2,6238,No,0,0,2,No
3,6238,No,0,0,3,No
4,9428,No,0,0,1,No


Patients (`RANDID`) were represented in multiple rows based on examination period (`PERIOD`). 

In [4]:
df.sort_values("RANDID").head(10)

Unnamed: 0,RANDID,DIABETES,PREVSTRK,PREVHYP,PERIOD,ANYCHD
0,2448,No,0,0,1,Yes
1,6238,No,0,0,1,No
2,6238,No,0,0,2,No
3,6238,No,0,0,3,No
4,9428,No,0,0,1,No
5,9428,No,0,0,2,No
6,10552,No,0,1,1,No
7,10552,No,0,1,2,No
8,11252,No,0,0,1,No
9,11252,No,0,0,2,No


To prepare for the transformation to transactions, each patient will be treated as a transaction. Other attributes will be aggregated with their maximum value to reflect the values from the latest examination period.

In [5]:
df_agg = df.groupby("RANDID").max().reset_index()
df_agg.head()

Unnamed: 0,RANDID,DIABETES,PREVSTRK,PREVHYP,PERIOD,ANYCHD
0,2448,No,0,0,1,Yes
1,6238,No,0,0,3,No
2,9428,No,0,0,2,No
3,10552,No,0,1,2,No
4,11252,No,0,0,2,No


Transform aggregated data into transactions.

In [6]:
df_trans = pd.DataFrame()

for index, data in df_agg.iterrows():
  if data["DIABETES"] == "Yes":
    df_trans = pd.concat([df_trans, pd.DataFrame({ "RANDID": [data["RANDID"]], "CONDITION": ["DIABETES"] })])
  if data["PREVSTRK"] == 1:
    df_trans = pd.concat([df_trans, pd.DataFrame({ "RANDID": [data["RANDID"]], "CONDITION": ["PREVSTRK"] })])
  if data["PREVHYP"] == 1:
    df_trans = pd.concat([df_trans, pd.DataFrame({ "RANDID": [data["RANDID"]], "CONDITION": ["PREVHYP"] })])
  if data["ANYCHD"] == "Yes":
    df_trans = pd.concat([df_trans, pd.DataFrame({ "RANDID": [data["RANDID"]], "CONDITION": ["ANYCHD"] })])

df_trans = df_trans.reset_index(drop=True)

df_trans.head(10)

Unnamed: 0,RANDID,CONDITION
0,2448,ANYCHD
1,10552,PREVHYP
2,11263,DIABETES
3,11263,PREVHYP
4,11263,ANYCHD
5,12629,PREVHYP
6,12629,ANYCHD
7,14367,PREVHYP
8,16365,PREVHYP
9,20375,PREVHYP


### Exports

In [7]:
df_trans.to_csv("../data/CS_FRMGHAM2_TRANS.csv", index=False)