## This notebook merges the information from:
- longitudinal_data.csv
- longitudinal_events.csv
- edications.csv
- prozeduren.csv

### Problem

The longitudinal data contains all patients from the longitudinal cohort and needs to be joined with an injections table. 

Currently the merge is made with 1. medications.csv to longitudinal_events.csv which then aggregates into the MeasurementSeq object. 

The main problem is that medications.csv is incomplete. The more accurate information of administrered injections are found in the procedures.csv table for records with code: 5-156.9 in the ICPML columns.

**Issue 1:**

We need to find out for the exported OCTs, which are treatment naive w.r.t. to previous administrered injections at the LMU eye clinic. 

**Issue 2:**

We need to export a csv file with patient_id, laterality, first_recorded_inkection_date, for all LMU naive patients.

**Issue 3:**

Produce a list of all sequences not naive to LMU eye clinic and investigate if more OCTs for these sequences can be exported.

**Issue 4:**

Generate a new version of Longitudinal events csv with merged information btw the OPS code and medication.csv tables.

In [1]:
import os
import glob
import pandas as pd
import numpy as np
import seaborn as sns

WORK_SPACE = "/home/olle/PycharmProjects/LODE/workspace"

longitudinal_pd = pd.read_csv(os.path.join(WORK_SPACE, "sequence_data/longitudinal_data.csv"))
segmentation_statistics_pd = pd.read_csv(os.path.join(WORK_SPACE, "sequence_data/segmentation_statistics_vol.csv"))

## load data csv files

#### longitudinal cohort data

here we filter for just AMD patients and only records where oct is available

In [2]:
longitudinal_path = "/home/olle/PycharmProjects/LODE/workspace/sequence_data/longitudinal_data.csv"
longitudinal_pd = pd.read_csv(longitudinal_path)

longitudinal_pd = longitudinal_pd.loc[longitudinal_pd.diagnosis == "AMD"]
print(longitudinal_pd.shape)
longitudinal_pd = longitudinal_pd.loc[longitudinal_pd['oct?'] == True]
print(longitudinal_pd.shape)

(28304, 15)
(17683, 15)


In [3]:
longitudinal_pd.head()

Unnamed: 0.1,Unnamed: 0,patient_id,laterality,study_date,oct_path,fundus_path,thickness_path,visual_acuity,logMAR,oct?,visus?,thickness?,fundus?,diagnosis_raw,diagnosis
55,55,53955,L,2013-11-26,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/projects/2018161...,,,True,False,True,True,AMD,AMD
56,56,53955,L,2013-04-30,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/projects/2018161...,,,True,False,True,True,AMD,AMD
57,57,53955,L,2013-05-23,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/projects/2018161...,,,True,False,True,True,AMD,AMD
58,58,53955,L,2014-04-24,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/projects/2018161...,FZ,2.2,True,True,True,True,AMD,AMD
59,59,53955,L,2013-08-26,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/raw/2018_LMUAuge...,/storage/groups/ml01/datasets/projects/2018161...,,,True,False,True,True,AMD,AMD


#### longitudianl events

Here we load the data as well as filter for the records recieving injections

In [4]:
events_path = "/home/olle/PycharmProjects/LODE/workspace/sequence_data/longitudinal_events.csv"

events_pd = pd.read_csv(events_path)

# filter for only record which recieved an injection
injection_bool = events_pd["injection?"] == True

events_inj_pd = events_pd.loc[injection_bool]
events_inj_pd["study_date_dt"] = pd.to_datetime(events_inj_pd.study_date)

events_date_check = events_inj_pd.groupby(['patient_id', 'laterality'])['study_date_dt'].nsmallest(1) 

events_date_check = pd.DataFrame(events_date_check)
events_date_check = events_date_check.reset_index()[["patient_id", "laterality", "study_date_dt"]]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


In [5]:
# save file
events_date_check.to_csv("/home/olle/PycharmProjects/LODE/workspace/sequence_data/events_check.csv")

In [6]:
# check fist recorded injection
min(events_date_check.study_date_dt)

Timestamp('2013-10-11 00:00:00')

#### medications

In [7]:
med_path = "/home/olle/PycharmProjects/LODE/workspace/dwh_tables/medications.csv"
med_pd = pd.read_csv(med_path)

med_pd["study_date_dt"] = pd.to_datetime(med_pd.DAT)

med_pd_check = med_pd.groupby(['PATNR', 'AUGE'])['study_date_dt'].nsmallest(1) 

In [8]:
med_pd_check = pd.DataFrame(med_pd_check)
med_pd_check = med_pd_check.reset_index()[["PATNR", "AUGE", "study_date_dt"]]

In [9]:
# confirm that first recorded date should be the same as for longitudinal_events.csv above
min(med_pd_check.study_date_dt)

Timestamp('2013-10-11 00:00:00')

#### OPS codes

here we filter for records with the injection code in the ICPML column

In [10]:
proz_path = "/home/olle/PycharmProjects/LODE/workspace/dwh_tables/prozeduren.csv"

proz_pd = pd.read_csv(proz_path)
# events_pd.groupby(["patient_id", "laterality"]).min("study_date")

proz_pd["study_date_dt"] = pd.to_datetime(proz_pd.DAT)

injections_bool = proz_pd.ICPML == "5-156.9"

proz_inj_pd = proz_pd.loc[injections_bool]

proz_inj_check = proz_inj_pd.groupby(['PATNR', 'LOK'])['study_date_dt'].nsmallest(1) 

In [11]:
proz_pd.PATNR.drop_duplicates()

0              3
1              6
8              7
17            14
24            15
           ...  
910340    376141
910341    376144
910354    376145
910355    376146
910362    376148
Name: PATNR, Length: 120508, dtype: int64

In [12]:
proz_inj_check = pd.DataFrame(proz_inj_check)
proz_inj_check = proz_inj_check.reset_index()[["PATNR", "LOK", "study_date_dt"]]

In [13]:
# confirm that the first observed is before that of the event and medication file
min(proz_inj_check.study_date_dt)

Timestamp('2005-01-03 00:00:00')

#### Join Proz and longitudinal data & medications table

we extract all eyes in the longitudinal cohort for joining against the proz and med tables

In [14]:
longitudinal_pd["study_date_dt"] = pd.to_datetime(longitudinal_pd.study_date)

longitudinal_eyes = longitudinal_pd[["patient_id", "laterality"]].drop_duplicates()

In [15]:
long_keys = ["patient_id", "laterality"]

med_keys = ['PATNR', 'AUGE']
proz_keys = ['PATNR', 'LOK']

# get all injection from med table for the eyes in the longitudinal cohort
long_med = pd.merge(longitudinal_eyes, med_pd, left_on=long_keys, right_on=med_keys, how="inner")
long_med = long_med[["patient_id", "laterality", "study_date_dt", "MED"]]

# get all injection from proz table for the eyes in the longitudinal cohort
long_proz = pd.merge(longitudinal_eyes, proz_inj_pd, left_on=long_keys, right_on=proz_keys, how="inner")
long_proz = long_proz[["patient_id", "laterality", "study_date_dt", "ICPML"]]

print("Number of eyes in chohort", longitudinal_eyes.shape[0])
print("Number of injections for these eyes in med table", long_med.shape[0])
print("Number of injections in the proz table", long_proz.shape[0])

Number of eyes in chohort 1090
Number of injections for these eyes in med table 3094
Number of injections in the proz table 16514


In [16]:
# drop any rows without data
long_proz = long_proz.dropna(subset=["ICPML", "study_date_dt"])
long_med = long_med.dropna(subset=["MED", "study_date_dt"])

print("get shapes after dropping nans")
print(long_med.shape[0], long_proz.shape[0])

get shapes after dropping nans
3073 16514


#### Show number of eyes in longitudinal cohort that have injection data in med and proz table

we see that the med table only has injection data for 428 our of 1090 eyes while Proz table has injection data for 1083 of 1090 sequences. Clearly most of AMD sequences are actually treated with injections and that data is available in the Proz table.

In [17]:
print("Number of longitudinal eyes with injections in MED table", 
      long_med.drop_duplicates(long_keys).shape[0])

print("Number of longitudinal eyes with injections in Proz table", 
      long_proz.drop_duplicates(long_keys).shape[0])

Number of longitudinal eyes with injections in MED table 428
Number of longitudinal eyes with injections in Proz table 1083


In [18]:
long_proz.head()

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML
0,53955,L,2007-11-29,5-156.9
1,53955,L,2008-01-07,5-156.9
2,53955,L,2008-02-11,5-156.9
3,53955,L,2008-06-24,5-156.9
4,53955,L,2008-09-08,5-156.9


#### find dates are in Proz but not in MED

In [19]:
id_proz = long_proz.patient_id.astype(str) + "_" + long_proz.laterality + "_" + \
long_proz.study_date_dt.astype(str)

id_med = long_med.patient_id.astype(str) + "_" + long_med.laterality + "_" + \
long_med.study_date_dt.astype(str)

In [20]:
print(f"of  {id_proz.shape[0]} unique injections in procedure table; {sum(id_proz.isin(id_med))} are found in the medication table")

print(f"of  {id_med.shape[0]} unique injections in medication table; {sum(id_med.isin(id_proz))} are found in the procedure table")

of  16514 unique injections in procedure table; 1686 are found in the medication table
of  3073 unique injections in medication table; 2704 are found in the procedure table


here we see that the medication table contains few of the injection that are noted in the prozedure table but there also exist injections noted in the medication table that are not present in the prozedure table. We will use the information from both of these tables.

#### merge tables to final table

In [21]:
long_proz_keys = ["patient_id", "laterality", "study_date_dt"]
long_proz_med = pd.merge(long_proz, long_med, left_on = long_proz_keys, right_on = long_proz_keys, how="outer")

In [22]:
patient_id = 53955
long_proz_med[long_proz_med.patient_id == patient_id]

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED
0,53955,L,2007-11-29,5-156.9,
1,53955,L,2008-01-07,5-156.9,
2,53955,L,2008-02-11,5-156.9,
3,53955,L,2008-06-24,5-156.9,
4,53955,L,2008-09-08,5-156.9,
5,53955,L,2008-12-18,5-156.9,
6,53955,L,2009-01-29,5-156.9,
7,53955,L,2009-03-04,5-156.9,
8,53955,L,2009-06-26,5-156.9,
9,53955,L,2009-07-24,5-156.9,


#### join in OCT data

In [23]:
longitudinal_pd.head(), longitudinal_pd.shape

(    Unnamed: 0  patient_id laterality  study_date  \
 55          55       53955          L  2013-11-26   
 56          56       53955          L  2013-04-30   
 57          57       53955          L  2013-05-23   
 58          58       53955          L  2014-04-24   
 59          59       53955          L  2013-08-26   
 
                                              oct_path  \
 55  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 56  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 57  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 58  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 59  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 
                                           fundus_path  \
 55  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 56  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 57  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 58  /storage/groups/ml01/datasets/raw/2018_LMUAuge...   
 59  /storage/groups/ml01/datasets

In [24]:
longitudinal_slim_pd = longitudinal_pd[["patient_id", "laterality", "study_date_dt", "oct?"]]

In [25]:
join_keys = ["patient_id", "laterality", "study_date_dt"]
long_proz_med_oct = pd.merge(long_proz_med, longitudinal_slim_pd, left_on = join_keys, 
                             right_on = join_keys, how="outer")


Below is an example of an a not from the LMU clinic treatment naive patient

In [26]:
long_proz_med_oct[long_proz_med_oct.patient_id == patient_id]

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
0,53955,L,2007-11-29,5-156.9,,
1,53955,L,2008-01-07,5-156.9,,
2,53955,L,2008-02-11,5-156.9,,
3,53955,L,2008-06-24,5-156.9,,
4,53955,L,2008-09-08,5-156.9,,
5,53955,L,2008-12-18,5-156.9,,
6,53955,L,2009-01-29,5-156.9,,
7,53955,L,2009-03-04,5-156.9,,
8,53955,L,2009-06-26,5-156.9,,
9,53955,L,2009-07-24,5-156.9,,


#### get number of untretaed eyes

In [27]:
untreated = long_proz_med_oct.groupby(["patient_id", "laterality"])['ICPML', 
                                                                    'MED'].apply(lambda x: x.isnull().all())

print("number of untreated eyes", sum((untreated.ICPML == True) | (untreated.ICPML == True)))

untreated_bool = (untreated.ICPML == True) | (untreated.ICPML == True)

number of untreated eyes 7


In [28]:
bool_treat = untreated_bool.values
untreated_bool_pd = untreated_bool.reset_index()[['patient_id', 'laterality']]
untreated_bool_pd["treated"] = bool_treat

#### drop untreated eyes

In [29]:
longitudinal_slim_pd[longitudinal_slim_pd.patient_id == patient_id].head()

Unnamed: 0,patient_id,laterality,study_date_dt,oct?
55,53955,L,2013-11-26,True
56,53955,L,2013-04-30,True
57,53955,L,2013-05-23,True
58,53955,L,2014-04-24,True
59,53955,L,2013-08-26,True


In [30]:
print("Number of longitudinal eyes with injections in Procedure or MED table", 
      long_proz_med_oct.drop_duplicates(long_keys).shape[0])

Number of longitudinal eyes with injections in Procedure or MED table 1090


In [31]:
long_proz_med_first = long_proz_med_oct.groupby(['patient_id', 'laterality'])['study_date_dt'].nsmallest(1)
long_proz_med_first = pd.DataFrame(long_proz_med_first)
long_proz_med_first = long_proz_med_first.reset_index()[['patient_id', 'laterality', "study_date_dt"]]

In [32]:
print("Number of eyes with injections from either MED and Proz table:", long_proz_med_first.shape[0])

Number of eyes with injections from either MED and Proz table: 1090


#### compare patients

In [33]:
save_dir = "/home/olle/PycharmProjects/LODE/workspace/sequence_data_checks"
patient_id = 261772
long_proz_med_oct_patient = long_proz_med_oct[long_proz_med_oct.patient_id == patient_id]
long_proz_med_oct_patient.to_csv(os.path.join(save_dir, str(patient_id)+".csv"))
long_proz_med_oct_patient

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
141,261772,R,2011-06-08,5-156.9,,
142,261772,R,2011-07-06,5-156.9,,
143,261772,R,2013-04-24,5-156.9,,
144,261772,R,2013-05-22,5-156.9,,
145,261772,R,2013-06-26,5-156.9,,
146,261772,R,2013-08-21,5-156.9,,
147,261772,R,2013-09-27,5-156.9,,
148,261772,R,2013-11-08,5-156.9,,
17940,261772,R,2013-08-01,,,True


In [34]:
not_naive_margin = 0
not_naive = 0
treated_eyes = 0

naive_to_lmu = [[], []]
not_naive_to_lmu = [[], []]

for row in long_proz_med_oct[["patient_id", "laterality"]].drop_duplicates().itertuples():
    long_proz_med_oct_patient = long_proz_med_oct[long_proz_med_oct.patient_id == row.patient_id]
    long_proz_med_oct_rec = long_proz_med_oct_patient[long_proz_med_oct_patient.laterality == row.laterality]
    
    injections = long_proz_med_oct_rec.dropna(subset=["ICPML", "MED"], how='all')
    oct_ = long_proz_med_oct_rec.dropna(subset=["oct?"])
    
    
    if oct_.size == 0:
        print("patient treated but has no OCT", row.patient_id, row.laterality)
        treated_eyes += 1
        continue
    
    if injections.size == 0:
        print("patient untreated", row.patient_id, row.laterality)
        continue
        
    delta = min(injections.study_date_dt) - min(oct_.study_date_dt)
    if delta.days < -90:
        not_naive_margin += 1

    if delta.days < 0:
        not_naive += 1
        not_naive_to_lmu[0].append(row.patient_id)
        not_naive_to_lmu[1].append(row.laterality)
    
    else:
        naive_to_lmu[0].append(row.patient_id)
        naive_to_lmu[1].append(row.laterality)
        
    treated_eyes += 1
        
print("not naive in eye clinic", not_naive, "not naive with 90 day margin", not_naive_margin)
print("treated cases", treated_eyes)

patient untreated 195054 L
patient untreated 33402 R
patient untreated 102325 L
patient untreated 319682 L
patient untreated 108005 R
patient untreated 299501 L
patient untreated 516 L
not naive in eye clinic 652 not naive with 90 day margin 593
treated cases 1083


### Using above information to ansers the Issues 1, 2, 3

**Issue 1:**

We need to find out for the exported OCTs, which are treatment naive w.r.t. to previous administrered injections at the LMU eye clinic. 

In [35]:
naive_to_lmu_pd = pd.DataFrame(naive_to_lmu).T
naive_to_lmu_pd = naive_to_lmu_pd.rename(columns={0:"patient_id", 1: "laterality"})

print("number of naive patient to LMU", naive_to_lmu_pd.shape[0])

number of naive patient to LMU 431


example of LMU treatment naive patient

In [36]:
patient_id = 502 # naive_to_lmu_pd.iloc[0].patient_id
laterality = "L" # naive_to_lmu_pd.iloc[0].laterality


long_proz_med_oct[(long_proz_med_oct.patient_id == patient_id) & (long_proz_med_oct.laterality == laterality)]

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
13815,502,L,2017-07-04,5-156.9,,True
13816,502,L,2017-08-29,5-156.9,,True
13817,502,L,2017-10-24,5-156.9,,True
13818,502,L,2018-01-16,5-156.9,,True
13819,502,L,2018-04-10,5-156.9,,True
13820,502,L,2018-07-03,5-156.9,,True
13821,502,L,2018-09-18,5-156.9,,
25658,502,L,2017-05-23,,,True
25659,502,L,2016-08-01,,,True
25660,502,L,2016-09-06,,,True


**Issue 2:**

We need to export a csv file with patient_id, laterality, first_recorded_inkection_date, for all LMU naive patients.

In [37]:
keys = ["patient_id", "laterality"]
lmu_naive = pd.merge(naive_to_lmu_pd, long_proz_med_oct, left_on=keys, right_on=keys, 
                                     how="left")

# consider only dates where injection has ben administrered
lmu_naive = lmu_naive.dropna(subset=["ICPML", "MED"], how="all")

lmu_naive_ = lmu_naive.groupby(['patient_id', 'laterality'])['study_date_dt'].nsmallest(1)

lmu_naive_pd = pd.DataFrame(lmu_naive_)
lmu_naive_pd = lmu_naive_pd.reset_index()[["patient_id", "laterality", "study_date_dt"]]

In [38]:
lmu_naive_pd

Unnamed: 0,patient_id,laterality,study_date_dt
0,18,R,2017-09-26
1,502,L,2017-07-04
2,709,L,2016-08-08
3,1263,L,2015-07-31
4,1263,R,2015-08-14
...,...,...,...
426,359944,R,2018-07-19
427,360270,L,2017-11-30
428,363725,R,2018-02-08
429,365435,L,2018-03-06


view few examples too see if correctlly calculated date

In [39]:
patient_id = 1263 # naive_to_lmu_pd.iloc[0].patient_id
laterality = "L" # naive_to_lmu_pd.iloc[0].laterality


long_proz_med_oct[(long_proz_med_oct.patient_id == patient_id) & (long_proz_med_oct.laterality == laterality)]

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
2204,1263,L,2015-07-31,5-156.9,,
2205,1263,L,2015-08-28,5-156.9,,
2206,1263,L,2015-09-25,5-156.9,,
2207,1263,L,2016-02-29,5-156.9,,True
2208,1263,L,2016-04-01,5-156.9,,True
2209,1263,L,2016-04-27,5-156.9,,True
2210,1263,L,2016-10-21,5-156.9,,
2211,1263,L,2016-11-18,5-156.9,,True
2212,1263,L,2016-12-19,5-156.9,,True
2213,1263,L,2017-02-08,5-156.9,,True


In [40]:
patient_id = 363725 # naive_to_lmu_pd.iloc[0].patient_id
laterality = "R" # naive_to_lmu_pd.iloc[0].laterality

long_proz_med_oct[(long_proz_med_oct.patient_id == patient_id) & (long_proz_med_oct.laterality == laterality)]

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
16785,363725,R,2018-02-08,5-156.9,,True
16786,363725,R,2018-03-08,5-156.9,,True
16787,363725,R,2018-04-09,5-156.9,,True
16788,363725,R,2018-05-09,5-156.9,,
16789,363725,R,2018-06-07,5-156.9,,True
16790,363725,R,2018-07-05,5-156.9,,True
27474,363725,R,2018-05-08,,,True
27475,363725,R,2018-01-22,,,True
27476,363725,R,2018-08-02,,,True


seems correct, we save the file for manual verification of naivity

In [41]:
# rename column
lmu_naive_pd = lmu_naive_pd.rename(columns={"study_date_dt": "first_injection_date"})
lmu_naive_pd.to_csv(os.path.join(save_dir, "check_naive_patients.csv"))

**Issue 3:**

Produce a list of all sequences not naive to LMU eye clinic and investigate if more OCTs for these sequences can be exported.

In [42]:
#### get eyes not naive to LMU eye clinic

In [53]:
# filter out any patients naive to LMU from current batch
lmu_naive_bool = long_proz_med_oct[["patient_id"]].isin(lmu_naive_pd["patient_id"].tolist())
long_prozedure_not_naive = long_proz_med_oct[~lmu_naive_bool["patient_id"]]

print("Number of dates with not naive records ", long_prozedure_not_naive.shape[0])

print("Number of patients with not naive records ", 
      long_prozedure_not_naive.patient_id.drop_duplicates().shape[0])

# remove all patients with no treatment

# drop any date with no injection
injected_dates = long_prozedure_not_naive.dropna(subset=["ICPML", "MED"], how='all')

# get patients with treatment
treated_not_naive_patients = injected_dates.patient_id.drop_duplicates()

treated_patient_bool = long_prozedure_not_naive.patient_id.isin(treated_not_naive_patients)

long_prozedure_not_naive_treated = long_prozedure_not_naive[treated_patient_bool]

print("Number of dates with not naive treated records ", long_prozedure_not_naive_treated.shape[0])


print("Number of sequences with not naive treated records ", 
      long_prozedure_not_naive_treated[["patient_id", "laterality"]].drop_duplicates().shape[0])


print("Number of patients with not naive treated records ", 
      long_prozedure_not_naive_treated.patient_id.drop_duplicates().drop_duplicates().shape[0])

long_prozedure_not_naive_treated.patient_id.drop_duplicates().reset_index().to_csv("patients_tb_exported.csv")

Number of dates with not naive records  15079
Number of patients with not naive records  423
Number of dates with not naive treated records  14971
Number of sequences with not naive treated records  569
Number of patients with not naive treated records  416


#### Issue 4:

Merge the OPS code and medications table for complete and correct data on treatments w.r.t. injections

In [44]:
events_ops_medications = long_proz_med_oct.dropna(subset=["ICPML", "MED"], how="all")

print("Number of untreated dates only for OCT", long_proz_med_oct.shape[0] - events_ops_medications.shape[0])

events_ops_medications.MED.fillna("Unknown", inplace=True)
events_ops_medications

Number of untreated dates only for OCT 9955


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)


Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
0,53955,L,2007-11-29,5-156.9,Unknown,
1,53955,L,2008-01-07,5-156.9,Unknown,
2,53955,L,2008-02-11,5-156.9,Unknown,
3,53955,L,2008-06-24,5-156.9,Unknown,
4,53955,L,2008-09-08,5-156.9,Unknown,
...,...,...,...,...,...,...
17899,170213,L,2014-12-01,,Lucentis,True
17900,59831,L,2015-01-22,,Lucentis,True
17901,45625,R,2014-09-15,,Lucentis,
17902,247097,R,2014-01-20,,Lucentis,True


In [45]:
keys = ["patient_id", "study_date_dt", "laterality"]
events_columns =["patient_id", "study_date_dt", "laterality", "injection?", "iol?"]
longitudinal_events_ops = pd.merge(events_ops_medications, events_inj_pd[events_columns], 
                                   left_on=keys, right_on=keys, how="left")

print(events_ops_medications.shape[0], longitudinal_events_ops.shape[0],events_inj_pd.shape)

longitudinal_events_ops.head()

17904 17946 (3508, 8)


Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?,injection?,iol?
0,53955,L,2007-11-29,5-156.9,Unknown,,,
1,53955,L,2008-01-07,5-156.9,Unknown,,,
2,53955,L,2008-02-11,5-156.9,Unknown,,,
3,53955,L,2008-06-24,5-156.9,Unknown,,,
4,53955,L,2008-09-08,5-156.9,Unknown,,,


In [46]:
events_ops_medications

Unnamed: 0,patient_id,laterality,study_date_dt,ICPML,MED,oct?
0,53955,L,2007-11-29,5-156.9,Unknown,
1,53955,L,2008-01-07,5-156.9,Unknown,
2,53955,L,2008-02-11,5-156.9,Unknown,
3,53955,L,2008-06-24,5-156.9,Unknown,
4,53955,L,2008-09-08,5-156.9,Unknown,
...,...,...,...,...,...,...
17899,170213,L,2014-12-01,,Lucentis,True
17900,59831,L,2015-01-22,,Lucentis,True
17901,45625,R,2014-09-15,,Lucentis,
17902,247097,R,2014-01-20,,Lucentis,True


In [47]:
longitudinal_events_ops = longitudinal_events_ops.rename(columns={"study_date_dt": "study_date"})
longitudinal_events_ops_un = longitudinal_events_ops.drop_duplicates()

longitudinal_events_ops_un.to_csv(os.path.join(WORK_SPACE, "sequence_data/longitudinal_events_ops.csv"))

In [48]:
longitudinal_events_ops_un["injections_joint?"]

KeyError: 'injections_joint?'