In [None]:
%pip install pandas
%pip install seaborn
%pip install matplotlib
%pip install numpy


In [99]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np 
import random 

# Time in Therapeutic Range - Rosendaal Method

Time in therapeutic range is a commonly used mareker of quality control for I suppose any drugs but mainly anticoagulants. Literature mentions several approaches and debate around what is the best method to use.

Commonly used methods are traditional approach : no of in-range blood tests divided by total no of blood tests. 
This method has flaws as time between blood tests are not treated.

For a patient P, who receive INR blood tests in a certain time period with values as : 

time 1 , INR value 1

time 2, INR value 2 

time 3 , INR value 3 

Traditional method for this patient P ,TTRtrad = no of in-range / 3 

Traditional method is not robust to handle with situations such as below :- 

target INR 2-3

day 1, INR 1.0

day 2, INR 5.0

day 3, INR 1.0 

According to traditional method, this patient would have 0 time in therapeutic range as none of the recorded blood values are in-range, thus 0/3.

Thus, Rosendaal(1993) has made an improvement on this traditional method by using a linter interpolation between two set of blood values.  Therefore,under Rosendaal, a straight line (lm) is plotted between time1,value1 and time2,value 2 (i.e., linear interpolation). Therefore, in the above scenario, patient real-world INR must move between 1.0 to 5.0 in between day 1 and day 2's tests. Although at time t1 with INR 1.0 is below the range, the patient must have time 't' where it is in range to achieve INR 5.0 on time t2. 

The days in monitored are two days : i.e., between interval begining on day 1 at blood draw with resultant INR 1.0 and interval end on day 3 at INR 1.0 again. 

Thus, according to Rosendaal method, TTRrosendaal would be = 0.5 days. 

(add graph to easily comprehend this).


### Unfortunately, no easy code or package available in R or Python. 

Whilst https://rdrr.io/github/anticoagulation/warfarin/ this may suggest that there is a package. 
Review of his repo https://github.com/badgettrg did not yield any and neither does https://github.com/opencpu/opencpu



## Requirements of the Function

1. Given a desired range - e.g., INR value of 2.0 to 3.0. There are 3 possible outcomes for a particular INR value : below the range, in the range, above the range. 

Therefore, between two sets of INR value : 3 ^ 2 = 9 possible outcomes.

2. There are 9 possible scenarios that the function should be able to handle.

For interval 'i', there will be two values 'v1' and 'v2' at the begining and end of this interval.

interval "i" / value at begining of interval "v1" / value at end of interval "v2" /

3. In addition, this formual shoudl be able to handle missing values "Na - 2.0 " etc. 



In [100]:
data = {
    "situation": [1,2,3,4,5,6,7,8,9],
    "v1":[1,1,1,2.5,2.5,2.5,5,5,5],
    "v2": [0.9,2.5,5,0.9,2.5,5,0.9,2.5,5],
    "comment":["lower to lower","lower to in-range","lower to higher","in-range to lower","in-range to in-range","in-range to higher","higher to lower","higher to in-range","higher to higher"]
}

testdf = pd.DataFrame(data)

testdf
# this dataframe describes above situations, where range is 2.0 - 3.0 

Unnamed: 0,situation,v1,v2,comment
0,1,1.0,0.9,lower to lower
1,2,1.0,2.5,lower to in-range
2,3,1.0,5.0,lower to higher
3,4,2.5,0.9,in-range to lower
4,5,2.5,2.5,in-range to in-range
5,6,2.5,5.0,in-range to higher
6,7,5.0,0.9,higher to lower
7,8,5.0,2.5,higher to in-range
8,9,5.0,5.0,higher to higher


### Function Description

Given the above desired requirements, I have built this function as inspired by {cite:p}`Razouki 2014`
Razouki et al has released their calculation in the supplementary material as an excel spreadsheet. The spreadsheet is well worth reviewing to understand the conceptualisation in their approach. 

Given the constraints of excel formulation, Razouki's spreadsheet allows calculation for one patient in a data format mentioned above already, where time interval would be t2-t1 for first interval and blood values at each interval - v1 for beginning and v2 for end for first interval and so on. 

(t1,v1)
(t2,v2) 
(t3,v3)

The findings from this function are also cross-checked manually as well as with Razouki's spreadsheet.

For manual checking, I recommend simple plotting of all those 9 possible scenarios and counting the time elapse between the ranges. 

Razouki's spreadsheet make use of logical multiplications (e.g., TRUE * TRUE = 1 ) to deal with missing data. In python, we are however able to use if,else flow statements and calculation for each specific scenario. 

https://pubmed.ncbi.nlm.nih.gov/25185245/

In addition to Razouki who mainly interested in quantiative time in therapeutic range. I have generated 'low' and 'higher' to denote results for intervals who may be below or above range. This is to enable correlation between 'lower' times with 'thrombotic events' and 'higher times' with hemorrhagic events.


In [None]:
def ttrcalc(v1,v2,upper = 3.0001,lower = 1.999):
    #formula for time in therapeutic range calculation using linear interpol
    #variables upper and lower are target ranges.
    #this calculator should be able to handle 9 possible scenarios
    upper = upper
    lower = lower
    vdiff = abs(v2-v1)
    #vdiff allows absolute difference between two values.
    res = 0
    #result object. 
    
    if v1 < lower and v2 < lower:
        #situation 1 where both vals are lower than range.
        return "low"
    
    elif v1 < lower and lower < v2 < upper :
        #situation 2 
        res = (v2 - lower) / vdiff 
        return res 
    
    elif v1 < lower and v2 > upper :
        #situation 3 
        res = (upper - lower) / vdiff
        return res
    
    elif lower < v1 < upper and v2 < lower :
        #situation 4 
        res = (v1 - lower) / vdiff
        return res 
    
    elif lower<v1<upper and lower<v2<upper :
        # situation 5 where both vals are in range.
        return 1
    
    elif lower < v1 < upper and v2 > upper :
        # situation 6 
        res = (upper - v1) / vdiff
        return res
    
    elif v1 > upper and v2 < lower :
        #situation 7
        res = (upper - lower) / vdiff 
        return res 
    
    elif v1 > upper and lower < v2 < upper :
        #situation 8
        res = (upper - v2) / vdiff
        return res
    
    elif v1 > upper and v2 > upper :
        #situation 9 where both vals are higher than range.
        return " higher"
    

# Test ttrcalc function

Goals :
- handle all possible scenarios 1 to 9 , 
- handle missing data
- crosscheck pass manually

In [None]:
testdf['col_3'] = testdf.apply(lambda x: ttrcalc(x.v1, x.v2), axis=1)
testdf

In [None]:
#lets test is more comprehensively,
#all the outputs of this function should be higher, lower or between 0 - 100%
randlist = []

for i in range(0,20):
    x = np.random.uniform(0,6)
    x = round(x,2)
    randlist.append(x)

randlist

In [None]:
d = { "v1" : randlist,"v2" :randlist1}
#let's make a data frame that encomasses all these values 
df = pd.DataFrame(d)
df

#let's apply by feeding a column of the dataframe into this function.
# LHS : making a new column column 3. RHS = function call via apply using lambda x and defining lambda(x)
 #as ttrcalc function with variables name as col names. axis 1 apply row wise.
df['col_3'] = df.apply(lambda x: ttrcalc(x.v1, x.v2), axis=1)

df
#showed code worked 

In [None]:
d = {"v1" :[1 , 1, 1, np.nan, 2.5, 2.5, 5, 5, 5], "v2":[1, 2.5, 5.0, 0.9, 2.5, np.nan, 2.0,5,7] }
#this is a scenario where pdf dataframe encompasses all possible combinations of scenarios. 
pdf = pd.DataFrame(d)

pdf["col_3"] = pdf.apply(lambda x: ttrcalc(x.v1,x.v2),axis = 1)

pdf
#this cross checked with original information. and can handle NaN's individually. 

In [None]:
#this function should work with na values in the middle too. 

print(ttrcalc(1,5))
print(ttrcalc(5,1))
print(ttrcalc(1,np.nan))
print(ttrcalc(np.nan,2.5))
print(ttrcalc(2.5,0.9))

In [None]:
#now let's see if t and v pairs - aka untidy format - engineered yield the same result. 

# traditionally information is presented in time,value pairs.
#(t1,v1)
#(t2,v2)
#(t3,v3)
#(tn,vn)

#thus we need to reshape it into ->
# i1 ,v1 , v2
# i2 , v2 , v1
# i3 , v3 , v2
# i(n-1),vn, v(n-1)

# in a dataframe format.

fdt = {
    "t" : [1,2,3,4,5,6,7,8,9,10],
    "v" : [1.1,1.5,2,2.5,3,3.5,5,2.5,4,2.0],
}

fdg = pd.DataFrame(fdt)
fdg



## Note on datashape.

traditional tidy format would be one observation with one value
thus t1,v1 etc..

applying this function - ttrcalc() in this function is quite complex as involves calling previous row's value's. This is easily executed in excel but leads to messier code.

The cleaner way is to reshape this data. Effectively, we are interested in time in thereapeutic range. Thus, we shall calculate intervals of elapsed time. so if t1,t2...tn. Then there would 'n-1' intervals.

This intervals would have a column . Then each interval would have begining and end times v1 and v2 respectively in theier own seperate columns.

This thus allow ease of passing ttrcalc function as well as applying time in range which is interval x percentage in range. 

Please see below re-shaping fdg to a desired format.

In [None]:
#re-shaping int the format we'd like 
fdg["t2"] = fdg["t"].shift(1)
fdg["i"] = fdg.apply(lambda x: x.t - x.t2, axis = 1)
#calculating i for interval.
fdg["v1"] = fdg["v"].shift(1)
fdg["v2"] = fdg["v"]

# sense check : interval i which is always n-1. 
# sensecheck : v1 is the begining of interval and v2 is end of interval.
fdg["col_3"] = fdg.apply(lambda x: ttrcalc(x.v1,x.v2),axis = 1)

In [None]:
fdg
#this cross checked with the manual checks.