# *Growth Profiler* Data Analysis

The *Growth Profiler* is a high throughput device for growth characterization. Growth is automatically measured via OD in 96-well plates and the results are stored in csv files. This workflow trains analysis of growth data in csv format.

## Loading libraries
There are some general Python libraries necessary in order to have usefull functions available. The library responsible for growth curve analysis is `batchslopes` [Link](https://github.com/uliebal/gp_analytics) and contains functions specifically designed for analysis of the *Growth Profiler*. 

In [None]:
import numpy as np
import pandas as pd
import os
import warnings
import sys
sys.path.append('..')
from batchslopes import *
warnings.filterwarnings("ignore", category=DeprecationWarning) 
warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning) 
# from collections import OrderedDict
import matplotlib.pyplot as plt

## Data input and descripion

The standard growth profiler csv file contains in the first rows some general information on the experimental conditions. We ignore these metadata in the regression analysis, however, make sure to include the header names when the real data starts. The first column should contain the time measurements, and all other columns are OD values. The *Growth Profiler* usually measures the time in minutes, other experimental data, like in *RecExpSim*, uses hours as time unit. The workflow tries to unify the time-based analysis by dividing the time vector with the variable `TimeUnit`.

The data is partitioned (binned, variable `Partition`) and in each bin a regression on the logarithmic data is conducted. The bin with the best correlation coefficient is then selected for further partitioning until the correlation coefficient gets worse (is decreasing). The correlation coefficient and the corresponding slope of the logarithmic OD are reported.

**Important**: Check the decimal separator. Science typically works with the english convention of point-separators (`10,000.23`), but the *Growth Profiler* or your local system might generate an csv with german comma-separators (`10.000,23`).


**Input:**
 - `File`: string, growth experiment csv-file name
 - `skiprows`: integer, lines of experimental metadata, Start counting from 0.
 - `decimal`: string ('.' or ','), decimal separator 
 - `TimeUnit`: integer, for measure time in minutes: 60, hours: 1
 - `Partition`: integer, decides the number of bins on which regression is performed

In [None]:
#if 'Input_Image' in list(df.columns):
#    print('success')

In [None]:
File = os.path.join('data', 'Strain_characterization_1.csv')       #GP_20210507_171028_MTP03_GValue
skiprows = 0
decimal = '.'
TimeUnit = 1
Partition = 2
df = pd.read_csv(File, skiprows=skiprows , decimal=decimal)
if 'Input_Image' in list(df.columns):
    df = df.drop(labels='Input_Image', axis=1)
TimeAx = df.columns[0] #'Time (min)' # for growth profiler
Exp0 = df.columns[1]
print('Column headers: ', df.columns)
df.plot(x=TimeAx, y=Exp0)

GVexp = 2
eexp = -10

## Single Experiment Analysis

Below, we check individual growth curves and their regression. This can be helpful if you want to examine a particular experiment.

In [None]:
t = df[TimeAx].values/TimeUnit
x = df[Exp0].values
myResult = DetectR2MaxSingle(t,x,Partition)
print(myResult)
plt.plot(t, np.log(x))
if myResult!=False:
    plt.scatter(myResult['time'], np.log(myResult['OD']))
    plt.plot(myResult['time'], myResult['Slope']*myResult['time'] + myResult['ycorrect'], 'r', label='$\mu$:{:.3f}, R2:{:.2f}'.format(myResult['Slope'], myResult['R2']))
    plt.legend()
    myResult['ID'] = Exp0

# myResult

## All Experiment Analysis

Below, all experiments in the csv file are analysed. The final information is stored in plots showing the regression range for each experiment along with the growth rate and the correlation coefficient and stored as `myPlots.svg`. If the regressions look unconvincing, try a different partition.

In [None]:
myCols = df.columns[1:]

subplot_x = round(np.sqrt(len(myCols)))
subplot_y = round(np.sqrt(len(myCols))) + 1

OD = pd.DataFrame()
for i1 in myCols: 
    OD[i1] = CorrectedOD(df[i1], GVexp, eexp)
mu_list = []
r2_list = []

NumExp = len(myCols)
AxDim = np.ceil(np.sqrt(NumExp))
t = df[TimeAx].values/TimeUnit
# plt.subplots(AxDim, AxDim, sharex='col')
fig, ax = plt.subplots(figsize=[20,10], sharey=True)
for idx, myExp in enumerate(myCols):
#     print(myExp)
    x = OD[myExp].values
    myResult = DetectR2MaxSingle(t,x,Partition)
    plt.subplot(subplot_x, subplot_y, idx+1)
    plt.plot(t, np.log(x))
    plt.title(myExp)
    if myResult is not False:
        plt.scatter(myResult['time'], np.log(myResult['OD']))
        plt.plot(myResult['time'], myResult['Slope']*myResult['time'] + myResult['ycorrect'], 'r', label='$\mu$:{:.3f}, R2:{:.2f}'.format(myResult['Slope'], myResult['R2']))
        myResult['ID'] = myExp
        plt.legend()
        #hier Dataframe für Mu und R2 feeden
        mu_list.append(myResult['Slope'])
        r2_list.append(myResult['R2'])
    if myResult is False:
        mu_list.append(False)
        r2_list.append(False)
    plt.tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
    
    
fig.text(0.5, 0.1, 'time', ha='center')
fig.text(0.1, 0.5, 'ln(OD)', va='center', rotation='vertical')
plt.savefig('myPlots.svg', format='svg')

In [None]:
csv = pd.DataFrame()
csv['Wells'] = myCols
csv['Mu value'] = mu_list
csv['R2 value'] = r2_list
csv.to_csv('newcsv.csv')