# High Throughput Growth Experiment Analysis in Biotechnology
***

## Background
Biotechnology requires knowledge of microbial physiology and metabolism. Among the most important information is the growth rate and the substrate uptake rate. Both can be used to calculate the effectiveness of biotechnological production strategies and biomass and product yields. To identify an organism with suitable growth parameters, multiple strains, mutants or environmental conditions are tested which results in high throughput data. To analyse high throughput data computational approaches can speed up the process.

There are different growth phenotypes: exponential, linear, saturated, diauxie. The typical microbial saturated growth is divided in different growth phases, lag, log, stationary, and death phase.

Laws to describe a biomass $N$ include: 
- exponential law: 
  - $N(t) = N_0e^{\mu t}$
    - $N_0$: initial biomass, gDW/L
    - $\mu$: growth rate, /h
    - $t$: duration, h
- logistic growth law (Verhulst): 
  - $N(t) = \frac{K}{1+\left( \frac{K-N_0}{N_0}e^{-\mu t} \right)}$
    - $N_0$: initial biomass, gDW/L
    - $\mu$: growth rate, /h
    - $t$: duration, h
    - $K$: carrying capacity, max biomass, gDW/L
- Gompertz function: 
  - $N(t) = N_0e^{\left(\textrm{ln} \left(\frac{K}{N_0} \right)\left(1 - e^{-\mu t} \right) \right)}$
    - $N_0$: initial biomass, gDW/L
    - $\mu$: growth rate, /h
    - $t$: duration, h
    - $K$: carrying capacity, max biomass, gDW/L
- Baranyi model: complex model, see [literature](https://doi.org/10.1006/fmic.1999.0285)

The growth rate $\mu$ is a function of the substrate level and can be calculated via the Monod equation:
- $\mu = \frac{\mu_{max}S}{K_S + S}$
  - $\mu$: growth rate, /h
  - $\mu_{max}$: maximum growth rate, /h
  - $S$: Substrate concentration, mmol/L
  - $K_S$: half velocity constant, the value of S when $\frac{\mu}{\mu_{max}} = 0.5$, mmol/L

## Objective
1. Determine the optimal growth temperature
2. Calculate the growth rate and substrate uptake rate at different substrate concentrations
3. Evaluate the growth yields

### Additional information:
- Hagen, Exponential growth of bacteria: Constant multiplication through division, American Journal of Physics, 2010. doi [10.1119/1.3483278](https://doi.org/10.1119/1.3483278)
- Verduyn et al., A theoretical evaluation of growth yields of yeasts, Antoie van Leeuwenhoek, 1991. doi [10.1007/BF00582119](https://doi.org/10.1007/BF00582119)
- Pirt, The maintenance energy of bacteria in growing cultures, 1965. doi [10.1098/rspb.1965.0069](https://doi.org/10.1098/rspb.1965.0069)

## Workflow

**1 Set-up of simulation environment**
 * *1.1 Loading Python libraries and functions*
 * *1.2 Seeding your individual organism*

**2 Optimal temperature experiment**
 * *2.1 Shake-flask experiment simulation and data export* 
 * *2.2 Python based data analysis*
 
**3 Optical density to dry weight conversion**
 * *3.1 Experiment setup (temperature, substrate concentrations, duration)*
 * *3.2 Data analysis*
 * *3.3 Data export to Excel* 

**4 Substrate uptake experiment**
 * *3.1 Experiment setup (temperature, substrate concentrations, duration)*
 * *3.2 Data analysis*
 * *3.3 Data export to Excel* 

---

## Set-up of simulation environment
### Loading libraries
Loading libraries and fixing visualization. No user input necessary.

In [None]:
# Loading of important functionalities for the notebook:
# Loading numpy, a library fo manipulation of numbers:
import os
import numpy as np
# Loading matplotlib, a library for visualization:
import matplotlib.pyplot as plt
# Initialization, loading of all laboratory functionalities and stored models and information of the organisms:
# from FermProSimFun import MonodModel as Model

try:
    import silvio
except ImportError:
    print("silvio not found, installing...")
    %pip install silvio
    import silvio
from silvio.catalog.RecExpSim import RecExperiment, combine_data
from silvio.extensions.records.gene.crafted_gene import CraftedGene

from silvio.catalog.GroExpSim import *# GrowthExperiment

Par_Bud = 10000

print('silvio version: ',silvio.__version__)
print('System ready')

## 2 Lab setup

The physiological conditions of the simulation are defined. The choice of organism has no effect on the parameters.

**Resource cost:**
* **Free** 

**Input:** 
* **`mySeed`: Number, e.g., Student-ID (integer)**
* **`myInvest`: 1000-2000 Eur (10-20% of total budget, integer)**

**Output:** 
* Text with remaining budget, experiment failure rate and internal organism identifier.


In [None]:
mySeed = None  # Your student ID
myInvest = None # Your lab expenditure, choose anything between 1000-2500

exp = GrowthExperiment(mySeed, myInvest, Par_Bud)

Organism = 'ecol'
host = exp.create_host(Organism)
exp.print_status()

----
## 2 Optimal temperature experiment

### 2.1  Experiment set-up
You have to identify the optimal growth temperature, the corresponding maximum growth rate and the maximum biomass of your strain by cultivating the cells at different temperatures. For each program start the optimal temperature and the maximum biomass is randomly initiated. The optimal temperature is sampled from the range of temperatures for **mesophilic microorganisms** within 20-40°C (see page 23, [Biotechnology: An illustrated primer](https://application.wiley-vch.de/books/sample/3527335153_c01.pdf)). Occasionally, a culture will not grow at all and therefore it might be helpful to measure temperature replicates. However, be aware that each cultivation costs resources.     

The results from the temperature experiments are stored in a comma separated value (csv) file  in the `Data` subfolder as `Growth_Simulation.csv`. You can find the csv-file in the left navigation panel within the `biolabsim` folder. For quick inspection double click the csv-file, for downloading and subsequent analysis in Excel right click on the file name and choose 'Download'.        

If you want to do another set of experiments afterwards, or if you want to repeat individual experiments, you should make sure that you change the ID of your set of experiments (experiments_ID), otherwise results already generated may be overwritten. By default the experiments_ID has the value `1`, as is shown in the code cell below. 

Example: `temperature = [22,24,26,28,30,32,34,36,38]`

**Resource cost:**
* **100 EUR**

**Input:**
* **`temperatures`: Temperature array (integer list)**
* **`experiments_ID`: variable name (integer)**

**Output:**
* File `TempGrowthExperiment.csv` in `Data` subfolder.

In [None]:
temperatures = None # Choose a list of temperatures between 20-40, e.g. [20, 25, 30, 35, 40]
FileName = 'TempGrowthExperiment.csv'
exp.measure_TemperatureGrowth(Organism, temperatures, FileName)
exp.print_status()

### Visualizing growth experiments
The experimental results are stored in the csv-file. Analyse the data either with Python with the following code with scambled line or separately with e.g. Excel, Origin or GraphPad. Correct the line of code to visualize the logarithmic biomass over time.

In [None]:
# Correct code sequence for plotting

Time, Biomass = my_data[:,0], my_data[:,1:]
DataFile = os.path.join('..','Data','TempGrowthExperiment.csv')
LnBiomass = np.log(Biomass)
[plt.scatter(Time, X, label=Exp) for Exp,X in enumerate(LnBiomass.T)]
plt.legend([r'{}:{}$^\circ$C'.format(Idx, T) for Idx, T in enumerate(temperatures)], bbox_to_anchor=(1.05, 1), loc='upper left'); plt.xlabel('time, h'); plt.ylabel('ln(Biomass)')
my_data = np.genfromtxt(DataFile, delimiter=',', skip_header=1)

### Calculate linear regression
In the following cell, the linear regression `polyfit` is performed on the logarithmic biomass values, `LnBiomass` from the previous cell. The slope equals the growth rate and regression provides a standard deviation.

In [None]:
# For None enter the corresponding values of experiment index with fastest growth and latest time of linear growth (integer number).
# %load Snippets/snip_GrowthPars.py
Idx_optT = 5 # Insert the experiment ID for the optimal temperature
Linear_optT = 5 # Insert the time point when linearized, exponential growth ceases

# Rearrange the correct code sequence for calculating the growth and biomass parameters here.
print(f'Biomass: {MB_mean:.2f}±{MB_std:.2f} g/L')
MB_std = np.std(Biomass[Linear_optT:,Idx_optT])
GR, cov = np.polyfit(Time[:Linear_optT],LnBiomass[:Linear_optT,Idx_optT],1, cov=True)
MB_mean = np.mean(Biomass[Linear_optT:,Idx_optT])
print(f'Growth rate: {GR_mean:.2f}±{GR_std:.2f} /h')
GR_mean, GR_std = GR[0], np.sqrt(np.diag(cov))[0]


---
## Optical density to dry weight conversion
In literature the standard reporting for biomass is gramm dry weight (gDW). In the lab, the measurement by optical density is typical and thus, a conversion between these units need to be established. Typically, a culture is grown for a period of time and harvested during exponential growth. All liquid is removed by baking in the oven or microwave and the remaining biomass powder is weighted.

In the following, you conduct an experiment to determine the conversion factor. Choose the optimal temperature, and appropriate values for cultivation time and substrate concentration to end approximately in exponential phase. The result is stored in a csv-file in the Data folder.

**Resource cost:**
* **25 EUR/replicate**

**Input:**
* **`Temperatures`: optimal temperature (integer)**
* **`FinalTime`: Hours to harvest biomass (integer)**
* **`SubstrateConc`: Initial glucose concentration in g/L (integer)**
* **`Replicates`: Parallel experiments for better statistics (integer)**

**Output:**
* File `OD-DryWeight_?C_?h_?gL-1.csv` with `?` as the choices for temperature, final time and substrate concentration.


In [None]:
Temperature = 30
FinalTime = 50
SubstrateConc = 10
Replicates = 3
FileName = f'OD-DryWeight_{Temperature}C_{FinalTime}h_{SubstrateConc}gL-1.csv'
exp.measure_DryWeight(Organism, Temperature, FinalTime, SubstrateConc, Replicates, FileName)

### Calculation of the conversion factor
The conversion factor is simply calculated by dividing the biomass dry weight with the OD. The factor can be multiplied to the OD to get the corresponding dry weight. Perform the analysis either with the Python code with scrambled lines or use Excel, Origin, GraphPad etc...

In [None]:
# Rearrange the correct code sequence for calculating the OD and gDW correlation.
# The code does not need any further input, it assumes the filename is the same as above.

my_data = np.genfromtxt(DataFile, delimiter=',', skip_header=1)
DataFile = os.path.join('..', 'Data',FileName)
OD2XAvg = round(np.average(DryWeight/OD),3)
print(f'The OD to dry weight conversion factor: {OD2XAvg}±{OD2XStd} (gDW/L)/OD')
OD2XStd = round(np.std(DryWeight/OD),3)
DryWeight, OD = my_data[:,1], my_data[:,3]


---
## 4 Substrate uptake experiment

During the experiment the substrate concentration as g/L and biomass as OD is measured. Use the optimal temperature you identified previously, decide which substrate concentration to choose and the total time for sampling. Make sure that the exponential phase as well as the stationary phase are measured. You also set the amount of sampling per hour, this gives you better statistical results, but results in higher expenses. There is a night time recurring every 24h, during which no measurements are taken for approximately six hours. You can start your experiment relative to the night start.

**Resource cost:**
* **~6 EUR/Sample**

**Input:**
* **`Temperatures`: optimal temperature (integer)**
* **`SubstrateConc`: Initial glucose concentration in g/L (integer)**
* **`TotalTime`: Experiment duration in hours (integer)**
* **`SamplingTime`: Sampling per hour, e.g. 2 means every half hour (integer)**
* **`NightStart`: Duration until night begins when no samples are taken, recurring every 24h (integer)**

**Output:**
* File `SubstrateGrowthExp_?C_?h_?gL-1.csv` with `?` as the choices for temperature, experiment duration and substrate concentration.


In [None]:
Temperature = None # °C
SubstrateConc = None # g/L
TotalTime = None  # hours
SamplingTime = None # sampling per hour
NightStart = None # Duration in hours when the night starts. No measurements for 6h and repeating every 24h.
# The data is saved in the 'Data' subfolder. Choose the name of the file and add description of the experiment, like the substrate concentration, Temperature, etc.
FileName = f'SubstrateGrowthExp__{Temperature}C_{TotalTime}h_{SubstrateConc}gL-1.csv' # add file type .csv

myDat = exp.measure_BiomassSubstrateExp('ecol', Temperature, [TotalTime, SamplingTime], SubstrateConc, NightStart, FileName)
exp.print_status()

### Visualization of experiment

The cell visualizes the lates experiment defined before by loading the associated csv-file.

In [None]:
# loading of data csv file to dataframe
ExperimentFile = os.path.join('..', 'Data', FileName)
df = pd.read_csv(ExperimentFile, sep=',', header=0)
# Scatter plot of the data with two y axis for OD and substrate concentration
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('time (h)')
ax1.set_ylabel('OD', color=color)
ax1.scatter(df['t'], df['OD'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
color = 'tab:blue'
ax2.set_ylabel('Substrate', color=color)  # we already handled the x-label with ax1
ax2.scatter(df['t'], df['S'], color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.show()

## Check your results
Testing values of:
- Optimal temperature
- OD to dry weight correlation
- Growth rate
- Maximum biomass
- Growth yield
- Glucose concentration with half maximum uptake rate
- Glucose maximum uptake rate

In [None]:
# Enter your determined optimal values for the parameters here. Delete parameters that are not calculated.
Temperature = None
MaxBiomass = None
OD2X = None
GrowthYield_Avg = None
GrowthYield_Std = None
GrowthRate_Avg = None
GrowthRate_Std = None
GlcRateMax_Avg = None
GlcRateMax_Std = None
Ks_Avg = None
Ks_Std = None

Results = {'Temperature': Temperature,
           'MaxBiomass': MaxBiomass,
           'OD2X': OD2X,
           'GrowthYield_Avg': GrowthYield_Avg, 'GrowthYield_Std': GrowthYield_Std,
           'GrowthRate_Avg':GrowthRate_Avg, 'GrowthRate_Std':GrowthRate_Std, 
           'GlcRateMax_Avg': GlcRateMax_Avg, 'GlcRateMax_Std': GlcRateMax_Std,
           'Ks_Avg': Ks_Avg, 'Ks_Std': Ks_Std,
           } #
exp.check_Results(Organism, Results)