<a href="https://colab.research.google.com/github/uliebal/RWTH-QMB1/blob/master/2001_GSMM_cobrapy_QuantMiBi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Seminar Quantitative Mikrobiologie: Hands-on simulation genome scale metabolic models
## Introduction

The seminar provides a guide of how to work with genome scale metabolic models (GSMM) of micro-organisms. The example organism we use is *E. coli*, for which reliable models are available in online databases. The goal is to get insight into flux distributions and general metabolic state as well as to find flux solutions that represent the highest possible over-produce of a metabolite, in our example succinate.

The seminar uses Jupyter notebooks, a new way to use and visualize code in the web. Such a notebook is composed of a sequence of cells. Cellls can be either text/comments, like this introduction, or it contains python code to be run. In this example the code is evaluated by the cloud service [Binder](https://mybinder.org/). The output for each code-cell is shown directly below it. For a overview on Jupyter notebooks read [this review](https://www.nature.com/articles/d41586-018-07196-1). Another usefull resource to develop Jupyter notebooks is via [Google Colaboration](https://colab.research.google.com).

The content of the seminar is adapted from a tutorial by the project Data-Driven Design of Cell Factories and Communities (DD-DeCaF) as well as workshop information available [here](https://biosustain.github.io/cell-factory-design-course/). More information on DD-DeDaF and the original files are [here](https://github.com/DD-DeCaF/tutorials). The simulations are performed using cobrapy (see [Ebrahim et al., 2013](https://doi.org/10.1186/1752-0509-7-74)), the most widely used analysis tool for genome scale metabolic models.

**The steps of this tutorial are:**
 * [Set-up compute environment](#Python-setup)
   * loading programs for analysis (data analysis, plotting, cobrapy, escher)
   * downloading *E. coli* GSMM (iJO1366)
 * [Model Analysis](#Model-Analysis)
   * loading the model into cobrapy
   * examining model structure and details (metabolites, reactions, medium)
   * solving reaction flux distribution for optimal growth
   * investigating reaction statistics within the solution (NADH and ATP turnover)
   * visualizing flux activities in metabolic network
 * [Model Manipulation](#Model-Manipulation)
   * setting new objective function
   * visualizing optimal succinate flux distribution
   * generating production envelope for the effect of growth rate on succinate production rate
 

**Skills you learn:**
* Familiarizing with Jupyter notebooks
* Retrieving genome scale model files
* Manipulating and Simulating GSMM with cobrapy   


**Practical comments:**
* In some code cells you are supposed to enter input. The location at which input is required is marked by ''**?...?**''.

* Comments to explain details in the code bock are written with a hash **plus blank**. 
Alternative commands and commands providing somewhat more detailed information are deactivated with a hash **without blank**.

## 1 Set-up compute environment <a id='Python-setup'></a>

Before we can analyse GSMM, we have adjust the python environment that it integrates the cobrapy toolbox and downloading the GSMM.

### 1.1 Basic Python libraries 
Some libraries that facilitate data manipulation

In [0]:
import sys # loading commands to control/navigate within the system architecture
# Loading pandas, a library for data manipulation
!{sys.executable} -m pip install pandas
import pandas as pd

# Loading numpy, a library fo manipulation of numbers
import numpy as np

# loading matplotlib, a library for visualization
!{sys.executable} -m pip install matplotlib
import matplotlib.pyplot as plt
%matplotlib inline


### 1.2 Git-Installation of cobrapy

In [0]:
from IPython.utils import io

# loading cobrapy, a library dedicated to the analysis of genome scale metabolic models
with io.capture_output() as capture: # capturing/hiding lenghty output
  !{sys.executable} -m pip install git+https://github.com/opencobra/cobrapy
  import cobra


### 1.3 Installation of Escher for visualization

[Escher]( https://doi.org/10.1371/journal.pcbi.1004321) is a web-based tool for building, viewing, and sharing visualizations of biological pathways. These 'pathway maps' are a great way to contextualize data about metabolism.

[Official Documentation site](https://escher.readthedocs.io/en/v1.2.0/index.html)

* Get a quick view on the available metabolic maps, is a map of iJO1366 part of it?

In [0]:
with io.capture_output() as capture:
  !{sys.executable} -m pip install escher
  import escher
  from escher import Builder
# list of available maps
escher.list_available_maps() 


### 1.4 GSMM Download from BIGG

[BiGG](https://doi.org/10.1186/1471-2105-11-213) is a knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest.

In the following we download the *E. coli* model iJO1366 ([Orth et al., 2011](http://dx.doi.org/10.1038/msb.2011.65)), as well as the visualization JSON file.

In [0]:
!wget http://bigg.ucsd.edu/static/models/iJO1366.xml.gz

# loading a visualization file of the metabolic network. 
# For frequently used models, like iJO1366, Escher automatically retrieves the visualization file.  
#!wget http://bigg.ucsd.edu/static/models/iJO1366.json

## 2. Model Analysis <a id='Model-Analysis'></a>

First we have a look at general features of the model. Then we check which reaction is optimized and perform a simulation to find the optimal flux distribution to maximize the optimized reaction.

### 2.1 Inspection of model features

Converting the GSMM file from the BIGG database into the 'model' data-structure that can be evaluated by the cobrapy toolbox. Just calling the variable 'model' shows some basic features of the model.

In [0]:
# generating cobra variable from SBML/xml file
model = cobra.io.read_sbml_model('iJO1366.xml.gz')
model

# Display model objective (a reverse reaction is present as well, just for mathematical reasons)
#print(model.objective)

Generate a visualization of the stoichiometric matrix.

* What do you see on x-axis, y-axis? Complete the code!
* Why do the first few hundred columns look different?
* What are the connected diagonal rows?
* What do highly colored horizontal/vertical lines represent?

In [0]:
stoich_matrix = cobra.util.create_stoichiometric_matrix(model)
plt.spy(stoich_matrix, precision=0.01, markersize=.2)

plt.xlabel('?..?');
plt.ylabel('?..?');

### Details to metabolite glyceraldehyde-3-phosphate

In the following we extract information of cytoplasmic glyceraldehyde-3-phosphate.
Finding metabolites that contain 'g3p_c'.

In [0]:
model.metabolites.query('g3p_c')

More specific information

In [0]:
model.metabolites.get_by_id('g3p_c')

# displays all reactions associated with g3p_c
#model.metabolites.g3p_c.reactions
# dot notation only works for 'proper' variable names, i.e. not for names starting with numbers (e.g. 10fthf_c)

Explicit list of reactions with reactants

In [0]:
for reaction in model.metabolites.g3p_c.reactions:
    print(reaction, reaction.name)

Investigating a particular reaction: GAPD

GPR shows the 'Gene-Protein-Reaction' relationship, i.e. it shows an ID for a gene.

In [0]:
model.reactions.GAPD

### 2.2 Simulation of optimal growth

In [0]:
# displaying which solver is used to perform FBA
#model.solver

solution = model.optimize()
print('Growth rate: {:.2f}'.format(solution.objective_value))

Models solved using FBA can be further analyzed by using summary methods, which output printed text to give a quick representation of model behavior. Calling the summary method on the entire model displays information on the input and output behavior of the model, along with the optimized objective.

* Check which medium components were not used for growth.
* Check which substrate was limiting growth.
* What is the metabolic mode used for catabolism? (fermentation, Glycolysis+TCA)
* What is the major source of NADH/ATP?
* What is the major sink of NADH/ATP?


In [0]:
model.summary()

# information for specific reactions
#solution.fluxes.BIOMASS_Ec_iJO1366_core_53p95M
#solution.fluxes.ATPM

In addition, the input-output behavior of individual metabolites can also be inspected using summary methods. For instance, the following commands can be used to examine the overall redox balance of the model.

In [0]:
model.metabolites.nadh_c.summary()

Or to get a sense of the main energy production and consumption reactions

In [0]:
model.metabolites.atp_c.summary()

### 2.3 Visualization

Escher is a web-based tool for building, viewing, and sharing visualizations of biological pathways. These 'pathway maps' are a great way to contextualize data about metabolism. In this example the maps are directly exported as html-file.

Best help [here](https://escher.readthedocs.io/en/latest/escher-python.html)

In [0]:
# Escher wants the data to be stored in a specific format, a 'dictionary'
builder = Builder()
# Load an Escher map
builder.map_name = 'iJO1366.Central metabolism'
builder.model = model
builder.reaction_data = solution.fluxes
# Add some data for metabolites
#builder.metabolite_data = solution.shadow_prices
# Simplify the map by hiding some labels
builder.hide_secondary_metabolites = True
#builder.hide_all_labels = True
builder.reaction_scale = [
    { 'type': 'min', 'color': '#000000', 'size': 12 },
    { 'type': 'median', 'color': '#ffffff', 'size': 20 },
    { 'type': 'max', 'color': '#ff0000', 'size': 25 }
]
builder.reaction_scale_preset = 'GaBuRd'

# Make all the arrows three times as thick
builder.reaction_scale = [
    {k: v * 3 if k == 'size' else v for k, v in x.items()}
    for x in builder.reaction_scale
]
builder.save_html('escher_map_file.html')

## 3. Model Manipulation <a id='Model-Manipulation'></a>

### 3.1 Setting new objective

The objective function is determined from the objective_coefficient attribute of the objective reaction(s). Generally, a “biomass” function which describes the composition of metabolites which make up a cell is used.

The objective function can be changed by assigning Model.objective, which can be a reaction object (or just it’s name)

* Complete the code! (2x)

In [0]:
# copy model for changes to succinate
SucModel = model.copy()
#SucModel.reactions.query('suc')
#SucModel.reactions.EX_succ_e

# change objective to the synthesis of succinate
SucModel.objective = SucModel.reactions.EX_succ_e

# Because biomass is not anymore optimized, the biomass reactions will be zero for optimization to succinate because it only drains the result.
# Assuming that there is still some growth occuring, we raise the lower boundary for the growth reaction
SucModel.reactions.get_by_id('BIOMASS_Ec_iJO1366_core_53p95M').lower_bound = ?..?

# storing the simulation in a variable
Suc_Solution = SucModel.optimize()

# print overview of exchange reactions
?...?
#SucModel.reactions.ATPM.flux


### 3.2 Visualization of succinate optimized flux distribution

The optimization of succinate from glucose shows a theoretical conversion mechanism. Let's visualize the optmized reaction fluxes in Escher.

* Complete the code!
* Describe similarities and differences of the metabolism for succinate production compared to growth.

In [0]:
builder.reaction_data = ?...?

builder.save_html('escher_map_file.html')

### 3.3 Generating production envelopes
Production envelopes (aka phenotype phase planes) will show distinct phases of optimal growth with different use of two different substrates. For more information, see [Edwards et al.](http://dx.doi.org/10.1002/bit.10047).

* Complete the code! (2x)

In [0]:
# setting lowest growth boundary back to '0'
SucModel.reactions.get_by_id('BIOMASS_Ec_iJO1366_core_53p95M').lower_bound = 0.

# Calculation of production envelope 
prod_env = cobra.flux_analysis.production_envelope(SucModel, reactions=SucModel.reactions.BIOMASS_Ec_iJO1366_core_53p95M, 
                               objective=SucModel.reactions.EX_succ_e)
prod_env.head(3)



In [0]:
# Visualization
plt.plot(prod_env['BIOMASS_Ec_iJO1366_core_53p95M'], prod_env['carbon_yield_maximum'], color='blue', linestyle='solid', linewidth=2)
plt.xlabel('?...?');
plt.ylabel('?...?');
plt.title('Production Envelope of Succinate for different growth rates');

# save figure in Jupyter notebook
plt.savefig('production-envelope_suc-growth.png')

## Further Tasks

An equally important organism for biotechnology is *S. cerevisiae* with one of the main models called iMM904. Your goal is to identify the growth rate based on different oxygen uptake rates.



1.   Find the iMM904 model in the BIGG database.
2.   Extract the map name from Escher for visualization.
3.   Check growth on glucose and visualize the fluxes.
4.   Use the production envelope command to check the dependence of growth on oxygen uptake.
  * Identify the reaction name for the oxygen exchange reaction.
  * Use as 'objective' the default biomass reaction and as 'reactions' the oxygen exchange reaction.
  * For plotting, examine the column header names in 'prod_env'.
  * Is the resulting graph visually appealing? Try using np.abs(prod_env['?...?']) to improve the figure.

# Further Literature

## Details to Jupyter education
The following sites provide guides to using Jupyter notebooks for educational means:
 * https://jupyter4edu.github.io/jupyter-edu-book/index.html
 * https://github.com/jperkel/example_notebook
 * https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#mathematics-physics-chemistry-biology
 * https://github.com/binder-examples/
 * https://nbviewer.jupyter.org/
 * https://jupyter.readthedocs.io/en/latest/index.html
 * https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
 * https://matplotlib.org/index.html

## more on cobrapy:
 * https://opencobra.github.io/cobrapy/
 * https://cobrapy.readthedocs.io/en/latest/getting_started.html
 * https://github.com/DD-DeCaF/tutorials
 * https://biosustain.github.io/cell-factory-design-course/

## Escher Documentation:
 * https://escher.readthedocs.io/en/v1.2.0/index.html
 * https://github.com/zakandrewking/escher/blob/master/docs/notebooks/COBRApy%20and%20Escher.ipynb
 * https://github.com/DD-DeCaF/tutorials/blob/master/escher-01.ipynb

## further material with cameo:
 * https://github.com/DD-DeCaF/tutorials
 * https://biosustain.github.io/cell-factory-design-course/
 * https://try.cameo.bio/user/M6JFMnvW35Vz/tree
 * !pip install cameo