# MEWpy Community Modeling

Author: Vitor Pereira, inspired on the work by Daniel Machado. 

License: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)

-------

In this tutorial:

- You will learn how to perform flux balance analysis of microbial communities
using a model of the [central carbon metabolism of *E. coli*](https://journals.asm.org/doi/10.1128/ecosalplus.10.2.1).


## Install requirements 
To run this notebook we firstly need to install the required packages

In [1]:
! pip install -U -q mewpy cplex escher

[31mERROR: Could not find a version that satisfies the requirement cplex (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for cplex[0m[31m
[0m

Verify the instalation

In [2]:
import mewpy
mewpy.info()

MEWpy version: 0.1.34
Author: Vitor Pereira (2019-) | CEB University of Minho (2019-2023)
Contact: vmsapereira@gmail.com 

Available LP solvers: gurobi glpk
Default LP solver: gurobi 

Available ODE solvers: scikits scipy
Default ODE solver: scikits 

Optimization Problems: AbstractKOProblem AbstractOUProblem CofactorSwapProblem CommunityKOProblem ETFLGKOProblem ETFLGOUProblem GKOProblem GOUProblem GeckoKOProblem GeckoOUProblem KcatOptProblem KineticKOProblem KineticOUProblem MediumProblem OptORFProblem OptRamProblem RKOProblem ROUProblem 

Available EA engines: inspyred jmetal
Default EA engine: jmetal
Available EAs: GA NSGAII NSGAIII SA SPEA2 



**IMPORTANT**: The notebook requires a MEWpy version >= 0.1.35

### Run in Google colab

If you are running this notebook in Colab, you need to perform the following steps, otherwise skip.

In [3]:
%%bash
[[ ! -e /colabtools ]] && exit
! pip install -U -q PyDrive

In [4]:
if 'google.colab' in str(get_ipython()):
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    from google.colab import auth
    from oauth2client.client import GoogleCredentials

    auth.authenticate_user()
    gauth = GoogleAuth()
    gauth.credentials = GoogleCredentials.get_application_default()
    drive = GoogleDrive(gauth)

    model_file = drive.CreateFile({'id':'1o0XthuEOs28UJ4XTa9SfFSFofazV-2nN'})
    model_file.GetContentFile('e_coli_core.xml.gz')

## Setting up a community

We will create a synthetic microbial consortium with two *E. coli* mutants growing in minimal medium. In one of the mutants we will knockout the glucose transporter and in the other we will knockout the ammonium transporter.

In [5]:
from cobra.io import read_sbml_model
from mewpy import get_simulator

model = read_sbml_model('models/ec/e_coli_core.xml.gz')
wildtype = get_simulator(model)
solution = wildtype.simulate()
print(solution)
solution.find('EX')

Set parameter Username
Academic license - for non-commercial use only - expires 2024-12-11
objective: 0.8739215069684301
Status: OPTIMAL
Method:FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
EX_co2_e,22.809833
EX_glc__D_e,-10.0
EX_h_e,17.530865
EX_h2o_e,29.175827
EX_nh4_e,-4.765319
EX_o2_e,-21.799493
EX_pi_e,-3.214895


Now we create our two mutants (`glc_ko` and `nh4_ko`):

In [6]:
glc_ko = wildtype.copy()
glc_ko.id = 'glc_ko'
glc_ko.set_reaction_bounds('GLCpts', 0, 0)

Read LP format model from file /var/folders/fw/kbs61_l15j587pjbwf3_y8780000gn/T/tmpcimgyt8p.lp
Reading time = 0.00 seconds
: 72 rows, 190 columns, 720 nonzeros


In [7]:
nh4_ko = wildtype.copy()
nh4_ko.id = 'nh4_ko'
nh4_ko.set_reaction_bounds('NH4t', 0, 0)

Read LP format model from file /var/folders/fw/kbs61_l15j587pjbwf3_y8780000gn/T/tmpxd905hg4.lp
Reading time = 0.00 seconds
: 72 rows, 190 columns, 720 nonzeros


## Comparing models

Community models require that metabolites have the same identifiers accros all models. MEWpy offers some functions tho that end, computing the metabolites, reactions and uptakes overlaps between a list models.

In [8]:
from mewpy.com import *
mets, rxns, over = jaccard_similarity_matrices([glc_ko, nh4_ko])

In [9]:
mets

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,1.0
nh4_ko,1.0,1.0


In [10]:
rxns

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,0.978947
nh4_ko,0.978947,1.0


In [11]:
over

Unnamed: 0,glc_ko,nh4_ko
glc_ko,1.0,1.0
nh4_ko,1.0,1.0


## Building communities

**MEWpy** has some basic functionality for working with microbial communities, one is the `CommunityModel` class to create microbial communities from a list of models of individual species: 

In [44]:
from mewpy.model import CommunityModel
community = CommunityModel([glc_ko, nh4_ko], flavor='cobra')

In [45]:
sim = community.get_community_model()

Organism: 100%|███████████████████████████████████| 2/2 [00:00<00:00,  4.59it/s]


This community model ignores the environmental conditions that were specified in the original models (since these could be very different). 

To make our life easier, we will extract the nutrient composition specified in the wild-type model to use later.

In [46]:
from mewpy.simulation import Environment
M9 = Environment.from_model(wildtype)
M9

Unnamed: 0,lb,ub
EX_ac_e,0.0,1000.0
EX_acald_e,0.0,1000.0
EX_akg_e,0.0,1000.0
EX_co2_e,-1000.0,1000.0
EX_etoh_e,0.0,1000.0
EX_for_e,0.0,1000.0
EX_fru_e,0.0,1000.0
EX_fum_e,0.0,1000.0
EX_glc__D_e,-10.0,1000.0
EX_gln__L_e,0.0,1000.0


## Simulation using FBA

A very simple way to simulate a microbial community is to merge the individual models into a single model that mimics a "super organism", where each microbe lives inside its own compartment, and run a (conventional) FBA simulation for this *super organism*.

In [47]:
solution = sim.simulate(constraints=M9)

print(solution)
solution.find('EX')

objective: 0.40757209363986224
Status: OPTIMAL
Method:FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
EX_glc__D_e,-10.0
EX_h2o_e,31.248968
EX_h_e,16.351792
EX_nh4_e,-4.444818
EX_o2_e,-24.368743
EX_pi_e,-2.998671
EX_co2_e,25.311132
EX_glc__D_e_nh4_ko,-10.0
EX_glu__L_e_glc_ko,2.222409
EX_glu__L_e_nh4_ko,-2.222409


We can see that the model predicts a growth rate (total biomass per hour) similar to the wild-type, with an efficient consumption of glucose and ammonia that results in respiratory metabolism.

But what is each organism doing, and are both organisms actually growing at the same rate?

Let's print the biomass flux for each organism:

In [48]:
solution.find('BIOMASS', sort=True,show_nulls=True)

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572


and all non null fluxes by organism:

In [49]:
sim.find_metabolites()

Unnamed: 0_level_0,name,compartment,formula
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
glc__D_e_glc_ko,D-Glucose,e_glc_ko,C6H12O6
glc__D_e,D-Glucose,e,C6H12O6
gln__L_c_glc_ko,L-Glutamine,c_glc_ko,C5H10N2O3
gln__L_e_glc_ko,L-Glutamine,e_glc_ko,C5H10N2O3
gln__L_e,L-Glutamine,e,C5H10N2O3
...,...,...,...
fum_c_nh4_ko,Fumarate,c_nh4_ko,C4H2O4
fum_e_nh4_ko,Fumarate,e_nh4_ko,C4H2O4
g3p_c_nh4_ko,Glyceraldehyde 3-phosphate,c_nh4_ko,C3H5O6P
g6p_c_nh4_ko,D-Glucose 6-phosphate,c_nh4_ko,C6H11O9P


In [50]:
solution.find('nh4_ko')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
ACKr_nh4_ko,-7.810667
ACONTa_nh4_ko,1.607668
ACONTb_nh4_ko,1.607668
ACt2r_nh4_ko,-7.810667
AKGt2r_nh4_ko,-3.390348
ATPM_nh4_ko,8.39
ATPS4r_nh4_ko,10.578752
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572
CO2t_nh4_ko,-12.016587
CS_nh4_ko,1.607668


Actually it seems that only one of the organisms is growing while the other has an active metabolism (it exchanges metabolites with the environment and with the other organism) performing the role of a bioconverter, but none of the flux is used for growth. 

> Do you think this would be a stable consortium ?

## Regularized Community FBA

Flux balance analysis (FBA) provides not one but potentially an infinite number of solutions. There are, however, a number different strategies to select one particular solution from the set of all possibles. One common approach is to select to most parsimonious solution that minimizes the sum of all fluxes (pFBA). 

Next, we simulate the community growth and select a solution based on L2 regularization of each community organism growth, that is, we aim to find the solution for which no organisms growth to fast while approaching each organisms individual growth:   

In [51]:
from mewpy.com import regComFBA

In [52]:
solution = regComFBA(community,constraints=M9,obj_frac=1)
solution.find('BIOMASS|growth', sort=True, show_nulls=True)

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572
community_growth,0.407572


Or using the simulator

In [53]:
solution=sim.simulate(method=regComFBA,constraints=M9,obj_frac=1)
solution.find('BIOMASS|growth', sort=True, show_nulls=True)

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572
community_growth,0.407572


## Community Simulation with SteadyCom

**SteadyCom** by [Chan, et al (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005539) is a recent community simulation method that takes into account the fact that to reach a stable composition the organisms need to grow at the same *specific growth rate* (1/h), which means that the *absolute growth rate* (gDW/h) of each organism is proportional to its *abundance* at steady-state (gDW).

Let's simulate the same community using SteadyCom:

In [54]:
solution = SteadyCom(community, constraints=M9)

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Organism: 100%|███████████████████████████████████| 2/2 [00:00<00:00,  6.15it/s]


In this case the solution object shows the overall community growth rate and the relative abundance of each species:

In [55]:
solution

Community growth: 0.027466848121535575
glc_ko	1.0
nh4_ko	30.785477210087368

The `solution` object for community simulations implements a few additional features, such as enumerating all the cross-feeding interactions:

In [56]:
solution.cross_feeding(as_df=True).dropna().sort_values('rate', ascending=False)

Unnamed: 0,donor,receiver,compound,rate
6,nh4_ko,glc_ko,lac__D_e,31.460909
12,glc_ko,nh4_ko,pyr_e,31.254737
13,nh4_ko,glc_ko,ac_e,23.866599
4,nh4_ko,glc_ko,h_e,23.679204
18,glc_ko,nh4_ko,etoh_e,9.979772
14,glc_ko,nh4_ko,acald_e,8.736784
15,nh4_ko,glc_ko,akg_e,4.689488
1,glc_ko,nh4_ko,glu__L_e,4.610779
2,glc_ko,nh4_ko,h2o_e,2.254232


We can plot the fluxes of each mutant in a map to help with interpretation of the results:

In [57]:
from mewpy.visualization.escher import build_escher
if 'google.colab' in str(get_ipython()):
    from google.colab import output
    output.enable_custom_widget_manager()

build_escher(fluxes=solution.internal['glc_ko'])

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'ACALD': -18.71655537485597, 'ACALDt': -8.73678370807742, 'ACKr': 23.86659902629946, 'A…

In [58]:
build_escher(fluxes=solution.internal['nh4_ko'])

Downloading Map from https://escher.github.io/1-0-0/6/maps/Escherichia%20coli/e_coli_core.Core%20metabolism.json


Builder(reaction_data={'ACALD': 18.71655537485597, 'ACALDt': 8.73678370807742, 'ACKr': -23.86659902629946, 'AC…

## Explore alternative solutions

Unfortunately, one limitation of **SteadyCom**, which is exemplified by [Chan, et al (2017)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005539) in Figure 3 (reproduced below), is the variability in the solution space when the community is not growing at the maximum (theoretical) growth rate.

> Would you expect a synthetic community to grow at its maximum growth rate?

**MEWpy** implements a variability analysis function for the SteadyCom solution space, let's see what happens if the community is growing at 90% of the theoretical maximum:

In [59]:
from mewpy.com import SteadyComVA
variability = SteadyComVA(community, obj_frac=0.9, constraints=M9)

print('Strain\tMin\tMax')
for strain, (lower, upper) in variability.items():
    print(f'{strain}\t{lower:.1%}\t{upper:.1%}')

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Strain	Min	Max
glc_ko	0.0%	99.9%
nh4_ko	0.1%	100.0%


As you can see, there is a really large variability in this solution space. This means that we know in theory the two mutants **can** cooperate and survive in minimal media, but there is still a lot of uncertainty with regard to **how** they will achieve a stable consortium.

> How do you think we can reduce this uncertainty?

Firstly, lets set the environment conditions:

In [60]:
sim.set_environmental_conditions(M9)

We may now impose constraints on each organism growth, such as stating that each organism need to grow at least 0.1/h

In [61]:
constraints={community.organisms_biomass['nh4_ko']:(0.1,1000), 
             community.organisms_biomass['glc_ko']:(0.1,1000)}
solution = sim.simulate(constraints=constraints)
solution

objective: 0.40757209363986224
Status: OPTIMAL
Method:FBA

In [62]:
solution.find('BIOMASS')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572


Alternatively, we might choose to impose relative growth rates for each of the organisms, as a proxy of the community composition:

In [63]:
community = CommunityModel([glc_ko, nh4_ko],
                           add_compartments=True,
                           merge_biomasses=True,
                           flavor='cobra')

In [64]:
sim = community.get_community_model()
sim.set_environmental_conditions(M9)

Organism: 100%|███████████████████████████████████| 2/2 [00:00<00:00,  6.17it/s]


In [65]:
solution = sim.simulate()
print(solution)
solution.find('BIOMASS')

objective: 0.40757209363986224
Status: OPTIMAL
Method:FBA


Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.407572
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.407572


In [66]:
sim.find(community.biomass)

Unnamed: 0_level_0,name,lb,ub,stoichiometry,gpr,annotations
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
community_growth,Community growth rate,0,inf,"{'Biomass_glc_ko': -1, 'Biomass_nh4_ko': -1}",,{}


The relative abundance (relative growth rates) are by default equal. We may though change these ratios:  

In [67]:
community.set_abundance({'glc_ko':1,'nh4_ko':2.5})
sim.simulate(method='pFBA').find('BIOMASS|growth')

Unnamed: 0_level_0,Flux rate
Reaction ID,Unnamed: 1_level_1
BIOMASS_Ecoli_core_w_GAM_glc_ko,0.105388
BIOMASS_Ecoli_core_w_GAM_nh4_ko,0.263471
community_growth,0.105388


## SMETANA

**SMETANA** implements several algorithms to analyse cross-feeding interactions in microbial communities. These have been describe in [Zelezniak et al, PNAS (2015)](https://www.pnas.org/doi/abs/10.1073/pnas.1421834112). Please read the paper for a more detailed explanation.

SCS (species coupling score): measures the dependency of one species in the presence of the others to survive

In [68]:
sc_score(community)

Organism: 100%|███████████████████████████████████| 2/2 [00:00<00:00,  6.22it/s]


Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Unnamed: 0_level_0,Value
Attribute,Unnamed: 1_level_1
glc_ko,{'nh4_ko': 1.0}
nh4_ko,{'glc_ko': 1.0}


MUS (metabolite uptake score): measures how frequently a species needs to uptake a metabolite to survive

In [69]:
MUS = mu_score(community)
MUS

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Unnamed: 0_level_0,Value
Attribute,Unnamed: 1_level_1
glc_ko,"{'ac_e': 0.06, 'acald_e': 0.28, 'akg_e': 0.2, ..."
nh4_ko,"{'ac_e': 0.0, 'acald_e': 0.0, 'akg_e': 0.0, 'c..."


In [70]:
MUS.glc_ko

{'ac_e': 0.06,
 'acald_e': 0.28,
 'akg_e': 0.2,
 'co2_e': 0.0,
 'etoh_e': 0.17,
 'for_e': 0.0,
 'fru_e': 0.0,
 'fum_e': 0.0,
 'glc__D_e': 0.0,
 'gln__L_e': 0.0,
 'glu__L_e': 0.0,
 'h_e': 0.05,
 'h2o_e': 0.09,
 'lac__D_e': 0.24,
 'mal__L_e': 0.0,
 'nh4_e': 1.0,
 'o2_e': 0.94,
 'pi_e': 1.0,
 'pyr_e': 0.3,
 'succ_e': 0.08}

In [71]:
MUS.nh4_ko

{'ac_e': 0.0,
 'acald_e': 0.0,
 'akg_e': 0.0,
 'co2_e': 0.0,
 'etoh_e': 0.0,
 'for_e': 0.0,
 'fru_e': 0.0,
 'fum_e': 0.0,
 'glc__D_e': 1.0,
 'gln__L_e': 0.0,
 'glu__L_e': 1.0,
 'h_e': 0.0,
 'h2o_e': 0.0,
 'lac__D_e': 0.0,
 'mal__L_e': 0.0,
 'nh4_e': 0.0,
 'o2_e': 0.0,
 'pi_e': 1.0,
 'pyr_e': 0.0,
 'succ_e': 0.0}

MPS (metabolite production score): measures the ability of a species to produce a metabolite

In [72]:
MPS = mp_score(community,environment=M9)
MPS

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09


Unnamed: 0_level_0,Value
Attribute,Unnamed: 1_level_1
glc_ko,"{'etoh_e': 1, 'for_e': 1, 'h2o_e': 1, 'pyr_e':..."
nh4_ko,"{'etoh_e': 1, 'for_e': 1, 'h2o_e': 1, 'pyr_e':..."


MRO (metabolic resource overlap): calculates how much the species compete for the same metabolites.

In [73]:
score, MRO = mro_score(community,environment=M9)
print(score)
MRO

Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
Set parameter FeasibilityTol to value 1e-09
Set parameter OptimalityTol to value 1e-09
0.5


Unnamed: 0_level_0,Value
Attribute,Unnamed: 1_level_1
community_medium,"{glc, gln, pi}"
individual_media,"{'glc_ko': {'gln', 'h2o', 'acald', 'pi', 'pyr'..."


In [74]:
MRO.individual_media.glc_ko

{'acald', 'gln', 'h2o', 'pi', 'pyr'}

In [75]:
MRO.individual_media.nh4_ko

{'glc', 'gln', 'pi'}