# L6b: Incorporating Gene Expression Logic into Flux Balance Analysis
In this lecture, we'll continue our discussion of Flux Balance Analysis (FBA), particularly what the constraints are saying in the flux estimation problem. Last time, we simplified the material balance and flux bounds constraints. Today, we'll discuss incorporating gene expression logic into the FBA problem. The key ideas of this lecture are:
* __Flux balance analysis (FBA)__ is a mathematical approach used to analyze the flow of metabolites through a metabolic network. It assumes a steady state where metabolite production, consumption, and transport rates are balanced. The FBA problem is formulated as a linear programming (LP) problem to maximize or minimize fluxes through the network, subject to constraints. 
* __Flux bounds constraints__ limit the range of possible fluxes through a metabolic network. These bounds can incorporate additional information, such as experimental data or prior knowledge about the system, into the FBA problem.
* __Boolean gene expression logic__ can be incorporated into the FBA problem by using gene-protein-reaction (GPR) rules to link gene expression levels to enzyme activity and metabolic fluxes. GPR rules define the relationship between genes, proteins, and reactions in a metabolic network, allowing for the integration of gene expression data into the FBA model. The GPR rules are logical expressions in [a boolean model](https://en.wikipedia.org/wiki/Boolean_algebra).

## PS2 (Preview): Flux Balance Analysis of the Urea Cycle in HL-60 Cells
In problem set 2 (PS2), we will explore the urea cycle in HL-60 cells using flux balance analysis. The [urea cycle](https://www.kegg.jp/pathway/hsa00220) is a crucial metabolic pathway that converts toxic ammonia into urea for excretion. While the urea cycle's role in [HL-60 cells, a human promyelocytic leukemia cell line](https://www.atcc.org/products/ccl-240?matchtype=b&network=g&device=c&adposition=&keyword=hl60%20cell%20line%20atcc&gad_source=1&gbraid=0AAAAADR6fpoOXsp8U8fXLd_E6sLTcwv24&gclid=CjwKCAiA5eC9BhAuEiwA3CKwQm0C1oE5_JjTpJ24VnTjZUZQVLivpPxmufDo7HdH5v3hN1XKnEf3ExoCvhwQAvD_BwE), is not directly established, these cells exhibit alterations in protein levels and proliferation rates when exposed to various compounds, which may indirectly affect nitrogen metabolism and related pathways.

* __Tasks__: We'll construct [a simplified model of the urea cycle](https://github.com/varnerlab/CHEME-5450-Lectures-Spring-2025/blob/main/lectures/week-5/L5c/docs/figs/Fig-Urea-cycle-Schematic.pdf), analyze its structure, determining reversibility of the reactions, some estimates for the bounds, and then compute the flux distribution through the network under different assumptions.

### References
1. [Al-Otaibi NAS, Cassoli JS, Martins-de-Souza D, Slater NKH, Rahmoune H. Human leukemia cells (HL-60) proteomic and biological signatures underpinning cryo-damage are differentially modulated by novel cryo-additives. Gigascience. 2019 Mar 1;8(3):giy155. doi: 10.1093/gigascience/giy155. PMID: 30535373; PMCID: PMC6394207.](https://pmc.ncbi.nlm.nih.gov/articles/PMC6394207/)
2. [Figarola JL, Weng Y, Lincoln C, Horne D, Rahbar S. Novel dichlorophenyl urea compounds inhibit proliferation of human leukemia HL-60 cells by inducing cell cycle arrest, differentiation and apoptosis. Invest New Drugs. 2012 Aug;30(4):1413-25. doi: 10.1007/s10637-011-9711-8. Epub 2011 Jul 5. PMID: 21728022.](https://pubmed.ncbi.nlm.nih.gov/21728022/)
3. [Caldwell RW, Rodriguez PC, Toque HA, Narayanan SP, Caldwell RB. Arginase: A Multifaceted Enzyme Important in Health and Disease. Physiol Rev. 2018 Apr 1;98(2):641-665. doi: 10.1152/physrev.00037.2016. PMID: 29412048; PMCID: PMC5966718.](https://pmc.ncbi.nlm.nih.gov/articles/PMC5966718/)

### Setup, Data, and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. 
* The `Include.jl` file also loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem. It checks for a `Manifest.toml` file; if it finds one, packages are loaded. Other packages are downloaded and then loaded.

In [45]:
include("Include.jl");

__Build the model__. To store all the problem data, we created [the `MyPrimalFluxBalanceAnalysisCalculationModel` type](src/Types.jl). Let's build one of these objects for our problem and store it in the `model::MyPrimalFluxBalanceAnalysisCalculationModel` variable. We also return the `rd::Dict{String, String}` dictionary, which maps the reaction name field (key) to the reaction string (value).
* __Builder (or factory) pattern__: For all custom types that we make, we'll use something like [the builder software pattern](https://en.wikipedia.org/wiki/Builder_pattern) to construct and initialize these objects. The calling syntax will be the same for all types: [a `build(...)` method](src/Factory.jl) will take the kind of thing we want to build in the first argument, and the data needed to make that type as [a `NamedTuple` instance](https://docs.julialang.org/en/v1/base/base/#Core.NamedTuple) in the second argument.
* __What's the story with the `let` block__? A [let block](https://docs.julialang.org/en/v1/manual/variables-and-scoping/#Let-Blocks) creates a new hard scope and new variable bindings each time they run. Thus, they act like a private scratch space, where data comes in (is captured by the block), but only what we want to be exposed comes out. 

In [47]:
model, rd = let

    # first, load the reaction file - and process it
    listofreactions = read_reaction_file(joinpath(_PATH_TO_DATA, "Network.net")); # load the reactions from the VFF reaction file
    S, species, reactions, rd = build_stoichiometric_matrix(listofreactions); # Builds the stochiometric matrix, species list, and the reactions list
    boundsarray = build_default_bounds_array(listofreactions); # Builds a default bounds model using the flat file flags

    # build the FBA model -
    model = build(MyPrimalFluxBalanceAnalysisCalculationModel, (
        S = S, # stoichiometric matrix
        fluxbounds = boundsarray, # these are the *default* bounds, we'll need to update with new info if we have it
        species = species, # list of species. The rows of S are in this order
        reactions = reactions, # list of reactions. The cols of S are in this order
        objective = length(reactions) |> R -> zeros(R), # this is empty, we'll need to set this
    ));

    # return -
    model, rd
end;

In [48]:
model.fluxbounds

19×2 Matrix{Float64}:
     0.0  1000.0
     0.0  1000.0
     0.0  1000.0
     0.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0
 -1000.0  1000.0

`Unhide` the code block below to see how we build a table of the reactions in the model [using the `pretty_tables(...)` method exported from the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl).

In [50]:
let
    df = DataFrame()
    reactions = model.reactions;

    for i ∈ eachindex(reactions)
        reactionstring = reactions[i] |> key -> rd[key];
        row_df = (
            name = reactions[i],
            string = reactionstring,
        );
        push!(df, row_df);
    end

    pretty_table(df, tf = tf_simple, alignment = :l)
end

 [1m name   [0m [1m string                                                                                                               [0m
 [90m String [0m [90m String                                                                                                               [0m
  v1       v1,M_ATP_c+M_L-Citrulline_c+M_L-Aspartate_c,M_AMP_c+M_Diphosphate_c+M_N-(L-Arginino)succinate_c,false
  v2       v2,M_N-(L-Arginino)succinate_c,M_Fumarate_c+M_L-Arginine_c,false
  v3       v3,M_L-Arginine_c+M_H2O_c,M_L-Ornithine_c+M_Urea_c,false
  v4       v4,M_Carbamoyl_phosphate_c+M_L-Ornithine_c,M_Orthophosphate_c+M_L-Citrulline_c,false
  v5       v5,2*M_L-Arginine_c+4*M_Oxygen_c+3*M_NADPH_c+3*M_H_c,2*M_Nitric_oxide_c+2*M_L-Citrulline_c+3*M_NADP_c+4*M_H2O_c,true
  b1       b1,[],M_Carbamoyl_phosphate_c,true
  b2       b2,[],M_L-Aspartate_c,true
  b3       b3,[],M_Fumarate_c,true
  b4       b4,[],M_Urea_c,true
  b5       b5,[],M_ATP_c,true
  b6       b6,[],M_AMP_c,true
  b7       b7,[],

Update the objective function. 

In [83]:
i = findfirst(x-> x=="b4", model.reactions);
objective = model.objective;
objective[i] = -1; # why negative 1?

FB = model.fluxbounds;
FB[i,1] = -1.0;

__Compute the optimal flux distribution__: Finally, let's compute the optimal metabolic distribution $\left\{\hat{v}_{i} \mid i = 1,2,\dots,\mathcal{R}\right\}$ by solving the [linear programming problem](). We solve the optimization problem by passing the `model::MyPrimalFluxBalanceAnalysisCalculationModel` to [the `solve(...)` method](src/Compute.jl). This method returns a `solution::Dict{String, Any}` dictionary, which holds information about the solution.
* __Why the [try-catch environment](https://docs.julialang.org/en/v1/base/base/#try)__? The [solve(...) method](src/Compute.jl) has an [@assert statement](https://docs.julialang.org/en/v1/base/base/#Base.@assert) to check if the calculation has converged. Thus, the solve method can [throw](https://docs.julialang.org/en/v1/base/base/#Core.throw) an [AssertionError](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) if the optimization problem fails to converge. To gracefully handle this case, we use a [try-catch construct](https://docs.julialang.org/en/v1/base/base/#try). See the [is_solved_and_feasible method from the JuMP package](https://jump.dev/JuMP.jl/stable/api/JuMP/#JuMP.is_solved_and_feasible) for more information.

In [54]:
solution = let
    
    solution = nothing; # initialize nothing for the solution
    try
        solution = solve(model); # call the solve method with our problem model -
    catch error
        println("error: $(error)"); # Oooooops! Looks like we have a *major malfunction*, problem didn't solve
    end

    # return solution
    solution
end;

__Flux table__: Let's use [the `pretty_tables(...)` method exported by the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl) to display the estimated optimal metabolic fluxes. `Unhide` the code block below to see how we constructed the flux table.

In [56]:
let

    # setup -
    S = model.S;
    flux_bounds_array = model.fluxbounds;
    number_of_reactions = size(S,2); # columns
	flux_table = Array{Any,2}(undef,number_of_reactions,5)
    flux = solution["argmax"];
    
    # populate the state table -
	for reaction_index = 1:number_of_reactions
		flux_table[reaction_index,1] = model.reactions[reaction_index]
		flux_table[reaction_index,2] = flux[reaction_index]
		flux_table[reaction_index,3] = flux_bounds_array[reaction_index,1]
		flux_table[reaction_index,4] = flux_bounds_array[reaction_index,2]
        flux_table[reaction_index,5] = model.reactions[reaction_index] |> key-> rd[key]
	end

    # header row -
	flux_table_header_row = (["Reaction","v̂ᵢ", "v̂ᵢ LB", "v̂ᵢ UB", "Reaction"],["","mmol/gDW-time", "mmol/gDW-time", "mmol/gDW-time", "N/A"]);
		
	# write the table -
	pretty_table(flux_table; header=flux_table_header_row, tf=tf_simple, alignment = :l)
end

 [1m Reaction [0m [1m v̂ᵢ            [0m [1m v̂ᵢ LB         [0m [1m v̂ᵢ UB         [0m [1m Reaction                                                                                                             [0m
 [90m          [0m [90m mmol/gDW-time [0m [90m mmol/gDW-time [0m [90m mmol/gDW-time [0m [90m N/A                                                                                                                  [0m
  v1         1000.0          0.0             1000.0          v1,M_ATP_c+M_L-Citrulline_c+M_L-Aspartate_c,M_AMP_c+M_Diphosphate_c+M_N-(L-Arginino)succinate_c,false
  v2         1000.0          0.0             1000.0          v2,M_N-(L-Arginino)succinate_c,M_Fumarate_c+M_L-Arginine_c,false
  v3         1000.0          0.0             1000.0          v3,M_L-Arginine_c+M_H2O_c,M_L-Ornithine_c+M_Urea_c,false
  v4         1000.0          0.0             1000.0          v4,M_Carbamoyl_phosphate_c+M_L-Ornithine_c,M_Orthophosphate_c+M_L-Citrulline_c,false
 

## A model for flux bounds
The flux bounds are important constraints in flux balance analysis calculations and the convex decomposition of the stoichiometric array. Beyond their role in the flux estimation problem, the flux bounds are _integrative_, i.e., these constraints integrate many types of genetic and biochemical information into the problem. A general model for these bounds is given by:
$$
\begin{align*}
-\delta_{j}\underbrace{\left[{V_{max,j}^{\circ}}\left(\frac{e}{e^{\circ}}\right)\theta_{j}\left(\dots\right){f_{j}\left(\dots\right)}\right]}_{\text{reverse: other functions or parameters?}}\leq\hat{v}_{j}\leq{V_{max,j}^{\circ}}\left(\frac{e}{e^{\circ}}\right)\theta_{j}\left(\dots\right){f_{j}\left(\dots\right)}
\end{align*}
$$
where $V_{max,j}^{\circ}$ denotes the maximum reaction velocity (units: `flux`) computed at some _characteristic enzyme abundance_. Thus, the maximum reaction velocity is given by:
$$
V_{max,j}^{\circ} \equiv k_{cat,j}^{\circ}e^{\circ}
$$
where $k_{cat,j}$ is the catalytic constant or turnover number for the enzyme (units: `1/time`) and $e^{\circ}$ is a characteristic enzyme abundance (units: `concentration`). The term $\left(e/e^{\circ}\right)$ is a correction to account for the _actual_ enzyme abundance catalyzing the reaction (units: `dimensionless`). The $\theta_{j}\left(\dots\right)\in\left[0,1\right]$ is the current fraction of maximial enzyme activity of enzyme $e$ in reaction $j$. The activity model $\theta_{j}\left(\dots\right)$ describes [allosteric effects](https://en.wikipedia.org/wiki/Allosteric_regulation) on the reaction rate, and is a function of the regulatory and the chemical state of the system, the concentration of substrates, products, and cofactors (units: `dimensionless`).
Finally, the $f_{j}\left(\dots\right)$ is a function describing the substrate (reactants) dependence of the reaction rate $j$ (units: `dimensionless`). 

* __Parameters__: We need estimates for the $k_{cat,j}^{\circ}$ for all enzymes in the system we are interested in and a _reasonable policy_ for specifying a characteristic value for $e^{\circ}$. In addition, the $\theta_{j}\left(\dots\right)$ and $f_{j}\left(\dots\right)$ models can also have associated parameters, e.g., saturation or binding constants, etc. Thus, we need to estimate these from literature studies or experimental data.
* __Reversibility__: Next, we need to estimate the binary direction parameter $\delta_{j}\in\left\{0,1\right\}$. The value of $\delta_{j}$ describes the reversibility of reaction $j$; if reaction $j$ is __reversible__ $\delta_{j}=1$. If reaction $j$ is __irreversible__ $\delta_{j}=0$

Today, let's focus on the $(e/e^{\circ})$ term. The $e$ term is the actual enzyme abundance in the system, and $e^{\circ}$ is a characteristic enzyme abundance. The ratio $(e/e^{\circ})$ is a correction to account for the _actual_ enzyme abundance catalyzing the reaction. How do we estimate the $e$ term?

## Models for gene expression logic
One of the shortcomings that we discussed about flux balance analysis was:
* __No regulation__. FBA may conflict with experimental data, especially when regulatory loops are excluded. These discrepancies reveal the limitations of relying only on stoichiometric information without considering complex cellular regulation. This can be fixed with [regulatory flux balance analysis](https://pubmed.ncbi.nlm.nih.gov/11708855/). Gene expression is _easy(ish)_, but allosteric regulation (activity) is hard.

Let's investigate how we could describe gene regulation in flux balance analysis. Suppose the flux problem we were interested in was composed of enzymes encoded by the genes $\mathcal{G}=1,2,\dots, N$.
The _action_ of each gene is described by two differential equations, one for mRNA concentration ($m_{j}$, units: `nmol/gDW`) and a second for the corresponding protein concentration ($p_{j}$, units: `nmol/gDW`):
$$
\begin{align*}
	\dot{m}_{j} &= r_{X,j}u_{j}\left(\dots\right) - \left(\theta_{m,j}+\mu\right)\cdot{m_{j}}+\lambda_{j}\quad{j=1,2,\dots,N}\\
	\dot{p}_{j} &= r_{L,j}w_{j}\left(\dots\right) - \left(\theta_{p,j}+\mu\right)\cdot{p_{j}}
\end{align*}
$$
Terms in the balances:
* _Transcription_: The term $r_{X,j}u_{j}\left(\dots\right)$ in the mRNA balance, which denotes the _regulated rate of transcription_ for gene $j$. This is 
the product of a _kinetic limit_ $r_{X,j}$ (units: `nmol/gDW-h`) and a transcription control function $0\leq{u_{j}\left(\dots\right)}\leq{1}$ (dimensionless).
The final term $\lambda_{j}$ is the _unregulated expression rate_ of mRNA $j$ (units: `nmol/gDW-time`), i.e., this is the _leak_ expression rate.
* _Translation_: The _regulated rate of translation_ of mRNA $j$, denoted by $r_{L,j}w_{j}$, is also the product of the
kinetic limit of translation (units: `nmol/gDW-time`) and a translational control term $0\leq{w_{j}\left(\dots\right)}\leq{1}$ (dimensionless).
* _Degradation_: Lastly, $\theta_{\star,j}$ denotes the first-order rate constant (units: `1/time`) governing degradation of protein and mRNA, and $\mu$ is the specific growth rate of the cell (units: `1/time`). We get the latter term using cell-specific concentration units (e.g., `nmol/gDW`).

### Steady-state assumption
We have publically said (without proof _yet_) that gene expression _is slow_ and metabolism _is fast_. This means that the mRNA and protein concentrations are at an approximate steady state, i.e., $\dot{m}_{j}=\dot{p}_{j}=0$ from the perspective of the metabolic network. This allows us to solve the gene expression equations for the steady-state mRNA and protein concentrations. Let's show the steps to compute the steady-state mRNA concentration $m^{\star}_{j}$:
$$
\begin{align*}
r_{X,j}u_{j}\left(\dots\right) - \left(\theta_{m,j}+\mu\right)\cdot{m_{j}}+\lambda_{j} & = \dot{m}_{j}\\
r_{X,j}u_{j}\left(\dots\right) - \left(\theta_{m,j}+\mu\right)\cdot{m^{\star}_{j}}+\lambda_{j} &= 0 \\
r_{X,j}u_{j}\left(\dots\right) + \lambda_{j} & = \left(\theta_{m,j}+\mu\right)\cdot{m^{\star}_{j}}\\
\frac{r_{X,j}u_{j}\left(\dots\right) + \lambda_{j}}{\theta_{m,j}+\mu} &= m^{\star}_{j}\quad\text{for }j=1,2,\dots,N\quad\blacksquare
\end{align*}
$$
Following the same steps, we can compute the steady-state protein concentration $p^{\star}_{j}$:
$$
\begin{equation*}
p^{\star}_{j} = \frac{r_{L,j}w_{j}\left(\dots\right)}{\theta_{p,j}+\mu}\quad\text{for }j=1,2,\dots,N\quad\blacksquare
\end{equation*}
$$

Some things to think about:
* _Nonlinearity_: The expressions for $m^{\star}_{j}$ and $p^{\star}_{j}$ are tricker than they may seem at first blush. The steady-state mRNA and protein concentrations are a function of the kinetic limits of transcription and translation, the control functions, the degradation rates, and the specific growth rate of the cell. At the mRNA level, the $u(...)$ model could be a function of metabolite and protein concentrations. While the kinetic limit of translation $r_{L,j}$ and the $w_{j}(...)$ terms will be functions of the mRNA concentrations and other factors, such as ribosome availability, etc.
* _Models?_ We need to formulate the control functions $u_{j}(...)$ and $w_{j}(...)$, and the degradation rates $\theta_{m,j}$ and $\theta_{p,j}$, and the kinetic limits $r_{X,j}$ and $r_{L,j}$ for each gene in the system. These will be functions of parameters that must be estimated from literature studies or experimental data.
* _Complication_. Finally, the enzymes catalyzing the reactions in the metabolic network are often complexes of different protein subunits, where a different gene encodes each subunit. We need to formulate the gene-protein-reaction (GPR) rules that link the genes to the proteins and the proteins to the reactions in the metabolic network. This will allow us to integrate the gene expression logic into the flux balance analysis problem.

### What are the kinetic limit expressions?
The kinetic limit expressions $r_{X,j}$ and $r_{L,j}$ are the maximum rates of transcription and translation, respectively. 
* __Strategy__: The key idea behind deriving transcription and translation kinetic limit expressions is that the polymerase (or ribosome) acts as a pseudo-enzyme; it binds a gene (or message), reads the gene (or message), and then dissociates. Thus, one propose a set of elementary reactions for transcription and translation, one of which we assumed was rate limiting, and then invoke the pseudo state assumption for each intermediate complex to develop the overall rate expression.

#### Transcription kinetic limit
The transcription kinetic limit $r_{X,j}$ is given by:
$$
\begin{equation*}
  r_{X,j} = V^{max}_{X,j}\left(\frac{\mathcal{G}_{j}}{\tau_{X,j}K_{X,j}+\left(1+\tau_{X,j}\right)\mathcal{G}_{j}+
  \mathcal{O}_{X,j}}\right)
\end{equation*}
$$
where $V^{max}_{X,j}$ denotes the maximum transcription rate (units: `nmol/gDW-time`) of gene $j$, $\mathcal{G}_{j}$ denotes the concentration of gene $j$ (units: `nmol/gDW`), $K_{X,j}$ denotes the saturation constant for transcription of gene $j$ (units: `nmol/gDW`), $\tau_{X,j}$ denotes the time constant for transcription (dimensionless) and:
$$
\begin{equation*}
  \mathcal{O}_{X,j} = \sum_{i=1,j}^{\mathcal{N}}\frac{K_{X,j}\tau_{X,j}}{K_{X,i}\tau_{X,i}}\left(1+\tau_{X,i}\right)\mathcal{G}_{i}
\end{equation*}
$$
denotes the coupling of the transcription of gene $j$ with the other genes in the system through
competition for RNA polymerase. The maximum transcription rate $V_{X,j}^{max}$ was formulated as:
$$
\begin{equation*}
	V_{X,j}^{max} \equiv \left[R_{X,T}\left(\frac{\dot{v}_{X}}{l_{G,j}}\right)\right]
\end{equation*}
$$
where $R_{X,T}$ denotes the total RNA polymerase concentration (units: `nmol/gDW`), $\dot{v}_{X}$ denotes the RNA polymerase elongation rate (units: `nt/time`) and $l_{G,j}$ denotes the length of gene $j$ in nucleotides (nt).

#### Translation kinetic limit
Similarly, we developed an expression for the translational kinetic limit:
$$
\begin{equation}
  r_{L,j} = V^{max}_{L,j}\left(\frac{m_{j}}{\tau_{L,j}K_{L,j}+\left(1+\tau_{L,j}\right)m_{j}+
  \mathcal{O}_{L,j}}\right)
\end{equation}
$$
where $V^{max}_{L,j}$ denotes the maximum translation rate (units: `nmol/gdw-time`), $K_{L,j}$ denotes the saturation constant for translation of mRNA message $j$ (units: `nmol/gDW`), $\tau_{L,j}$ denotes the time constant for translation of message $j$ (dimensionless) and:
$$
\begin{equation}
  \mathcal{O}_{L,j} = \sum_{i=1,j}^{\mathcal{N}}\frac{K_{L,j}\tau_{L,j}}{K_{L,i}\tau_{L,i}}\left(1+\tau_{L,i}\right)m_{i}
\end{equation}
$$
describes the coupling of the translation of mRNA $j$ with other messages in the system because of kinetic competition for available ribosomes. The maximum translation rate $V_{L,j}^{max}$ was formulated as:
$$
\begin{equation}
	V_{L,j}^{max} \equiv \left[K_{P} R_{L,T}\left(\frac{\dot{v}_{L}}{l_{P,j}}\right)\right]
\end{equation}
$$
where $R_{L,T}$ denotes the total ribosome pool, $K_{P}$ denotes the polysome amplification constant,
$\dot{v}_{L}$ denotes the ribosome elongation rate (units: `aa/time`), and $l_{P,j}$ denotes the length of protein $j$ (units: `aa`).

#### References
* [Ron Milo, Paul Jorgensen, Uri Moran, Griffin Weber, Michael Springer, BioNumbers—the database of key numbers in molecular and cell biology, Nucleic Acids Research, Volume 38, Issue suppl_1, 1 January 2010, Pages D750–D753, https://doi.org/10.1093/nar/gkp889](https://academic.oup.com/nar/article/38/suppl_1/D750/3112244)
* [Adhikari A, Vilkhovoy M, Vadhin S, Lim HE, Varner JD. Effective Biophysical Modeling of Cell-Free Transcription and Translation Processes. Front Bioeng Biotechnol. 2020 Nov 26;8:539081. doi: 10.3389/fbioe.2020.539081. PMID: 33324619; PMCID: PMC7726328.](https://pubmed.ncbi.nlm.nih.gov/33324619/)

## The basics of Boolean algebra
__Idea__: Suppose we let the $u(...)$ control functions be binary, i.e., $u_{j}\in\left\{0,1\right\}$, and the $w(...)$ control functions be binary, i.e., $w_{j}\in\left\{0,1\right\}$.

### Basic Boolean operations
In Boolean algebra, a variable $v$ can take on one of two possible binary values, `v = 0` or `v = 1`. Boolean variables describe the state of a process or object, where `v = 0` indicates that a process or object is in the `OFF` state, while `v = 1` indicates the process or object is in the `ON` state. 

The basic operations of Boolean algebra are Conjunction, Disjunction, and Negation. These Boolean operations are associated with the corresponding binary operators `AND`, and `OR` and the unary operator `NOT`, collectively referred to as the `basic` Boolean operators:

* __Conjunction__: Conjunction between two boolean variables $x$ and $y$, given the symbol $x\land y$, denotes the `AND` operation; The `AND` operation can be evaluated using the `min` operator or multiplication. 
* __Disjunction__: Disjunction between two boolean variables $x$ and $y$, given the symbol $x\lor y$, is the `OR` operation; The `OR` can be evaluated using the `max` operator or addition.
* __Negation__: Negation is a unary operator. Negation of variable $x$, given the symbol $\lnot x$, denotes the `NOT` operation; The `NOT` can be implemented by $1-x$.

The basic boolean operators can be combined into more complex `secondary` operators, and they follow the same laws as ordinary algebra. One common secondary operator is `exclusive OR` or `XOR` given the symbol $x\oplus y$, which is defined as:

$$x\oplus y = \left(x\lor y\right) \land \lnot \left(x \land y\right)$$

Boolean expressions can be expressed by tabulating their values in a truth table; the rows denote the possible $2^n$ variable permutations, where $n$ represents the number of boolean variables, while the columns contain the values of the boolean expression.

In [61]:
enumerate_binary_variable_cases(2)

4×2 Matrix{Int64}:
 0  0
 0  1
 1  0
 1  1

`Unhide` the code block below to see how we constructed the truth table for `n = 2` binary variables for `AND`, `OR` and `XOR` rules.

In [63]:
let

    # generate a truth table for basic boolean operations -
	number_of_variables = 2
	number_of_rows = 2^number_of_variables
	state_array = Array{Any,2}(undef, number_of_rows, 5)
	
	# generate the rows of the truth table -
	input_array = enumerate_binary_variable_cases(number_of_variables);
	for row_index ∈ 1:number_of_rows

		# get input values -
		x = input_array[row_index,1]
		y = input_array[row_index,2]

		# compute XOR -
		C1 = max(x,y)
		C2 = 1 - min(x,y)
		
		state_array[row_index,1] = x
		state_array[row_index,2] = y
		state_array[row_index,3] = min(x,y)
		state_array[row_index,4] = max(x,y)
		state_array[row_index,5] = min(C1,C2)
	end

	header_row = (["x", "y", "x AND y", "x OR y", "x XOR y"])
    pretty_table(state_array; header = header_row, tf=tf_simple)
end

 [1m x [0m [1m y [0m [1m x AND y [0m [1m x OR y [0m [1m x XOR y [0m
  0   0         0        0         0
  0   1         0        1         1
  1   0         0        1         1
  1   1         1        1         0


## Using Boolean algebra in flux balance analysis
__Idea__: Suppose we let the $u(...)$ control functions be binary, i.e., $u_{j}\in\left\{0,1\right\}$, and the $w(...)$ control functions be binary, i.e., $w_{j}\in\left\{0,1\right\}$. 

If we had boolean descriptions for the control functions, we could estimate the steady-state mRNA and protein concentrations. Then the steady-state mRNA and protein concentration expressions in the `ON` case ($u_{j} = 1$ and $w_{j} = 1$) are given by:
$$
\begin{align*}
m^{\star}_{j} &= \frac{r_{X,j} + \lambda_{j}}{\theta_{m,j}+\mu}\qquad\,p^{\star}_{j} = \frac{r_{L,j}}{\theta_{p,j}+\mu}\\
\end{align*}
$$
and in the `OFF` case with $\lambda_{j}>0$:
$$
\begin{align*}
m^{\star}_{j} &= \frac{\lambda_{j}}{\theta_{m,j}+\mu}\qquad\,p^{\star}_{j} = \frac{r_{L,j}}{\theta_{p,j}+\mu}\\
\end{align*}
$$
or in the `OFF` case if $\lambda_{j}=0$: 
$$
\begin{align*}
m^{\star}_{j} &= 0\qquad\,p^{\star}_{j} = 0\\
\end{align*}
$$

### Wrinkle: Going from protein to enzyme abundance
Once we have $p^{\star}_{j}$, we can compute the enzyme abundance $e_{j}$ in the system. However, there are three _base cases_ to consider:
* __One to one__: If enzyme $e$ corresponds directly to protein $p$, then we can use the $p^{\star}$ expression directly.
* __Multisubunit__: If $e_{j}$ is a complex of different protein subunits, we can use the gene-protein-reaction (GPR) rules to compute the enzyme abundance $e_{j}$ as an `AND` combination. The `AND` rule requires all subunits to be expressed.
* __Isoforms__: Alternatively, if $e_{j}$ is a single protein but there are multiple isoforms, we can use the GPR rules to compute the enzyme abundance $e_{j}$ as an `OR` combination.

Incorporating Boolean regulatory models, which are typically parameter-free, into flux balance analysis calculations (and ultimately metabolic design calculations) improves the ability of this type of mathematical model to simulate (predict) metabolic function.  

__Boolean logic in FBA example papers__:
* [Covert MW, Palsson BO. Constraints-based models: regulation of gene expression reduces the steady-state solution space. J Theor Biol. 2003 Apr 7;221(3):309-25. doi: 10.1006/jtbi.2003.3071. PMID: 12642111.](https://pubmed.ncbi.nlm.nih.gov/12642111/)
* [Orth JD, Fleming RM, Palsson BØ. Reconstruction and Use of Microbial Metabolic Networks: the Core Escherichia coli Metabolic Model as an Educational Guide. EcoSal Plus. 2010 Sep;4(1). doi: 10.1128/ecosalplus.10.2.1. PMID: 26443778.](https://pubmed.ncbi.nlm.nih.gov/26443778/)
* [Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004 May 6;429(6987):92-6. doi: 10.1038/nature02456. PMID: 15129285.](https://pubmed.ncbi.nlm.nih.gov/15129285/)

Look at [the core _E.coli_ model Orth et al EcoSal Plus 2010 paper](https://pubmed.ncbi.nlm.nih.gov/26443778/):

In [65]:
model_core = let
    modelid = "e_coli_core"; # model id to download
    path_to_saved_model_file = joinpath(_PATH_TO_DATA, "saved-model-$(modelid).jld2"); # we've already downloaded this model from the BiGG database
    model = load(path_to_saved_model_file)["model"];
end

Dict{String, Any} with 6 entries:
  "metabolites"  => Any[Dict{String, Any}("compartment"=>"e", "name"=>"D-Glucos…
  "id"           => "e_coli_core"
  "compartments" => Dict{String, Any}("c"=>"cytosol", "e"=>"extracellular space…
  "reactions"    => Any[Dict{String, Any}("name"=>"Phosphofructokinase", "metab…
  "version"      => "1"
  "genes"        => Any[Dict{String, Any}("name"=>"adhE", "id"=>"b1241", "notes…

Let's look up a reaction rule:

In [67]:
rule = let

    testid = "SUCDi";
    list_of_genes = model_core["reactions"];
    ī = nothing;
    for i ∈ eachindex(list_of_genes)
        gene_id = model_core["reactions"][i]["id"];
        if gene_id == testid
            ī = i;
            break;
        end
    end

    # get the rule
    rule = model_core["reactions"][i]["gene_reaction_rule"];

    # return -
    rule
end

"b2987 or b3493"

How many possible cases for this rule: $2^{n}$ where $n$ is the number of genes in the rule:

In [69]:
enumerate_binary_variable_cases(4)

16×4 Matrix{Int64}:
 0  0  0  0
 0  0  0  1
 0  0  1  0
 0  0  1  1
 0  1  0  0
 0  1  0  1
 0  1  1  0
 0  1  1  1
 1  0  0  0
 1  0  0  1
 1  0  1  0
 1  0  1  1
 1  1  0  0
 1  1  0  1
 1  1  1  0
 1  1  1  1

# Today?
That's a wrap! Let's review - what are some things we discussed today?