# L4c: Metabolic Networks and the Stoichiometric Matrix
---
In this lecture, we will begin our discussion of metabolism and metabolic networks, and in particular, we'll introduce the stoichiometric matrix and analyze its properties. The key ideas that we will discuss in this lecture include:
* __Metabolism and metabolic networks__: A metabolic network is the complete set of metabolic (chemical) processes determining a cell's biochemical state. It encompasses all the chemical reactions associated with metabolism, i.e., the breakdown of raw materials such as sugars (catabolism) and the production of macromolecules, e.g., proteins, lipids, etc (anabolism).
* __A stoichiometric matrix__ is a mathematical representation of a metabolic network that encodes the relationships between reactants and products in a metabolic network, where rows correspond to different metabolites, contained in the set $\mathcal{M}$, and columns correspond to reactions, contained in the set $\mathcal{R}$. Thus, the stoichiometric matrix is a $\mathbf{S}\in\mathbb{R}^{|\mathcal{M}|\times|\mathcal{R}|}$ matrix holding the stochiometric coefficients $\sigma_{ij}\in\mathbf{S}$ for $i=1,2,\dots,|\mathcal{M}|$ and $j=1,2,\dots,|\mathcal{R}|$.
* __Structural analysis of $\mathbf{S}$__: Structural analysis of the stoichiometric matrix involves examining its connectivity distribution and using tools such as eigendecomposition to explore the network's fundamental pathway structures and other topological properties. These type of analyses give us some more insight into the structure of the network (and perhaps some indication of importance of particular metabolites of reactions).

Check out the lecture notes: [here!](https://github.com/varnerlab/CHEME-5450-Lectures-Spring-2025/blob/main/lectures/week-4/L4c/docs/Notes.pdf)

---

## Metabolic networks and the stoichiometric matrix
A metabolic network encompasses all the chemical reactions associated with metabolism, i.e., the breakdown of raw materials such as sugars ([catabolism](https://en.wikipedia.org/wiki/Catabolism)) and the production of macromolecules, e.g., DNA, RNA, proteins, lipids, etc ([anabolism](https://en.wikipedia.org/wiki/Anabolism)). These networks are curated for thousands of organisms and are available in various online databases. 

Let's check out a few of these online metabolic databases:
* [Minoru Kanehisa, Miho Furumichi, Yoko Sato, Yuriko Matsuura, Mari Ishiguro-Watanabe, KEGG: biological systems database as a model of the real world, Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D672–D677, https://doi.org/10.1093/nar/gkae909](https://academic.oup.com/nar/article/53/D1/D672/7824602)
* [Karp PD, Billington R, Caspi R, Fulcher CA, Latendresse M, Kothari A, Keseler IM, Krummenacker M, Midford PE, Ong Q, Ong WK, Paley SM, Subhraveti P. The BioCyc collection of microbial genomes and metabolic pathways. Brief Bioinform. 2019 Jul 19;20(4):1085-1093. doi: 10.1093/bib/bbx085. PMID: 29447345; PMCID: PMC6781571.](https://pubmed.ncbi.nlm.nih.gov/29447345/)
* [Charles J Norsigian, Neha Pusarla, John Luke McConn, James T Yurkovich, Andreas Dräger, Bernhard O Palsson, Zachary King, BiGG Models 2020: multi-strain genome-scale models and expansion across the phylogenetic tree, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D402–D406, https://doi.org/10.1093/nar/gkz1054](https://academic.oup.com/nar/article/48/D1/D402/5614178)

There are many other databases with information about enzymes and other biological numbers that we may be interested in:
* [Antje Chang, Lisa Jeske, Sandra Ulbrich, Julia Hofmann, Julia Koblitz, Ida Schomburg, Meina Neumann-Schaal, Dieter Jahn, Dietmar Schomburg, BRENDA, the ELIXIR core data resource in 2021: new developments and updates, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D498–D508, https://doi.org/10.1093/nar/gkaa1025](https://academic.oup.com/nar/article/49/D1/D498/5992283)
* [Ron Milo, Paul Jorgensen, Uri Moran, Griffin Weber, Michael Springer, BioNumbers—the database of key numbers in molecular and cell biology, Nucleic Acids Research, Volume 38, Issue suppl_1, 1 January 2010, Pages D750–D753, https://doi.org/10.1093/nar/gkp889](https://academic.oup.com/nar/article/38/suppl_1/D750/3112244)

### Stoichiometric matrix
Suppose we have a set of biochemical reactions $\mathcal{R}$ involving chemical species (metabolite) set $\mathcal{M}$. Then, the stoichiometric matrix is a $\mathbf{S}\in\mathbb{R}^{|\mathcal{M}|\times|\mathcal{R}|}$ matrix, where $|\mathcal{M}|$ denotes the number of chemical species and $|\mathcal{R}|$ denotes the number of reactions. The elements of the stoichiometric matrix $\sigma_{ij}\in\mathbf{S}$ are stoichiometric coefficients  such that:
* $\sigma_{ij}>0$: Chemical species (metabolite) $i$ is _produced_ by reaction $j$. Species $i$ is a product of reaction $j$.
* $\sigma_{ij} = 0$: Chemical species (metabolite) $i$ is not connected with reaction $j$
* $\sigma_{ij}<0$: Chemical species (metabolite) $i$ is _consumed_ by reaction $j$. Species $i$ is a reactant of reaction $j$.

The stoichiometric matrix $\mathbf{S}$ is the digital representation of the biochemistry occurring inside some volume, i.e., inside the cell, in a test tube in the case of cell-free systems, or some abstract volume such as a compartment or pseudo compartment of interest.

Let's download (and construct) a few stoichiometric matrices from [the BiGG database](http://bigg.ucsd.edu/) and check out their properties

## Setup, Data, and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [4]:
include("Include.jl");

### Data
We developed a simple software development kit (SDK) against [the BiGG Models application programming interface at the University of California, San Diego](http://bigg.ucsd.edu/). The [BiGG Models database](http://bigg.ucsd.edu/) integrates published genome-scale metabolic networks into a single database with standardized nomenclature and structure. 
* [The BiGG models API](http://bigg.ucsd.edu/data_access) allows users to programmatically access genome-scale stoichiometric model reconstructions using a simple web API. There are `108` models of intracellular biochemistry occurring in various organisms (including humans) in the database (so far); [see here for a list of models](http://bigg.ucsd.edu/models).

We call the model download endpoint of [the BiGG models API](http://bigg.ucsd.edu/data_access) and then save the model file to disk (so we don't hit the API unless we have to). This call returns model information organized as [a Julia dictionary](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) in the `model::Dict{String, Any}` variable. If a model file is saved, we use the cached file instead of making an API call.

In [6]:
model = let

    # build download endpoint -
    baseurl = "http://bigg.ucsd.edu"; # base url to download model
    modelid = "iYO844"; # model id to download
    path_to_saved_model_file = joinpath(_PATH_TO_DATA, "saved-model-$(modelid).jld2");

    # check: do we have a model file saved?
    model = nothing;
    if (isfile(path_to_saved_model_file) == false)
        
        endpoint = MyBiggModelsDownloadModelEndpointModel();
        endpoint.bigg_id = modelid;
        url = build(baseurl, endpoint)
        model = MyBiggModelsDownloadModelEndpointModel(url);

        # Before we move on, save this model for later (so we don't keep hitting the API)
        save(path_to_saved_model_file, Dict("model" => model));
    else
        model = load(path_to_saved_model_file)["model"];
    end
    model; # return the model (either saved, or downloaded)
end

Dict{String, Any} with 6 entries:
  "metabolites"  => Any[Dict{String, Any}("compartment"=>"c", "name"=>"L-2-Amin…
  "id"           => "iYO844"
  "compartments" => Dict{String, Any}("c"=>"cytosol", "e"=>"extracellular space…
  "reactions"    => Any[Dict{String, Any}("name"=>"Ethanolamine exchange", "met…
  "version"      => "1"
  "genes"        => Any[Dict{String, Any}("name"=>"atpF", "id"=>"BSU36850", "no…

__Genes records__: Each model has a set of genes associated with the model's chemical reactions. The `genes` has several subfields that give information about the gene. For example, the `refseq_name` field [is the name in the NCBI reference sequence database](https://www.ncbi.nlm.nih.gov/refseq/).

In [8]:
model["genes"][1]["annotation"]

Dict{String, Any} with 9 entries:
  "subtilist"         => Any["BG10817"]
  "sbo"               => "SBO:0000243"
  "EnsemblGenomes-Tr" => Any["CAB15702"]
  "interpro"          => Any["IPR002146", "IPR028987", "IPR005864"]
  "goa"               => Any["P37814"]
  "ncbigi"            => Any["2636210"]
  "refseq_locus_tag"  => Any["BSU36850"]
  "refseq_name"       => Any["atpF"]
  "EnsemblGenomes-Gn" => Any["BSU36850"]

__Metabolite records__: Each metabolite (chemical compound) in the network has an associated metabolite record with several fields. Let's take a look at the metabolite at index `1`. The  `id` field holds an abbreviation or symbol associated with this metabolite.

In [10]:
model["metabolites"][1] # example metabolite record

Dict{String, Any} with 7 entries:
  "compartment" => "c"
  "name"        => "L-2-Amino-3-oxobutanoate"
  "formula"     => "C4H7NO3"
  "id"          => "2aobut_c"
  "charge"      => 0
  "notes"       => Dict{String, Any}("original_bigg_ids"=>Any["2aobut_c"])
  "annotation"  => Dict{String, Any}("sabiork"=>Any["6672"], "kegg.compound"=>A…

__Reaction records__: Similarly, each reaction in the network has a reaction record with several fields. Let's look at the reaction record at index `25`. The species involved in a reaction are contained in the `metabolites` field, which lists the stoichiometric coefficients associated with this particular reaction.

In [12]:
model["reactions"][156] # example reaction record

Dict{String, Any} with 9 entries:
  "name"               => "2  hydroxymethyl phenol transport inout via proton s…
  "metabolites"        => Dict{String, Any}("h_e"=>-1.0, "2hxmp_e"=>-1.0, "2hxm…
  "lower_bound"        => -999999.0
  "id"                 => "2HXMPt6"
  "notes"              => Dict{String, Any}("original_bigg_ids"=>Any["2HXMPt6"])
  "gene_reaction_rule" => ""
  "upper_bound"        => 999999.0
  "subsystem"          => "S_Transport"
  "annotation"         => Dict{String, Any}("metanetx.reaction"=>Any["MNXR94804…

#### Stoichiometric matrix
Next, let's build a stoichiometric matrix $\mathbf{S}$ using the metabolite and reaction records. We'll do this using nested [`for` loops](https://docs.julialang.org/en/v1/base/base/#for):
* _In the outer loop_: we iterate over the system's `metabolites` (chemical species) and select the `id` field from the `metabolites` record.
* _In the inner loop_: we iterate over each reaction. For each reaction record, we ask if this reaction has an entry for the current metabolite `id` value; if it does, we grab the stoichiometric coefficient $\sigma_{ij}\in\mathbf{S}$ corresponding to this metabolite and reaction.

In [14]:
S = let

    # get some data from the model -
    m = model["metabolites"]; # get list of metabolites
    r = model["reactions"]; # get list of reactions
    number_of_rows = length(m); # how many metabolites do we have? (rows)
    number_of_cols = length(r); # how many reactions do we have? (cols)
    S = zeros(number_of_rows,number_of_cols); # initialize an empty stoichiometric matrix

    # let's build a stm -
    for i ∈ eachindex(m)
        metabolite = m[i]["id"]; # we are checking if this metabolite is in the reaction record
        for j ∈ eachindex(r)
            reaction = r[j];
            if (haskey(reaction["metabolites"], metabolite) == true)
                S[i,j] = reaction["metabolites"][metabolite];
            end
        end
    end
    S; 
end;

In [15]:
S

990×1250 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0  0.0  0.0  0.0  0.0  0.0
 ⋮          

#### Binary stoichiometric matrix
Finally, let's compute the binary stoichiometric matrix $\bar{\mathbf{S}}$ by [calling the `binary(...)` method](src/Compute.jl). The binary stochiometric matrix is constructed by replacing each non-zero entry of $\mathbf{S}$ with a `1`.

In [17]:
S̄ = binary(S) # convert all non-zero entries to 1

990×1250 Matrix{Int64}:
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  

## Connectivity of a metabolic network
The connectivity of metabolites in a metabolic network provides insight into the importance of a metabolite or reaction. Altering the enzyme level of a highly connected reaction or conditions affecting a highly connected metabolite may yield a greater response than changing an unconnected one. 

* __Connectivity matrix__: We can explore the connectivity by constructing connectivity matrices from the binary stoichiometric matrices. In particular, we compute the metabolite connectivity array $\mathbf{C}_{m}$ or the reaction connectivity matrix $\mathbf{C}_{r}$ and look at some of its properties. 

Let's start with the metabolite connectivity array $\mathbf{C}_{m}$.

### Metabolite connectivity
The metabolite connectivity matrix, defined as $\mathbf{C}_{m} \equiv \bar{\mathbf{S}}\bar{\mathbf{S}}^{\top}$, is an $|\mathcal{M}|\times|\mathcal{M}|$ symmetric array with the following features:
* __Diagonal elements__:The elements along the central diagonal $c_{ii}\in\mathbf{C}_{m}$ are the total number of reactions a particular metabolite participates in. However. because we removed the directionality when we computed the binary stoichiometric matrix, we have no information about whether the participation is a reactant or product.
* __Off diagonal elements__: The off-diagonal elements $c_{ij}\in\mathbf{C}_{m}$ where $i\neq{j}$ describe how many reactions metabolite $i$ has in common with metabolite $j$, i.e., the number of joints reactions for the pair.

In [20]:
Cₘ = S̄*transpose(S̄) # metabolite connectivity matrix M x M matrix

990×990 Matrix{Int64}:
 2  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0   0  0  0  0  0  0  0
 0  2  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  2  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  2  1  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  1  5  1  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  1  2  0  0  0  0  0  0  …  0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  2  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  2  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  6  1  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  1  3  0  0     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  2  0  …  0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  3     0  0  0  0  0   0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  

In [21]:
argmax(diag(Cₘ)) |> i-> model["metabolites"][i] # maximum connectivity metabolite

Dict{String, Any} with 7 entries:
  "compartment" => "c"
  "name"        => "H+"
  "formula"     => "H"
  "id"          => "h_c"
  "charge"      => 1
  "notes"       => Dict{String, Any}("original_bigg_ids"=>Any["h_c"])
  "annotation"  => Dict{String, Any}("sabiork"=>Any["39"], "kegg.compound"=>Any…

Let's sort the diagonal elements $\text{diag}(\mathbf{C}_{m})$ from largest to smallest and then build [a table using the `pretty_table(...)` method exported by the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl). `Unhide` the code block below to see how we constructed the metabolite connectivity table.
* __Summary__: Several of the highest connected metabolites are associated with energy metabolism. Thus, if we manipulate [the energy metabolic pathways](https://www.genome.jp/kegg/pathway.html#energy) or otherwise perturb the energetics of the cell, we should expect a significant (for better or worse) response from the system. Interesting! But what about the reaction connectivity?

In [23]:
let
    df = DataFrame();
    d = diag(Cₘ);
    î = sortperm(d, rev=true);
    number_of_rows_in_table = 20;
    for i ∈ î[1:number_of_rows_in_table]
        m = model["metabolites"][i]
        row_df = (
            index = i,
            compartment = m["compartment"],
            name = m["name"],
            id = m["id"],
            connections = d[i]
        );
        push!(df, row_df) # capture the row
    end
    pretty_table(df, tf=tf_simple)
end

 [1m index [0m [1m compartment [0m [1m                                                  name [0m [1m       id [0m [1m connections [0m
 [90m Int64 [0m [90m      String [0m [90m                                                String [0m [90m   String [0m [90m       Int64 [0m
    596             c                                                      H+        h_c           616
    588             c                                                 H2O H2O      h2o_c           342
    255             c                                       ATP C10H12N5O13P3      atp_c           246
    665             c                                               Phosphate       pi_c           210
    231             c                                       ADP C10H12N5O10P2      adp_c           208
    598             e                                                      H+        h_e           143
    693             c                       Nicotinamide adenine dinucleotide      nad_c 

### Reaction connectivity
The reaction connectivity matrix, defined as $\mathbf{C}_{r} \equiv \bar{\mathbf{S}}^{\top}\bar{\mathbf{S}}$, is an $|\mathcal{R}|\times|\mathcal{R}|$ symmetric array with the following features:
* __Diagonal elements__:The elements along the central diagonal $c_{ii}\in\mathbf{C}_{r}$ are the total number of reactants and products of a particular reaction. However, because we removed the directionality when we computed the binary stoichiometric matrix, we have no information about the number of reactants or products, just the total participation number for a reaction.
* __Off diagonal elements__: The off-diagonal elements of the reaction matrix $c_{ij}\in\mathbf{C}_{r}$ where $i\neq{j}$ describe how many metabolites are shared between reaction $i$ and $j$, i.e., the number of joint metabolites for the pair.

In [25]:
Cᵣ = transpose(S̄)*S̄ # metabolite connectivity matrix R x R matrix

1250×1250 Matrix{Int64}:
 1  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0   0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  1  0  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  1  0  0  0  0  0  0  0  …  0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  1  0  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  0  1  0  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  1  0  0  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  1  0  0  …  0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  1  0     0  0  0  0  0   0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0   0  0

What is the most connected reaction? Find the index of the maximum diagonal element [using the `argmax(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.argmax), and then [pipe that index using the `|>` operator](https://docs.julialang.org/en/v1/manual/functions/#Function-composition-and-piping) to the reaction array. Save the highest connected reaction in the `most_connected_reaction::Dict{String,Any}` variable:

In [27]:
most_connected_reaction = argmax(diag(Cᵣ)) |> i-> model["reactions"][i] # maximum connectivity reaction

Dict{String, Any} with 10 entries:
  "name"                  => "B subtilis biomass  demand "
  "metabolites"           => Dict{String, Any}("glu__L_c"=>-0.260378, "psetha_B…
  "lower_bound"           => 0.0
  "id"                    => "BIOMASS_BS_10"
  "notes"                 => Dict{String, Any}("original_bigg_ids"=>Any["BS_Bio…
  "gene_reaction_rule"    => ""
  "upper_bound"           => 999999.0
  "subsystem"             => "Biomass and maintenance functions"
  "objective_coefficient" => 1.0
  "annotation"            => Dict{String, Any}("metanetx.reaction"=>Any["MNXR13…

We can [call the `reactionstring(...)` method](src/Compute.jl) with the `metabolites` dictionary from a `reaction` dictionary to see the reaction string: 

In [29]:
test = reactionstring(most_connected_reaction["metabolites"])

"0.00056 psetha_BS_c + 0.000367 10fthf_c + 0.00018 gdp_c + 0.002347 gtca2_45_BS_c + 0.323093 lys__L_c + 0.081739 his__L_c + 0.101817 peptido_BS_c + 0.000503 gmp_c + 0.00467 amp_c + 1.5e-5 lipo4_24_BS_c + 0.038902 ctp_c + 0.022899 dttp_c + 6.6e-5 t12dg_BS_c + 6.0e-6 lipo2_"[93m[1m ⋯ 672 bytes ⋯ [22m[39m"2 tcam_BS_c + 0.266902 ala__L_c + 0.05699 cys__L_c + 0.00345 fe3_c + 0.147987 asn__L_c + 105.0 h2o_c + 0.000934 nadp_c + 8.6e-5 m12dg_BS_c + 0.148014 asp__L_c + 0.000266 mql7_c + 0.113326 met__L_c + 0.003624 gtca1_45_BS_c = 104.997414 adp_c + 104.985613 pi_c + 105.0 h_c"

Let's sort the diagonal elements $\text{diag}(\mathbf{C}_{r})$ from largest to smallest and then build [a table using the `pretty_table(...)` method exported by the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl). `Unhide` the code block below to see how we constructed the reaction connectivity table.
* __Summary__: The most connected reaction list makes intuitive sense (or maybe not) depending on the network we are looking at. For example, for networks with a `biomass` reaction (which describes the requirements to make more cells), we would intuitively expect those reactions to be highly ranked. However, for other networks that describe non-replicating systems, the importance of the reaction may be specific to the function of the cell.

In [65]:
let
    df = DataFrame();
    d = diag(Cᵣ);
    î = sortperm(d, rev=true);
    number_of_rows_in_table = 20;
    for i ∈ î[1:number_of_rows_in_table]
        m = model["reactions"][i]
        row_df = (
            index = i,
            id = m["id"],
            connections = d[i],
            reaction = reactionstring(m["metabolites"]),
        );
        push!(df, row_df) # capture the row
    end
    #pretty_table(df, tf=tf_simple)
    df
end

Row,index,id,connections,reaction
Unnamed: 0_level_1,Int64,String,Int64,String
1,332,BIOMASS_BS_10,63,0.00056 psetha_BS_c + 0.000367 10fthf_c + 0.00018 gdp_c + 0.002347 gtca2_45_BS_c + 0.323093 lys__L_c + 0.081739 his__L_c + 0.101817 peptido_BS_c + 0.000503 gmp_c + 0.00467 amp_c + 1.5e-5 lipo4_24_BS_c + 0.038902 ctp_c + 0.022899 dttp_c + 6.6e-5 t12dg_BS_c + 6.0e-6 lipo2_24_BS_c + 0.041501 utp_c + 0.000918 ppi_c + 0.017398 dgtp_c + 0.062667 gtp_c + 7.0e-6 lipo1_24_BS_c + 0.269905 ile__L_c + 0.110824 tyr__L_c + 0.408288 gly_c + 1.8e-5 lipo3_24_BS_c + 0.186317 thr__L_c + 0.101714 mg2_c + 5.0e-6 cdlp_BS_c + 0.01738 dctp_c + 105.053483 atp_c + 0.160642 pro__L_c + 0.175939 phe__L_c + 0.306734 val__L_c + 0.022982 datp_c + 0.346445 leu__L_c + 0.001042 cmp_c + 0.260378 glu__L_c + 0.00011 d12dg_BS_c + 0.016164 nad_c + 2.2e-5 lysylpgly_BS_c + 0.706312 k_c + 0.216213 ser__L_c + 0.003205 ca2_c + 0.001819 gtca3_45_BS_c + 0.193021 arg__L_c + 0.000251 cdp_c + 0.260335 gln__L_c + 0.054336 trp__L_c + 0.000176 pgly_BS_c + 0.000216 nadph_c + 0.003112 tcam_BS_c + 0.266902 ala__L_c + 0.05699 cys__L_c + 0.00345 fe3_c + 0.147987 asn__L_c + 105.0 h2o_c + 0.000934 nadp_c + 8.6e-5 m12dg_BS_c + 0.148014 asp__L_c + 0.000266 mql7_c + 0.113326 met__L_c + 0.003624 gtca1_45_BS_c = 104.997414 adp_c + 104.985613 pi_c + 105.0 h_c
2,1239,TECA2S45,13,45.0 atp_c + 45.0 ala__D_c + 45.0 h2o_c + 45.0 cdpglyc_c + 1.0 uacmam_c + 1.0 uacgam_c = 45.0 amp_c + 1.0 ump_c + 45.0 ppi_c + 1.0 gtca2_45_BS_c + 1.0 udp_c + 45.0 cmp_c + 91.0 h_c
3,316,AGPATr_BS,12,0.01 1ag3p_BS_c + 0.17 fa12coa_c + 0.07 fa11coa_c + 0.2 fa3coa_c + 0.34 fa4coa_c + 0.05 fa6coa_c + 0.03 strcoa_c + 0.01 fa1coa_c + 0.1 pmtcoa_c + 0.03 tdcoa_c = 1.0 coa_c + 0.01 12dag3p_BS_c
4,471,G3POA_BS,12,0.17 fa12coa_c + 0.07 fa11coa_c + 0.2 fa3coa_c + 0.34 fa4coa_c + 0.05 fa6coa_c + 1.0 glyc3p_c + 0.03 strcoa_c + 0.01 fa1coa_c + 0.1 pmtcoa_c + 0.03 tdcoa_c = 0.01 1ag3p_BS_c + 1.0 coa_c
5,758,LIPO3S24_BS,10,2400.0 atp_c + 2400.0 ala__D_c + 2400.0 h2o_c + 2400.0 cdpglyc_c + 1.0 d12dg_BS_c = 2400.0 amp_c + 1.0 lipo3_24_BS_c + 2400.0 ppi_c + 2400.0 cmp_c + 4800.0 h_c
6,1245,TECA3S45,10,1.0 h2o_c + 45.0 cdpglyc_c + 1.0 uacmam_c + 45.0 udpg_c + 1.0 uacgam_c = 1.0 gtca3_45_BS_c + 1.0 ump_c + 46.0 udp_c + 45.0 cmp_c + 91.0 h_c
7,362,ASNS1,9,1.0 atp_c + 1.0 h2o_c + 1.0 gln__L_c + 1.0 asp__L_c = 1.0 glu__L_c + 1.0 amp_c + 1.0 ppi_c + 1.0 h_c + 1.0 asn__L_c
8,403,CTPS2,9,1.0 atp_c + 1.0 h2o_c + 1.0 gln__L_c + 1.0 utp_c = 1.0 adp_c + 1.0 glu__L_c + 1.0 ctp_c + 1.0 pi_c + 2.0 h_c
9,420,CBPS,9,2.0 atp_c + 1.0 h2o_c + 1.0 gln__L_c + 1.0 hco3_c = 2.0 adp_c + 1.0 glu__L_c + 1.0 cbp_c + 1.0 pi_c + 2.0 h_c
10,511,FEDCabc,9,1.0 atp_c + 1.0 h2o_c + 2.0 cit_e + 1.0 fe3_e = 1.0 adp_c + 2.0 cit_c + 1.0 fe3_c + 1.0 pi_c + 1.0 h_c


## Other ways to estimate important reactions (metabolites)?
We imposed the idea that connectivity is proportional to importance; what if this isn't the case? Could we have the stoichiometric matrix tell us what is important? Yes! We could decompose the stoichiometric matrix using tools from [eigendecomposition](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#). But there is a _gotcha_: the stoichiometric matrix is not _square_. Thus, we cannot directly compute its [eigendecomposition](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#).

* __Idea__: Let's compute a square matrix that measures the _similarity_ of reactions (or metabolites) using a measure such as the [covariance](https://en.wikipedia.org/wiki/Covariance_matrix) or a [kernel function](https://en.wikipedia.org/wiki/Kernel_method). This gives us an idea about the relationship between the network's reactions (or metabolites) $i$ and $j$. Then, we can decompose that and look at the [eigenvalues and eigenvectors](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) to see what we can see. In particular, we'll look at the eigenvector corresponding to the largest eigenvalue.

For giggles, let's propose [a kernel function](https://en.wikipedia.org/wiki/Kernel_method) to compute the _similarity_ of the rows (metabolites) or the columns (reactions) and save it in the `k(x,y)::Function` variable:

In [33]:
k(x,y) = dot(x,y);

Next, compute the reaction (metabolite) similarity matrix $\hat{\mathbf{S}}$ and save it in the `Ŝ::Array{Float64,2}` variable. If we are considering the reaction similarity then $\hat{\mathbf{S}}\in\mathbb{R}^{|\mathcal{R}|\times|\mathcal{R}|}$, otherwise the metabolite similarity will be a $\hat{\mathbf{S}}\in\mathbb{M}^{|\mathcal{R}|\times|\mathcal{M}|}$ matrix.

In [35]:
Ŝ = let
    
    # get some data from the system -
    m = model["metabolites"] |> length # get the number of metabolites
    r = model["reactions"] |> length # get the number of reactions

    # Uncomment me for reaction similarity
    Ŝ = zeros(r,r); # create a scaled stoichiometric matrix
    for i ∈ 1:r
        σᵢ = S[:,i]; # get the ith col (reaction)
        for j ∈ 1:r
            σⱼ = S[:,j]; # get the jth col (reaction)
            Ŝ[i,j] = k(σᵢ,σⱼ);
        end
    end

    # Uncomment me for metabolite similarity -
    # Ŝ = zeros(m,m); # create a scaled stoichiometric matrix
    # for i ∈ 1:m
    #     mᵢ = S[i,:]; # get the ith row (metabolite)
    #     for j ∈ 1:m
    #         mⱼ = S[j,:]; # get the jth row (metabolite)
    #         Ŝ[i,j] = k(mᵢ,mⱼ);
    #     end
    # end

    Ŝ
end;

In [36]:
Ŝ

1250×1250 Matrix{Float64}:
 1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0   0.0   0.0   0.0    0.0
 0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  …  0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0   0.0   0.0   0.0    0.0
 ⋮                      

__Compute the eigenvalues and the eigenvectors__. We use the built-in [`eigen(...)` function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen) to compute [the eigendecomposition](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#:~:text=In%20linear%20algebra%2C%20eigendecomposition%20is,be%20factorized%20in%20this%20way.). This function takes a square matrix $\mathbf{A}$ (and potentially some additional optional arguments) and returns [an `Eigen` factorization object](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Eigen) holding the eigenvalues and eigenvectors.

In [38]:
λ,V = let

    # compute the eigendecomposition
    F = eigen(Ŝ); # compute the decomposition, returns eigen factorization
    λ = F.values; # get the values from F
    V = F.vectors; # get the vectors from F

    # return -
    λ,V
end;

__What's in the largest eigenvector?__ 

First, why are we looking at only the largest eigenvector? Let's borrow an idea from Google. [The Google PageRank algorithm](https://epubs.siam.org/doi/10.1137/050623280) utilizes the dominant eigenvalue and its corresponding eigenvector to assess the importance of web pages when searching the internet. Let's use [the `softmax(...)` function](src/Compute.jl) to transform the largest eigenvector in a probability vector (sums to one, all entries are non-negative). The [softmax](https://en.wikipedia.org/wiki/Softmax_function) for some vector $\mathbf{z}$ is defined as:
$$
\begin{equation}
\sigma(\mathbf{z})_{i} = \frac{e^{z_{i}}}{\sum_{j=1}^{m}e^{z_{j}}}\quad{i=1,2,\dots,m}
\end{equation}
$$
where $\sigma(\mathbf{z})_{i}$ is the ith component of the transformed eigenvector. We apply [the `argmax(...)` function](https://docs.julialang.org/en/v1/base/collections/#Base.argmax) to the transformed vector to get the largest component.

In [40]:
i,j,r̂,m̂ = let

    # setup -
    m = nothing;
    r = nothing;
    i = argmax(λ); # get the index of the largest eigenvalue i
    j = V[:,i] |> v-> softmax(v) |> v̂-> argmax(v̂); # get the i vector, apply a softmax, find index of maximum element

    # Uncomment me for reactions -
    r = model["reactions"][j]

    # Uncomment me for metabolites -
    # m = model["metabolites"][j]
    
    # return
    i,j,r,m
end;

In [67]:
r̂ |> reactionstring

LoadError: MethodError: no method matching isless(::Dict{String, Any}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.

[0mClosest candidates are:
[0m  isless([91m::Missing[39m, ::Any)
[0m[90m   @[39m [90mBase[39m [90m[4mmissing.jl:87[24m[39m
[0m  isless(::Any, [91m::Missing[39m)
[0m[90m   @[39m [90mBase[39m [90m[4mmissing.jl:88[24m[39m
[0m  isless([91m::AbstractFloat[39m, ::Real)
[0m[90m   @[39m [90mBase[39m [90m[4moperators.jl:180[24m[39m
[0m  ...


## Further Reading
Several publications with a similar theme, i.e., using different matrix factorizations such as [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) or different approaches such as looking at the degree distribution, have been published to understand the structural features of biologically derived networks such the stoichiometric arrays:
* [Price ND, Reed JL, Papin JA, Famili I, Palsson BO. Analysis of metabolic capabilities using singular value decomposition of extreme pathway matrices. Biophys J. 2003 Feb;84(2 Pt 1):794-804. doi: 10.1016/S0006-3495(03)74899-1. PMID: 12547764; PMCID: PMC1302660.](https://pubmed.ncbi.nlm.nih.gov/12547764/)
* [Famili I, Palsson BO. Systemic metabolic reactions are obtained by singular value decomposition of genome-scale stoichiometric matrices. J Theor Biol. 2003 Sep 7;224(1):87-96. doi: 10.1016/s0022-5193(03)00146-2. PMID: 12900206.](https://pubmed.ncbi.nlm.nih.gov/12900206/)
* [Barrett CL, Price ND, Palsson BO. Network-level analysis of metabolic regulation in the human red blood cell using random sampling and singular value decomposition. BMC Bioinformatics. 2006 Mar 13;7:132. doi: 10.1186/1471-2105-7-132. PMID: 16533395; PMCID: PMC1421444.](https://pubmed.ncbi.nlm.nih.gov/16533395/)
* [Broido, A.D., Clauset, A. Scale-free networks are rare. Nat Commun 10, 1017 (2019). https://doi.org/10.1038/s41467-019-08746-5](https://rdcu.be/d9Q02)

# Today?
That's a wrap! What are some things we discussed today?