# L4a: Metabolic Networks and the Stoichiometric Matrix
Fill me in.

## Theory: What is a stoichiometric matrix?
Suppose we have a set of chemical (or biochemical) reactions $\mathcal{R}$ involving chemical species (metabolite) set $\mathcal{M}$. Then, the stoichiometric matrix is a $\mathbf{S}\in\mathbb{R}^{|\mathcal{M}|\times|\mathcal{R}|}$ matrix, where $|\mathcal{M}|$ denotes the number of chemical species and $|\mathcal{R}|$ denotes the number of reactions. The elements of the stoichiometric matrix $\sigma_{ij}\in\mathbf{S}$ are stoichiometric coefficients  such that:
* $\sigma_{ij}>0$: Chemical species (metabolite) $i$ is _produced_ by reaction $j$. Species $i$ is a product of reaction $j$.
* $\sigma_{ij} = 0$: Chemical species (metabolite) $i$ is not connected with reaction $j$
* $\sigma_{ij}<0$: Chemical species (metabolite) $i$ is _consumed_ by reaction $j$. Species $i$ is a reactant of reaction $j$.

The stoichiometric matrix $\mathbf{S}$ is the digital representation of the biochemistry occurring inside the cell.

## Setup, Data, and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [7]:
include("Include.jl");

### Data
We developed a simple software development kit (SDK) against [the BiGG Models application programming interface at the University of California, San Diego](http://bigg.ucsd.edu/). The [BiGG Models database](http://bigg.ucsd.edu/) integrates published genome-scale metabolic networks into a single database with standardized nomenclature and structure. 
* [The BiGG models API](http://bigg.ucsd.edu/data_access) allows users to programmatically access genome-scale stoichiometric model reconstructions using a simple web API. There are `108` models of intracellular biochemistry occurring in various organisms (including humans) in the database (so far); [see here for a list of models](http://bigg.ucsd.edu/models).
* Here, we'll first explore the [core metabolic model of Palsson and coworkers](https://pubmed.ncbi.nlm.nih.gov/26443778/), which is a scaled-down model of [carbohydrate metabolism](https://en.wikipedia.org/wiki/Carbohydrate_metabolism) in _E.coli_. This model has 72 metabolites and 95 reactions. We'll then look at other models, and see what is going on with these.

We call the model download endpoint of [the BiGG models API](http://bigg.ucsd.edu/data_access) and then save the model file to disk (so we don't hit the API unless we have to). This call returns model information organized as [a Julia dictionary](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) in the `model::Dict{String, Any}` variable. If a model file is saved, we use the cached file instead of making an API call.

In [10]:
model = let

    # build download endpoint -
    baseurl = "http://bigg.ucsd.edu"; # base url to download model
    modelid = "iAB_RBC_283"; # model id to download
    path_to_saved_model_file = joinpath(_PATH_TO_DATA, "saved-model-$(modelid).jld2");

    # check: do we have a model file saved?
    model = nothing;
    if (isfile(path_to_saved_model_file) == false)
        
        endpoint = MyBiggModelsDownloadModelEndpointModel();
        endpoint.bigg_id = modelid;
        url = build(baseurl, endpoint)
        model = MyBiggModelsDownloadModelEndpointModel(url);

        # Before we move on, save this model for later (so we don't keep hitting the API)
        save(path_to_saved_model_file, Dict("model" => model));
    else
        model = load(path_to_saved_model_file)["model"];
    end
    model; # return the model (either saved, or downloaded)
end

Dict{String, Any} with 6 entries:
  "metabolites"  => Any[Dict{String, Any}("compartment"=>"c", "name"=>"3-Phosph…
  "id"           => "iAB_RBC_283"
  "compartments" => Dict{String, Any}("c"=>"cytosol", "e"=>"extracellular space…
  "reactions"    => Any[Dict{String, Any}("name"=>"Sink pchol hs 18 1 18 1(c)",…
  "version"      => "1"
  "genes"        => Any[Dict{String, Any}("name"=>"NMRK1", "id"=>"Nrk1_AT1", "n…

__Metabolite records__: Each metabolite (chemical compound) in the network has an associated metabolite record with several fields. Let's take a look at the metabolite at index `1`.
* The key field for today in the metabolite record is the `id` field, an abbreviation or symbol associated with this metabolite.

In [27]:
model["metabolites"][1] # example metabolite record

Dict{String, Any} with 7 entries:
  "compartment" => "c"
  "name"        => "3-Phospho-D-glyceroyl phosphate"
  "formula"     => "C3H4O10P2"
  "id"          => "13dpg_c"
  "charge"      => -4
  "notes"       => Dict{String, Any}("original_bigg_ids"=>Any["13dpg_c"])
  "annotation"  => Dict{String, Any}("kegg.compound"=>Any["C00236"], "sbo"=>"SB…

__Reaction records__: Similarly, each reaction in the network has a reaction record with several fields. Let's look at the reaction record at index `25`.
* The key field for the reaction record is the `metabolites` field, which lists the stoichiometric coefficients associated with this particular reaction.

In [32]:
model["reactions"][25] # example reaction record

Dict{String, Any} with 9 entries:
  "name"               => "N-Acetylneuraminate 9-phosphate pyruvate-lyase (pyru…
  "metabolites"        => Dict{String, Any}("h2o_c"=>-1.0, "acnamp_c"=>1.0, "ac…
  "lower_bound"        => 0.0
  "id"                 => "ACNAM9PL"
  "notes"              => Dict{String, Any}("original_bigg_ids"=>Any["ACNAM9PL"…
  "gene_reaction_rule" => "Nans_AT1"
  "upper_bound"        => 1000.0
  "subsystem"          => "Aminosugar Metabolism"
  "annotation"         => Dict{String, Any}("ec-code"=>Any["2.5.1.57"], "metane…

__Stoichiometric matrix__: Next, let's build a stoichiometric matrix $\mathbf{S}$ using the metabolite and reaction records. We'll do this using two for loops. 
* __Strategy__: In the outer loop, we iterate over the system's metabolites (chemical species) and select the `id` from the metabolites record for each metabolite. In the inner loop, we iterate over each reaction. For each reaction record, we ask if this reaction has an entry for the current metabolite `id` value; if it does, we grab the stoichiometric coefficient $\sigma_{ij}$ corresponding to this metabolite and reaction.

In [40]:
S = let

    # get some data from the model -
    m = model["metabolites"]; # get list of metabolites
    r = model["reactions"]; # get list of reactions
    number_of_rows = length(m); # how many metabolites do we have? (rows)
    number_of_cols = length(r); # how many reactions do we have? (cols)
    S = zeros(number_of_rows,number_of_cols); # initialize an empty stoichiometric matrix

    # let's build a stm -
    for i ∈ eachindex(m)
        metabolite = m[i]["id"]; # we are checking if this metabolite is in the reaction record
        for j ∈ eachindex(r)
            reaction = r[j];
            if (haskey(reaction["metabolites"], metabolite) == true)
                S[i,j] = reaction["metabolites"][metabolite];
            end
        end
    end
    S; 
end;

## What is a metabolic network's most important reaction (or metabolite)?
To explore this question, we'll have the stoichiometric matrix tell us what is important using [eigendecomposition](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#). However, the stoichiometric matrix is not square. Thus, we cannot directly compute its eigendecomposition. Suppose we compute [the covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix) between the columns, i.e., between the reactions stoichiometric vectors of the rows (chemical interactions). This gives us an idea about the relationship between the network's reactions $i$ and $j$.

### Covariance matrix $\mathbf{\Sigma}$
The covariance matrix is a square matrix that summarizes the variance and covariance of the features in the dataset.
Suppose we have a dataset $\mathcal{D} = \left\{\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{n}\right\}$ where each $\mathbf{x}_{i}\in\mathbb{R}^{m}$ is a feature vector.
The covariance of feature vectors $i$ and $j$, denoted as $\text{cov}\left(\mathbf{x}_{i},\mathbf{x}_{j}\right)$, is a real-valued symmetric matrix $\mathbf{\Sigma}\in\mathbb{R}^{n\times{n}}$ with elements: 
$$
\begin{equation}
    \Sigma_{ij} = \text{cov}\left(\mathbf{x}_{i},\mathbf{x}_{j}\right) = \sigma_{i}\,\sigma_{j}\,\rho_{ij}\qquad\text{for}\quad{i,j \in \mathcal{D}}
\end{equation}
$$
where $\sigma_{i}$ denote the standard deviation of the feature vector $\mathbf{x}_{i}$, $\sigma_{j}$ denote the standard deviation of the 
feature vector $\mathbf{x}_{j}$, and $\rho_{ij}$ denotes the correlation between features $i$ and $j$ in the dataset $\mathcal{D}$. The correlation is given by:
$$
\begin{equation}
\rho_{ij} = \frac{\mathbb{E}(\mathbf{x}_{i}-\mu_{i})\cdot\mathbb{E}(\mathbf{x}_{j} - \mu_{j})}{\sigma_{i}\sigma_{j}}\qquad\text{for}\quad{i,j \in \mathcal{D}}
\end{equation}
$$
where $\mathbb{E}(\cdot)$ denotes the expected value, and $\mu_{i}$ denotes the mean of the feature vector $\mathbf{x}_{i}$.
The diagonal elements of the covariance matrix $\Sigma_{ii}\in\mathbf{\Sigma}$ are the variances of features $i$,
while the off-diagonal elements $\Sigma_{ij}\in\mathbf{\Sigma}$ for $i\neq{j}$ measure the relationship between features 
$i$ and $j$ in the dataset $\mathcal{D}$.

In [51]:
Σ = cov(S)

469×469 Matrix{Float64}:
  0.00292398  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
 -8.57471e-6   0.00292398  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6   0.00292398      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6     -0.00292398   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6  -0.00292398
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
  0.0          0.0          0.0             0.0          0.0
  1.71494e-5   1.71494e-5   1.71494e-5      0.00584795   0.00584795
  ⋮                                     ⋱               
  0.0          0.0          0.0          