# Lab 2d: Fun With Principle Component Analysis (PCA)
Fill me in

## Task 1: Setup, Data and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [3]:
include("Include.jl")

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mPrecompiling IJuliaExt [2f4121a4-3b3a-5ce6-9c5e-1f2673ce168a] (cache misses: wrong dep version loaded (6), incompatible header (12))


### Data
We developed a simple software development kit (SDK) against [the BiGG Models application programming interface at the University of California, San Diego](http://bigg.ucsd.edu/). The [BiGG Models database](http://bigg.ucsd.edu/) integrates published genome-scale metabolic networks into a single database with standardized nomenclature and structure. 
* [The BiGG models API](http://bigg.ucsd.edu/data_access) allows users to programmatically access genome-scale stoichiometric model reconstructions using a simple web API. There are `108` models of intracellular biochemistry occurring in various organisms (including humans) in the database (so far); [see here for a list of models](http://bigg.ucsd.edu/models).
* Here, we'll first explore the [core metabolic model of Palsson and coworkers](https://pubmed.ncbi.nlm.nih.gov/26443778/), which is a scaled-down model of [carbohydrate metabolism](https://en.wikipedia.org/wiki/Carbohydrate_metabolism) in _E.coli_. This model has 72 metabolites and 95 reactions. We'll then look at other models, and see what is going on with these.

We call the model download endpoint of [the BiGG models API](http://bigg.ucsd.edu/data_access) and then save the model file to disk (so we don't hit the API unless we have to). This call returns model information organized as [a Julia dictionary](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) in the `model::Dict{String, Any}` variable. If a model file is saved, we use the cached file instead of making an API call.

In [10]:
model = let

    # build download endpoint -
    baseurl = "http://bigg.ucsd.edu"; # base url to download model
    modelid = "iAB_RBC_283"; # model id to download
    path_to_saved_model_file = joinpath(_PATH_TO_DATA, "saved-model-$(modelid).jld2");

    # check: do we have a model file saved?
    model = nothing;
    if (isfile(path_to_saved_model_file) == false)
        
        endpoint = MyBiggModelsDownloadModelEndpointModel();
        endpoint.bigg_id = modelid;
        url = build(baseurl, endpoint)
        model = MyBiggModelsDownloadModelEndpointModel(url);

        # Before we move on, save this model for later (so we don't keep hitting the API)
        save(path_to_saved_model_file, Dict("model" => model));
    else
        model = load(path_to_saved_model_file)["model"];
    end
    model; # return the model (either saved, or downloaded)
end

Dict{String, Any} with 6 entries:
  "metabolites"  => Any[Dict{String, Any}("compartment"=>"c", "name"=>"3-Phosph…
  "id"           => "iAB_RBC_283"
  "compartments" => Dict{String, Any}("c"=>"cytosol", "e"=>"extracellular space…
  "reactions"    => Any[Dict{String, Any}("name"=>"Sink pchol hs 18 1 18 1(c)",…
  "version"      => "1"
  "genes"        => Any[Dict{String, Any}("name"=>"NMRK1", "id"=>"Nrk1_AT1", "n…

__Stoichiometric matrix__: Next, let's build a stoichiometric matrix $\mathbf{S}$ using the metabolite and reaction records. We'll do this using two for loops. 
* __Strategy__: In the outer loop, we iterate over the system's metabolites (chemical species) and select the `id` from the metabolites record for each metabolite. In the inner loop, we iterate over each reaction. For each reaction record, we ask if this reaction has an entry for the current metabolite `id` value; if it does, we grab the stoichiometric coefficient $\sigma_{ij}$ corresponding to this metabolite and reaction.

In [12]:
S = let

    # get some data from the model -
    m = model["metabolites"]; # get list of metabolites
    r = model["reactions"]; # get list of reactions
    number_of_rows = length(m); # how many metabolites do we have? (rows)
    number_of_cols = length(r); # how many reactions do we have? (cols)
    S = zeros(number_of_rows,number_of_cols); # initialize an empty stoichiometric matrix

    # let's build a stm -
    for i ∈ eachindex(m)
        metabolite = m[i]["id"]; # we are checking if this metabolite is in the reaction record
        for j ∈ eachindex(r)
            reaction = r[j];
            if (haskey(reaction["metabolites"], metabolite) == true)
                S[i,j] = reaction["metabolites"][metabolite];
            end
        end
    end
    S; 
end;

Z-score center the stoichiometrix matrix $\mathbf{S}$

In [None]:
Ŝ = let
    
    # get some data from the system -
    m = model["metabolites"] |> length # get the number of metabolites
    r = model["reactions"] |> length # get the number of reactions
    

    for j ∈ 1:r
        col = S[:,j]; # get the jth col (reaction)
        μⱼ = mean(col); # mean of the col
        σⱼ = std(col); # std of the col
    end

end

In [14]:
Σ = cov(S)

469×469 Matrix{Float64}:
  0.00292398  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
 -8.57471e-6   0.00292398  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6   0.00292398      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6     -0.00292398   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6  -0.00292398
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6      8.57471e-6   8.57471e-6
 -8.57471e-6  -8.57471e-6  -8.57471e-6  …   8.57471e-6   8.57471e-6
  0.0          0.0          0.0             0.0          0.0
  1.71494e-5   1.71494e-5   1.71494e-5      0.00584795   0.00584795
  ⋮                                     ⋱               
  0.0          0.0          0.0          

In [18]:
T = transpose(S)*S

469×469 Matrix{Float64}:
 1.0   0.0   0.0   0.0  0.0   0.0  …   0.0   0.0   0.0   0.0   0.0   0.0
 0.0   1.0   0.0   0.0  0.0   0.0     -1.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   1.0   0.0  0.0   0.0      0.0  -1.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   1.0  0.0   0.0      0.0   0.0  -1.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  1.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   1.0  …   0.0   0.0   0.0  -1.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0      0.0   0.0   0.0   0.0  -1.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0      0.0   0.0   0.0   0.0   0.0  -1.0
 0.0   0.0   0.0   0.0  0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0  …   0.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
 0.0   0.0   0.0   0.0  0.0   0.0      2.0   2.0   2.0   2.0   2.0   2.0
 ⋮                        