# Lab 13d: Let's build an Observable Markov Model of a Stock Returns
Fill me in

## Setup

In [1]:
include("Include.jl");

[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-4800-5800-Labs-AY-2024/week-13/Lab-13d`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Labs-AY-2024/week-13/Lab-13d/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Labs-AY-2024/week-13/Lab-13d/Manifest.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Labs-AY-2024/week-13/Lab-13d/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-4800-5800-Labs-AY-2024/week-13/Lab-13d/Manifest.toml`


## Prerequisites 

In [2]:
# initialize -
dataset = Dict{String,DataFrame}();
risk_free_rate = 0.05;

# load the price dataset full dataset, remove firms with missing data -
original_dataset = MyPortfolioDataSet() |> x->x["dataset"];
maximum_number_trading_days = original_dataset["AAPL"] |> nrow;
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
my_list_of_tickers = keys(dataset) |> collect |> x->sort(x);

# what ticker?
ticker = "SPY"
idx_ticker = findfirst(x->x==ticker, my_list_of_tickers);

# compute the growth rate matrix -
market_matrix = μ(dataset, my_list_of_tickers, risk_free_rate = risk_free_rate) |> x-> transpose(x) |> Matrix;
Rₘ = market_matrix[idx_ticker, :]; # this is growth rate of the price

## Setup the states $\mathcal{S}$, the emission matrix $\mathbf{E}$ and transition matrix $\mathbf{T}$

In [3]:
number_of_states = 5;
states = range(1,stop=number_of_states) |> collect;

Next, set up the emissions matrix $\mathbf{E}$. For this example, we assume that the states are __fully observable__, i.e., we can see the states directly. Thus, the emission matrix $\mathbf{E}$ is the identity matrix $\mathbf{I}$:

In [4]:
E = diagm(ones(number_of_states));

To compute the transition matrix $\mathbf{T}$, we'll estimate the transition probabilities from the return data calculated in the `Prerequisites` section and saved in the `Rₘ` variable:
* Let's split the data into two blocks: the first (which we'll call `in-sample`) will be used to estimate the elements of the matrix $\mathbf{T}$, while the second (which we'll call `out-of-sample`) will be used for testing purposes.

In [5]:
split_fraction = 0.90;
insample_end_index = round(split_fraction*length(Rₘ),digits=0) |> Int
in_sample_dataset = Rₘ[1:insample_end_index]
out_of_sample_dataset = Rₘ[(insample_end_index+1):end];

In [6]:
d = fit_mle(Laplace, Rₘ); # use the *full* data set to establish the cutoff's

In [7]:
percentage_cutoff = range(0.0,stop=1.0,length=(number_of_states+1)) |> collect

6-element Vector{Float64}:
 0.0
 0.2
 0.4
 0.6
 0.8
 1.0

In [8]:
bounds = Array{Float64,2}(undef, number_of_states,2)
for s ∈ states
    bounds[s,1] = quantile(d,percentage_cutoff[s])
    bounds[s,2] = quantile(d,percentage_cutoff[s+1])
end
bounds

5×2 Matrix{Float64}:
 -Inf        -1.40451
  -1.40451   -0.206636
  -0.206636   0.56462
   0.56462    1.76249
   1.76249   Inf

In [9]:
encoded_in_sample = Array{Int64,1}();
for i ∈ eachindex(in_sample_dataset)
    value = in_sample_dataset[i];

    class_index = 1;
    for s ∈ states
        if (bounds[s,1] ≤ value && value < bounds[s,2])
            class_index = s;
            break;
        end
    end
    push!(encoded_in_sample, class_index);
end
encoded_in_sample;

In [17]:
T = zeros(number_of_states, number_of_states)
for i ∈ 2:insample_end_index
    start_index = encoded_in_sample[i-1];
    stop_index = encoded_in_sample[i];
    T[start_index,stop_index] += 1;
end
T

5×5 Matrix{Float64}:
 110.0  44.0  28.0  44.0  75.0
  61.0  57.0  46.0  63.0  29.0
  46.0  55.0  77.0  57.0  36.0
  39.0  60.0  74.0  68.0  58.0
  45.0  40.0  47.0  66.0  75.0

In [11]:
T̂ = zeros(number_of_states, number_of_states)
for row ∈ states
    Z = sum(T[row,:]);
    for col ∈ states
        T̂[row,col] = (1/Z)*T[row,col]
    end
end
T̂

5×5 Matrix{Float64}:
 0.365449  0.146179  0.0930233  0.146179  0.249169
 0.238281  0.222656  0.179688   0.246094  0.113281
 0.169742  0.202952  0.284133   0.210332  0.132841
 0.130435  0.200669  0.247492   0.227425  0.19398
 0.164835  0.14652   0.172161   0.241758  0.274725

In [12]:
model = build(MyObservableMarkovModel, (
    states = states,
    T = T̂, 
    E = E
));

## Simulate the Observable Markov Model (OMM)

In [22]:
π̄ = (T̂^100) |> tmp -> Categorical(tmp[1,:]);

In [24]:
number_of_paths = 10;
number_of_steps = 252; # average number of trading days per year
archive = Array{Int64,2}(undef, number_of_steps, number_of_paths);
for i ∈ 1:number_of_paths
    start_state = rand(π̄);
    tmp = model(start_state, number_of_steps)
    for j ∈ 1:number_of_steps
        archive[j,i] = tmp[j]
    end
end
archive

252×10 Matrix{Int64}:
 4  3  3  1  4  3  4  4  1  4
 2  1  1  4  1  4  2  4  1  3
 1  5  4  3  5  5  5  4  5  5
 1  5  4  1  4  4  2  5  1  3
 4  4  3  4  2  2  3  5  1  4
 4  4  3  4  3  1  2  1  2  2
 4  5  5  2  2  5  4  1  4  3
 4  4  3  4  4  4  5  2  2  1
 5  3  4  3  3  2  4  1  4  4
 4  4  5  3  1  4  1  1  4  3
 3  4  5  4  1  3  3  1  1  3
 3  2  1  5  5  1  3  1  1  3
 1  2  5  1  4  1  2  5  5  3
 ⋮              ⋮           
 2  5  3  4  5  1  5  3  2  4
 1  2  1  2  5  1  1  1  1  2
 5  5  1  4  5  1  5  4  1  4
 5  3  1  4  5  1  4  4  4  3
 5  5  2  2  3  5  2  2  2  4
 5  4  1  5  2  5  1  2  5  3
 1  4  1  3  4  2  5  4  5  4
 4  5  2  4  2  4  4  4  4  5
 2  1  4  5  4  2  3  1  4  5
 1  1  2  5  3  2  1  5  4  1
 1  5  4  4  2  3  5  4  4  4
 1  4  2  5  2  4  4  4  1  4