# Lab 15c: Learning an Optimal Trading Policy
As part of the `CHEME 5660 Quantitative Finance` course, we constructed an `expert` agent who reallocates their portfolio daily by solving the Minimum Variance portfolio allocation problem everyday, using new information generated during that trading day. We've recorded the policy of this `expert` and proposed copying it using a deep learning model. 

* Our strategy is an example of `Imitation Learning,` and particularly `Behavioral cloning.` We copy what an expert does by watching them. For more information on this idea, [check out CS273b from Stanford](https://web.stanford.edu/class/cs237b/). 
* Let's implement the `policy` function $\pi(s)$ as a dictionary that stores a `state tuple s` as its `keys` and action vector $a$ as the value. Let's keep this in the `policy` variable
* We'll store a separate collection of the `states` and the `actions` so we can explore `policy.` In both cases, we'll also implement these as dictionaries

The objective of this lab is to familarize students with constructing, and training artifical neural network (ANN) models of and some of the various conventions used in the application area.

## Tasks
* __Prerequisite__: Load policy, states and actions of the `expert` trader
* __Task 1__: Build `training` dataset from the `expert` policy
* __Task 2__: Build a Deep Learning model of the `expert` policy

## Setup

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Manifest.toml`
[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLQuantitativeFinancePackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Manifest.toml`
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5760-Labs-F23`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Labs-F23/Manifest.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/r

## Prerequisite: Load policy, states and actions of the `expert` trader

In [2]:
dataset = load(joinpath(_PATH_TO_DATA, "PolicyData-Testing-MinVar-Agent.jld2"));

In [3]:
states = dataset["states"];
actions = dataset["actions"];
policy = dataset["policy"];

In [4]:
states[1]

([4.9900842162455845, 5.152445652173913, 0.3092524377031419, 0.0, 0.0, 1.0049838661568913], [-0.01701574822598631, 0.1935452534460845, 0.47009619391386487, 0.2594984882083575, 0.11867979037912248, 0.15587517810289664])

In [5]:
actions[1]

3×6 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 ⋅  1  1  1  ⋅  ⋅
 1  ⋅  ⋅  ⋅  ⋅  1
 ⋅  ⋅  ⋅  ⋅  1  ⋅

## Task 1: Build `training` dataset from the `expert` policy
Let's try to use [Deep Learning](https://www.deeplearningbook.org) to compute a policy function $\pi_{\theta}(s)$ using the [Flux.jl](https://fluxml.ai) machine learning package (loaded by the `Include.jl` file). First, let's build a `training` dataset from the `policy` function.

In [6]:
# initialize storage for labeled data for training -
training_dataset = Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}()
number_of_training_examples = length(states);
ticker_index = 6; # let's look at SPY
for i ∈ 1:number_of_training_examples
    
    state = states[i];
    action = actions[i];
    single_state_vector = [state[1]...,state[2]...]
    action_vector = action[:, ticker_index];
    
    # make a training tuple -
    training_tuple = (
        single_state_vector, action_vector
    );
    
    # insert -
    push!(training_dataset, training_tuple);
end

In [7]:
training_dataset[1][1]

12-element Vector{Float32}:
  4.990084
  5.152446
  0.30925244
  0.0
  0.0
  1.0049839
 -0.017015748
  0.19354525
  0.4700962
  0.25949848
  0.11867979
  0.15587518

## Task 2: Build a Deep Learning model of the `expert` policy

In [8]:
number_of_input_states = length(training_dataset[1][1]);
number_of_classes = 3; # we have buy, sell and hold
number_of_nodes = 48;
FFN_policy_model = Chain(Dense(number_of_input_states, number_of_nodes, σ), 
    Dense(number_of_nodes, number_of_nodes, σ), Dense(number_of_nodes, 3, σ), NNlib.softmax);

For the loss function $L(\theta)$, we'll use a version of the [cross entropy function](https://en.wikipedia.org/wiki/Cross-entropy), where the training examples $\hat{y}_{i}$ are encoded in [one-hot format](https://en.wikipedia.org/wiki/One-hot):

In [9]:
# setup a loss function -
loss(x, y) = Flux.Losses.logitbinarycrossentropy(FFN_policy_model(x), y; agg = mean);

In [10]:
# pointer to params -
θ = Flux.params(FFN_policy_model);

Next, let's specify the optimization approach the we'll use to estimate the unknown model parameters $\theta$. In particular, we'll use the [Momentum gradient descent algorithm](https://optimization.cbe.cornell.edu/index.php?title=Momentum): 
> Momentum is an extension to the gradient descent optimization algorithm that allows the search to build inertia in a direction in the search space and overcome the oscillations of noisy gradients and coast across flat spots of the search space

In [11]:
λ = 0.01;  # learning rate
β = 0.05; # momentum parameter
# opt = Momentum(λ, β);
opt = AdaDelta();

We'll specify the number of times we process the data (called an `epoch`) in the `number_of_epochs` variable. To run the gradient descent estimation algorithm, we'll call the `train!(...)` function exported by the [Flux.jl](https://fluxml.ai) package:

In [12]:
test_x, test_y = training_dataset[1][1], training_dataset[1][2]
number_of_epochs = 2000;
evalcb() = @show(loss(test_x, test_y))
throttled_cb = Flux.throttle(evalcb, 5)
for i = 1:number_of_epochs
    Flux.train!(loss, θ, training_dataset, opt, cb = throttled_cb)
end

loss(test_x, test_y) = 0.777251f0
loss(test_x, test_y) = 0.8062558f0
loss(test_x, test_y) = 0.80365986f0
loss(test_x, test_y) = 0.8039484f0
loss(test_x, test_y) = 0.8050162f0
loss(test_x, test_y) = 0.79285717f0
loss(test_x, test_y) = 0.7682772f0
loss(test_x, test_y) = 0.7553499f0


In [13]:
correct_counter = 0;
for i ∈ 1:number_of_training_examples
    
    single_state_vector = [states[i][1]...,states[i][2]...] .|> x-> convert(Float32,x)
    ŷ = FFN_policy_model(single_state_vector);
    y = actions[i][:,ticker_index];
    
    if (argmax(ŷ) == argmax(y))
        correct_counter += 1
    end
end

In [14]:
correct_counter/number_of_training_examples

0.6121883656509696

## Disclaimer and Risks
__This content is offered solely for training and  informational purposes__. No offer or solicitation to buy or sell securities or derivative products, or any investment or trading advice or strategy,  is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on your evaluation of your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.