## Example: Developing a Bernoulli Binary Bandit Ticker Picker Agent
In this example, we'll develop an agent to select which stocks we should include in a portfolio $\mathcal{P}$ by solving a Multi-arm Binary Bandit problem using [ϵ-Greedy Thompson Sampling](https://arxiv.org/abs/1707.02038). 

### Problem
* We have `N` agents independently analyze sequences of daily Open High Low Close (OHLC) data and rank-order their belief that ticker `XYZ` will return at least the risk-free rate in the next step. 
* We then sample the `world`. If ticker `XYZ` returns greater than or equal to the risk-free rate in the next time period, the agent receives a reward of `+1`. Otherwise, the agent recives a reward of `0`.
* Each agent develops a distribution of beliefs based on this experimentation, which is stored in a $\beta$-distribution
* Each ticker is an action in the set $\mathcal{A}=\left\{a_{1},a_{2},\dots,a_{K}\right\}$

## Setup

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLQuantitativeFinancePackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Manifest.toml`
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5660-Examples-F23`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Manifest.toml`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLQuantitativeFinancePackage.jl.git`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Examples-F23/Manif

In [2]:
include(joinpath(_PATH_TO_SRC, "CHEME-5660-L14a-BanditProblems-CodeLibrary.jl"));

## Prerequisites: Load and clean the historical dataset
We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) since `01-03-2018` until `11-17-2023`, along with data for a few exchange traded funds and volatility products during that time. 

In [3]:
original_dataset = load(joinpath(_PATH_TO_DATA, 
        "SP500-Daily-OHLC-1-3-2018-to-11-17-2023.jld2")) |> x-> x["dataset"];

### Clean the data
Not all of the tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquistion or de-listing events. Let's collect only those tickers with the maximum number of trading days.

* First, let's compute the number of records for a company that we know has a maximim value, e.g., `AAPL` and save that value in the `maximum_number_trading_days` variable:

In [4]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow

1480

Now, lets iterate through our data and collect only those tickers that have `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [5]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
dataset;

Let's get a list of firms that we have in cleaned up `dataset`, and save it in the `all_tickers` array:

In [6]:
all_tickers = keys(dataset) |> collect |> sort;
K = length(all_tickers)

459

## Initialize the `world` function
The world function takes action `a`, i.e., chooses a `ticker` from the collection of stocks/ETFs that we are exploring, and asks whether or not the return of this `ticker` at the end of today was greater or equal to the risk-free rate. 
* If the daily return is greater than or equal to the risk-free rate, the `agent` receives a reward of `r = 1`. Otherwise, the `agent` receives a `r = 0` reward.

In [7]:
function world(action::Int64, start::Int64, data::Dict{String, DataFrame}, tickers::Array{String,1}; 
        Δt::Float64 = (1.0/252.0), risk_free_rate::Float64 = 0.05, buffersize::Int64 = 1)::Int64

    # initialize -
    result_flag = 0;

    # daily risk free rate -
    T = buffersize*Δt;

    # grab the ticker we are looking at?
    ticker_symbol = tickers[action];
    
    # compute the expected return over the horizon -
    price_df = data[ticker_symbol];
    time = range(start+1,(start+buffersize), step=1) |> collect
    buffer = Array{Float64,1}();
    for t ∈ time
        P₁ = price_df[t-1,  :volume_weighted_average_price]
        P₂ = price_df[t, :volume_weighted_average_price]
        R = (1/Δt)*log(P₂/P₁);
        push!(buffer,R);
    end
    μ = mean(buffer);

    # if we invested 1 USD, in each how much would we have at the end of horizon -
    W_risk_free = exp(risk_free_rate*T);
    W_ticker = exp(μ*T);
    
    # are we better or worse relative to the risk-free investment?
    if (W_ticker >= W_risk_free)
        result_flag = 1;
    end

    # default -
    return result_flag;
end;

Set the `risk_free_rate` variable:

In [8]:
risk_free_rate = 0.047;

## Setup the `ticker-picker` agent

First, let's specify the tickers that we want to examine in the `tickers` array, and store the number of tickers in the `K` variable:

In [9]:
tickers = ["MRK", "JNJ", "MET", "NFLX", "AAPL", "AMD", "MU", "INTC", "MSFT", "SPY", "SPYD", "MMM",
    "UNH", "JPM", "OXY", "TSLA", "PEP", "LMT", "CMCSA", "ECL", "SRE"];
K = length(tickers);

Next, we construct the `EpsilonSamplingModel` instance which holds information about the ϵ-greedy sampling approach. The `EpsilonSamplingModel` type has one additional field, the `ϵ` field which controls the approximate fraction of `exploration` steps the algorithm takes; `exploration` steps are purely random.

In [10]:
model = EpsilonSamplingModel()
model.K = K; # tickers
model.α = ones(K); # initialize to uniform values
model.β = ones(K); # initialize to uniform values
model.ϵ = 0.2;

## Run a single `ticker-picker` agent and explore its preferences
Let's run a single `ticker-picker` agent and examine what it returns using the `sample(...)` function. 
* The `sample(...)` function takes the agent model `model::EpsilonSamplingModel`, the `world::Function`, the cleaned `dataset` and your list of `tickers`, along with the `horizon` parameter, i.e., how many iterations we want the search to run for,  and the `risk_free_rate`. 
* The `sample(...)` function returns a dictionary holding the $(\alpha,\beta)$ parameters for each ticker (values) for iteration (keys).  

In [11]:
time_sample_results_dict_eps = sample(model, world, dataset, tickers; 
    horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate)

Dict{Int64, Matrix{Float64}} with 1479 entries:
  1144 => [17.0 12.0; 10.0 11.0; … ; 26.0 23.0; 2.0 8.0]
  1175 => [19.0 14.0; 11.0 11.0; … ; 26.0 25.0; 2.0 8.0]
  719  => [4.0 8.0; 4.0 8.0; … ; 15.0 15.0; 2.0 7.0]
  1028 => [8.0 10.0; 7.0 10.0; … ; 23.0 21.0; 2.0 7.0]
  699  => [4.0 8.0; 4.0 8.0; … ; 15.0 15.0; 2.0 7.0]
  831  => [5.0 9.0; 6.0 9.0; … ; 16.0 15.0; 2.0 7.0]
  1299 => [38.0 26.0; 13.0 14.0; … ; 27.0 25.0; 2.0 9.0]
  1438 => [45.0 37.0; 18.0 17.0; … ; 28.0 29.0; 2.0 9.0]
  1074 => [11.0 10.0; 8.0 11.0; … ; 23.0 22.0; 2.0 8.0]
  319  => [4.0 7.0; 3.0 6.0; … ; 9.0 7.0; 2.0 4.0]
  687  => [4.0 8.0; 3.0 8.0; … ; 15.0 15.0; 2.0 7.0]
  1199 => [23.0 17.0; 11.0 11.0; … ; 26.0 25.0; 2.0 8.0]
  185  => [2.0 3.0; 3.0 4.0; … ; 4.0 4.0; 2.0 4.0]
  823  => [5.0 9.0; 6.0 9.0; … ; 16.0 15.0; 2.0 7.0]
  1090 => [12.0 11.0; 8.0 11.0; … ; 23.0 22.0; 2.0 8.0]
  420  => [4.0 8.0; 3.0 6.0; … ; 10.0 9.0; 2.0 5.0]
  1370 => [40.0 35.0; 15.0 15.0; … ; 28.0 28.0; 2.0 9.0]
  1437 => [45.0 37.0; 18

In [12]:
Z = time_sample_results_dict_eps[1479]

21×2 Matrix{Float64}:
 46.0  39.0
 18.0  18.0
 47.0  38.0
  3.0  10.0
 20.0  21.0
 10.0  16.0
 56.0  48.0
 33.0  28.0
  7.0  15.0
 68.0  58.0
 12.0  18.0
 43.0  43.0
 11.0  14.0
  2.0   8.0
 28.0  31.0
 74.0  59.0
 58.0  50.0
 40.0  38.0
 12.0  18.0
 28.0  30.0
  2.0   9.0

## Run a collection of `ticker-picker` agents and examine their preferences
Repeat the single-agent analysis with `N` agents by running the `sample(...)` method inside a `for` loop. We'll store the results of the last time point in the `agent_specific_data::Array{Beta,2}(undef, K, number_of_agents)` array. 
* The `agent_specific_data` array holds the `Beta` distributions for each agent, i.e., it holds the preferences for each agent (cols) for each `ticker` in our collection (rows).

In [13]:
number_of_agents = 10000;
trading_day_index = 1479
agent_specific_data = Array{Beta,2}(undef, K, number_of_agents);

for agent_index ∈ 1:number_of_agents
    
    # sample -
    time_sample_results_dict_eps = sample(model, world, dataset, tickers; horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate);
    beta_array = build_beta_array(time_sample_results_dict_eps[trading_day_index]);

    # grab data for this agent -
    for k = 1:K
        agent_specific_data[k, agent_index] = beta_array[k]
    end
end

In [14]:
agent_specific_data[1,:] # data for each agent for ticker 1 for all the agents

10000-element Vector{Beta}:
 Beta{Float64}(α=30.0, β=31.0)
 Beta{Float64}(α=80.0, β=68.0)
 Beta{Float64}(α=85.0, β=61.0)
 Beta{Float64}(α=66.0, β=49.0)
 Beta{Float64}(α=13.0, β=19.0)
 Beta{Float64}(α=1.0, β=6.0)
 Beta{Float64}(α=20.0, β=20.0)
 Beta{Float64}(α=14.0, β=20.0)
 Beta{Float64}(α=25.0, β=25.0)
 Beta{Float64}(α=37.0, β=31.0)
 Beta{Float64}(α=36.0, β=33.0)
 Beta{Float64}(α=16.0, β=18.0)
 Beta{Float64}(α=22.0, β=25.0)
 ⋮
 Beta{Float64}(α=2.0, β=8.0)
 Beta{Float64}(α=85.0, β=59.0)
 Beta{Float64}(α=21.0, β=22.0)
 Beta{Float64}(α=2.0, β=8.0)
 Beta{Float64}(α=21.0, β=24.0)
 Beta{Float64}(α=13.0, β=18.0)
 Beta{Float64}(α=37.0, β=25.0)
 Beta{Float64}(α=19.0, β=18.0)
 Beta{Float64}(α=17.0, β=20.0)
 Beta{Float64}(α=17.0, β=23.0)
 Beta{Float64}(α=13.0, β=18.0)
 Beta{Float64}(α=26.0, β=27.0)

## The wisdom of the collective
Now that we have prefernces for the `N` agents (encoded as `Beta` distributions for each ticker), let's develop a concencous belief in which tickers to include in our portfolio $\mathcal{P}$. First, let's compute the agent-specific rank of eack ticker, where `rank = 1` is the best, and `rank = K` is the worst. We'll store these values in the `preference_rank_array` array.

In [15]:
preference_rank_array = Array{Int,2}(undef, number_of_agents, K);
for agent ∈ 1:number_of_agents
        
    # ask agent about thier preference for ticker i -
    experience_distributions = agent_specific_data[:,agent]
    preference_vector = preference(experience_distributions, tickers) .|> x-> trunc(Int64,x) # trunc function is cool!
    
    # package -
    for i ∈ 1:K
        preference_rank_array[agent, i] = preference_vector[i];
    end
end
preference_rank_array

10000×21 Matrix{Int64}:
 10  19  21  13   1  12  18  20   2  …   8   4   5   9  16  11  17   3   7
  4  15   2  16   5   3   6   8   1     17   7  13  12  10  21  19   9  10
  1   9   2   3   6  17  20  12   9     13  15  14   7   4  20  11   8   5
  3  14  19  16  11  16  15   6  12     20   9   4   5   8  10  21  18   2
 16  19   9  20   7   8   6  21   3      4  15  13   1  11  12   2   5  14
 21  11   7  15   8  12  19  18   6  …   2   5  14  17  16   4  13  20   3
 10  12  15  17  18   4  19   8  21     14   1   6   2   7   3  11   5  16
 15   4  10  14   1   3  20  10   2     12  15   5  21   6  13  18  18   8
 10   9  16  21   1   3  12  20   2      8  18   4  19  13  14  15   7   6
  4  19  12  16   1   6   8  13   3     11   9  14   9   5   6  21  20  15
  6  21   3  11   4  12  18  13  10  …   1  17   8  20   9   2  15  19  15
 17  12  16  12   7  20   1   8  11     15  19   3   4  14   6   9  18   2
 17  14  20  16   8   1   3  14   4      5  13  11  21  12   9  19  10   7
 

### Compute the frequency dictionary
Let's count the times a `ticker` is ranked in the `top 3` across the `N` agents and then normalize by the number of agents, i.e., compute the frequency of being ranked in the `top 3`. We'll store this value in the `frequency_dictionary,` where the `ticker` is the key, and the frequency is the value.
* `Hypothesis`: Those `tickers` with high rank are likelier to beat an alternative `risk-free` investment, while low rank `tickers` do not outperform a `risk-free` alternative investment. 

In [16]:
frequency_dictionary = Dict{String,Float64}();
for i ∈ eachindex(tickers)
    
    # compute the frequency -
    freq = findall(x->(x==1 || x== 2 || x == 3), preference_rank_array[:,i]) |> x-> length(x) |> x-> x/number_of_agents
    
    # get the ticker -
    ticker = tickers[i];
    frequency_dictionary[ticker] = freq;
end

In [17]:
frequency_dictionary

Dict{String, Float64} with 21 entries:
  "MSFT"  => 0.3172
  "JPM"   => 0.1556
  "MRK"   => 0.1311
  "UNH"   => 0.1088
  "SPY"   => 0.2584
  "MU"    => 0.0877
  "SRE"   => 0.1636
  "TSLA"  => 0.1721
  "AMD"   => 0.1643
  "INTC"  => 0.103
  "MMM"   => 0.0521
  "ECL"   => 0.1085
  "LMT"   => 0.0713
  "OXY"   => 0.086
  "CMCSA" => 0.0788
  "JNJ"   => 0.0982
  "SPYD"  => 0.1257
  "NFLX"  => 0.1146
  "PEP"   => 0.1496
  "MET"   => 0.2158
  "AAPL"  => 0.2539

### Disclaimer and Risks
__This content is offered solely for training and  informational purposes__. No offer or solicitation to buy or sell securities or derivative products, or any investment or trading advice or strategy,  is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on your evaluation of your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.