# `Example`: Training Binary Bernoulli Bandit Ticker Picker Agents
In this example, we create a group of `N` agents to assist us in selecting stocks for our portfolio $\mathcal{P}$. We will use the [$\epsilon$-Greedy Thompson Sampling algorithm](https://arxiv.org/abs/1707.02038) to solve a Multi-arm Binary Bandit problem. The choices of stocks that we can select from will be the bandits, and there are `K` of them in our choice set $\mathcal{C}$. We will feed daily return data to the group of `N` agents, who will learn which tickers return beyond a specified cutoff, such as the risk-free rate, in an online manner. 

Each agent will independently analyze daily Open High Low Close (OHLC) data sequences and develop a ranking of their belief that a particular ticker will return at least the risk-free rate in the next step. This ranking is based on the probability $p_{a}$, which represents the probability that the ticker will beat the risk-free rate. Each agent will express this probability as a belief distribution, represented by the parameters of a [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution).

Once the analysis is complete, we will ask the agents to provide us with their `top-N` tickers. Subsequently, we could use these picks to populate our portfolio $\mathcal{P}$.

## Tasks
* __Prerequisites__: We'll load the daily Open High Low Close (OHLC) dataset and select tickers with complete data. We'll also compute the annualized daily return for each ticker in that dataset.
* __Task 1__: Develop a `world` function that agents will use to evaluate the possible actions and set up the ticker picker agents
    * `TODO`: Agents sample the `world`. If the ticker `XYZ` returns greater than or equal to the risk-free rate in the sample period, the agent receives a reward of `+1`. Otherwise, the agent receives a reward of `0`.
    * `TODO`: Initialize a collection of `N` ticker picker agents with an apriori belief that ticker `XYZ` is `good` or `bad`
* __Task 2__: In this task, we explore how the ticker picker agents learn which ticker symbols are better or worse bets.
    * `TODO`: Starting with a single ticker picker agent and a collection of `K` potential ticker symbols, we feed the agent daily (annualized) logarithmic return data and watch it evolve its belief system as new data is added.
* __Task 3__: Expand the single agent analysis to `N` agents, each exploring `K` potential ticker symbols.
    * `TODO`: After the agents complete their training, analyze their findings and have each agent compute its `top-N` ticker picks.  

## Setup
Several external `Julia` packages enable the computations in this example. To load the required packages and any custom codes the teaching team has developed to work with these packages, we [include](https://docs.julialang.org/en/v1/manual/code-loading/) the `Include.jl` file:

In [1]:
include("Include.jl");

## Prerequisites: Load and clean the historical dataset
We gathered daily open-high-low-close data for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2018` until `12-29-2023`, along with data for a few exchange-traded funds and volatility products during that time. We'll use this data in the subsequent study, but we must load and clean it up first. In particular, in the `prerequisites` code block, we will:
* First, we'll load the data and remove tickers that do not have the maximum number of trading days. We'll store the cleaned data in the `dataset::Dict{String, DataFrame}` variable.
* Next, we'll get a list of all the tickers in the `dataset,` sort them alphabetically, and store them in the `all_tickers_array` variable.
* Finally, we'll compute the logarithmic return for tickers in the `dataset` using the `log_return_matrix(...)` function and store these values in the `all_firms_log_return_matrix` variable (assuming a `252`-day trading year)

In [2]:
original_dataset = MyMarketDataSet() |> x-> x["dataset"];

### Clean the data
Not all of the tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquistion or de-listing events. Let's collect only those tickers with the maximum number of trading days.

* First, let's compute the number of records for a company that we know has a maximim value, e.g., `AAPL` and save that value in the `maximum_number_trading_days` variable:

In [3]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow

1508

Now, lets iterate through our data and collect only those tickers that have `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
dataset;

Let's get a list of firms that we have in cleaned up `dataset`, and save it in the `all_tickers_array` array:

In [5]:
all_tickers_array = keys(dataset) |> collect |> sort;

Finally, let's compute the annualized logarithmic growth rate time series for each ticker in `dataset` using the `log_return_matrix(...)` function:
> The `log_return_matrix(dataset::Dict{String, DataFrame}, firms::Array{String,1}; Δt::Float64 = (1.0/252.0), risk_free_rate::Float64 = 0.0) -> Array{Float64,2}` takes the `dataset` argument (which holds `N`-price values and other data for a ticker), and a `K`-dimensional list of ticker symbols, and returns the annualized growth rate $\mu$ time-series as `N-1`$\times$`K` array, where the rows correspond to times, and the columns correspond to tickers.

In [6]:
all_firms_log_return_matrix = log_return_matrix(dataset, all_tickers_array, 
    Δt = (1.0/252.0), risk_free_rate = 0.0);

## __Task 1__: Initialize the `world` function and set up a ticker picker agent
The world function takes action `a`, i.e., chooses a `ticker` from the collection of stocks/ETFs that we are exploring, and asks whether or not the return of this `ticker` at the end of today was greater or equal to the risk-free rate. 
* If the daily return is greater than or equal to the risk-free rate, the `agent` receives a reward of `r = 1`. Otherwise, the `agent` receives a `r = 0` reward.

In [7]:
function world(action::Int64, start::Int64, data::Dict{String, DataFrame}, tickers::Array{String,1}; 
        Δt::Float64 = (1.0/252.0), risk_free_rate::Float64 = 0.05, buffersize::Int64 = 1)::Int64

    # initialize -
    result_flag = 0;

    # daily risk free rate -
    T = buffersize*Δt;

    # grab the ticker we are looking at?
    ticker_symbol = tickers[action];
    
    # compute the expected return over the horizon -
    price_df = data[ticker_symbol];
    time = range(start+1,(start+buffersize), step=1) |> collect
    buffer = Array{Float64,1}();
    for t ∈ time
        P₁ = price_df[t-1,  :volume_weighted_average_price]
        P₂ = price_df[t, :volume_weighted_average_price]
        R = (1/Δt)*log(P₂/P₁);
        push!(buffer,R);
    end
    μ = mean(buffer);

    # if we invested 1 USD, in each how much would we have at the end of horizon -
    W_risk_free = exp(risk_free_rate*T);
    W_ticker = exp(μ*T);
    
    # are we better or worse relative to the risk-free investment?
    if (W_ticker >= W_risk_free)
        result_flag = 1;
    end

    # default -
    return result_flag;
end;

Set the `risk_free_rate` variable:

In [8]:
risk_free_rate = 0.047;

### `TODO`: Setup a single ticker-picker agent

First, let's specify the tickers that we want to examine in the `tickers` array, and store the number of tickers in the `K` variable:

In [9]:
tickers = ["MRK", "JNJ", "MET", "NFLX", "AAPL", "AMD", "MU", "INTC", "MSFT", "SPY", "SPYD", "MMM",
    "UNH", "JPM", "OXY", "TSLA", "PEP", "LMT", "CMCSA", "ECL", "SRE", "BAC", "C", "WFC", "QQQ", "KR", "NOC", "GS"];
K = length(tickers);

Next, we construct the `EpsilonSamplingModel` instance which holds information about the ϵ-greedy sampling approach. The `EpsilonSamplingModel` type has one additional field, the `ϵ` field which controls the approximate fraction of `exploration` steps the algorithm takes; `exploration` steps are purely random.

In [20]:
model = build(EpsilonSamplingModel, (
        K = K, 
        α = ones(K), # initialize to uniform values
        β = ones(K), # initialize to uniform values
        ϵ = 0.3
));

## __Task 2__: Run a single `ticker-picker` agent and explore its preferences
Let's run a single `ticker-picker` agent and examine what it returns using the `sample(...)` function. 
* The `sample(...)` function takes the agent model `model::EpsilonSamplingModel`, the `world::Function`, the cleaned `dataset` and your list of `tickers`, along with the `horizon` parameter, i.e., how many iterations we want the search to run for,  and the `risk_free_rate`. 
* The `sample(...)` function returns a dictionary holding the $(\alpha,\beta)$ parameters for each ticker (values) for iteration (keys).  

In [11]:
time_sample_results_dict_eps = sample(model, world, dataset, tickers; 
    horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate)

Dict{Int64, Matrix{Float64}} with 1507 entries:
  1144 => [11.0 14.0; 3.0 7.0; … ; 7.0 9.0; 1.0 5.0]
  1175 => [11.0 14.0; 3.0 7.0; … ; 7.0 9.0; 1.0 5.0]
  719  => [10.0 11.0; 1.0 4.0; … ; 3.0 6.0; 1.0 5.0]
  1028 => [10.0 13.0; 3.0 5.0; … ; 7.0 8.0; 1.0 5.0]
  699  => [10.0 11.0; 1.0 4.0; … ; 3.0 6.0; 1.0 5.0]
  831  => [10.0 12.0; 1.0 4.0; … ; 5.0 6.0; 1.0 5.0]
  1299 => [11.0 15.0; 3.0 7.0; … ; 8.0 11.0; 1.0 5.0]
  1438 => [11.0 16.0; 3.0 8.0; … ; 8.0 12.0; 1.0 5.0]
  1074 => [10.0 13.0; 3.0 6.0; … ; 7.0 8.0; 1.0 5.0]
  319  => [5.0 5.0; 1.0 4.0; … ; 3.0 5.0; 1.0 3.0]
  687  => [10.0 11.0; 1.0 4.0; … ; 3.0 6.0; 1.0 5.0]
  1199 => [11.0 15.0; 3.0 7.0; … ; 7.0 9.0; 1.0 5.0]
  185  => [5.0 3.0; 1.0 4.0; … ; 3.0 3.0; 1.0 2.0]
  823  => [10.0 12.0; 1.0 4.0; … ; 5.0 6.0; 1.0 5.0]
  1090 => [11.0 13.0; 3.0 7.0; … ; 7.0 8.0; 1.0 5.0]
  420  => [6.0 5.0; 1.0 4.0; … ; 3.0 5.0; 1.0 4.0]
  1370 => [11.0 16.0; 3.0 7.0; … ; 8.0 12.0; 1.0 5.0]
  1437 => [11.0 16.0; 3.0 8.0; … ; 8.0 12.0; 1.0 5.0]


In [12]:
Z = time_sample_results_dict_eps[1479]

28×2 Matrix{Float64}:
 11.0  16.0
  3.0   8.0
  5.0  10.0
 25.0  23.0
  1.0   5.0
 46.0  33.0
 17.0  17.0
 89.0  59.0
 42.0  28.0
 27.0  26.0
 30.0  26.0
  5.0  10.0
  4.0   8.0
  ⋮    
  2.0   5.0
 10.0  13.0
  2.0   8.0
  3.0   9.0
 31.0  26.0
 26.0  22.0
 29.0  27.0
 14.0  16.0
 65.0  46.0
 18.0  17.0
  8.0  12.0
  1.0   5.0

## __Task 3__: Run a collection of `N` ticker-picker agents and examine their preferences
Repeat the single-agent analysis with `N` agents by running the `sample(...)` method inside a `for` loop. We'll store the results of the last time point in the `agent_specific_data::Array{Beta,2}(undef, K, number_of_agents)` array. 
* The `agent_specific_data` array holds the `Beta` distributions for each agent, i.e., it holds the preferences for each agent (cols) for each `ticker` in our collection (rows).

In [13]:
number_of_agents = 10000;
trading_day_index = 1479
agent_specific_data = Array{Beta,2}(undef, K, number_of_agents);

for agent_index ∈ 1:number_of_agents
    
    # sample -
    time_sample_results_dict_eps = sample(model, world, dataset, tickers; horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate);
    beta_array = build_beta_array(time_sample_results_dict_eps[trading_day_index]);

    # grab data for this agent -
    for k = 1:K
        agent_specific_data[k, agent_index] = beta_array[k]
    end
end

In [14]:
agent_specific_data[1,:] # data for each agent for ticker 1 for all the agents

10000-element Vector{Beta}:
 Beta{Float64}(α=19.0, β=21.0)
 Beta{Float64}(α=12.0, β=16.0)
 Beta{Float64}(α=4.0, β=8.0)
 Beta{Float64}(α=5.0, β=8.0)
 Beta{Float64}(α=5.0, β=9.0)
 Beta{Float64}(α=14.0, β=18.0)
 Beta{Float64}(α=16.0, β=16.0)
 Beta{Float64}(α=2.0, β=7.0)
 Beta{Float64}(α=33.0, β=27.0)
 Beta{Float64}(α=28.0, β=26.0)
 Beta{Float64}(α=19.0, β=19.0)
 Beta{Float64}(α=11.0, β=13.0)
 Beta{Float64}(α=9.0, β=14.0)
 ⋮
 Beta{Float64}(α=19.0, β=18.0)
 Beta{Float64}(α=29.0, β=28.0)
 Beta{Float64}(α=31.0, β=24.0)
 Beta{Float64}(α=36.0, β=30.0)
 Beta{Float64}(α=14.0, β=17.0)
 Beta{Float64}(α=4.0, β=9.0)
 Beta{Float64}(α=5.0, β=11.0)
 Beta{Float64}(α=18.0, β=16.0)
 Beta{Float64}(α=49.0, β=42.0)
 Beta{Float64}(α=24.0, β=21.0)
 Beta{Float64}(α=13.0, β=13.0)
 Beta{Float64}(α=3.0, β=9.0)

## The wisdom of the collective
Now that we have prefernces for the `N` agents (encoded as `Beta` distributions for each ticker), let's develop a concencous belief in which tickers to include in our portfolio $\mathcal{P}$. First, let's compute the agent-specific rank of eack ticker, where `rank = 1` is the best, and `rank = K` is the worst. We'll store these values in the `preference_rank_array` array.

In [15]:
preference_rank_array = Array{Int,2}(undef, number_of_agents, K);
for agent ∈ 1:number_of_agents
        
    # ask an agent about thier preference for ticker i -
    experience_distributions = agent_specific_data[:,agent]
    preference_vector = preference(experience_distributions, tickers) .|> x-> trunc(Int64,x) # trunc function is cool!
    
    # package -
    for i ∈ 1:K
        preference_rank_array[agent, i] = preference_vector[i];
    end
end
preference_rank_array

10000×28 Matrix{Int64}:
 16  11  20   2   7   4  23  14   6  …   3  11  11  18  17  15  26  11  25
 19  16   2   9  19  25   5  22   3     15  24  19  14  11  27  23  26   4
 23   8   1  15   8  26  22  15  11     19   7  17   3  13   5  27  10  23
 23  25   3   8  12  15  18  11   6     19   1   4  17  22  26   7  20  16
 22   8   4  15  15   5  28  21   9     18  15   3  23   1  26  24   2   7
 20  28   7  15  13  25   5  25  21  …  19  17   3  24  14  16   4  11  27
 17   6   2  17   5  25  26   9   3     12  10  14  17  15  13  21  28   8
 26  10  16  20  13  21  28  18   2      7  13   8  23  17   5  21  23  10
  5  22   7  11   8  19  24   4  23      1   3  14  28   2  26  18  21  12
  6   2  25  18  16  23   8   4  11     21   5  23  12   1   8   8  17  20
  9   2  17   9  14   5  25  21  19  …   4  16  18  26  20  13  27  28  22
 18  23   8  20  12  14  18  28   3     11  10   5  27  24  13  26   2  22
 21  27  27  19  11  14  20  14   8      6   1  14  26  17   4   5  16  18
 

### Compute the frequency dictionary
Let's count the times a `ticker` is ranked in the `top 10` across the `N` agents and then normalize by the number of agents, i.e., compute the frequency of being ranked in the `top 10`. We'll store this value in the `frequency_dictionary,` where the `ticker` is the key, and the frequency is the value.
* `Hypothesis`: Those `tickers` with high rank are likelier to beat an alternative `risk-free` investment, while low rank `tickers` do not outperform a `risk-free` alternative investment. 

In [16]:
frequency_dictionary = Dict{String,Float64}();
for i ∈ eachindex(tickers)
    
    # compute the frequency -
    freq = findall(x-> x ≤ 10, preference_rank_array[:,i]) |> x-> length(x) |> x-> x/number_of_agents
    
    # get the ticker -
    ticker = tickers[i];
    frequency_dictionary[ticker] = freq;
end

In [17]:
frequency_dictionary

Dict{String, Float64} with 28 entries:
  "MSFT"  => 0.525
  "JPM"   => 0.3714
  "C"     => 0.2594
  "MRK"   => 0.3556
  "UNH"   => 0.3219
  "SPY"   => 0.4807
  "BAC"   => 0.3408
  "MU"    => 0.2818
  "SRE"   => 0.3943
  "TSLA"  => 0.407
  "KR"    => 0.3857
  "AMD"   => 0.4083
  "INTC"  => 0.3138
  "MMM"   => 0.2498
  "NOC"   => 0.2808
  "ECL"   => 0.3352
  "LMT"   => 0.2935
  "GS"    => 0.2728
  "OXY"   => 0.3009
  "CMCSA" => 0.2793
  "JNJ"   => 0.3248
  "SPYD"  => 0.3519
  "NFLX"  => 0.3189
  "WFC"   => 0.3693
  "PEP"   => 0.3854
  ⋮       => ⋮

## Disclaimer and Risks
__This content is offered solely for training and informational purposes__. No offer or solicitation to buy or sell securities or derivative products or any investment or trading advice or strategy is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on evaluating your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.