## `Example`: Developing a Bernoulli Binary Bandit Ticker Picker Agent
In this example, we'll develop a collection of agents to advise us which stocks we should include in a portfolio $\mathcal{P}$ by solving a Multi-arm Binary Bandit problem using [ϵ-Greedy Thompson Sampling](https://arxiv.org/abs/1707.02038). The `K`-actions (bandits) are the potential `tickers` in the portfolio $\mathcal{P}$. After their analysis, we'll ask the agents to rank-order their preferred `tickers`. 

### Problem
* We have `N` agents independently analyzing daily Open High Low Close (OHLC) data sequences and rank-ordering their belief that the ticker `XYZ` will return at least the risk-free rate in the next step. 
* We then sample the `world`. If the ticker `XYZ` returns greater than or equal to the risk-free rate in the sample period, the agent receives a reward of `+1`. Otherwise, the agent receives a reward of `0`.
* Each agent develops a distribution of beliefs for the probability $p_{a}$, i.e., the probability that `ticker` will beat the risk-free rate based on this experimentation. These beliefs are encoded in the parameters of a [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution)

## Setup

In [1]:
include("Include.jl");

## Prerequisites: Load and clean the historical dataset
We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) since `01-03-2018` until `12-29-2023`, along with data for a few exchange traded funds and volatility products during that time. 

In [2]:
original_dataset = MyMarketDataSet() |> x-> x["dataset"];

### Clean the data
Not all of the tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquistion or de-listing events. Let's collect only those tickers with the maximum number of trading days.

* First, let's compute the number of records for a company that we know has a maximim value, e.g., `AAPL` and save that value in the `maximum_number_trading_days` variable:

In [3]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow

1508

Now, lets iterate through our data and collect only those tickers that have `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [4]:
dataset = Dict{String,DataFrame}();
for (ticker,data) ∈ original_dataset
    if (nrow(data) == maximum_number_trading_days)
        dataset[ticker] = data;
    end
end
dataset;

Let's get a list of firms that we have in cleaned up `dataset`, and save it in the `all_tickers` array:

In [5]:
all_tickers = keys(dataset) |> collect |> sort;
K = length(all_tickers)

460

## Initialize the `world` function
The world function takes action `a`, i.e., chooses a `ticker` from the collection of stocks/ETFs that we are exploring, and asks whether or not the return of this `ticker` at the end of today was greater or equal to the risk-free rate. 
* If the daily return is greater than or equal to the risk-free rate, the `agent` receives a reward of `r = 1`. Otherwise, the `agent` receives a `r = 0` reward.

In [6]:
function world(action::Int64, start::Int64, data::Dict{String, DataFrame}, tickers::Array{String,1}; 
        Δt::Float64 = (1.0/252.0), risk_free_rate::Float64 = 0.05, buffersize::Int64 = 1)::Int64

    # initialize -
    result_flag = 0;

    # daily risk free rate -
    T = buffersize*Δt;

    # grab the ticker we are looking at?
    ticker_symbol = tickers[action];
    
    # compute the expected return over the horizon -
    price_df = data[ticker_symbol];
    time = range(start+1,(start+buffersize), step=1) |> collect
    buffer = Array{Float64,1}();
    for t ∈ time
        P₁ = price_df[t-1,  :volume_weighted_average_price]
        P₂ = price_df[t, :volume_weighted_average_price]
        R = (1/Δt)*log(P₂/P₁);
        push!(buffer,R);
    end
    μ = mean(buffer);

    # if we invested 1 USD, in each how much would we have at the end of horizon -
    W_risk_free = exp(risk_free_rate*T);
    W_ticker = exp(μ*T);
    
    # are we better or worse relative to the risk-free investment?
    if (W_ticker >= W_risk_free)
        result_flag = 1;
    end

    # default -
    return result_flag;
end;

Set the `risk_free_rate` variable:

In [7]:
risk_free_rate = 0.047;

## Setup the `ticker-picker` agent

First, let's specify the tickers that we want to examine in the `tickers` array, and store the number of tickers in the `K` variable:

In [8]:
tickers = ["MRK", "JNJ", "MET", "NFLX", "AAPL", "AMD", "MU", "INTC", "MSFT", "SPY", "SPYD", "MMM",
    "UNH", "JPM", "OXY", "TSLA", "PEP", "LMT", "CMCSA", "ECL", "SRE", "BAC", "C", "WFC", "QQQ", "KR", "NOC", "GS"];
K = length(tickers);

Next, we construct the `EpsilonSamplingModel` instance which holds information about the ϵ-greedy sampling approach. The `EpsilonSamplingModel` type has one additional field, the `ϵ` field which controls the approximate fraction of `exploration` steps the algorithm takes; `exploration` steps are purely random.

In [9]:
model = EpsilonSamplingModel()
model.K = K; # tickers
model.α = ones(K); # initialize to uniform values
model.β = ones(K); # initialize to uniform values
model.ϵ = 0.3;

## Run a single `ticker-picker` agent and explore its preferences
Let's run a single `ticker-picker` agent and examine what it returns using the `sample(...)` function. 
* The `sample(...)` function takes the agent model `model::EpsilonSamplingModel`, the `world::Function`, the cleaned `dataset` and your list of `tickers`, along with the `horizon` parameter, i.e., how many iterations we want the search to run for,  and the `risk_free_rate`. 
* The `sample(...)` function returns a dictionary holding the $(\alpha,\beta)$ parameters for each ticker (values) for iteration (keys).  

In [10]:
time_sample_results_dict_eps = sample(model, world, dataset, tickers; 
    horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate)

Dict{Int64, Matrix{Float64}} with 1507 entries:
  1144 => [13.0 9.0; 11.0 13.0; … ; 25.0 21.0; 15.0 12.0]
  1175 => [14.0 10.0; 11.0 13.0; … ; 26.0 21.0; 15.0 12.0]
  719  => [8.0 7.0; 9.0 11.0; … ; 22.0 16.0; 5.0 6.0]
  1028 => [8.0 9.0; 11.0 13.0; … ; 25.0 21.0; 13.0 12.0]
  699  => [8.0 7.0; 9.0 11.0; … ; 22.0 14.0; 5.0 6.0]
  831  => [8.0 9.0; 11.0 12.0; … ; 23.0 17.0; 7.0 6.0]
  1299 => [19.0 15.0; 11.0 13.0; … ; 26.0 21.0; 18.0 14.0]
  1438 => [24.0 18.0; 11.0 13.0; … ; 28.0 23.0; 22.0 17.0]
  1074 => [9.0 9.0; 11.0 13.0; … ; 25.0 21.0; 14.0 12.0]
  319  => [4.0 3.0; 3.0 3.0; … ; 12.0 5.0; 4.0 4.0]
  687  => [8.0 7.0; 9.0 11.0; … ; 22.0 14.0; 5.0 6.0]
  1199 => [16.0 13.0; 11.0 13.0; … ; 26.0 21.0; 16.0 13.0]
  185  => [1.0 2.0; 2.0 3.0; … ; 2.0 2.0; 3.0 3.0]
  823  => [8.0 9.0; 11.0 12.0; … ; 23.0 17.0; 7.0 6.0]
  1090 => [11.0 9.0; 11.0 13.0; … ; 25.0 21.0; 14.0 12.0]
  420  => [4.0 4.0; 8.0 8.0; … ; 15.0 9.0; 4.0 5.0]
  1370 => [19.0 17.0; 11.0 13.0; … ; 28.0 21.0; 21.0 15.0]


In [11]:
Z = time_sample_results_dict_eps[1479]

28×2 Matrix{Float64}:
 24.0  18.0
 11.0  14.0
 18.0  16.0
 13.0  13.0
 29.0  22.0
  9.0  11.0
 13.0  13.0
  2.0   6.0
 10.0  15.0
 10.0  12.0
  7.0  11.0
 33.0  26.0
 38.0  28.0
  ⋮    
 51.0  38.0
 14.0  14.0
  4.0  10.0
 19.0  17.0
 12.0  14.0
 26.0  19.0
 37.0  30.0
 33.0  26.0
 29.0  24.0
 28.0  20.0
 28.0  24.0
 24.0  18.0

## Run a collection of `ticker-picker` agents and examine their preferences
Repeat the single-agent analysis with `N` agents by running the `sample(...)` method inside a `for` loop. We'll store the results of the last time point in the `agent_specific_data::Array{Beta,2}(undef, K, number_of_agents)` array. 
* The `agent_specific_data` array holds the `Beta` distributions for each agent, i.e., it holds the preferences for each agent (cols) for each `ticker` in our collection (rows).

In [12]:
number_of_agents = 10000;
trading_day_index = 1479
agent_specific_data = Array{Beta,2}(undef, K, number_of_agents);

for agent_index ∈ 1:number_of_agents
    
    # sample -
    time_sample_results_dict_eps = sample(model, world, dataset, tickers; horizon = (maximum_number_trading_days - 1), risk_free_rate = risk_free_rate);
    beta_array = build_beta_array(time_sample_results_dict_eps[trading_day_index]);

    # grab data for this agent -
    for k = 1:K
        agent_specific_data[k, agent_index] = beta_array[k]
    end
end

In [13]:
agent_specific_data[1,:] # data for each agent for ticker 1 for all the agents

10000-element Vector{Beta}:
 Beta{Float64}(α=33.0, β=32.0)
 Beta{Float64}(α=8.0, β=11.0)
 Beta{Float64}(α=26.0, β=23.0)
 Beta{Float64}(α=7.0, β=10.0)
 Beta{Float64}(α=20.0, β=18.0)
 Beta{Float64}(α=51.0, β=34.0)
 Beta{Float64}(α=11.0, β=16.0)
 Beta{Float64}(α=2.0, β=7.0)
 Beta{Float64}(α=6.0, β=12.0)
 Beta{Float64}(α=26.0, β=24.0)
 Beta{Float64}(α=21.0, β=20.0)
 Beta{Float64}(α=1.0, β=6.0)
 Beta{Float64}(α=44.0, β=29.0)
 ⋮
 Beta{Float64}(α=45.0, β=29.0)
 Beta{Float64}(α=9.0, β=12.0)
 Beta{Float64}(α=13.0, β=16.0)
 Beta{Float64}(α=53.0, β=35.0)
 Beta{Float64}(α=47.0, β=38.0)
 Beta{Float64}(α=17.0, β=16.0)
 Beta{Float64}(α=17.0, β=12.0)
 Beta{Float64}(α=11.0, β=13.0)
 Beta{Float64}(α=5.0, β=10.0)
 Beta{Float64}(α=2.0, β=7.0)
 Beta{Float64}(α=28.0, β=23.0)
 Beta{Float64}(α=30.0, β=30.0)

## The wisdom of the collective
Now that we have prefernces for the `N` agents (encoded as `Beta` distributions for each ticker), let's develop a concencous belief in which tickers to include in our portfolio $\mathcal{P}$. First, let's compute the agent-specific rank of eack ticker, where `rank = 1` is the best, and `rank = K` is the worst. We'll store these values in the `preference_rank_array` array.

In [14]:
preference_rank_array = Array{Int,2}(undef, number_of_agents, K);
for agent ∈ 1:number_of_agents
        
    # ask an agent about thier preference for ticker i -
    experience_distributions = agent_specific_data[:,agent]
    preference_vector = preference(experience_distributions, tickers) .|> x-> trunc(Int64,x) # trunc function is cool!
    
    # package -
    for i ∈ 1:K
        preference_rank_array[agent, i] = preference_vector[i];
    end
end
preference_rank_array

10000×28 Matrix{Int64}:
  7  16  27  26   6  19  25  14   1  …   3   9   4  23  17  13  20  22  21
 21   9  25  16  21  20   2  26   6     16  11  24  19  11   7  18  16  14
  6   5   9  22  25  12  12  14   9     24   1  23  17   4  15  20  11  27
 19  24  21   8   1  13   7  27  23      2  25  10  15  15  17   3  26  20
 11  25  28  18  20  17   2  27  12     15  24  21   9   7   9  22   3  14
  3  20   1  21   4   7  19  25  27  …  10  18  12  14   7  13  26   5  24
 18  27   2  19  17   7   6  11  19      4   2  25  23  16   2  23  10  13
 27   7   2   4  16  15  26  13  11     28   8  23  22  24   5  17   6  21
 24  11   2   8   6   9  24  18  21     13   1  28  17   3  26   5  20  26
  6   5  26   3  17  10  13  28  21     14  27   7  18   2   9   4  14  18
 11  18  16   4   5  22   1   2  25  …  19   9  21  13  13  10  17  26   7
 27  26   1  14   3  12  11  21  20     23   4  24  22  18   8   6  15  10
  1   4  18   8  21  26  19  23  11      3   9  13  22  10   2  25  27  28
 

### Compute the frequency dictionary
Let's count the times a `ticker` is ranked in the `top 10` across the `N` agents and then normalize by the number of agents, i.e., compute the frequency of being ranked in the `top 10`. We'll store this value in the `frequency_dictionary,` where the `ticker` is the key, and the frequency is the value.
* `Hypothesis`: Those `tickers` with high rank are likelier to beat an alternative `risk-free` investment, while low rank `tickers` do not outperform a `risk-free` alternative investment. 

In [15]:
frequency_dictionary = Dict{String,Float64}();
for i ∈ eachindex(tickers)
    
    # compute the frequency -
    freq = findall(x-> x ≤ 10, preference_rank_array[:,i]) |> x-> length(x) |> x-> x/number_of_agents
    
    # get the ticker -
    ticker = tickers[i];
    frequency_dictionary[ticker] = freq;
end

In [16]:
frequency_dictionary

Dict{String, Float64} with 28 entries:
  "MSFT"  => 0.5426
  "JPM"   => 0.3805
  "C"     => 0.248
  "MRK"   => 0.36
  "UNH"   => 0.3199
  "SPY"   => 0.4824
  "BAC"   => 0.3398
  "MU"    => 0.2878
  "SRE"   => 0.3922
  "TSLA"  => 0.404
  "KR"    => 0.39
  "AMD"   => 0.4057
  "INTC"  => 0.3212
  "MMM"   => 0.2587
  "NOC"   => 0.2922
  "ECL"   => 0.3266
  "LMT"   => 0.2731
  "GS"    => 0.2671
  "OXY"   => 0.2941
  "CMCSA" => 0.2858
  "JNJ"   => 0.3156
  "SPYD"  => 0.3549
  "NFLX"  => 0.3276
  "WFC"   => 0.3629
  "PEP"   => 0.3782
  ⋮       => ⋮

### Disclaimer and Risks
__This content is offered solely for training and  informational purposes__. No offer or solicitation to buy or sell securities or derivative products, or any investment or trading advice or strategy,  is made, given, or endorsed by the teaching team. 

__Trading involves risk__. Carefully review your financial situation before investing in securities, futures contracts, options, or commodity interests. Past performance, whether actual or indicated by historical tests of strategies, is no guarantee of future performance or success. Trading is generally inappropriate for someone with limited resources, investment or trading experience, or a low-risk tolerance.  Only risk capital that is not required for living expenses.

__You are fully responsible for any investment or trading decisions you make__. Such decisions should be based solely on your evaluation of your financial circumstances, investment or trading objectives, risk tolerance, and liquidity needs.