# Installing the Packge
This notebook introduces the basics of using Bandits.jl package.

First install `Bandits.jl` as below:

In [None]:
Pkg.add( "Bandits.jl" )

# Usage
To start using the package,

In [None]:
using Bandits

The package is divide into 3 sub modules:
* Algorithms
* Arms
* Experiments

`Algorithms` includes all the available algorithms. `Arms` includes the available arm model. `Experiments` is supposed to include code for running experiments and is a work under progress.

## Defining a bandit
For this demo, first we will create a bandit with 5 Bernoulli arms. It is not necessary to create a bandit with arms specified as below to use algorithms, but these kind of bandits can be used to benchmark new algorithms.
A bandit is an array of arms and can be defined as below:

In [None]:
bandit = [ Arms.Bernoulli(0.30),
            Arms.Bernoulli(0.40),
            Arms.Bernoulli(0.50),
            Arms.Bernoulli(0.60),
            Arms.Bernoulli(0.70)  ]

We can check the number of arms of the bandit as

In [None]:
no_of_arms = length( bandit )

So we have a 5-arm Bernoulli Bandit.

Each arm has 3 functions associated with it:
* `pull!()` - To pull an arm. Returns a reward associated with the arm. pull!() also changes the internal state of the arm if it is a Markovian/Non-stationary arms.
* `tick!()` - To change the internal state of a Markovian/Non-stationary arm. Not necessarily return something. May return junk depending on the internal state.
* `reset!()` - To reset the internal state of a Markovian/Non-stationary arm.

To pull an 1st arm,

In [None]:
Arms.pull!( bandit[1] )

Above line will randomly return 0/1 according to the underlying probability distribution.

## Using Algorithms
Every algorithm has specific initializers which depends on the parameters for the algorithms. Usually the first parameter of the initialization is the number of arms of the bandit. As an example, an instance of $\epsilon$-Greedy algorithm with $\epsilon = 0.10$ can be created as

In [None]:
alg1 = Algorithms.epsGreedy( no_of_arms, 0.10 )

Now, let's run an experiment of $750$ timesteps and average it over $1000$ runs to get an average behaviour.

In [None]:
noOfRounds    = 1000
noOfTimesteps = 750

Create an array to hold the results of each round of play.

In [None]:
observations  = zeros( noOfTimesteps, noOfRounds );

Now we can run the $\epsilon$-Greedy algorithm over this bandit as below:

In [None]:
for _round = 1:noOfRounds
    Algorithms.reset!( alg1 )
    for arm ∈ bandit
        Arms.reset!( arm )
    end
    for _n = 1:noOfTimesteps
        armToPull = Algorithms.getArmIndex( alg1 )
        reward    = Arms.pull!( bandit[armToPull] )
        Algorithms.updateReward!( alg1, reward )
        observations[_n,_round] = reward
    end
end

Above code runs the algorithm for 250 time steps and save the obtained rewards into `observations`. Note that we need to `reset!()` the algorithm and arms between each rounds.

To see the average behaviour, we can plot the average reward.

In [None]:
avgReward = mean( observations, 2 )

using PyPlot
plot( 1:noOfTimesteps, avgReward, label = Algorithms.info_str(alg1) )
legend()
grid()
PyPlot.ylabel( "Avg. Reward" )
PyPlot.xlabel( "Timesteps" );

Congrats!! You have successfully completed the basics of Bandits.jl package. You can to the documentation page of the package to explore available algorithms and arm models.

# Comparing Multiple Algorithms
You can also compare performance of multiple algorithms easily with the package. First we'll look into the actual code for doing it. Later we will look into the `Experiments` section to automate this function.

As above, we'll start with defining a bandit

In [None]:
bandit1 = [ Arms.Bernoulli(0.25), Arms.Bernoulli(0.35), Arms.Bernoulli(0.45),
            Arms.Bernoulli(0.55), Arms.Bernoulli(0.65), Arms.Bernoulli(0.75) ]

no_of_arms = length( bandit1 )

We can define an array of algorithms which we want to test along the associated parameters as

In [None]:
test_algs = [ Algorithms.epsGreedy( no_of_arms, 0.10),
              Algorithms.epsGreedy( no_of_arms, 0.20),
              Algorithms.UCB1( no_of_arms ),
              Algorithms.TS( no_of_arms )    ];

We can run the experiment as:

In [None]:
fig = figure()

for _alg ∈ test_algs
    observations = zeros( noOfTimesteps, noOfRounds )
    for _r = 1:noOfRounds
        Algorithms.reset!( _alg )
        for _arm ∈ bandit1
            Arms.reset!( _arm )
        end
        for _n = 1:noOfTimesteps
            armToPull = Algorithms.getArmIndex( _alg )
            reward    = Arms.pull!( bandit1[armToPull] )
            Algorithms.updateReward!( _alg, reward )
            
            observations[_n,_r] = reward
        end
    end
    avgReward = mean( observations, 2 );
    plot( 1:noOfTimesteps, avgReward, label = Algorithms.info_str(_alg) )
end
ylabel( "Avg. Reward" )
xlabel( "Timesteps" )
title( "Comparison Plot (Averaged over $noOfRounds runs)" )
ax = gca()
ax[:set_ylim]( [0.00, 1.00] )
legend()
grid()

You explore the above code by changing the bandit model and the algorithms to compare.