_**Monte Carlo Simulations are like Unit tests for Bandit Algorithms:-**_

- Bandit algorithms have to actively select which data you should acquire and analyze that data in real time. Indeed, bandit algorithms exemplify two types of learning that are not present in standard ML examples: active learning, which refers to algorithms that actively select which data they should receive; and online learning, which refers to algorithms that analyze data in real-time and provide results on the fly.
- The behavior of the algorithm depends on the data it sees, but the data the algorithm sees depends on the behavior of the algorithm.

- A Monte Carlo simulation will let out implementation of a bandit algorithm actively make decisions about which data it will receive, because our simulations will be able to provide simulated data in real-time to the algorithm for analysis.

- In short, we are going to deal with the feedback cycle by coding up both our bandit algorithm and a simulation of the bandit's arms that the algorithm has to select between.
- The two pieces of code then work together to generate an example of how the algorithm might really function in production.


_** Optimizing click-through rates **_
- Every time we show someone an ad, we will imagine that thers's a fixed probability that they will click on the ad. The bandit algorithm will then estimate this probability and try to decide on a strategy for showing ads that maximizes the click-through rate.
_**Conversion rates for new users**_
- Every time a new visitor comes to our site who is not already a registered user, we will imagine that there's a fixed probability that they will register as a user after seeing the landing page. We will then estimate this probability and try to decide on a strategy for maximizing our conversion rate.


In [1]:

class BernoulliArm():
    '''
        Simulating hypothetical arm
        BernoulliArm :- An arm that rewards you with a value of 1 some percentage of the time and rewards you with a value of 0 
        the rest of the time ( 1 - x )
    '''
    def __init__(self, p):
        self.p = p
    def __repr__(self):
        return( 'BernoulliArm probability: {0:.2f}'.format(self.p) )
    def draw(self):
        if random.random() > self.p:
            return 0.0 
        else:
            return 1.0
    
BernoulliArm(0.2)

BernoulliArm probability: 0.20

In [2]:
def testing_algorithm(  algo, arms, num_sims, horizon ):
        '''
            algo : A bandit algorithm we want to test ( epsilon-greedy, UCB, Softmax, ...)
            arms : An array of arms we want to simulate draws from
            num_sims : A fixed number of simulations to run to average over the noise in each simulation
            horizon : The number of times each algorithm is allowed to pull on arms during each simulation. Any algorithm that's
            not terrible will eventually learn which arm is best; the interesting thing to study in a simulation is whether
            an algorithm does well when it only has 100 ( or 100k ) tries to find the best arm
        '''
        chosen_arms = [ 0.0 for i in range(num_sims * horizon) ] 
        rewards = [ 0.0 for i in range( num_sims *horizon)]
        cumulative_rewards = [ 0.0 for i in range(num_sims * horizon) ]
        sim_nums = [ 0.0 for i in range(num_sims*horizon)]
        times = [ 0.0 for i in range(num_sims*horizon)]
        
        for sim in range( num_sims ):
            sim += 1 # because range generates numbers from 0
            algo.initialize( len(arms) )
            
            for t in range( horizon) :
                t += 1
                index = ( sim - 1 ) * horizon + t - 1 
                
                sim_nums[index] = sim
                times[index] = t
                
                chosen_arm = algo.select_arm() # draws based on epsilon
                chosen_arms[index] = chosen_arm
                
                reward = arms[chosen_arms[index]].draw() # draws based on success rate
                rewards[index] = reward
                
                if t == 1 :
                    cumulative_rewards[index] = reward
                else:
                    cumulative_rewards[index] = cumulative_rewards[index-1] + reward
                    
                algo.update( chosen_arm, reward)
        return [ sim_nums, times, chosen_arms, rewards, cumulative_rewards]

In [6]:
'''
    Simulation for standard epsilon-greedy algorithm with 5 arms, whose esimated sucess rates : [0.1, 0.1, 0.1, 0.1, 0.9]
'''
import numpy
import random
import math

import nbimporter
import epsilonGreedy

from epsilonGreedy import EpsilonGreedy

random.seed(1)

means = [0.1, 0.1, 0.1, 0.1, 0.9]
n_arms = len(means)
random.shuffle(means)
arms = [ BernoulliArm(mu) for mu in  means ]
print("Best arms is {0} with success rate of {1:.1f}". format( means.index( max(means) ) ,  max(means)))

fPointer = open('D:/BanditsSimulationDataSets/epsilon-greedy-with-multiple-epsilons.tsv', 'w')
for epsilon in [ 0.1, 0.2, 0.3, 0.4, 0.5] :
    algo = epsilonGreedy.EpsilonGreedy( epsilon )
    algo.initialize ( n_arms )
    results = testing_algorithm( algo, arms, 5000, 250)
    for i in range( len(results[0])):
        fPointer.write( str(epsilon) + '\t' )
        fPointer.write( '\t'.join( [ str( results[j][i] ) for j in range( len(results) )])  + '\n')
fPointer.close()

Best arms is 2 with success rate of 0.9


In [7]:
'''
    Simulation for annealing epsilon-greedy algorithm with 5 arms, whose esimated sucess rates : [0.1, 0.1, 0.1, 0.1, 0.9]
'''

import numpy
import random
import math

import nbimporter
import epsilonGreedy

from epsilonGreedy import AnnealingEpsilonGreedy
from DebuggingBanditAlgorithms import BernoulliArm
from DebuggingBanditAlgorithms import testing_algorithm
random.seed(1)
means = [ 0.1, 0.1, 0.1, 0.1, 0.9]
n_arms = len(means)
random.shuffle(means)
arms = [  BernoulliArm(mu) for mu in means ]

f = open( 'D:/BanditsSimulationDataSets/annealing_epsilon_greedy_results.tsv', 'w')

algo = AnnealingEpsilonGreedy ( )
algo.initialize(n_arms)
results = testing_algorithm ( algo, arms, 5000, 250 )
for i in range( len(results[0]) ):
    f.write( '\t'.join( [ str( results[j][i]) for j in range( len(results) ) ] )  + '\n')
f.close()

In [9]:
'''
    Simulation for standard softmax algorithm with 5 arms, whose esimated sucess rates : [0.1, 0.1, 0.1, 0.1, 0.9]
'''
import numpy
import random
import math

import nbimporter
import epsilonGreedy

from SoftMax import SoftMax
from DebuggingBanditAlgorithms import BernoulliArm
from DebuggingBanditAlgorithms import testing_algorithm
random.seed(1)
means = [ 0.1, 0.1, 0.1, 0.1, 0.9]
n_arms = len(means)
random.shuffle(means)
arms = [  BernoulliArm(mu) for mu in means ]

f = open( 'D:/BanditsSimulationDataSets//standard_softmax_results.tsv', 'w')
for temperature in [ 0.1, 0.2, 0.3, 0.4, 0.5] :
    algo = SoftMax ( temperature )
    algo.initialize(n_arms)
    results = testing_algorithm ( algo, arms, 5000, 250 )
    for i in range( len(results[0]) ):
        f.write( str(temperature)  + '\t')
        f.write( '\t'.join( [ str( results[j][i]) for j in range( len(results) ) ] )  + '\n')
f.close()

In [10]:
'''
    Simulation for annealing epsilon-greedy algorithm with 5 arms, whose esimated sucess rates : [0.1, 0.1, 0.1, 0.1, 0.9]
'''

import numpy
import random
import math

import nbimporter
import SoftMax

from SoftMax import AnnealingSoftMax
from DebuggingBanditAlgorithms import BernoulliArm
from DebuggingBanditAlgorithms import testing_algorithm
random.seed(1)
means = [ 0.1, 0.1, 0.1, 0.1, 0.9]
n_arms = len(means)
random.shuffle(means)
arms = [  BernoulliArm(mu) for mu in means ]

f = open( 'D:/BanditsSimulationDataSets/annealing_softmax_results.tsv', 'w')

algo = AnnealingSoftMax ( )
algo.initialize(n_arms)
results = testing_algorithm ( algo, arms, 5000, 250 )
for i in range( len(results[0]) ):
    f.write( '\t'.join( [ str( results[j][i]) for j in range( len(results) ) ] )  + '\n')
f.close()

In [1]:
'''
    Simulation for annealing epsilon-greedy algorithm with 5 arms, whose esimated sucess rates : [0.1, 0.1, 0.1, 0.1, 0.9]
'''

import numpy
import random
import math

import nbimporter
import UpperConfidenceBound

from UpperConfidenceBound import UCB1
from DebuggingBanditAlgorithms import BernoulliArm
from DebuggingBanditAlgorithms import testing_algorithm
random.seed(1)
means = [ 0.1, 0.1, 0.1, 0.1, 0.9]
n_arms = len(means)
random.shuffle(means)
arms = [  BernoulliArm(mu) for mu in means ]

f = open( 'D:/BanditsSimulationDataSets/ucb1_results.tsv', 'w')

algo = UCB1 ( )
algo.initialize(n_arms)
results = testing_algorithm ( algo, arms, 5000, 250 )
for i in range( len(results[0]) ):
    f.write( '\t'.join( [ str( results[j][i]) for j in range( len(results) ) ] )  + '\n')
f.close()


Importing Jupyter notebook from UpperConfidenceBound.ipynb
Importing Jupyter notebook from DebuggingBanditAlgorithms.ipynb
Importing Jupyter notebook from epsilonGreedy.ipynb
Importing Jupyter notebook from SoftMax.ipynb
