<a href="https://colab.research.google.com/github/vijaygwu/classideas/blob/main/BanditABTest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np

class EpsilonGreedy:
    def __init__(self, n_arms, epsilon=0.1):
        self.n_arms = n_arms
        self.epsilon = epsilon
        self.counts = np.zeros(n_arms)
        self.values = np.zeros(n_arms)

    def select_arm(self):
        if np.random.random() < self.epsilon:
            return np.random.randint(self.n_arms)
        else:
            return np.argmax(self.values)

    def update(self, chosen_arm, reward):
        self.counts[chosen_arm] += 1
        n = self.counts[chosen_arm]
        value = self.values[chosen_arm]
        self.values[chosen_arm] = ((n - 1) / n) * value + (1 / n) * reward

# Simulated A/B test using Epsilon-Greedy
n_arms = 2  # Two designs: A and B
bandit = EpsilonGreedy(n_arms, epsilon=0.1)

n_users = 10000

# Simulated conversion rates for A and B
conversion_rates = [0.05, 0.06]

for _ in range(n_users):
    chosen_arm = bandit.select_arm()
    reward = np.random.choice([0, 1], p=[1-conversion_rates[chosen_arm], conversion_rates[chosen_arm]])
    bandit.update(chosen_arm, reward)

print(f"Estimated values (conversion rates) for A and B: {bandit.values}")
print(f"Number of times A and B were shown: {bandit.counts}")


Estimated values (conversion rates) for A and B: [0.0498633  0.05993963]
Number of times A and B were shown: [7681. 2319.]


Using a bandit algorithm for A/B testing is a great way to dynamically allocate traffic to different variants in a way that continually optimizes for the best-performing variant. This is in contrast to traditional A/B testing, where traffic is evenly split between variants for the duration of the test.

For simplicity, I'll use the Epsilon-Greedy algorithm as our bandit approach for this A/B test example. Assume you have two web page designs, A and B, and you want to determine which one results in more user conversions (e.g., sign-ups or purchases).



Let's set it up:

Setup the experiment:

Design A is variant 0.
Design B is variant 1.
Our reward will be 1 if a user converts and 0 otherwise.

Simulate user interactions:

For the sake of this example, we'll simulate user interactions. In a real-world scenario, you would integrate the bandit algorithm with your web server to serve designs and observe user interactions.

In this example, we simulated 10,000 users interacting with our website. Design A has a conversion rate of 5%, while Design B has a conversion rate of 6%. The bandit algorithm, as it gains more information, will increasingly favor showing Design B to users, since it converts better.

In practice, rather than simulating users, you'd integrate this bandit setup into your web server's logic. When a user visits the site, select_arm would determine which design to show, and user interactions would provide the "reward" to the update metho