**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [2]:
# Definitions for this lab...

# PMF - (Probability Mass Function) - Discrete distribution function returning the probability
# that a random variables equals a SPECIFIC value

# CDF - (Cumulative Distribution Function) - Probability that a random variable is less than or equal
# to a given value
    # Discrete - As a step function (plt.step). It increases at each posible value of the rando variable
    # Contrinuous - It's the integral of the PDF from negative infinity to the given value

# PDF - (Probability Density Function) - Continuous distribution function returning the relative
# likelihood of a random variable taking on a given value.

# "Fun with Loot Boxes" Lab

> Author: Caroline Schmitt, Matt Brems

### Scenario:

You're an analyst for [Zynga](https://en.wikipedia.org/wiki/Zynga), a gaming studio working on an event for an MMO (massively multiplayer online) game. This event is going to include **loot boxes**.

<img src="https://vignette.wikia.nocookie.net/2007scape/images/0/06/Culinaromancer%27s_chest.png/revision/latest?cb=20180403231423" alt="drawing" width="150"/> 

A loot box is basically a treasure chest in a game. This loot box can be opened to reveal a variety of items: some items are very rare and valuable, other items are common and less valuable. (You may consult [the esteemed Wikipedia](https://en.wikipedia.org/wiki/Loot_box) for a more extensive definition.)

In our specific game, suppose that loot boxes can be obtained in one of two ways: 
- After every three hours of playing the game, a user will earn one loot box.
- If the user wishes to purchase a loot box, they may pay $1 (in real money!) for a loot box.

These loot boxes are very good for our business!
- If a player earns a loot box, it means they are spending lots of time on the game. This often leads to advertisement revenue, they may tell their friends to join the game, etc.
- If the player purchases a loot box, it means we've earned $1 from our customer.

Suppose each loot box is opened to reveal either:
- magical elixir (super rare, very valuable), or
- nothing.

Whether each loot box contains the elixir or nothing is **random**. Our boss wants some guidance on what sort of randomness to use on these loot boxes! 
- If the magical elixir is too rare, then users may not be motivated to try to get them, because they believe they'll never find the magical elixir.
- If the magical elixir is too common, then users may not be motivated to try to get them, because the game has so much of the magical elixir that it isn't worthwhile to try to get it.

However, our boss isn't a math-y type person! When explaining things to our boss, we need to explain the impact of our choices on the game as concretely as possible.

### Version 1
In our first version of the game, we'll say that loot boxes contain magical elixir 15% of the time and nothing 85% of the time.

#### 1. Our boss asks, "If a user buys 100 loot boxes, how many elixirs will they get?" How would you respond?

In [None]:
# Anser: We can't guarantee how many elixirs one person woulf get. It's possible, though unlikely, that
# a player who is very lucky gets 100 elixirs out of 100 loot boxes. It's also possible, though unlikely,
# that a play how is very unlucky gets none. The expected value is 15 BUT that does not in any way mean
# every player gets 15 out of every 100 loot boxes.

In [4]:
# Expected Value (EV) is the average outcome of an event if repeated many times.
# It's calculated by multiplying each possible outcome by it's probability and summing
# these products together.
# EV helps predict long-term results in situation involving chance.

def calculate_elixir_ev(elixir_probability, num_boxes):
    return elixir_probability * num_boxes

# Game parameters:
elixir_prob = 0.15 # 15 chance of getting an elixir
nothing_prob = 0.85 # 85 chance of not getting an elixir
num_loot_boxes = 100

expected_elixirs = calculate_elixir_ev(elixir_prob, num_loot_boxes)
print(f'Expected number of elixirs from {num_loot_boxes} loot boxes: {expected_elixirs}')

Expected number of elixirs from 100 loot boxes: 15.0


In [1]:
# EV and probability differ in that:

# Probability measures the likelihood of a specific outcome occurring, expressed as a number
# between 0 and 1 or as a percentage.

# EV quantifies the average result over many trials, often expressed in units (items, dollars, baht)
# rather than as a probability.

#### 2. Our boss asks, "How many loot boxes does someone have to purchase in order to definitely get elixir?" How would you respond?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [None]:
# Anser: Someone could theoretically purchase an infinite of loot boxes and never get an elixir.
# We can't guarantee how may loot boxes one would need to purchase until they find an elixir.

# BUT, if each box has a 15% chance of obtaining an elixir, regardless of previous loot box purchases or openings, then, ON AVERAGE, a user would need 
# to open about 7 loot boxes to get an elixir. BUT, again, this is just an average. Some users will need fewer loot boxes to get an elixir
# and sume will need more. The actual number for any given user can vary
# widely due to the RANDOM nature of the process.

#### 3. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" This is a bit more complicated, so let's break it down before answering.

In [None]:
# chance 

#### 3a. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. Why is $X$ a discrete random variable?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [6]:
# Discrete Random Variable - coutable number of outcomse; distinct values

# Coutinuous Random Variable - uncountable number of outcomes

# X if discrete, can take on 1, 2, 3, 4, 5, etc. These are countable.
# We can't 2.5 loot boxes; we can't 3.7 loot boxes

#### 3b. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. What distribution is best suited for $X$? Why?
- Hint: It may help to consider getting the magical elixir a "success" and getting nothing a "failure."
- Two outcomes: elixir (success) or no elixir (failure)

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [7]:
# - Two outcomes: elixir (success) or no elixir (failure)

# Discreate Uniform - each outcome is equally likely
# Burnoulli - number of successes in one trial. But we have 100 trials here...
# Binomial - number of success in 'n' number of trials
# Poission - binomial + time

# Winner: Binomial

#### 3c. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the probability mass function to answer the boss' question.

In [2]:
# PMF - (Probability Mass Function) - Discrete distribution function returning the probability
# that a random variables equals a SPECIFIC value

# Summation
# For loop - Why? range of values

In [5]:
# Best
from scipy import stats
p = 0.15 # Probability of success (getting elixir)
n = 100 # Number of trials (loot boxes)

X = stats.binom(n, p) # X is the binomial distribution

P = 1 # Counter - set the intial P
for x in range(20 + 1):
    P = P - X.pmf(x)
# Running a for loop to subtract probability of gettig x elixirs where x is in 0,1,2,..., 20

print(P)

0.06631976581888208


In [6]:
# "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?"
def calculate_elixir_probability():
    return sum(stats.binom.pmf(x, n=100, p=0.15) for x in range(21, 101))

result = calculate_elixir_probability()
print(f"The probability of getting more than 20 elixirs from 100 loot boxes is {result:.4f}")

The probability of getting more than 20 elixirs from 100 loot boxes is 0.0663


In [18]:
# Used the cdf
# CDF - (Cumulative Distribution Function) - Probability that a random variable is less than or equal
# to a given value

print(f"Probability of getting more than 20 elixirs: {1 - stats.binom.cdf(20, 100, 0.15):.4f}")
# or...
print(f"Probability of getting more than 20 elixirs: {(1 - stats.binom.cdf(20, 100, 0.15))*100:.2f}%")

Probability of getting more than 20 elixirs: 0.0663
Probability of getting more than 20 elixirs: 6.63%


#### 3d. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the cumulative distribution function to answer the boss' question.

In [7]:
# Used the cdf
# CDF - (Cumulative Distribution Function) - Probability that a random variable is less than or equal 
# to a given value

     # Discrete - As a step function (plt.step). It increases at each possible value of the random variable

print(f"Probability of getting more than 20 elixirs: {1 - stats.binom.cdf(20, 100, 0.15):.4f}")
# or...
print(f"Probability of getting more than 20 elixirs: {(1 - stats.binom.cdf(20, 100, 0.15))*100:.2f}%")

Probability of getting more than 20 elixirs: 0.0663
Probability of getting more than 20 elixirs: 6.63%


#### 3e. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Answer your boss' question. *Remember that your boss is not a math-y person!*

In [None]:
# If user were to earn or buy 100 loot boxes, there's about a ~6.6% chance that they get
# more then 20 elixirs

# Suppose there are 25,000 active users currently. If all 25,000 users earned 100 loot boxes,
# then we expect about 1,650 of those users to get more 20 elixirs.

In [20]:
# 25_000 * 0.066

1650.0

In [7]:
#  Probability of getting more than 20 elixirs: 0.0008
#  Anser : the probability of getting more than 20 elixirs is around 0.02 or 2%.

The probability of getting more than 20 elixirs is around 0.02 or 2%.

The chance of a user getting more than 20 elixirs depends on how likely they are to get an elixir each time. For example, if there's a 10% chance of getting an elixir in each attempt and the user tries 100 times, the probability of getting more than 20 elixirs is around 0.02 or 2%.

#### 4. Your boss wants to know how many people purchased how many loot boxes last month. 
> For example, last month, 70% of users did not purchase any loot boxes. 10% of people purchased one loot box. 5% of people purchased two loot boxes... and so on.

#### 4a. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $Y$ counts up how many loot boxes each person purchased through the game last month. What distribution is best suited for $Y$? Why?

In [None]:
# Poisson distribution - models the number of successes we observe in a fixed amount of time,
# not a fixed amount of trials

# The Poisson distribution is often used to model count data, especially when the events are
# Relatively rare and con occur any number times within the given interval (last month).
# It's flexible enough to handle the varying probabilities we see in the data, unlike the discrete
# uniform or binomial distribution.

In [8]:
# Show your work; leave your answer in a comment.

revenue = 500_000
avg_purchase = 2.7

# revenue / avg_purchase = number of users
revenue / avg_purchase

185185.18518518517

In [9]:
# // is Floor division
result = int(revenue // avg_purchase + 1)
print(f"Number of users needed: {result:,}")

Number of users needed: 185,186


In [10]:
import math
math.ceil(revenue / avg_purchase)

185186

In [None]:

# (Round your answer up to the nearest thousand.)
# Rounding to 186,000

# round will round down effectively BUT how shall we round UP?
#round((revenue / avg_purchase), -3)

In [11]:
# Orr
math.ceil(int(revenue/avg_purchase)/1000)*1000  # 186000

186000

In [12]:
math.floor((revenue/avg_purchase + 999) / 1000) * 1000

186000

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

#### 4b. Suppose that, on average, your customers purchased 2.7 loot boxes last month. In order for your revenue to be at least $500,000, at least how many users would you need on your platform? (Round your answer up to the nearest thousand.) 

In [21]:
# Show your work; leave your answer in a comment.

revenue = 500_000
avg_purchase = 2.7

# revenue / avg_purchase = number of users
revenue / avg_purchase

185185.18518518517

In [24]:
# // is Floor division
result = int(revenue // avg_purchase + 1)
print(f"Number of users needed: {result:,}")

Number of users needed: 185,186


In [25]:
import math
math.ceil(revenue / avg_purchase)

185186

In [30]:
# (Round your answer up to the nearest thousand.)
# Rounding to 186,000

# round will round down effectively BUT how shall we round UP?
round((revenue / avg_purchase), -3)

185000.0

#### 4c. Assume that your platform has the numer of users you mentioned in your last answer. Suppose that your platform calls anyone who purchases 5 or more loot boxes in a month a "high value user." How much money do you expect to have earned from "high value users?" How about "low value users?"

In [36]:
# Set out total purchase amount to be 0

amount = 0

# Check values from 0 to 4
for purchases in range(5):
        # How many users purchase y loot boxes?
        print(f"There are {186_000 * stats.poisson(mu= 2.7).pmf(purchases)} users expected to purchase {purchases} loot boxes.")
        # How much money would we make from those people? (dollar amount * number of individual)
        print(f"\n We are expected to make ${purchases * round(186_000 * stats.poisson(mu = 2.7).pmf(purchases), 2)} from those expected to purchase {purchases} loot boxes")
        # Add in the above quntity to amount
        amount += purchases * round(186_000 * stats.poisson(mu = 2.7).pmf(purchases), 2)

print(f"\nWe Expected to make ${round(186_000 * 2.7):,} from 'from all users.'")
# How much we expect to make from people buying 4 or fewer loot boxes (low value users)
print(f"\nWe Expected to make ${round(amount):,} from 'low value users.'")
# How much we expect to make from people buying at least 5 loot boxes (high value users)
print(f"\nWe Expected to make ${round(186_000 * 2.7) - round(amount):,} from 'high value users.'")

There are 12500.225369593454 users expected to purchase 0 loot boxes.

 We are expected to make $0.0 from those expected to purchase 0 loot boxes
There are 33750.60849790233 users expected to purchase 1 loot boxes.

 We are expected to make $33750.61 from those expected to purchase 1 loot boxes
There are 45563.32147216814 users expected to purchase 2 loot boxes.

 We are expected to make $91126.64 from those expected to purchase 2 loot boxes
There are 41006.98932495134 users expected to purchase 3 loot boxes.

 We are expected to make $123020.97 from those expected to purchase 3 loot boxes
There are 27679.71779434215 users expected to purchase 4 loot boxes.

 We are expected to make $110718.88 from those expected to purchase 4 loot boxes

We Expected to make $502,200 from 'from all users.'

We Expected to make $358,617 from 'low value users.'

We Expected to make $143,583 from 'high value users.'


In [13]:
# Panda Key and Panda Handler Air

# Set our total purchase amount to be 0
total_purchase = 0

# Check values from 0-4
for x in range(4+1):
    
    # How many users purchase y loot boxes? (rate = 2.7 loot boxes / month)
    amount_distribution = stats.poisson(2.7)
    prob = amount_distribution.pmf(x)
    
    # How much money would we make from those people? (dollar amount * number of individual)
    y = x*prob*result
    
    # Add in the above quantity to total purchase
    total_purchase += y

# How much we expect to make from people buying 4 or fewer loot boxes (low value users)
print(f"${total_purchase:,.0f}")

# How much we expect to make from people buying at least 5 loot boxes (high value users)
print(f"${500_000 - total_purchase:,.0f}")

$357,048
$142,952


In [14]:
# Jay, Kel, and Cha-aim (The triplets)
import numpy as np

roundup = 186_000
# Use poisson, when events occur independently and an event occurring in an interval.
outcomes = np.random.poisson(avg_purchase,int(roundup))

high_value_users = [item for item in outcomes if item>=5] # Users who purchase >=5 items
total_high_value_users =sum(high_value_users) # Get expected money earned from high value users

low_value_users = sum(outcomes) - total_high_value_users 
total_low_value_users = low_value_users * 1

print(f'Money expect to earn from high value users is ${total_high_value_users:,}')
print(f'Money expect to earn from low value users is ${total_low_value_users:,}')

Money expect to earn from high value users is $143,689
Money expect to earn from low value users is $358,392


In [15]:

#Fan via discussion...
no_customer = 186_000
ro = 2.7 # rate of occur
customers_distribution = stats.poisson(ro)

low_value_users = round(sum(customers_distribution.pmf(k) for k in range(1, 5))*no_customer)
high_value_users = no_customer - low_value_users

# Calculate expected purchases for low and high value users
expected_purchases_low = sum(k * customers_distribution.pmf(k) for k in range(1, 5)) / sum(customers_distribution.pmf(k) for k in range(1, 5))
expected_purchases_high = (ro * no_customer - expected_purchases_low * low_value_users) / high_value_users
#print(expected_purchases_low)
#print(expected_purchases_high)

low_value_revenue = round(low_value_users * expected_purchases_low)
high_value_revenue = round(high_value_users * expected_purchases_high)
#print(low_value_revenue)
#print(high_value_revenue)

print(f'Number of low value users: {low_value_users:,}')
print(f'Number of high value users: {high_value_users:,}')
print(f'Expected revenue from low value users: ${low_value_revenue:,}')
print(f'Expected revenue from high value users: ${high_value_revenue:,}')
print(f'Total expected revenue: ${low_value_revenue + high_value_revenue:,}')

# print(f'Least number of money that expect from high value users : {high_value_users * 5}')
# #print(f'Most Number of money that expect from low value users : {low_value_users * 4}')

Number of low value users: 148,001
Number of high value users: 37,999
Expected revenue from low value users: $358,618
Expected revenue from high value users: $143,582
Total expected revenue: $502,200


#### 4d. Suppose that you want to summarize how many people purchased how many loot boxes last month for your boss. Since your boss isn't math-y, what are 2-4 summary numbers you might use to summarize this for your boss? (Your answers will vary here - use your judgment!)

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [None]:
# Non-code answer
# Expected revenue total / Expected revenue by user type
# Count of user types
# Average number of loot boxes purchased by users
# Total number of loot boxes purchased by users
# Ratio of low value users to high value users

#### 5. Your boss asks "How many loot boxes does it take before someone gets their first elixir?" Using `np.random.choice`, simulate how many loot boxes it takes somone to get their first elixir. 
- Start an empty list.
- Use control flow to have someone open loot boxes repeatedly.
- Once they open a loot box containing an elixir, record the number of loot boxes it took in the empty list.
- Repeat this process 100,000 times. 

This simulates how long it takes for someone to open a loot box containing elixir. Share the 5th, 25th, 50th, 75th, and 95th percentiles.

> You may find [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html)  and [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html) helpful.

In [16]:
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Create empty list to collecht how long until each elixir is found
times_until_elixir = []

# Possible outcomes from opening a loot box
loot_box = ['elixir', 'nothing']

# loop through 100_000 loot box openings (experiment)
for i in range(100_000):
    turns = 0

    while True:
        opened_loot_box = np.random.choice(loot_box, p = [0.15, 0.85])

        turns +=1

        # If we find an elixir, add turns to the empty list called 'times_until_elixir'
        if opened_loot_box == 'elixir':
            times_until_elixir.append(turns)
            break

In [17]:
# Share the 5th, 25th, 50th, 75th, and 95th percentiles.
print(f"The 5th percentile of number of turns to find an elixir is {int(np.percentile(times_until_elixir, 5))}.")
print(f"The 25th percentile of number of turns to find an elixir is {int(np.percentile(times_until_elixir, 25))}.")
print(f"The 50th percentile of number of turns to find an elixir is {int(np.percentile(times_until_elixir, 50))}.")
print(f"The 75th percentile of number of turns to find an elixir is {int(np.percentile(times_until_elixir, 75))}.")
print(f"The 95th percentile of number of turns to find an elixir is {int(np.percentile(times_until_elixir, 95))}.")

The 5th percentile of number of turns to find an elixir is 1.
The 25th percentile of number of turns to find an elixir is 2.
The 50th percentile of number of turns to find an elixir is 5.
The 75th percentile of number of turns to find an elixir is 9.
The 95th percentile of number of turns to find an elixir is 19.


**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

### Version 2

After a substantial update to the game, suppose every loot box can be opened to reveal *one of four different* items:
- magical elixir (occurs 1% of the time, most valuable)
- golden pendant (occurs 9% of the time, valuable)
- steel armor (occurs 30% of the time, semi-valuable)
- bronze coin (occurs 60% of the time, least valuable)

#### 6. Suppose you want repeat problem 5 above, but do that for the version 2 loot boxes so you can track how many loot boxes are needed to get each item? (e.g. You'd like to be able to say that on average it takes 10 trials to get a golden pendant, 3 trials to get steel armor, and so on.) What Python datatype is the best way to store this data? Why?

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [18]:
# Dictionary - allows us to store key-value pairs
# Each key could be the name of the new items (magical elixir, bronze coin, etc.)
# Each value could be the simulated list of how many turns are needed to find each particular item

# Each key could be the name of the new items (magical elixir, bronze coin, etc.)
# Each value being the probability might be an option

#### 7. Suppose you and your boss want to measure whether "Version 2" is better than "Version 1." What metrics do you think are important to measure? (Your answers will vary here - use your judgment!)

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

In [40]:

# Panda Key and Panda Wrangler Air...

# 1. Total revenue
# 2. Purchase(dollar,time) per user in a month
# 3. Time spent on playing the game
# 4. Number of game invites (Accept and Download the game / Ignore / Decline)
# 5. Revenue generated from new lead

In [19]:
# The Triplets...

# Revenue: Track overall revenue generated by users in V2 vs. V1 from loot box purchasing.
# Conversion rate : As we want to ensure there is more interest of purchasing the loot boxes.
# Purchasing by items : Track items purchased in each items and identify whether the new items can help boost up the chance that user buys.
# ARPU (Average Revenue Per User) : Monitor after adding the new items can help in turning low user values to ‘high value user’.
# Screen time : This help to identify whether the users enjoy playing a game longer as they got new items added.

In [20]:
# Orr and Wee
# * Total revenue
# * Total purchase loot box number
# * Total revenue of loot box per user
# * Purchase number of loot box per user
# * Repeat purchase rate per user

In [21]:
# Fan and Gun
# 1. Total Revenue from Users  -> V.1 there have 2 outcomes for users but V.2 there have more 
# than outcomes and make more persue users to buys the
# 2. Total Boxes Purchased -> Making  more variety of play in games.
# 3. Total Number of Users -> This make the retention rate for long term.
# 4  Distribution of Purchases -> The distribution of purchases is expected to increase as a result 
# of the enhanced user experience and expanded options.

In [22]:
# Options...
# We will likely want to measure the daily revenue for version 2 to compare to version 1. (Ideally, daily revenue in version 2 is higher!)
# We may want to look at the number of users, or number of users active every day (or every week). 
# We may want to see how many loot boxes are earned in a given day.
# We may want to compare the proportion of loot boxes that are purchased (instead of earned) in versions 1 and 2.
# We may want to look at the rate of growth of revenue or number of users.
# We may want to look at the average length of time a user spends on the game in versions 1 and 2.