# "Fun with Loot Boxes" Lab

> Author: Caroline Schmitt, Matt Brems

### Scenario:

You're an analyst for [Zynga](https://en.wikipedia.org/wiki/Zynga), a gaming studio working on an event for an MMO (massively multiplayer online) game. This event is going to include **loot boxes**.

<img src="https://vignette.wikia.nocookie.net/2007scape/images/0/06/Culinaromancer%27s_chest.png/revision/latest?cb=20180403231423" alt="drawing" width="150"/> 

A loot box is basically a treasure chest in a game. This loot box can be opened to reveal a variety of items: some items are very rare and valuable, other items are common and less valuable. (You may consult [the esteemed Wikipedia](https://en.wikipedia.org/wiki/Loot_box) for a more extensive definition.)

In our specific game, suppose that loot boxes can be obtained in one of two ways: 
- After every three hours of playing the game, a user will earn one loot box.
- If the user wishes to purchase a loot box, they may pay $1 (in real money!) for a loot box.

These loot boxes are very good for our business!
- If a player earns a loot box, it means they are spending lots of time on the game. This often leads to advertisement revenue, they may tell their friends to join the game, etc.
- If the player purchases a loot box, it means we've earned $1 from our customer.

Suppose each loot box is opened to reveal either:
- magical elixir (super rare, very valuable), or
- nothing.

Whether each loot box contains the elixir or nothing is **random**. Our boss wants some guidance on what sort of randomness to use on these loot boxes! 
- If the magical elixir is too rare, then users may not be motivated to try to get them, because they believe they'll never find the magical elixir.
- If the magical elixir is too common, then users may not be motivated to try to get them, because the game has so much of the magical elixir that it isn't worthwhile to try to get it.

However, our boss isn't a math-y type person! When explaining things to our boss, we need to explain the impact of our choices on the game as concretely as possible.

### Version 1
In our first version of the game, we'll say that loot boxes contain magical elixir 15% of the time and nothing 85% of the time.

#### 1. Our boss asks, "If a user buys 100 loot boxes, how many elixirs will they get?" How would you respond?

**NOTE**: 


-In this case. The amount of trials is n = 100. The probability of success, labeled p, is .15 and the probability of failure is .85. Now that these variables are known, we can use a binomial distribution to see the probability of obtaining an exiliers(x, successes) in n number of trials. Size in this case is 1 becasue it is 1 user who is opening up 100 lootboxes. 

-We can run a binomial function in python to gather the value

-we can run np.random.binomial(n=100 , p = .15, size =1)

In [2]:
import numpy as np
import scipy.stats as stats

np.random.binomial(n=100, p = .15, size = 1)


##in this case. one user who opens 100 boxes will get 19 exilirs. 

array([12])

In [3]:
stats.binom.pmf(n = 100, p = .15, k = 1)

##This this means 1 loot box with a exilir 

1.5437071111956072e-06

#### 2. Our boss asks, "How many loot boxes does someone have to purchase in order to definitely get elixir?" How would you respond?

**NOTE**: 

When approaching a stats problem, I like to define my known variables and the variables i am looking for. In this case 1 person is trying to purchase n amount of lootboxes. So the question asks, how many lootboxes does someone have to purchase in order to definitely get elixir? Since we are using a random function, there could be a chance that a user never gets a lootbox




#### 3. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" This is a bit more complicated, so let's break it down before answering.


Lets break down this question, and see if we can arrive to something logical. First, ONE user earns 100 loot boxes. That means that n trials in this case is 100. We want the user to open 100 boxes. Now that we have this information, I want to compute the probability of 20 or more success boxes in 100 lootboxes. We can run a binomial distribtuion with n = 100, p=.15, and k = 20. We can run this function with a cdf to find the cumalitive proabilities past 20.

#### 3a. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. Why is $X$ a discrete random variable?

 The random variable X is a countable number from 0 to 100.

#### 3b. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $X$ counts up how many elixirs I observe out of my 100 loot boxes. What distribution is best suited for $X$? Why?
- Hint: It may help to consider getting the magical elixir a "success" and getting nothing a "failure." 


In this case,if the outcoms are succes  Bernoli distribution makes sense. We can observe the probability of success versus the probability of failure. We can set the sample size to 100 and see if the lootbox is a success or not. This is done by using the probability to predict the outcome of success or failure. 

**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

#### 3c. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the probability mass function to answer the boss' question.

In [4]:
# Show your work; leave your answer in a comment.

stats.binom.pmf(n = 100, p = .15, k =21)

0.027041674394231836

#### 3d. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Use the cumulative distribution function to answer the boss' question.

In [5]:
# Show your work; leave your answer in a comment.

1- stats.binom.cdf(n= 100, p =.15, k=20)

0.06631976581888166

#### 3e. Our boss asks, "If a user earns 100 loot boxes, what is the chance that a user gets more than 20 elixirs?" Answer your boss' question. *Remember that your boss is not a math-y person!*

In [8]:
1 - stats.binom.cdf(n=100, p =.15, k=20)

0.06631976581888166

#### 4. Your boss wants to know how many people purchased how many loot boxes last month. 
> For example, last month, 70% of users did not purchase any loot boxes. 10% of people purchased one loot box. 5% of people purchased two loot boxes... and so on.

#### 4a. Recall our discrete distributions: discrete uniform, Bernoulli, binomial, Poisson. Let's suppose my random variable $Y$ counts up how many loot boxes each person purchased through the game last month. What distribution is best suited for $Y$? Why?

Poisson Distribution is best suited for this. This tests the postive succeses over a time period. 


#### 4b. Suppose that, on average, your customers purchased 2.7 loot boxes last month. In order for your revenue to be at least $500,000, at least how many users would you need on your platform? (Round your answer up to the nearest thousand.) 

In [12]:
customers = 2.7

revenue = 500_000


average_users = revenue / customers

average_users





185185.18518518517

The average users should be 185,185 at $ 1 lootboxes

#### 4c. Assume that your platform has the numer of users you mentioned in your last answer. Suppose that your platform calls anyone who purchases 5 or more loot boxes in a month a "high value user." How much money do you expect to have earned from "high value users?" How about "low value users?"

In [70]:
# Show your work; leave your answer in a comment.

low = stats.poisson(customers)

list_low = []
for i in range(0,5):
    x = low.pmf(i)
    list_low.append(x)
    
sum_low = sum(list_low) * 500_000

format_sum = '${:,.2f}'.format(sum_low)
                        
high_value = 500_000 - sum_low 

format_high = '${:,.2f}'.format(high_value)
    
print("The total amount brought in by our low value customers is " , format_sum)
print("The total amount brought in by our high value customers is", format_high)


    




The total amount brought in by our low value customers is  $431,453.93
The total amount brought in by our high value customers is $68,546.07


#### 4d. Suppose that you want to summarize how many people purchased how many loot boxes last month for your boss. Since your boss isn't math-y, what are 2-4 summary numbers you might use to summarize this for your boss? (Your answers will vary here - use your judgment!)

To summaraize the amount of people that have purchased loot boxes in the last month, I first would need the average loot boxes purchased in the last month per user. After having the average, I will calculate the revenue from lootboxes. Dividing revenue by the average amount of lootboxes precisely gives me the amount of users in the last amount, assuming the dollar price of lootboxes are a dollar. Finding the proability of purchasing 0,1,2,3,4, etc.. loot boxes can help us break down our customer base.

#### 5. Your boss asks "How many loot boxes does it take before someone gets their first elixir?" Using `np.random.choice`, simulate how many loot boxes it takes somone to get their first elixir. 
- Start an empty list.
- Use control flow to have someone open loot boxes repeatedly.
- Once they open a loot box containing an elixir, record the number of loot boxes it took in the empty list.
- Repeat this process 100,000 times. 

This simulates how long it takes for someone to open a loot box containing elixir. Share the 5th, 25th, 50th, 75th, and 95th percentiles.

> You may find [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html)  and [this documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html) helpful.

In [None]:
greeting = ""
while len(greeting) < 10:
    greeting += "HO!"
greeting

In [156]:


loot = []

for i in range(100_000):
    
    count = 0

    while np.random.binomial(n=1,p=.15) == 0:
    
        count += 1
    
    loot.append(count)
    
    


loot 

print(np.percentile(loot,5))
print(np.percentile(loot,25))
print(np.percentile(loot,50))
print(np.percentile(loot,75))
print(np.percentile(loot,95))
    

0.0
1.0
4.0
8.0
18.0


**NOTE**: When your Jupyter notebook is open, double-click on this Markdown cell! You can delete this text and put your answer to the previous problem in here.

### Version 2

After a substantial update to the game, suppose every loot box can be opened to reveal *one of four different* items:
- magical elixir (occurs 1% of the time, most valuable)
- golden pendant (occurs 9% of the time, valuable)
- steel armor (occurs 30% of the time, semi-valuable)
- bronze coin (occurs 60% of the time, least valuable)

#### 6. Suppose you want repeat problem 5 above, but do that for the version 2 loot boxes so you can track how many loot boxes are needed to get each item? (e.g. You'd like to be able to say that on average it takes 10 trials to get a golden pendant, 3 trials to get steel armor, and so on.) What Python datatype is the best way to store this data? Why?

We could write a function that takes in n and probability to see which loot box is revealed with each user open. This function would emmulate number 5 but with the input of proabability of each box. np.random.binomial(n=1, p=p(paramater). Storing this algorithm into a function will allow us to use the function if probabilites of each items change, or use it to call a variety of functions.  

#### 7. Suppose you and your boss want to measure whether "Version 2" is better than "Version 1." What metrics do you think are important to measure? (Your answers will vary here - use your judgment!)

The best way to judge whether version 2 is better than version 1 is to see if average amount of users for version 2 is greater than version 1. The user experience of getting an item might help increase the amount of loot boxes open per user on a monthly basis. I predict that version 2 will have more users because of the higher probability of earning an items. 