# Mobile App for Lottery Addiction

### Introduction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49](https://en.wikipedia.org/wiki/Lotto_6/49) lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

The scenario we're following throughout this project is fictional — the main purpose is to practice applying the concepts we learned in a setting that simulates a real-world scenario.

### Core Functions

<b>Our goal is to write code that can enable users to answer probability questions about playing the lottery.</b> Throughout the project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

- A function that calculates factorials; and
- A function that calculates combinations.

To calculate factorials, this is the formula we learned we need to use:

\begin{equation}
n! = n \times (n - 1) \times (n - 2) \times ... \times 2 \times 1
\end{equation}

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use the formula:

\begin{equation}
_nC_k = {n \choose k} =  \frac{n!}{k!(n-k)!}
\end{equation}

In [1]:
# write a fuction of factorial()
def factorial(n):
    final_product = 1
    for i in range(n,0,-1):
        final_product *= i
    return final_product

# write a function of combinations()
def combinations(n,k):
    return factorial(n)/(factorial(n-k)*factorial(k))


### One-ticket Probability

In [2]:
def one_ticket_probability(user_number):
    """
    user_number must be 6 numbers typed in list format
    """
    n_combinations = combinations(49,6)
    n_successful = 1
    p_successful = (n_successful / n_combinations) * 100
    print("The winning oppurtunity of {} is {:.7f}%".format(user_number, p_successful))

In [3]:
test_1 = one_ticket_probability([1,2,3,4,5,6])
test_1

The winning oppurtunity of [1, 2, 3, 4, 5, 6] is 0.0000072%


In [4]:
test_2 = one_ticket_probability([34,49,17,11,8,24])

The winning oppurtunity of [34, 49, 17, 11, 8, 24] is 0.0000072%


### Historical Data Check for Canada Lottery

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- NUMBER DRAWN 1
- NUMBER DRAWN 2
- NUMBER DRAWN 3
- NUMBER DRAWN 4
- NUMBER DRAWN 5
- NUMBER DRAWN 6

In [5]:
# import library we need
import pandas as pd

# read in 649.csv into dataframe
lottery_canada = pd.read_csv("649.csv")
# check the rows and columns number
print(lottery_canada.shape)
# overview the first 3 rows
lottery_canada.head(3)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [6]:
# overview the last 3 rows
lottery_canada.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [16]:
lottery_canada.iloc[0][4:10]

NUMBER DRAWN 1     3
NUMBER DRAWN 2    11
NUMBER DRAWN 3    12
NUMBER DRAWN 4    14
NUMBER DRAWN 5    41
NUMBER DRAWN 6    43
Name: 0, dtype: object

### Functions for Historical Data Check

The engineering team told us that we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
  - the number of times the combination selected occurred in the Canada data set; and
  - the probability of winning the big prize in the next drawing with that combination.

We'll now start working on writing this function. Note there's more than one way to solve this problem, so take the instructions below as suggestions.

In [18]:
def extract_numbers(row):
    """
    row is a dataset for lottery_canada
    columns no.4-9 refers to drawn number 1-6
    return numbers in set type
    """
    numbers = row[4:10]
    numbers = set(numbers.values)
    return numbers

# extract drawn numbers from dataset and assign to variable 'winning_numbers'
winning_numbers = lottery_canada.apply(extract_numbers,axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [21]:
def check_historical_occurence(user_number, historical_numbers):
    """
    user_numbers : a Python list
    historical_numbers: a pandas Series
    """
    user_number_set = set(user_number) # transfer list into set type
    occurrence = (user_number_set == historical_numbers) # return True & False
    n_occurrence = occurrence.sum() # sum True & False to see occurrence number
    
    if n_occurrence == 0:
        print("""The combination {} is never occurred in historical numbers. This doesn't mean your combinations will or will not happen next time.
        Your oppurtunity to win a big prize in the next drawing using {} is 0.0000072%.
        """.format(user_number,user_number))
    else:
        print("""The combination {} happened {} time(s) in historical numbers. This doesn't mean your combination will or will not happen next time.
        Your oppurtunity to win a big prize in the next drawing using {} is 0.0000072%.
        """.format(user_number,n_occurrence,user_number))
    

In [22]:
# check the function
test_3 = check_historical_occurence([3,9,10,43,13,20],winning_numbers)
test_3

The combination [3, 9, 10, 43, 13, 20] happened 1 time(s) in historical numbers. This doesn't mean your combination will or will not happen next time.
        Your oppurtunity to win a big prize in the next drawing using [3, 9, 10, 43, 13, 20] is 0.0000072%.
        


We build a function for users to check if their combinations had occurred in history. 
- First, we convert user_numbers format from list to set.
- Second, check if user_number_ser exists in historical_numbers and it returns a boolean value representing occurred or not.
- Third, count the number of occurrence. 0 means never occured while other results means had occurred.

### Multi-Ticket Probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

We want to write a function for uses to calculate probabilities of winning a big prize if they buy multiple lottery tickets.

- First, let users to key in number of tickets they bought.
- Second, calculate the probabilities by dividing n by the number of combinations.
- Third, print out the result in a personalized way.

In [25]:
def multi_ticket_probability(n):
    """
    calculate the winning probability with buying n lottery tickets
    n must be positive integer
    """
    n_possibilities = combinations(49,6)
    probability = n / n_possibilities * 100
    
    if n == 1:
        print("""The chance to win a big prize of one ticket is {:7f}%.
In other word, you have 1 of {} chances to win.
        """.format(probability, n_possibilities))
    else:
        print("""The chance to win a big prize of {} tickets is {:7f}%.
In other word, you have 1 of {} chances to win.
        """.format(n,probability,n_possibilities))

In [27]:
# input a list of numbers to check the function
input_list = [1,10,55,10000,600000,6991908,13983816]

for i in input_list:
    multi_ticket_probability(i)
    print("-----------------------------") # output delimeter

The chance to win a big prize of one ticket is 0.000007%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 10 tickets is 0.000072%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 55 tickets is 0.000393%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 10000 tickets is 0.071511%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 600000 tickets is 4.290674%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 6991908 tickets is 50.000000%.
In other word, you have 1 of 13983816.0 chances to win.
        
-----------------------------
The chance to win a big prize of 13983816 tickets is 100.000000%.
In 

### Less Winning Numbers - Function
On this screen, we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
  - six different numbers from 1 to 49; and
  - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

Calculation example is as below:
\begin{equation}
_6C_5 = {6 \choose 5} =  \frac{6!}{5!(6-5)!} =  6
\end{equation}

In [28]:
def probability_less_6(n):
    """
    n must be positive integers from 1 to 6
    this function will calculate the winning posibility of n numbers
    """
    n_combination_ticket = combinations(6,n)
    n_combination_remaining = combinations(43,6-n)
    n_possibilities = n_combination_ticket * n_combination_remaining
    
    total_possible_outcomes = combinations(49,6)
    
    # divide n_possibilities by total_possibilities
    probability = n_possibilities / total_possible_outcomes
    # multiply 100 for better understanding
    probability_percentage = probability*100
    
    combinations_simplified = total_possible_outcomes / n_possibilities
    
    
    print("""Your chances of having {} winning numbers with this ticket are {:.7f}%.
In other words, you have a 1 in {:,} chances to win.""".format(n,probability,round(combinations_simplified)))
    

In [29]:
for i in [2,3,4,5]:
    probability_less_6(i)
    print("---------------------") #output delimeter

Your chances of having 2 winning numbers with this ticket are 0.1323780%.
In other words, you have a 1 in 8 chances to win.
---------------------
Your chances of having 3 winning numbers with this ticket are 0.0176504%.
In other words, you have a 1 in 57 chances to win.
---------------------
Your chances of having 4 winning numbers with this ticket are 0.0009686%.
In other words, you have a 1 in 1,032 chances to win.
---------------------
Your chances of having 5 winning numbers with this ticket are 0.0000184%.
In other words, you have a 1 in 54,201 chances to win.
---------------------


### Making the output easier to understand by funny analogies

We'd like to compare winning chances with bitten by shark, the chance of bitten by shark is 1 in 11.5 million and chances of killed by shark is less than 1 in 264.1 million.

Sources: [Shark Attack - Wikipedia](https://en.wikipedia.org/wiki/Shark_attack)

In [60]:
# improve function by adding fun analogies

def one_ticket_probability(user_number):
    """
    user_number must be 6 numbers typed in list format
    """
    n_combinations = combinations(49,6)
    n_successful = 1
    p_successful = (n_successful / n_combinations) * 100
    
    shark_bite_possibilities = 1/1150000 * 100
    
    print("The winning oppurtunity of {} is {:.7f}%".format(user_number, p_successful),
          "You're {:.3f} times more likely to be bitten by shark thsn winning a lottery".format(shark_bite_possibilities/p_successful))

In [40]:
one_ticket_probability([1,2,3,4,5,6])

The winning oppurtunity of [1, 2, 3, 4, 5, 6] is 0.0000072% You're 12.160 times more likely to be bitten by shark


In [76]:
# improve function by adding fun analogies

def multi_ticket_probability(n):
    """
    calculate the winning probability with buying n lottery tickets
    n must be positive integer
    """
    n_possibilities = combinations(49,6)
    probability = n / n_possibilities * 100
    
    shark_bite_possibilities = 1/1150000 * 100
    
    if n < 13:
        print("""The chance to win a big prize of {} ticket is {:7f}%.
In other word, you have 1 of {} chances to win, which means you're {:3f} times more likely to be bitten by a shark than winning a lottery.
        """.format(n,probability, n_possibilities, shark_bite_possibilities/probability))
        
    else:
        print("""The chance to win a big prize of {} tickets is {:7f}%.
In other word, you have 1 of {} chances to win, which means you're {:3f} times more likely to win a lottery than bitten by a shark.
        """.format(n,probability,n_possibilities, probability/shark_bite_possibilities))

In [77]:
# input a list of numbers to check the function
input_list = [1,13,55,10000,600000,6991908,13983816]

for i in input_list:
    multi_ticket_probability(i)
    print("-----------------------------") # output delimeter

The chance to win a big prize of 1 ticket is 0.000007%.
In other word, you have 1 of 13983816.0 chances to win, which means you're 12.159840 times more likely to be bitten by a shark than winning a lottery.
        
-----------------------------
The chance to win a big prize of 13 tickets is 0.000093%.
In other word, you have 1 of 13983816.0 chances to win, which means you're 1.069093 times more likely to win a lottery than bitten by a shark.
        
-----------------------------
The chance to win a big prize of 55 tickets is 0.000393%.
In other word, you have 1 of 13983816.0 chances to win, which means you're 4.523086 times more likely to win a lottery than bitten by a shark.
        
-----------------------------
The chance to win a big prize of 10000 tickets is 0.071511%.
In other word, you have 1 of 13983816.0 chances to win, which means you're 822.379242 times more likely to win a lottery than bitten by a shark.
        
-----------------------------
The chance to win a big prize