# Mobile App for Lottery Addition

**Background:** In this hypothetical scenario, a medical institute is building a mobile app to help lottery addicts understand their extremely low chances of winning.  The goal is for addicted gamblers and *potential* addicted gamblers to use the app in hopes of treating and preventing their toxic tendencies.

**Objective:** Our task is to develop the logical core of the app and calculate probabilities.  The first version's focus is around the following questions:
* What is the probability/chance of winning the big prize with a single ticket?
* What is the probability/chance of winning the big prize if 40 different tickets are played?  How about any other number of tickets?
* What is the probability/chance of having at least five, four, three, or two winning numbers on a single ticket?

**Data:** We will be utilizing fictional historical data from the national 6/49 lottery game in Canada.  The [dataset](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) has 3,665 drawings from the years 1982-2018.

**Lottery:** In the 6/49 lottery, 6 numbers are drawn *without replacement* from a set of 49 numbers (ranging from 1 to 49).  To answer probability questions related to playing the lottery, we'll need to repeatedly calculate `factorials` and `combinations`.  They are defined as functions in the cell below.

In [1]:
# Define a function that calculates the factorial of positive integer n
def factorial(n):
    answer = 1
    for i in range(n, 0, -1):
        answer *= i
    return answer

# Define a function that calculations the number of possible combinations of k objects from a group of n objects
def combinations(n, k):
    return factorial(n) / (factorial(k) * factorial(n - k))

## Chance of winning with a single lottery ticket

Using the above `factorial` and `combinations` functions, we define a new function `one_ticket_probability` that calculates and displays the answer to our first question:

    What is the probability/chance of winning the big prize with a single ticket?

In [2]:
# Define a function that prints the probability/chance of winning the lottery with one ticket
def one_ticket_probability(list):
    total_outcomes = combinations(49, 6)
    probability_pct = 1 / total_outcomes
    print("The chance of you winning big with your one lottery ticket that has the numbers {} is {:.7%}.  This translates to a 1 in {:,} chance of winning.".format(list, probability_pct, int(total_outcomes)))

In [3]:
# Test the `one_ticket_probability` function
one_ticket_probability([1,2,3,4,5,6])

The chance of you winning big with your one lottery ticket that has the numbers [1, 2, 3, 4, 5, 6] is 0.0000072%.  This translates to a 1 in 13,983,816 chance of winning.


## Read, inspect, and slightly modify the dataset

Since we'll be using historical data to calculate probabilities that will inform the app users, we import the csv file and take a quick look in the following cells. We also create a new column containing all winning numbers of each drawing, rather than having each number in six separate columns of their own.

In [4]:
# Import pandas library, read csv dataset, and view number of rows and columns
import pandas as pd

data = pd.read_csv("649.csv")
data.shape

(3665, 11)

In [5]:
# Print first three rows of dataset
data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [6]:
# Print last three rows of dataset
data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [7]:
# Define a function with all winning lottery numbers, as a set, for each given row of data
def extract_numbers(row):
    return {row["NUMBER DRAWN 1"], row["NUMBER DRAWN 2"], row["NUMBER DRAWN 3"]
           ,row["NUMBER DRAWN 4"], row["NUMBER DRAWN 5"], row["NUMBER DRAWN 6"]}

# Create a new column in our data with the above function applied
data["winning_nums"] = data.apply(extract_numbers, axis = 1)

Note that the winning numbers are place in a set, rather than a list, in the function defined above.  This allows for easy comparison of the entire set as a whole (against another set), as opposed to comparing individual elements of a list one-by-one.  It will come in handy shortly when we want to compare any set of lottery numbers to the historical data.

In [8]:
# Check the newly created column of winning numbers
data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER,winning_nums
0,649,1,0,6/12/1982,3,11,12,14,41,43,13,"{3, 41, 11, 12, 43, 14}"
1,649,2,0,6/19/1982,8,33,36,37,39,41,9,"{33, 36, 37, 39, 8, 41}"
2,649,3,0,6/26/1982,1,6,23,24,27,39,34,"{1, 6, 39, 23, 24, 27}"


## Check a given lottery ticket against historical drawings

Now we'll build the functionality allowing users to compare their lottery ticket numbers to previous drawings, and see whether or not they ever would have won (chances are they wouldn't have).  Also included is the chance of winning the next drawing with a single lottery ticket (this uses logic earlier in the project).

In [9]:
# Define a function that counts the number of times a list/set of lottery numbers won in the past
def check_historical_occurrence(list, series):
    user_nums = set(list)
    occurrence = 0
    
    for num_set in series:
        if user_nums == num_set:
            occurrence += 1
        else: continue
    print("The number of times your lottery ticket numbers {} occurred in the past is {}.  Your chances of winning the big prize in the next drawing is 0.0000072%, or 1 in 13,983,816.".format(list, occurrence))

In [10]:
# Test above created function
check_historical_occurrence([32,34,22,24,31,6], data["winning_nums"])

The number of times your lottery ticket numbers [32, 34, 22, 24, 31, 6] occurred in the past is 1.  Your chances of winning the big prize in the next drawing is 0.0000072%, or 1 in 13,983,816.


In [11]:
# Test above created function
check_historical_occurrence([3, 2, 44, 22, 1, 44], data["winning_nums"])

The number of times your lottery ticket numbers [3, 2, 44, 22, 1, 44] occurred in the past is 0.  Your chances of winning the big prize in the next drawing is 0.0000072%, or 1 in 13,983,816.


## Chances of winning with more than one lottery ticket

Lottery ticket addicts often play more than one ticket (per drawing), thinking that it increases their winning chances significantly.  Although this does increase their chances of winning, it is nowhere near significant, which is what we want to demonstrate in this next section.  The functionality below will allow users to input any number of tickets and their chances of winning will be output.

In [12]:
# Define a function that outputs the chance of winning given a number of lottery tickets purchased
def multi_ticket_probability(num_tickets):
    total_outcomes = combinations(49, 6)
    probability_pct = num_tickets / total_outcomes
    combinations_simplified = round(total_outcomes / num_tickets)
    print("The chance of you winning big with {:,} lottery ticket numbers is {:.7%}.  This translates to a 1 in {:,} chance of winning.".format(num_tickets, probability_pct, combinations_simplified))

In [13]:
# Test the above created function on different numbers of lottery tickets
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

The chance of you winning big with 1 lottery ticket numbers is 0.0000072%.  This translates to a 1 in 13,983,816 chance of winning.
------------------------
The chance of you winning big with 10 lottery ticket numbers is 0.0000715%.  This translates to a 1 in 1,398,382 chance of winning.
------------------------
The chance of you winning big with 100 lottery ticket numbers is 0.0007151%.  This translates to a 1 in 139,838 chance of winning.
------------------------
The chance of you winning big with 10,000 lottery ticket numbers is 0.0715112%.  This translates to a 1 in 1,398 chance of winning.
------------------------
The chance of you winning big with 1,000,000 lottery ticket numbers is 7.1511238%.  This translates to a 1 in 14 chance of winning.
------------------------
The chance of you winning big with 6,991,908 lottery ticket numbers is 50.0000000%.  This translates to a 1 in 2 chance of winning.
------------------------
The chance of you winning big with 13,983,816 lottery ticke

## Matching a fewer number of lottery ticket numbers

Up until now, we've defined a "winning" lottery ticket as one in which *all* six numbers match those drawn from the lottery.  However, there are smaller prizes for lottery ticket holders that match less than six numbers, but greater than 1.  For example, there's a small prize for a lottery ticket where two out of six numbers match.  Same for three, four, and five matches.  

The last piece of functionality we'll build below calculates the chances of matching 2 to 5 out of the 6 numbers drawn in the lottery.  This should illustrate to the user how the probability of winning big prizes decreases dramatically as the number of matches required increases.  They will see that even matching three out of six numbers theoretically occurs at a rate of less than 2%.

In [14]:
# Define a function that takes a number of matches we want and output its probability/chance.
def probability_less_6(n):
    ticket_combos = combinations(6, n)
    remaining_combos = combinations(43, 6 - n)
    successful_outcomes = ticket_combos * remaining_combos
    
    total_combos = combinations(49, 6)
    probability_pct = successful_outcomes / total_combos * 100
    
    combos_simplified = round(total_combos / successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.7}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n, probability_pct,
                                                               int(combos_simplified)))

In [15]:
# Test the above created function on different possible inputs
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Your chances of having 2 winning numbers with this ticket are 13.2378%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.76504%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.09686197%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.00184499%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------
