# Developing the Logic for a Mobile App to Combat Lottery Addiction

As part of a team working in a medical institute that aims to combat gambling addictions, an idea of developing a dedicated mobile app to help lottery addicts to better estimate their chances of winning was coined. Though there is a team of software engineers that will build the app, they would require us to come out with the logic behind the app in terms of the calculation of the winning probabilities. For the first version of the app, the key focus would be on the [6/49 lottery](https://www.wikiwand.com/en/Lotto_6/49), which is one of three national lottery games in Canada. Thus, the goal of this project is to build functions that enable us to answer questions like:
* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

Historical data dating back to 1982 all the way to 2018 would be considered to coming up with the probability logic, consisting a total of 3,665 drawings. This historical data can be found on [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset]).

# Creating the Core Functions

Throughout this project, there would be a need to calculate probabilities and combinations repeatedly. Since the lottery game draws 6 numbers from a set of 49 numbers without replacement and the order of numbers selected does not matter, we are more concerned about the combinations instead of permutations which account for the order of arrangement. Thus, the core functions of calculating factorials, which form the basis of calculating combinations, and calculating combinations have to be first created.

In [1]:
# Function to calculate factorial of input which is denoted as n
def factorial(n):
    total = 1
    for i in range(n,0,-1):
        total *= i
    return total

# Function to calculate the number of combinations from selecting k objects from a group of n objects
def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(n-k) * factorial(k)
    return numerator/denominator

# Function to Calculate One-ticket Win Probability

First, a function to calculate the probability of winning the big prize has to be created. In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

In [2]:
# Function to calculate the one-ticket probability which takes in as input a list of 6 unique numbers
def one_ticket_probability(input):
    total_possible_outcomes = combinations(49, 6)
    successful_outcomes = 1
    probability = successful_outcomes/total_possible_outcomes
    percentage = probability * 100
    print('The chances of you winnning the big prize with your combination of numbers {} is {:.7f}%.'.format(input, percentage))

In [3]:
one_ticket_probability([1,2,3,4,5,6])

The chances of you winnning the big prize with your combination of numbers [1, 2, 3, 4, 5, 6] is 0.0000072%.


In [4]:
one_ticket_probability([8,5,2,4,15,6])

The chances of you winnning the big prize with your combination of numbers [8, 5, 2, 4, 15, 6] is 0.0000072%.


To give context of what the function does, the function first calculates to total number of unique combinations that the lottery can select 6 numbers out of the 49 numbers without replacement. In every lottery, there is only one single winning combination. Subsequently, the probability can be calculated which is then translated into percentage terms which is much easier to understand for the user.

# Historical Data Check for Canada Lottery

For the first version of the app, users can also compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

## Getting Familiar with the Historical Data Set

In [5]:
# Importing pandas
import pandas as pd

# Loading in the historical data set into a pandas DataFrame
historical = pd.read_csv('649.csv')

In [6]:
# Checking the shape of the dataset
historical.shape

(3665, 11)

In [7]:
# Checking the first three rows of the data set
historical.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [8]:
# Checking the last three rows of the data set
historical.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check

In [9]:
# Creating a function to extract a row from a dataset and compile the 6 winning numbers into a Python set and return the set
def extract_numbers(row):
    winning_set = set(row.loc['NUMBER DRAWN 1':'NUMBER DRAWN 6'])
    return winning_set

In [10]:
# Creating a column in the dataframe which contains the sets of winning numbers
historical['winning_set'] = historical.apply(extract_numbers, axis = 1)
historical.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER,winning_set
0,649,1,0,6/12/1982,3,11,12,14,41,43,13,"{3, 41, 11, 12, 43, 14}"
1,649,2,0,6/19/1982,8,33,36,37,39,41,9,"{33, 36, 37, 39, 8, 41}"
2,649,3,0,6/26/1982,1,6,23,24,27,39,34,"{1, 6, 39, 23, 24, 27}"


In [11]:
# Creating a function which takes in two inputs: a Python list with the user's combination of numbers and the pandas Series
# containing the past winning numbers in Python sets which would output the number of times the winning combination occurred in the past
# and print information about probability of winning the big prize in the next draw with that combination

def check_historical_occurence(combination, series):
    times_won = 0
    user_set = set(combination)
    for winning_set in series:
        if user_set == winning_set:
            times_won += 1
    print('The combination you inputted: {} won a total of {} time(s) in the past draws.'.format(combination, times_won))
    one_ticket_probability(combination)
        

In [12]:
check_historical_occurence([3,11,12,14,41,43,12], historical['winning_set'])

The combination you inputted: [3, 11, 12, 14, 41, 43, 12] won a total of 1 time(s) in the past draws.
The chances of you winnning the big prize with your combination of numbers [3, 11, 12, 14, 41, 43, 12] is 0.0000072%.


To provide some context, the `extract_numbers` function compiled all 6 winning numbers which are originally in separate columns in the data set into a single column as Python sets. Python sets, similar to statistical definition of sets, are unordered collection of unique elements. Subsequently, this column is used to compare against the user's combination of numbers which the function `check_historical_occurence` takes as one input. Then, the user is informed of the number of times the combination had won in past historical draws and the probability of him winning the big prize in the next drawing with that same combination.

# Function to Calculate Multi-ticket Win Probability

As we are all aware, lottery addicts are unlikely to just purchase one single ticket in a draw. Instead, they are more inclined to purchase multiple tickets for a draw as they believe this would increase their chances of winning significantly. As such, the app also needs to cater for allowing the users to calculate the chances of winning for any number of different tickets bought.

In [13]:
# Function to calculate and print probability of winning the big prize depending on number of different tickets played
def multi_ticket_probability(num_tickets):
    total_possible_outcomes = combinations(49,6)
    successful_outcomes = num_tickets
    probability = successful_outcomes/total_possible_outcomes
    percentage = probability * 100
    print('Your {} ticket(s) played would mean that you have a {:.7f}% chances of winning the big prize.'.format(num_tickets, percentage))

In [14]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for n in test_inputs:
    multi_ticket_probability(n)

Your 1 ticket(s) played would mean that you have a 0.0000072% chances of winning the big prize.
Your 10 ticket(s) played would mean that you have a 0.0000715% chances of winning the big prize.
Your 100 ticket(s) played would mean that you have a 0.0007151% chances of winning the big prize.
Your 10000 ticket(s) played would mean that you have a 0.0715112% chances of winning the big prize.
Your 1000000 ticket(s) played would mean that you have a 7.1511238% chances of winning the big prize.
Your 6991908 ticket(s) played would mean that you have a 50.0000000% chances of winning the big prize.
Your 13983816 ticket(s) played would mean that you have a 100.0000000% chances of winning the big prize.


To give some context, the function would allow to user to input the intended number of tickets to be played. Subsequently, since there is only one winning combination in every draw, the number of tickets played which must have differing combinations, would translate to equal number of successful outcomes which would mean a win for the user. Then, the probability can be calculated which would be printed for the user to understand the chances of winning in percentage terms for easy interpretation.

# Function to Calculate Smaller Wins

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [15]:
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('Your chances of having {} winning numbers with this ticket are {:.6f}%.'.format(n_winning_numbers, probability_percentage))    

In [16]:
test_inputs = [2, 3, 4, 5]
for input in test_inputs:
    probability_less_6(input)

Your chances of having 2 winning numbers with this ticket are 13.237803%.
Your chances of having 3 winning numbers with this ticket are 1.765040%.
Your chances of having 4 winning numbers with this ticket are 0.096862%.
Your chances of having 5 winning numbers with this ticket are 0.001845%.


The above function `probability_less_6` allows to user input the expected number of winning numbers which would then output the probability of having the inputted number of winning numbers out of the 6 numbers in the user's combination played in a ticket. For the 6 numbers in the user's combination, the inputted number of winning numbers have to be selected from these 6 which is a combination. Subsequently, the remaining non-winning numbers must be selected from the 49 - 6 = 43 numbers that are not in the user's combination. This would allow us to obtain the number of successful outcomes and thus, provide the probability of winning the inputted number of winning numbers. Similarly, the probability would be printed for the user to understand the chances of winning in percentage terms for easy interpretation.

# Conclusion

To wrap up, the four functions are included in the first version of the app:

* `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
* `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set
* `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816
* `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers