# Lottery Odds Analysis 

*This analysis was done as part of dataquest practice.*

The lottery is relatively inexpensive. In the US, the usual price people will pay for a lottery ticket is in the range of $2-$5, with the promise of a huge jackpot prize. Naturally, it is seen as a socially acceptable form of gambling. However, for a minority, this form of entertainment escalate into an addiction.

In this analysis, we will assume the role of data analyst medical institute that aims to prevent and treat gambling addictions. The institute wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute need us to create the logical core of the app and calculate probabilities to support the engineer when building the app.

We will focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We will also use historical data coming from the national 6/49 lottery game in Canada as a bonus. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set. Which means we will have to perform a lot of factorial and combinations calculations.

In [1]:
import math
import scipy.special
import numpy as np
import pandas as pd

# So I don't have to type much. I am lazy
combinations = scipy.special.comb

## One-ticket Probability
We need to build a function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers.

The engineer team told us that we need to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [2]:
def one_ticket_probablity(ticket_list):
    '''
    Function to calculate probablity of winning when buying a single ticket
    
    Arguments:
    ticket_list (list): list of 6 numbers on the ticket
    '''
    possible_outcomes = combinations(49,6)
    one_ticket_probablity = 1/possible_outcomes
    percentage_form = one_ticket_probablity * 100
    
    message = ("The probability of winning with the numbers {} is"
               " {:.7f}%.\n"
               "In other word, you have a 1 in {:,} chances to win.".format(
                   ticket_list, percentage_form, int(possible_outcomes)))
    
    print(message)
    
one_ticket_probablity([1,2,3,4,5,6])
one_ticket_probablity([9, 26, 41, 7, 15, 6])

The probability of winning with the numbers [1, 2, 3, 4, 5, 6] is 0.0000072%.
In other word, you have a 1 in 13,983,816 chances to win.
The probability of winning with the numbers [9, 26, 41, 7, 15, 6] is 0.0000072%.
In other word, you have a 1 in 13,983,816 chances to win.


## Historical Data Check for Canada Lottery
*This was done solely to satisfy dataquest instructions. Comparing historical tickets with current ticket doesn't make much sense to me. However, I am not a gambler so I am not so sure about that.*

Next we will examine data from kaggle to check the empirical probability.

In [3]:
# This will download data into /data/ folder, with the name '649.csv'
import kaggle

kaggle.api.authenticate()

kaggle.api.dataset_download_files('datascienceai/lottery-dataset', 
                                  path='./data/', unzip=True)

In [4]:
lottery_data = pd.read_csv('./data/649.csv')
print(lottery_data.shape)
print(lottery_data.head(3))

(3665, 11)
   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  


We are going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [5]:
def extract_numbers(row):
    winning_row = row[4:10]
    winning_set = set(winning_row.values)
    return(winning_set)

winning_numbers = lottery_data.apply(func=extract_numbers,axis=1)

In [6]:
print(winning_numbers.head(3))

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
dtype: object


In [7]:
def check_historical_occurrence(user_ticket,winning_numbers):
    ticket_set = set(user_ticket)
    check = sum(ticket_set == winning_numbers)
    
    if check == 0:
        print("The combination has never occured")
    else:
        print("The combination has occured {} times in the past.".
             format(int(check)))

Let's test some tickets to see if they have won before.

In [8]:
test_input_3 = [33, 36, 37, 39, 8, 41]
check_historical_occurrence(test_input_3, winning_numbers)
test_input_4 = [3, 2, 44, 22, 1, 44]
check_historical_occurrence(test_input_4, winning_numbers)

The combination has occured 1 times in the past.
The combination has never occured


## Multi-ticket Probability
User should also be able to find the probability of winning if they play multiple different tickets. For instance, someone might intend to play 5 different tickets and they want to know the probability of winning the big prize.

We will write a function, multi_ticket_probability(), that takes in the number of tickets and prints probability of winning depending on the input.

In [9]:
def multi_ticket_probability(number_of_ticket):
    possible_outcomes = combinations(49,6)
    probability = number_of_ticket/possible_outcomes*100
    probabilty_simplified = round(possible_outcomes / number_of_ticket) 
    
    message = ("The probability of winning with {} tickets is"
               " {:.7f}%.\n"
               "In other word, you have a 1 in {:,} chances to win.".format(
                   number_of_ticket, probability, int(probabilty_simplified)))
    
    print(message)
    
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # Separate output to improve visibility  
    

The probability of winning with 1 tickets is 0.0000072%.
In other word, you have a 1 in 13,983,816 chances to win.
------------------------
The probability of winning with 10 tickets is 0.0000715%.
In other word, you have a 1 in 1,398,382 chances to win.
------------------------
The probability of winning with 100 tickets is 0.0007151%.
In other word, you have a 1 in 139,838 chances to win.
------------------------
The probability of winning with 10000 tickets is 0.0715112%.
In other word, you have a 1 in 1,398 chances to win.
------------------------
The probability of winning with 1000000 tickets is 7.1511238%.
In other word, you have a 1 in 14 chances to win.
------------------------
The probability of winning with 6991908 tickets is 50.0000000%.
In other word, you have a 1 in 2 chances to win.
------------------------
The probability of winning with 13983816 tickets is 100.0000000%.
In other word, you have a 1 in 1 chances to win.
------------------------


## Odds of winning smaller prizes

In 6/49 lottery, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. Thus, some users might want to know the probability of having two, three, four, or five winning numbers.

Let's say a player chose these six numbers on a ticket: (1, 2, 3, 4 ,5 ,6). Out of these six numbers, we can form six five-number combinations:

- (1, 2, 3, 4, 5)
- (1, 2, 3, 4, 6)
- (1, 2, 3, 5, 6)
- (1, 2, 4, 5, 6)
- (1, 3, 4, 5, 6)
- (2, 3, 4, 5, 6)

Thus, we can calculate total number of 5 number combinations by calculating 
\begin{equation}
_6C_5 = {6 \choose 5} =  \frac{6!}{5!(6-5)!} =  6
\end{equation}

For each one of the six five-number combinations above, there are 44 possible successful outcomes in a lottery drawing. For the combination (1, 2, 3, 4, 5), for instance, there are 44 lottery outcomes that would return a prize:

- (1, 2, 3, 4, 5, 6)
- ...
- (1, 2, 3, 4, 5, 15)
- ...
- (1, 2, 3, 4, 5, 30)
- ...
- (1, 2, 3, 4, 5, 49)

So the probability of winning smaller prize for 5 number combinations is
\begin{equation}
P(\text{5 numbers}) = \frac{6 \times 44}{{49 \choose 6}} = 0.0000189
\end{equation}

We can generalize the equation to numbers between 2 and 5 to be:

\begin{equation}
\frac{{6 \choose k}\times {49-k\choose6-k}}{{49 \choose 6}}
\end{equation}

Now we can construct a function to calculate the probability of winning smaller prizes, between 2 and 5 out of 6 correct numbers. We assume having only 1 number is not qualified to any prize.

In [10]:
def smaller_prizes_probability(n_correct):
    if n_correct == 1:
        print("You won't win any prize with only one right number,")
    elif n_correct >5:
        print("Please retry with number between 2 and 5")
    else:
        n_correct_combination = combinations(6, n_correct)
        n_remain_combination = combinations(49 - n_correct,
                                           6 - n_correct)
        total_success = n_correct_combination * n_remain_combination
        total_outcomes =  combinations(49,6)
        # Calculate probability in percentage
        probability = total_success/total_outcomes * 100
        # Calculate odds
        odds = round(total_outcomes/total_success)
        print("The probability of winning with {} correct numbers is"
             " {:.7f}%. You have a 1 in {:,} chance to win".
             format(n_correct,probability,int(odds)))

for test_input in [2, 3, 4, 5]:
    smaller_prizes_probability(test_input)
    print('--------------------------') # output delimiter

The probability of winning with 2 correct numbers is 19.1326531%. You have a 1 in 5 chance to win
--------------------------
The probability of winning with 3 correct numbers is 2.1710812%. You have a 1 in 46 chance to win
--------------------------
The probability of winning with 4 correct numbers is 0.1061942%. You have a 1 in 942 chance to win
--------------------------
The probability of winning with 5 correct numbers is 0.0018879%. You have a 1 in 52,969 chance to win
--------------------------
