# Lotto 6/49

###  Background

Lotto 6/49 is one of three national lottery games in Canada. Launched on June 12, 1982, Lotto 6/49 was the first nationwide Canadian lottery game to allow players to choose their own numbers. Previous national games, such as the Olympic Lottery, Loto Canada and Superloto used pre-printed numbers on tickets. Lotto 6/49 led to the gradual phase-out of that type of lottery game in Canada.


### Problems

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?

- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?

- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

In [42]:
import pandas as pd
import numpy as np

In [10]:
# Let's start by writing two functions that will be used frequenctly - factorials and combinations
# Factorials n! = n ⋅ (n-1) ⋅ (n-2) ⋅ ... ⋅ 3 ⋅ 2 ⋅ 1
def factorial(n):
    total_sum = 1
    for i in range(n,0,-1):
        total_sum *= i
    return total_sum

# Combination = nPk / k!
def permutation(n,k):
    return factorial(n)/factorial(n-k)

def combination(n,k):
    return permutation(n,k)/factorial(k)

In [119]:
def one_ticket_probability():
    pro = 1/combination(49,6)
    comb = int(combination(49,6))
    return print('You have 1 in {0:,} chance to win the next lotto, the probability is {1:.10%}'.format(comb,pro))

In [120]:
one_ticket_probability()

You have 1 in 13,983,816 chance to win the next lotto, the probability is 0.0000071511%


On the previous screen, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

On this screen, we'll focus on exploring the historical data coming from the Canada 6/49 lottery. The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset) and it has the following structure:

In [43]:
file_loc = 'D:/Dataquest/Dataquest 2022 Learning/Datasets/'
df = pd.read_csv(file_loc + '649.csv')

In [46]:
print(df.shape)

df.head()

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [63]:
# Let's write a function that will allow users to compare their ticket against the historical data
def historical_probability(a):
    match_1 = df['NUMBER DRAWN 1'] == a[0]
    match_2 = df['NUMBER DRAWN 2'] == a[1]
    match_3 = df['NUMBER DRAWN 3'] == a[2]
    match_4 = df['NUMBER DRAWN 4'] == a[3]
    match_5 = df['NUMBER DRAWN 5'] == a[4]
    match_6 = df['NUMBER DRAWN 6'] == a[5]
    num_of_match = df[match_1 & match_2 & match_3 & match_4 & match_5 & match_6]
    hist_probability = len(num_of_match) / len(df)
    return print('You selection occurred {0} times in the Canada data set.'.format(len(num_of_match))), one_ticket_probability()

In [66]:
a = [1,2,3,4,5,6]

# match_1 = df['NUMBER DRAWN 1'] == a[0]
# match_2 = df['NUMBER DRAWN 2'] == a[1]
# match_3 = df['NUMBER DRAWN 3'] == a[2]
# match_4 = df['NUMBER DRAWN 4'] == a[3]
# match_5 = df['NUMBER DRAWN 5'] == a[4]
# match_6 = df['NUMBER DRAWN 6'] == a[5]

# df[match_1 & match_2 & match_3 & match_4 & match_5 & match_6]

In [121]:
historical_probability(a)

You selection occurred 0 times in the Canada data set
You have 1 in 13,983,816 chance to win the next lotto, the probability is 0.0000071511%


(None, None)

In [68]:
def extract_number(row):
    row_extracted = {row['NUMBER DRAWN 1'],row['NUMBER DRAWN 2'],row['NUMBER DRAWN 3'],row['NUMBER DRAWN 4'],row['NUMBER DRAWN 5'],row['NUMBER DRAWN 6']}
    return row_extracted

In [73]:
all_winning_sets = df.apply(extract_number,axis=1)
all_winning_sets

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
                  ...           
3660    {38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 31}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Length: 3665, dtype: object

In [89]:
def check_historical_occurence(a, historical_sets):
    user_ticket = set(a)
    occurance = len(historical_sets[historical_sets==user_ticket])
    return print('Your selection appeared {0} times from Canada Lotto dataset'.format(occurance)), one_ticket_probability() 

In [122]:
a = [38, 40, 41, 10, 15, 23]
print('You selection is {0}'.format(set(a)))
check_historical_occurence(a, all_winning_sets)

You selection is {38, 40, 41, 10, 15, 23}
Your selection appeared 1 times from Canada Lotto dataset
You have 1 in 13,983,816 chance to win the next lotto, the probability is 0.0000071511%


(None, None)

In fact, when we check the historical data, each winning set only appeared once.

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [104]:
# def multi_ticket_probability(a):
#     user_combination = combination(len(a),6)
#     all_combination = combination(49,6)
#     prob = user_combination / all_combination
#     return print('You select {0} numbers, which have {1:,} combinations, and the probability of winning the next lotto is {2:.10%}'.format(a,int(user_combination),prob*100))

In [117]:
def multi_ticket_probability(a):
    user_tickets = a
    all_combination = combination(49,6)
    if user_tickets > all_combination:
        print('You have entered too many combinations')
    else:
        prob = user_tickets / all_combination
        print('If you buy {0} tickets, the probability of winning the next lotto is {1:.10%}'.format(user_tickets,prob))
    return 

In [124]:
multi_ticket_probability(13983816)

If you buy 13983816 tickets, the probability of winning the next lotto is 100.0000000000%


In [116]:
'percentage {0:.3%}'.format(0.5)

'percentage 50.000%'

On this screen, we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
  - six different numbers from 1 to 49; and
  - an integer between 2 and 5 that represents the number of winning numbers expected
  
  
Our function prints information about the probability of having the inputted number of winning numbers.
To help you code this function, we'll guide you through calculating the probability for having exactly five winning numbers. First, we need to differentiate between these two probability questions:


1. What is the probability of having exactly five winning numbers?


2. What is the probability of having at least five winning numbers?

---------------------------------------------------------------------------------------------------------------------------
Let's answer the first question, what is the probability of having exactly five winning number?

For any given ticket with 6 numbers, there are 6 possible 5-number combinations $_6C_5$ = (6! * (6-5)!) / 5! = 6

Let's assume we choose {1,2,3,4,5,6}, we can form six five-number combinations:

- (1,2,3,4,5)
- (1,2,3,4,6)
- (1,2,3,5,6)
- (1,2,4,5,6)
- (1,3,4,5,6)
- (2,3,4,5,6)

For each of the five-number combinations, let's take (1,2,3,4,5) as an example, there are **44** possible six-number outcomes:

- (1,2,3,4,5,**6**)
- (1,2,3,4,5,**7**)
- (1,2,3,4,5,**8**)
- ...
- ...
- (1,2,3,4,5,**48**)
- (1,2,3,4,5,**49**)


### However, for probability of having exactly FIVE winning numbers, NOT SIX winner numbers, we need to minus 1 outcome, so there are 43 possible six_number outcomes to win exact FIVE-number prize.


- Therefore, for each of the 6 combination, we have 43 possible winning outcomes, and the total number of winning outcomes will be

  **6 * 43 = 258 possible winning outcomes**
  
  
- The total possible Six-number outcomes are $_49C_6$ = **13,983,816** outcomes


- **P(5-winning numbers) = 258 / 13,983,816 =0.00001845 = 0.001845%**



In [146]:
def probability_less_6(n):
    
    possible_combination = combination(6,n)
    possible_outcomes_per_combination = combination(43,6-n)
    possible_outcomes = possible_combination * possible_outcomes_per_combination
    
    total_outcomes = combination(49,6)
    
    probability = possible_outcomes / total_outcomes
    
    print('The {0}-winner number probability is {1:.6%}'.format(n,probability))
    return 



----------------------------------------------------------------------------------------
2. What is the probability of having at least five winning numbers?

- P(at least 5_winning numbers) = P(5-winning numbers) * (1/44)

In [151]:
probability_less_6(4)

The 4-winner number probability is 0.096862%


In [152]:
probability_less_6(3)

The 3-winner number probability is 1.765040%
