In [16]:
import pandas as pd

### Application Mobile for Addiction
The following scenario is a fiction:
A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

the app should be able to tell the customer :
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We gonna use the dataset https://www.kaggle.com/datascienceai/lottery-dataset it contains 3665 drawings dating from 1982 to 2018 from the national 6/49 lottery game in Canada.

**how does the 6/49 lottery work?**
- 6 numbers are drawn 
- from a set of 49 numbers (1 t 49)
- drawing done *without replacement*
- there are combinations and NOT permutations

#### Core functions
Because we gonna use them frequently in this project, let's right a function that compute factorials and another one the number of combinations.

In [17]:
#factorial
def factorial(n):
    total=1
    for i in range(1,n+1):
        total = total * i
    return total

def combinations(n,k):
    numerator= factorial(n)
    denominator=factorial(k)*factorial(n-k)
    return numerator/denominator

In [18]:
#test  functions
print(factorial(3))
print(combinations(49,6))

6
13983816.0


### One ticket probability
We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

the proba to win with 6 numbers is 1 in 13,983,816 according to https://en.wikipedia.org/wiki/Lotto_6/49


In [19]:
#function that compute the proba to win 
def one_ticket_probability(array):
    lenght= len(array)
    if lenght != 6:
        return 'I do not know what game you play but surely not 6/49 lotery! You need 6 numbers'
    for i in array:
        if i > 49:
            return 'One of the numbers is too high'
    total_combinations=int(combinations(49,6))
    outcomes=1
    p_win= 1 / total_combinations *100
    return 'Your chance to win with {0} numbers is {1:1.7f}% which corresponds to 1 chance out of {2:,} with the numbers {3} '.format(lenght,p_win,total_combinations,array)

In [20]:
#test function
one_ticket_probability([49,3,4,5,6,7])

'Your chance to win with 6 numbers is 0.0000072% which corresponds to 1 chance out of 13,983,816 with the numbers [49, 3, 4, 5, 6, 7] '

In [21]:
#test when not enough numbers
one_ticket_probability([1,1,1,1,1])

'I do not know what game you play but surely not 6/49 lotery! You need 6 numbers'

In [22]:
#test with a number above 49
one_ticket_probability([1,1,1,1,1,56])

'One of the numbers is too high'

- We wrote a function that will give the player its chance of wining the big prize
- we paid attention that the numbers are below 49 and that 6 numbers are provided


### Historical Data Check for Canada Lottery

in the next blocs, we gonna open the dataset, inspect the format of the data (numbers, date....)

In [23]:
drawns_loto = pd.read_csv('649.csv',parse_dates=True)

In [24]:
#fix the date #optional for now
#drawns_loto['DRAW DATE']=drawns_loto['DRAW DATE'].astype('datetime64[ns]')

In [25]:
drawns_loto.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


In [26]:
drawns_loto.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


### Function for Historical Data Check
we gonna create a function that will let the palyer check if they would have win before with their ticket
Once again the engineering team told us that:
- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- they want a function that prints:
    - the number of times the combination selected occurred in the Canada data set
    - the probability of winning the big prize in the next drawing with that combination.

In [27]:
drawns_loto[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
       'NUMBER DRAWN 5', 'NUMBER DRAWN 6']].values

array([[ 3, 11, 12, 14, 41, 43],
       [ 8, 33, 36, 37, 39, 41],
       [ 1,  6, 23, 24, 27, 39],
       ...,
       [ 6, 22, 24, 31, 32, 34],
       [ 2, 15, 21, 31, 38, 49],
       [14, 24, 31, 35, 37, 48]])

In [28]:
def extract_numbers(row):
    #when using the apply funciton on a dataframe we have a serie 
    #so we case extract columns that way:
    row = row[4:10] 
    #extract the values of the rows selected
    values = row.values
    #pass the values as a set 
    values = set(values)
    return values

winning_numbers = drawns_loto.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [29]:
def check_historical_occurence(input_user,database=winning_numbers):
    lenght= len(input_user)
    if lenght != 6:
        return 'I do not know what game you play but surely not 6/49 lotery! You need 6 numbers'
    for i in input_user:
        if i > 49:
            return 'One of the numbers is too high'
    input_user = set(input_user)
    times_drawn = input_user == winning_numbers
    return 'The combinaison {0} has been a winner {1} time(s) in the past. You have 1 chance out of 13,983,816 to win with this combinaison'.format(input_user,times_drawn.sum()) 

In [30]:
check_historical_occurence([33,37,36, 39, 8, 41],winning_numbers)

'The combinaison {33, 36, 37, 39, 8, 41} has been a winner 1 time(s) in the past. You have 1 chance out of 13,983,816 to win with this combinaison'

The function works as a boolean by testing the input combinaison of the user vs all the past wining combinaisons. as s True is 1 and False is 0 with python, a basic sum helps to find how many time this specific combination was played in the past. 
- in addition there are still the controls that
    - 6 numbers were entered
    - no numbers higher than 49

### Multi-ticket Probability 

In [64]:
def multi_ticket_probability(n_tickets):
    all_combinations=combinations(49,6)
    p_winning = n_tickets / all_combinations *100
    return 'With {0} tickets, you have {1:1.7f}% winning chance which corresponds to 1 chance out of {2:,}'.format(n_tickets,
                                                                                                                   p_winning,
                                                                                                                   int(all_combinations/n_tickets),
                                                                                                                    )
    

In [66]:
#test
for i in [1, 10, 100, 10000, 1000000, 6991908, 13983816]:
    print(multi_ticket_probability(i))

With 1 tickets, you have 0.0000072% winning chance which corresponds to 1 chance out of 13,983,816
With 10 tickets, you have 0.0000715% winning chance which corresponds to 1 chance out of 1,398,381
With 100 tickets, you have 0.0007151% winning chance which corresponds to 1 chance out of 139,838
With 10000 tickets, you have 0.0715112% winning chance which corresponds to 1 chance out of 1,398
With 1000000 tickets, you have 7.1511238% winning chance which corresponds to 1 chance out of 13
With 6991908 tickets, you have 50.0000000% winning chance which corresponds to 1 chance out of 2
With 13983816 tickets, you have 100.0000000% winning chance which corresponds to 1 chance out of 1


### Less winning Numbers - Function
After discussion with the enginerring team we need to write a function that allows the user to know what are his chances of winning if he gets less than 6 numbers(2,3,4,5) matching the winning set 
- in the 6/49 there are effectively smaller prices for small winners :) 
The engineering team would like:
- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.


to understnad the questions, we ant the proba of winning for the exact number the user gives.
- it means that the numbers the user played are **partially** matching the wining tickets, which explain why we want to calculate how many combinaisons are left 

for partial winning it is important to split the problem in 2:the winning combination for 4 numbers for instance: ( **a ,b ,c ,d ,**e ,f ):
    - a b c d are a combination made from the numbers matching the wining combination (combinations(6,4)
    - e f are a combination **WHO DO NOT MATCHED** the final number of the wining combination so we need to find the number of combinations possible with e and f with the 43 others numbers ( we are without replacement so 49 - 6 numbers already drawn on the solutions)
    - then total of winning will be the number combi_ticket * rest_combi
    - then we compute the proba of winning by simple divisions

In [126]:
def probability_less_6(n_numbers):
    n_combinations_ticket = combinations(6, n_numbers)
    n_combinations_remaining = combinations(43, 6 - n_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    total_outcomes= combinations(49,6)
    p_winning = successful_outcomes / total_outcomes
    return print('''You have {0:.7f}% chances of winning with {1} numbers,
which means around 1 chance out of {2:,}'''.format(p_winning*100,
                                             n_numbers,
                                             round(total_outcomes / successful_outcomes)))

In [129]:
for n in range(1,7):
    probability_less_6(n)

You have 41.3019450% chances of winning with 1 numbers,
which means around 1 chance out of 2
You have 13.2378029% chances of winning with 2 numbers,
which means around 1 chance out of 8
You have 1.7650404% chances of winning with 3 numbers,
which means around 1 chance out of 57
You have 0.0968620% chances of winning with 4 numbers,
which means around 1 chance out of 1,032
You have 0.0018450% chances of winning with 5 numbers,
which means around 1 chance out of 54,201
You have 0.0000072% chances of winning with 6 numbers,
which means around 1 chance out of 13,983,816


In [88]:
#from another exercice for training:
def probability_less_3(n_winning_numbers):
    
    #Replace 6 with 3 (the numbers that are drawn)
    n_combinations_ticket = combinations(3, n_winning_numbers)
    #Replace 43 with 6-3 (the remaining incorrect numbers)
    n_combinations_remaining = combinations(3, 3 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    #Replace 49 with 6 (the pool) and 6 with 3 (the drawn numbers)
    n_combinations_total = combinations(6, 3)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage, int(combinations_simplified)))

In [94]:
probability_less_3(1)

Your chances of having 1 winning numbers with this ticket are 45.000000%.
In other words, you have a 1 in 2 chances to win.
