## Permutations, Combinations, Conditional Probability and Partitioning Complex Events



## What's the difference between a permutation and a combination?

A: In a permutation, order matters (Really you're padlock should be called a permutation lock not a combination lock because the order that you put your 3-number code in matters!) For combinations, you only care about which items are members of the set. For example, if you were asking how many 3 topping pizzas you could make from a selection of 8 toppings, that would be a problem involving combinations.

## How many permutations are there for a standard padlock?
> Hint: (there are 40 numbers on a padlock and use 3 numbers.)

In [5]:
40**3

64000

For first number: 40 choices, for second number, still 40 choices ($40\cdot40=40^2$). (Visualization: 40x40 square) Then for 3rd number: $40\cdot40\cdot=40^3$

## How many unique 3 topping pizzas can you make from the following ingredients:
    * Mushrooms
    * Pepperoni
    * Onion
    * Peppers
    * Ham
    * Pineapple
    * Sausage
    * Olives
    
> Fun question: which is the worst?

In [6]:
import itertools

In [7]:
toppings = ["Mushrooms","Pepperoni","Onion","Peppers","Ham","Pineapple","Sausage","Olives"]
three_topping_pizzas = list(itertools.combinations(toppings, 3))
len(three_topping_pizzas)

56

## Explaining the Intuition Behind Combinations


How would you compute the number of 3-topping combinations from the ingredients above by hand? Explain the rational behind your computations.

## Teacher Notes


Potential Intuitive Explanation:

> This explanation tries to build students understanding of combinations. Try and move student's discussion along this path. Some student's may explain the problem above by simple referencing the formula $C(n,r) = \frac{n!}{r!(n-r)!}$. If this happens, praise student understanding and then ask if anyone can explain the reasoning behind the formula. 


8 choices for first topping * 7 remaining choices for second topping * 6 choices for 3rd topping / repeats

For Example:

Mushrooms, Pepperoni, Onion is the same as  Pepperoni, Mushrooms, Onion and Onion, Mushrooms, Pepperoni.

From this permutation-esque conceptualization, we need to divide by how many different arrangements could be made with any the three toppings, as these will be the repeat examples under our current formulation.

Q: How many repeats will there be for each of our 3 topping pizzas following the current procedure of 8 choices for first topping * 7 remaining choices for second topping * 6 choices for 3rd topping?

> Hint: you can use another factorial to calculate the repeats.   

A: $3!=3\cdot2\cdot1=6$

Now, building from this intuition, we can derive the general formula for combinations:

8 choices for first topping * 7 remaining choices for second topping * 6 choices for 3rd topping / repeats

Abstracting to a formula, we have:

$\frac{8!}{5!\cdot number\_of\_repeats}$


Note how we have managed to rewrite $8\cdot7\cdot6$ as $\frac{8!}{5!}$ since $\frac{8!}{5!} = \frac{8\cdot7\cdot6\cdot5\cdot4\cdot3\cdot2\cdot1}{5\cdot4\cdot3\cdot2\cdot1}$  and many terms *'cancel'*, namely 1 through 5.

> *'cancel'* is a common terminology and so used here, but potentially misleading and vague. If you wish to be clarify, albeit verbosely, we can say that matching factors in the numerator and denominator *'cancel'*, because they form a multiplicative identity. For example, $\frac{5}{5}=1$, and since $1$ is the identity element for multiplication ($1\cdot x=x$), these factors can essentially be *'canceled out'* in reducing the fraction to simplest form.


Even more abstractly, in general:
n items chosen r at a time

$\frac{n!}{(n-r)!\cdot number\_of\_repeats}$

Number of repeats is simply r!, so all together we have:

$\frac{n!}{r!(n-r)!}$

## Conditional Probability

### When do we compute conditional probabilities? 

A: We need to compute conditional probabilities when the outcome of an event depends on the outcome of previous events (dependent events). A conditional probability of an event is the probability of the event given another event has occurred.


### Mushroom dataset

To discuss conditional probability, let's look at a modified version of the Mushroom dataset from UCI [here](https://archive.ics.uci.edu/ml/datasets/Mushroom). Each row in this dataset corresponds to one observation (one mushroom). 

The modified dataset includes 4 variables:

* **edible-poisonous**
    * This categorical variable can have one of two values: if the mushroom is edible, "edible". If not, "poisonous"

* **bruised**
    * This is a Boolean variable that can assume either one of two values, True or False.

* **gill-spacing**
    * This categorical variable can have one of three values: "close", "crowded", or "distant"
    
* **stalk-shape**
    * This categorical variable can have one of two values: "enlarging" or "tapering"
* **stalk-color-above-ring**
    * This categorical variable can have one of nine values:  "brown","buff","cinnamon","gray","orange", "pink","red","white" or "yellow"

* **stalk-color-below-ring**
    * This categorical variable can have one of nine values:  "brown","buff","cinnamon","gray","orange", "pink","red","white" or "yellow"

* **gill-color**
    * This categorical variable can have one of twelve values: "black","brown","buff","chocolate","gray", "green","orange","pink","purple","red", "white" or "yellow" 



In [8]:
import pandas as pd

df = pd.read_csv('Mushrooms_cleaned.csv')
df.head()

Unnamed: 0,edible-poisonous,gill-spacing,stalk-shape,stalk-color-above-ring,stalk-color-below-ring,gill-color,bruised
0,poisonous,close,enlarging,white,white,black,True
1,edible,close,enlarging,white,white,black,True
2,edible,close,enlarging,white,white,brown,True
3,poisonous,close,enlarging,white,white,brown,True
4,edible,crowded,tapering,white,white,black,False


#### If you picked a row from this dataset at random, what is the probability it corresponds to a bruised mushroom? 

In [9]:
probability_bruised = df[df['bruised'] == True].shape[0]/df.shape[0]
probability_bruised

0.4155588380108321

#### What is the probability you pick a row corresponding to a mushroom that is bruised _AND_ edible? 

In [10]:
probability_bruised_and_edible = df[(df['bruised'] == True) & 
                                    (df['edible-poisonous'] == 'edible')].shape[0]/df.shape[0]

In [11]:
probability_bruised_and_edible

0.33874938453963566

#### What is the probability of picking an edible mushroom given it is bruised, $P(edible|bruised)$? 

In [12]:
prob_edible_given_bruised = probability_bruised_and_edible/probability_bruised
prob_edible_given_bruised

0.8151658767772512

In [13]:
bruised = df[df['bruised'] == True]
bruised['edible-poisonous'].value_counts(dropna=False, normalize=True)

edible       0.815166
poisonous    0.184834
Name: edible-poisonous, dtype: float64

#### What is the probability of picking a bruised mushroom given it is edible, $P(\text{bruised | edible})$? 

* For this, it is important that students recognize that, even though computing the probability that a mushroom is edible and bruised is the same as the probability that a mushroom is bruised and edible, the conditional probability is **not the same** because the condition that needs to be met to compute the probability is different (i.e. the sample space is different)

In [14]:
probability_edible = df[df['edible-poisonous'] == 'edible'].shape[0]/df.shape[0]
probability_edible

0.517971442639094

In [15]:
probability_bruised_given_edible = probability_bruised_and_edible/probability_edible

In [16]:
edible = df[df['edible-poisonous'] == 'edible']
edible['bruised'].value_counts(dropna=False, normalize=True)

True     0.653992
False    0.346008
Name: bruised, dtype: float64

### Intuition behind conditional probability: 

How do you compute the probability that mushrooms are edible given they are bruised? 

When you ask the question "what is the probability that the mushrooms are edible and bruised?", the sample space originally contains all 8124 rows of mushrooms. 

<img src="images/Image_72_Cond4.png" width="300">

However, to compute the probability that the mushrooms are edible given they are bruised, you need to consider the reduced size of the sample space. 

In the image above, S is the universe of all mushrooms in the dataset, A is the set of mushrooms that are edible, and B is the set of mushrooms that are bruised.

* When you ask the question "what is the probability that the mushrooms are edible given the mushrooms are bruised?", you have effectively reduced the size of the sample space to include only those mushrooms that are bruised. 

* Given that mushrooms are bruised, the only way for the mushrooms to be edible is for these mushrooms to fall in the intersection of the set of mushrooms that are edible _and_ the set of mushrooms that are bruised , $P(edible \cap bruised)$.  

* To account for the smaller sample space, you divide the probability mushrooms are edible and bruised by the probability the mushrooms are bruised: $$\large P(edible|bruised) = \frac{P(edible \cap bruised)}{P(bruised)}$$




## Partitioning Complex Events

You're not really a mushroom expert, but you can see a bunch of orange spots all over the mushroom in your hand. Given the data at your disposal, what's the probability that the mushroom is edible?



$$\large P(edible|orange) = \frac{P(edible \cap orange)}{P(orange)}$$

Furthermore, we can decompose $P(orange)$ into all of the possibilities:

$P(orange) = P(\text{orange_gill}\cup\text{orange_stalk_below_ring}\cup\text{orange_stalk_above_ring})$

But be careful here! 

$P(\text{orange_gill}\cup\text{orange_stalk_below_ring}\cup \text{orange_stalk_above_ring}) != P(\text{orange_gill}) + P(\text{orange_stalk_below_ring}) + P(\text{orange_stalk_above_ring})$

While this may seem correct, adding these individual probabilities double counts mushrooms which have both orange gills and orange stalks or entirely orange stalks.  

In [17]:
p_orange = df[(df['gill-color'] == 'orange') 
               | (df['stalk-color-above-ring'] == 'orange')
              | (df['stalk-color-below-ring'] == 'orange')
              ].shape[0]/df.shape[0]
p_edible_and_orange = df[((df['gill-color'] == 'orange') 
                           | (df['stalk-color-above-ring'] == 'orange')
                          | (df['stalk-color-below-ring'] == 'orange')
                         )
                         & (df['edible-poisonous']=='edible')
                          ].shape[0]/df.shape[0]
p_edible_given_orange = p_edible_and_orange / p_orange
p_edible_given_orange

#Apparently orange mushrooms seem fairly safe....(Disclaimer: don't take this as definitive foraging advice!)

1.0

## Summary


In this lesson, you reviewed 4 major foundational concepts for probability: permutations, combinations, conditional probability and partitioning complex events. Remember that your standard padlock should be more accurately called a permutation lock! Order matters for permutations, whereas only the members of the set are important for combinations. Conditional probability investigates the odds of an event occurring given other information. In these instances, the universal set of possibilities reflects the given information. In the mushroom example, the probability of a mushroom being edible given that it is bruised can be computed by dividing the probability that it a mushroom is both edible AND bruised, by the probability that it is bruised. Mathematically: $$\large P(edible|bruised) = \frac{P(edible \cap bruised)}{P(bruised)}$$. Finally, you investigated partitioning complex events. Often, complex events can be broken into constituent parts, and the total probability can be calculated by combining these smaller events.

## Additional Resources

## Challenge Problem

Let's take some time and review questions like those from the [dsc-law-of-total-probability-lab](https://github.com/learn-co-curriculum/dsc-law-of-total-probability-lab).  

According to the CDC, [14% of Americans currently smoke, 15.8% of males and 12.2% of females](https://www.cdc.gov/tobacco/data_statistics/fact_sheets/adult_data/cig_smoking/index.htm). According the the American Lung Association, [men who smoke are 23 times more likely to smoke then never-smokers, and women are 13 times as likely](https://www.lung.org/lung-health-and-diseases/lung-disease-lookup/lung-cancer/resource-library/lung-cancer-fact-sheet.html). The American Cancer Society estimates that [the lifetime risk of developing lung cancer is 6.85% for males and 5.95% for females](https://www.cancer.org/cancer/cancer-basics/lifetime-probability-of-developing-or-dying-from-cancer.html). Currently, the census estimates that [women are 50.8% of the population](https://www.census.gov/quickfacts/fact/table/US/PST045218). 

What is the risk of lung cancer for non-smokers? Non-smoker males? Non-smoker females?

> To learn more about lung-cancer risks for non-smokers, see https://www.cancer.org/latest-news/why-lung-cancer-strikes-nonsmokers.html.

A:

$P(\text{Lung_Cancer | Male}) = P(\text{Lung_Cancer | NonSmoking_Male}) \cdot P(\text{NonSmoking_Male}) + P(\text{Lung_Cancer | Smoking_Male}) \cdot P(\text{Smoking_Male})$

We also know that male smokers are 23 times more likely to develop lung cancer then there counterparts:

$P(\text{Lung_Cancer | Smoking_Male})= 23 \cdot P(\text{Lung_Cancer | NonSmoking_Male})$  

Substituting we have:  

$.0685 = P(\text{Lung_Cancer | NonSmoking_Male}) \cdot (1-.158) + 23 \cdot P(\text{Lung_Cancer | NonSmoking_Male}) \cdot (.158)$  
$.0685 = P(\text{Lung_Cancer | NonSmoking_Male}) \cdot (.842) + 23 \cdot P(\text{Lung_Cancer | NonSmoking_Male}) \cdot (.158)$
$.0685 = 4.476 \cdot P(\text{Lung_Cancer | NonSmoking_Male})$
$0.015303842716711352 = P(\text{Lung_Cancer | NonSmoking_Male})$

So 23 times the risk for Smoking Males Would Give:

$0.3519883824843611 = P(\text{Lung_Cancer | Smoking_Male})$


Following a similar procedure for Females, results in a ~2.4% chance that a Non Smoking Female will develop lung cancer.



In [6]:
#Nonsmoking Males
(.0685 / (23*.158 + .842))

0.015303842716711352

In [9]:
#Nonsmoking Females
(.0595 / (13*.122 + .878))

0.024147727272727272

## Additional Conditional Probability Practice

What's the probability that a mushroom is poisonous if it has close gill spacing and a tapering stalk?

$$\large P(edible|close \cap tapering) = \frac{P(edible \cap close \cap tapering)}{P(close \cap tapering)}$$

In [17]:
probability_close_tapering = df[(df['gill-spacing'] == 'close') 
                                   & (df['stalk-shape'] == 'tapering')
                                  ].shape[0]/df.shape[0]
probability_close_tapering_and_edible = df[(df['gill-spacing'] == 'close') 
                                           & (df['edible-poisonous'] == 'edible')
                                           & (df['stalk-shape'] == 'tapering')
                                          ].shape[0]/df.shape[0]
probability_edible_given_close_tapering = probability_close_tapering_and_edible/probability_close_tapering
probability_edible_given_close_tapering

0.46153846153846156