# Bayes Theorem


taken from the book [think bayes](https://www.amazon.com.br/Think-Bayes-Allen-B-Downey/dp/1449370780/ref=asc_df_1449370780/?tag=googleshopp00-20&linkCode=df0&hvadid=379787788238&hvpos=&hvnetw=g&hvrand=12762391381986177857&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=1001652&hvtargid=pla-525198131441&psc=1)

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('data/gss_bayes.csv')

In [109]:
df[:3]

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
0,1,1974,21.0,1,4.0,2.0,4970.0
1,2,1974,41.0,1,5.0,0.0,9160.0
2,5,1974,58.0,2,6.0,1.0,2670.0


In [112]:
# df['indus10'].value_counts()

In [108]:
# df

In [113]:
# banker
banker = df['indus10'] == 6870
df[banker]

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
3,6,1974,30.0,1,5.0,4.0,6870.0
33,44,1974,54.0,2,4.0,1.0,6870.0
45,56,1974,59.0,1,5.0,0.0,6870.0
91,118,1974,28.0,2,4.0,1.0,6870.0
106,135,1974,30.0,2,4.0,2.0,6870.0
...,...,...,...,...,...,...,...
48922,2472,2016,26.0,1,5.0,7.0,6870.0
49077,2641,2016,71.0,2,6.0,4.0,6870.0
49192,2765,2016,24.0,2,4.0,5.0,6870.0
49252,2830,2016,62.0,1,5.0,1.0,6870.0


In [4]:
prob_bankers = df[banker].shape[0]/df.shape[0]*100

print(f"Prob of be a banker: {round(prob_bankers,2)} %")

Prob of be a banker: 1.48 %


In [5]:
df['sex'].value_counts(1)

2    0.537858
1    0.462142
Name: sex, dtype: float64

In [121]:
df['polviews'].value_counts(1)

4.0    0.384317
5.0    0.161087
6.0    0.148489
3.0    0.126659
2.0    0.117833
7.0    0.032360
1.0    0.029255
Name: polviews, dtype: float64

In [116]:
# chance of being a liberal person assuming liberals <= 3

liberal = df['polviews'] <= 3
round(df[liberal].shape[0]/df.shape[0],2 )*100

27.0

In [8]:
df['partyid']

0        2.0
1        0.0
2        1.0
3        4.0
4        4.0
        ... 
49285    0.0
49286    7.0
49287    5.0
49288    5.0
49289    3.0
Name: partyid, Length: 49290, dtype: float64

In [9]:
# democrat <= 1

df[df['partyid'] <= 1].shape[0]/df.shape[0]

0.3662609048488537

### Conjunction

"Conjunction" is another name for the logical `and` operation. If you have two propositions, `A` and `B`, the conjunction `A` and `B` is True if both `A and B` are True, and False otherwise.


In [10]:
# prob of be a banker person and demograt

In [10]:
democrat = df['partyid'] <= 1
df[(banker) & (democrat)].shape[0]/df.shape[0]

0.004686548995739501

conjunction is commutative: prob (A & B) = prob (B & A)

In [11]:
df[(democrat) & (banker)].shape[0]/df.shape[0]

0.004686548995739501

In [126]:
banker = df['indus10'] == 6870
df_banker = df[banker]

In [128]:
df_banker.shape[0]

728

In [129]:
df_banker_dem = df_banker[df_banker['partyid'] <= 1]

In [132]:
df_banker_dem.shape[0]/df.shape[0]

0.004686548995739501

In [152]:
%time df[(democrat) & (banker)].shape[0]/df.shape[0]

CPU times: user 676 µs, sys: 17 µs, total: 693 µs
Wall time: 698 µs


0.004686548995739501

### Conditional Probability

Conditional probability is a probability that depends on a condition, but that might not be the most helpful definition. 

Here are some examples:

* What is the probability that a respondent is a Democrat, given that they are liberal?

* What is the probability that a respondent is female, given that they are a banker?

* What is the probability that a respondent is liberal, given that they are female?

Let's start with the first one, which we can interpret like this: "Of all the respondents who are liberal, what fraction are Democrats?"

We can compute this probability in two steps:

1. Select all respondents who are liberal.

2. Compute the fraction of the selected respondents who are Democrats.

To select liberal respondents, we can use the bracket operator, `[]`, like this:



In [163]:
democrat = df['partyid'] <= 1
liberal = df['polviews'] <= 3

In [164]:
df_lib = df[liberal]

In [57]:
df_lib[df_lib['partyid'] <= 1].shape[0]/df_lib.shape[0]

0.5206403320240125

A little more than half of liberals are Democrats. If that result is lower than you expected, keep in mind:

1. We used a somewhat strict definition of “Democrat”, excluding independents who “lean” Democratic.

2. The dataset includes respondents as far back as 1974; in the early part of this interval, there was less alignment between political views and party affiliation, compared to the present.


Let's try the second example, "What is the probability that a respondent is female, given that they are a banker?". 

We can interpret that to mean, "Of all respondents who are bankers, what fraction are female?"

In [63]:
banker = df['indus10'] == 6870
df_banker = df[banker]

In [64]:
df_banker[df_banker['sex'] == 2].shape[0]/df_banker.shape[0]

0.7706043956043956

In [166]:
df_banker[df_banker['sex'] == 1].shape[0]/df_banker.shape[0]

0.22939560439560439

In [178]:
# wrap the conditional in function such as: 

def prob(A):
    """Computes the probability of a proposition, A."""
    return A.mean()


def conditional(proposition, given):
    return prob(proposition[given])


In [179]:
female = (df['sex'] == 2)
liberal = (df['polviews'] <= 3)
conditional(liberal, given=female)

0.27581004111500884

In [74]:
conditional(female, given=liberal)

0.5419106203216483

### Conditional Probability Is Not Commutative

We have seen that conjunction is commutative; that is, `prob(A & B)` is always equal to `prob(B & A)`.

But conditional probability is not commutative; that is, `conditional(A, B)` is not the same as `conditional(B, A)`.


In [75]:
conditional(female, given=banker)

0.7706043956043956

In [76]:
conditional(banker, given=female)

0.02116102749801969

In [180]:
male = (df['sex'] == 1)
conditional(banker, given=male)

0.007331313929496466

In [None]:
# 1 % homens sao operadores de caixa
# 2 % das mulheres sao operadoras de caixa

In [181]:
conditional(female, given=banker)

0.7706043956043956

In [182]:
conditional(male, given=banker)

0.22939560439560439

### Condition and Conjunction


We can combine conditional probability and conjunction. 

For example, here’s the probability a respondent is female, given that they are a liberal Democrat:

In [187]:
liberal = (df['polviews'] <= 3)
democrat = (df['partyid'] <= 1)

conditional(female, given=liberal & democrat)

0.576085409252669

About 57% of liberal Democrats are female.

In [188]:
conditional(liberal & female, given=banker)

0.17307692307692307

In [189]:
conditional(liberal, given=banker & female)

0.22459893048128343

About 17% of bankers are liberal women.

## Laws of Probability


Lets derive three relationships between conjunction and conditional probability:


* Theorem 1: Using a conjunction to compute a conditional probability.
* Theorem 2: Using a conditional probability to compute a conjunction.
* Theorem 3: Using `conditional(A, B)` to compute `conditional(B, A)`.


Theorem 3 is also known as Bayes’s theorem.

Lets write these theorems using mathematical notation for probability:

* $P(A)$ is the probability of proposition $A$;
* $P(A \ \textrm{and} \ B)$ is the probability of the conjunction of $A$ and $B$, that is, the probability that both are true.
* $P(A|B)$ is the conditional probability of $A$ given that $B$ is true. The vertical line between $A$ and $B$ is pronounced "given".


With that, we are ready for Theorem 1.

### Theorem 1

What fraction of bankers are female? We have already seen one way to compute the answer:

1. Use the bracket operator to select the bankers, then
2. Use `mean` to compute the fraction of bankers who are female.

We can write these steps like this:

In [80]:
female[banker].mean()

0.7706043956043956

Or we can use the `conditional` function, which does the same thing:

In [81]:
conditional(female, given=banker)

0.7706043956043956

But there is another way to compute this conditional probability, by computing the ratio of two probabilities:

1. The fraction of respondents who are female bankers, and
2. The fraction of respondents who are bankers.

In other words: of all the bankers, what fraction are female bankers? Here’s how we compute this ratio:

In [191]:
prob(female & banker)

0.011381618989653074

In [192]:
prob(banker)

0.014769730168391155

In [82]:
prob(female & banker) / prob(banker)

0.7706043956043956

The result is the same. This example demonstrates a general rule that relates conditional probability and conjunction. Here's what it looks like in math notation:

$$
P(A|B) = \frac{P(A \ \textrm{and} \ B)}{P(B)}
$$

And that's Theorem 1.

### Theorem 2

If we start with Theorem 1 and multiply both sides by $P(B)$ we get Theorem 2: 

$$
P(A \ \textrm{and} \ B) = P(B)P(A|B)
$$


This formula suggests a second way to compute a conjunction: instead of using the `&` operator, we can compute the product of two probabilities.

Let's see if it works for `liberal` and `democrat`. Here's the result using `&`:


In [83]:
prob(liberal & democrat)

0.1425238385067965

And here's the result using Theorem 2:

In [84]:
prob(democrat) * conditional(liberal, democrat)

0.1425238385067965

### Theorem 3

We have established that conjunction is commutative. In math notation, that means:

$$
P(A \ \textrm{and} \ B) = P(B \ \textrm{and} \ A)
$$

If we apply Theorem 2 to both sides, we have:

$$
P(B)P(A|B) = P(A)P(B|A)
$$

Here's one way to interpret that: if you want to check $A$ and $B$, you can do it in either order:

1. You can check $B$ first, then $A$ conditioned on $B$, or
2. You can check $A$ first, then $B$ conditioned on $A$.

If we divide through by $P(B)$, we get Theorem 3:

$$
P(A|B) = \frac{P(A)P(B|A)}{P(B)}
$$


<p style="color:red;">And that, my friends, is Bayes's theorem.</p>

To see how it works, let's compute the fraction of bankers who are liberal, first using `conditional`:

In [85]:
liberal = df['polviews'] <= 3
banker = df['indus10'] == 6870
conditional(liberal, given=banker)

0.2239010989010989

Now using Bayes's theorem:

In [87]:
prob(liberal) * conditional(banker, liberal) / prob(banker)

0.2239010989010989

In [193]:
conditional(liberal, given=banker)

0.2239010989010989

In [195]:
df.drop(['sex'], axis=1)

Unnamed: 0,caseid,year,age,polviews,partyid,indus10
0,1,1974,21.0,4.0,2.0,4970.0
1,2,1974,41.0,5.0,0.0,9160.0
2,5,1974,58.0,6.0,1.0,2670.0
3,6,1974,30.0,5.0,4.0,6870.0
4,7,1974,48.0,5.0,4.0,7860.0
...,...,...,...,...,...,...
49285,2863,2016,57.0,1.0,0.0,7490.0
49286,2864,2016,77.0,6.0,7.0,3590.0
49287,2865,2016,87.0,4.0,5.0,770.0
49288,2866,2016,55.0,5.0,5.0,8680.0


In [196]:
conditional(female, given=liberal)

0.5419106203216483

## The Law of Total Probability


In addition to these three theorems, there’s one more thing we’ll need to do Bayesian statistics: the law of total probability. Here’s one form of the law, expressed in mathematical notation:

$$
P(A) = P(B_1 \ \textrm{and} \ A) + P(B_2 \ \textrm{and} \ A)
$$

In words, the total probability of $A$ is the sum of two possibilities: either $B_1$ and $A$ are true or $B_2$ and $A$ are true. 

But this law applies only if $B_1$ and $B_2$ are:

* Mutually exclusive, which means that only one of them can be true, and
* Collectively exhaustive, which means that one of them must be true.


As an example, let's use this law to compute the probability that a respondent is a banker. 
We can compute it directly like this:

In [88]:
prob(banker)

0.014769730168391155

So let's confirm that we get the same thing if we compute male and female bankers separately.

We already have a Boolean `Series` that is `True` for female respondents. Here's the complementary `Series` for male respondents:



In [89]:
male = df['sex'] == 1

Now we can compute the total probability of `banker` like this:

In [91]:
prob(male & banker) + prob(female & banker)

0.014769730168391155

Because `male` and `female` are mutually exclusive and collectively exhaustive (MECE), we get the same result we got by computing the probability of `banker` directly.

Applying Theorem 2, we can also write the law of total probability like this:


$$
P(A) = P(B_1)P(A|B_1) + P(B_2)P(A|B_2)
$$




In [95]:
prob(male) * conditional(banker, given=male) + prob(female) * conditional(banker, given=female)

0.014769730168391153

When there are more than two conditions, it is more concise to write the law of total probability as a summation:

$$
P(A) = \sum_i P(B_i)P(A|B_i)
$$

Again, this holds as long as the conditions $B_i$ are mutually exclusive and collectively exhaustive. As an example, let's consider `polviews`, which has seven different values:

In [202]:
B = df['polviews']
B.value_counts(1).sort_index()

1.0    0.029255
2.0    0.117833
3.0    0.126659
4.0    0.384317
5.0    0.161087
6.0    0.148489
7.0    0.032360
Name: polviews, dtype: float64

On this scale, 4.0 represents "Moderate". So we can compute the probability of a moderate banker like this:

In [203]:
i = 4
prob(B==i) * conditional(banker, B==i)

0.005822682085615744

In [204]:
sum(prob(B==i) * conditional(banker, B==i) for i in range(1, 8))

0.014769730168391157

In [205]:
 sum(prob(B==i) * conditional(banker, B==i) for i in range(1, 8))

0.014769730168391157

In this example, using the law of total probability is a lot more work than computing the probability directly, but it will turn out to be useful.


# Summary 


Here’s what we have so far:

**Theorem 1** gives us a way to compute a conditional probability using a conjunction:


$$
P(A|B) = \frac{P(A \ \textrm{and} \ B)}{P(B)}
$$



**Theorem 2** gives us a way to compute a conjunction using a conditional probability:

$$
P(A \ \textrm{and} \ B) = P(B)P(A|B)
$$

**Theorem 3**, also known as Bayes's theorem, gives us a way to get from $P(A|B)$  to $P(B|A)$, or the other way around:

$$
P(A|B) = \frac{P(A)P(B|A)}{P(B)}
$$


**The Law of Total Probability** provides a way to compute probabilities by adding up the pieces:


$$
P(A) = \sum_i P(B_i)P(A|B_i)
$$



At this point you might ask, “So what?” If we have all of the data, we can compute any probability we want, any conjunction, or any conditional probability, just by counting. We don’t have to use these formulas.

And you are right, if we have all of the data. But often we don’t, and in that case, these formulas can be pretty useful—especially Bayes’s theorem. In the next chapter, we’ll see how.