Skip to content

Commit

Permalink
probability: updates to chapter
Browse files Browse the repository at this point in the history
Co-authored-by: Dan Nicolae <nicolae@galton.uchicago.edu>
  • Loading branch information
jesteria and dnicolae committed Nov 16, 2022
1 parent 77647f1 commit 97bdd0f
Show file tree
Hide file tree
Showing 6 changed files with 432 additions and 479 deletions.
113 changes: 79 additions & 34 deletions textbook/11/1/Rules_Definitions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,54 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Definitions and Rules (DRAFT)\n",
"# Probability: Definitions and Rules \n",
"\n",
"The intention here is not to have a comprehensive introduction to probability, but just to provide a reminder of the basic definitions and rules. Every statistics textbook has a chapter on probability that is more complete than this section. We encourage the readers who have not encounter the concept of probability to find a good introductory chapter. We start with some basic definitions:\n",
"The intention here is not to have a comprehensive introduction to probability, but just to provide a reminder of the basic definitions and rules. Every statistics textbook has a chapter on probability that is more complete than this section. We encourage the readers who have not encounter the concept of probability to find a good introductory chapter, and we offer a suggestion/reference at the end of this section.\n",
"\n",
"**Random phenomenon**: where individual outcomes are uncertain; for example, the number of boys in 100 births in a Chicago hospital (outcome is uncertain as we do not know if the number of boys will be 50, or 40 or something else). \n",
"We start with some basic definitions illustrated on three examples:\n",
"\n",
"**Sample space, S**: the set of all possible outcomes of a phenomenon; in the above example, S is the set of integers from 0 to 100 (possible outcomes for the number of boys are 0, 1, 2 , ..., 100). \n",
"**Random phenomenon**: where individual outcomes are uncertain; for example:\n",
"1. Roll a die and record the outcome. We do not know before rolling the die what the outcome will be.\n",
"2. The number of boys in 100 births in a Chicago hospital; outcome is uncertain as we do not know if the number of boys will be 50, or 40 or something else. \n",
"3. The set of birthdays in a group of 30 people.\n",
"\n",
"**Event, A**: An outcome or a set of outcomes of a random phenomenon; for example, A is the event that less than half of the babies are boys. A is the set of integers from 0 to 49.\n",
"**Sample space (denote by S)**: the set of all possible outcomes of a phenomenon; in the above examples:\n",
"1. S is the set of integers from 1 to 6: S=$\\{1,2,3,4,5,6\\}$\n",
"2. S is the set of integers from 0 to 100 (possible outcomes for the number of boys are 0, 1, 2 , ..., 100). \n",
"3. S is the set of all possible combinations of 30 birthdates.\n",
"\n",
"**Mutually exclusive events**: Events $A$ and $B$ are mutually exclusive (or disjoint) if they have no outcomes in common. An example of that is B the event that the number of boys is between 60 and 70 and A is as above. \n",
"**Event (denoted by A or B here)**: An outcome or a set of outcomes of a random phenomenon; for example:\n",
"1. Rolling an even number: A=$\\{2,4,6\\}$.\n",
"2. A is the event that less than half of the babies are boys. A is the set of integers from 0 to 49.\n",
"3. Having at least two people sharing birthdays.\n",
"\n",
"**Complement of an event**: The complement of an event $A$ is the event that $A$ does not occur, denoted by $A^C$. For the event $A$ defined above, $A^C$ is the event that more than half of the babies are boys, or the set of integers from 50 to 100.\n",
"\n",
"<img align=\"center\" src=\"./img/complement.png\" width=\"400\"/> \n",
"**Mutually exclusive events**: Events $A$ and $B$ are mutually exclusive (or disjoint) if they have no outcomes in common. Examples:\n",
"1. A is as above and B is rolling a 3.\n",
"2. A is as above and B is the event that the number of boys is between 60 and 70. \n",
"3. A is as above and B is the event that there is a birthday to celebrate for every day in March."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Complement of an event**: The complement of an event $A$ is the event that $A$ does not occur, denoted by $A^C$. For the events $A$ defined above:\n",
"1. $A^C$ is rolling an odd number: $A^C=\\{1,3,5\\}$ \n",
"2. $A^C$ is the event that more than half of the babies are boys, or the set of integers from 50 to 100.\n",
"3. $A^C$ is the event when there are no shared birthdays.\n",
"\n",
"**Compound events**: Events built from combinations of other events."
"<img align=\"center\" src=\"./img/complement.png\" width=\"200\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Compound events**: Events built from combinations of other events.\n",
"\n",
"**Union:** ($A$ or $B$) = ($A\\cup B$): set of all outcomes in $A$, or in $B$, or in both.\n",
"\n",
"<img align=\"center\" src=\"./img/union.png\" width=\"400\"/>"
"<img align=\"center\" src=\"./img/union.png\" width=\"200\"/>"
]
},
{
Expand All @@ -38,7 +60,7 @@
"source": [
"**Intersection:** ($A$ and $B$) = ($A\\cap B$): set of all outcomes that are in $A$ and in $B$.\n",
"\n",
"<img align=\"center\" src=\"./img/intersection.png\" width=\"400\"/>\n"
"<img align=\"center\" src=\"./img/intersection.png\" width=\"200\"/>"
]
},
{
Expand All @@ -51,7 +73,7 @@
"- A list of possible outcomes (sample space)\n",
"- An assignment of probabilities $P$\n",
"\n",
"The **frequentist interpretation of the probability** of an event $A$, $\\mbox{P}(A)$, is the long run relative frequency of the event $A$. Suppose you are interested in the probability of \"Heads\" when tossing a coin. In this frequentist interpretation, probability is given by the limit of the relative frequency of \"Heads\" when tossing the coin repeatedly. Note that while you can imagine repeating the coin toss for a large number of times (and some people have done it!), there are other events where the intutition behind frequentists probabilities are not as evident. For example, what is the probability that it will rain next Sunday? This where the **Bayesian interpretation** of probability - based on a subjective degree of belief - is more natural. In the Bayesian workd, two people could have different viewpoints and assign different probabilities. \n",
"The **frequentist interpretation of the probability** of an event $A$, $\\mbox{P}(A)$, is the long run relative frequency of the event $A$. Suppose you are interested in the probability of \"Heads\" when tossing a coin. In this frequentist interpretation, probability is given by the limit of the relative frequency of \"Heads\" when tossing the coin repeatedly. Note that while you can imagine repeating the coin toss for a large number of times (and some people have done it!), there are other events where the intutition behind frequentists probabilities are not as evident. For example, what is the probability that it will rain next Sunday? This where the **Bayesian interpretation** of probability - based on a subjective degree of belief - is more natural. In the Bayesian world, two people could have different viewpoints and assign different probabilities. \n",
"\n",
"Note that the rules below are universal."
]
Expand All @@ -61,7 +83,6 @@
"metadata": {},
"source": [
"## Basic Probability Rules\n",
"\n",
"- $0 \\le \\mbox{P}(A) \\le 1$, for any event $A$\n",
"\n",
"- $\\mbox{P}(S) = 1$\n",
Expand All @@ -74,44 +95,54 @@
"- $\\mbox{P}(A \\cup B) = \\mbox{P}(A) +\n",
"\\mbox{P}(B) - \\mbox{P}(A \\cap B)$\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conditional Probability\n",
"\n",
"If $\\mbox{P}(A) \\ne 0$, the conditional probability of event $B$\n",
"given $A$ has occurred, denoted by $\\mbox{P}(B|A)$, is defined by,\n",
"$ \\mbox{P}(B|A) = \\frac{\\mbox{P}(A \\mbox{ and } B)}{\\mbox{P}(A)}$\n",
"\n",
"![](./img/conditionalprobability.png)\n",
"<img align=\"center\" src=\"./img/conditionalprobability.png\" width=\"600\"/>\n",
"\n",
"Example:\n",
"- Select one subject at random in US;\n",
"- B is the event that the subject read a book last week;\n",
"- A is the event that the subject is a college student;\n",
"- Consider P(B|A) versus P(B): the fraction of college students who read a book last week is likely different than the fraction of US population who did that.\n",
"\n",
"- Select one subject at random in US\n",
"- B is the event that the subject spent more than 2 hours on zoom last week\n",
"- A is the event that the subject is a college student\n",
"- P(B|A) versus P(B)"
"**Multiplication rule**: $\\mbox{P}(A \\mbox{ and } B) = \\mbox{P}(A|B) \\mbox{P}(B)$. Note that this follows directly from the definition of conditional probability."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## More Probability Rules\n",
"\n",
"**Conditional probability**: If $\\mbox{P}(B) \\ne 0$, the conditional probability of event $A$ given $B$ has occurred, denoted by $\\mbox{P}(A|B)$, is defined by,\n",
"$ \\mbox{P}(A|B) = \\frac{\\mbox{P}(A \\mbox{ and } B)}{\\mbox{P}(B)}$\n",
"## Independence\n",
"\n",
"**Multiplication rule**: $\\mbox{P}(A \\mbox{ and } B) = \\mbox{P}(A|B) \\mbox{P}(B)$\n",
"\n",
"**Independence**: Events $A$ and $B$ are independent if $\\mbox{P}(A|B) =\n",
"Events $A$ and $B$ are called independent if $\\mbox{P}(A|B) =\n",
"\\mbox{P}(A)$ (or equivalently, $\\mbox{P}(B|A) = \\mbox{P}(B)$)\n",
"\n",
"Equivalent condition for **independence**: \n",
"$\\mbox{P}(A \\mbox{ and } B) = \\mbox{P}(A) \\mbox{P}(B)$\n",
"\n",
"**The Bayes Rule**:\n",
"$\n",
"### The Bayes Theorem\n",
"\n",
"The following property follows directly from the definition of conditional independence and the multiplication rule:\n",
"\n",
"\\begin{eqnarray}\n",
"\\mbox{P}(A|B) & = & \\frac{\\mbox{P}(B|A) \\mbox{P}(A)}{\\mbox{P}(B)} \\nonumber\n",
"\\end{eqnarray}\n",
"$\n"
"\n",
"This is one of the most important rules in statistics and data science because it describes statistical learning, and provides a way to update a belief (probability) given additional evidence (data)."
]
},
{
Expand All @@ -120,21 +151,35 @@
"source": [
"## The solution to the birthday problem\n",
"\n",
"We will use the **equally likely outcomes** formula from above. Note that, for $n$ random subjects, the total number of outcomes (number of possible combination of birthdays) is \n",
"$365^n.$\n",
"We will use the **equally likely outcomes** formula from the Basic Probability Rules above. Note that, for $n$ random subjects, the total number of outcomes (number of possible combination of birthdays) is \n",
"\n",
"$$365^n.$$\n",
"\n",
"The number of outcomes that lead to a set of distinct birthdays is\n",
"$365\\times364\\times ...\\times (365-n+1)$\n",
"\n",
"$$365\\times364\\times ...\\times (365-n+1)$$\n",
"\n",
"and the intuition comes from the way we can count the total number of distinct birthdays as follows:\n",
"- suppose you look at people sequentially;\n",
"- first person can have any of the 365 birthdays without leading to mathched birthdays;\n",
"- the second can have any of birthdays except the one of the first person: so 364 possibilities;\n",
"- the $n$-th person can have any of birthdays except any of the (n-1) different birthdays of the other people: so (365-n+1) possibilities.\n",
"\n",
"So the probability of having $n$ distinct birtdays is:\n",
"$\\frac{365\\times364\\times ...\\times (365-n+1)}{365^n}$\n",
"\n",
"$$\n",
"\\frac{365\\times364\\times ...\\times (365-n+1)}{365^n}\n",
"$$\n",
"\n",
"The complement of this event is the event of interest (at least two people share birthdays) and so the probability of interest is:\n",
"$P_n ~=~ 1-\\frac{365\\times364\\times ...\\times (365-n+1)}{365^n}$\n"
"\n",
"$$\n",
"P_n ~=~ 1-\\frac{365\\times364\\times ...\\times (365-n+1)}{365^n}\n",
"$$\n",
"\n",
"**Reference.**\n",
"\n",
"1. OpenIntro Statistics (Chapter 3 on Probability). Available for download at https://www.openintro.org/book/os/."
]
}
],
Expand Down

0 comments on commit 97bdd0f

Please sign in to comment.