In [1]:
# Example 5.1.1.
# Define a random variable X to represent the outcome flipping a
# fair coin, where this variable can take on two possible values
# (h or t) representing heads or tails. What is the distribution
# over X?

# "A RANDOM VARIABLE (or RV) is a variable that represents all
# the possible events in some RANDOM VARIABLE partition of the
# sample space." So p(X) => p(h) = 0.5 and p(t) = 0.5.
#
# The important point is that P(X) isn't a probabibility like
# P(h) o P(t), but a distribution, therefore all values of P(X)
# should add up to 1.

In [2]:
# Example 5.1.2.
# Suppose I roll a fair six-sided die, with one face colored red,
# two colored blue, and three colored green. Let X be the color
# I get when I roll. What is P(X)?

# P(X) => P(r) = 1/6, P(b) = 2/6 and P(g) = 3/6

In [3]:
# Example 5.2.1.
# Suppose we flip a fair coin several times in a row until we get
# a head. Let X be the total number of flips. What is the
# distribution over X?

# Notice the "until", from which I understand that only the last
# flip should be heads. In the case of X = 1, the chance of
# getting a head is 1/2. In the case of X = 2, we need tails
# and then heads, so 1/2 * 1/2. For X = 3, 1/2 * 1/2 * 1/2.
# More generally, P(X) = 1/2 ** X.
#
# The text actually expresses it like this:
# P(X = n) = 1/2 ** n

In [4]:
# Example 5.2.2.
# Suppose we perform the same type of experiment as in the
# previous example, but our coin isn’t fair: the probability of
# getting a head is some value p. Now what is the distribution
# over X, the number of flips required to get a head?

# Lets give p a value and see what happens. P(H) = 1/3, therefore
# P(T) = 2/3.
# P(X = 1) = 1/3
# P(X = 2) = 2/3 * 1/3
# P(X = 3) = 2/3 * 2/3 * 1/3 ... etc
# P(X = n) = (2/3 ** n-1) * 1/3
#
# Even more general, as the text points out:
# P(X = n) = ((1 - p) ** n-1) * p

In [5]:
# Example 5.4.1.
# Rewrite the JPT from Example 4.4.2 using random variable
# notation, with X representing the color of the marble,
# and Y representing the pattern (solid or patchy).

# First, lets recap Example 4.4.2, then we'll modify it.
#
# 4.4.2 JPT:
#
#    R     G     B
# S  1/3   1/5 _ 2/15
# P  2/15  2/15  1/15
#
# Rewrite using RV notation:
#
#       X=R    X=G    X=B
# Y=S   1/3    1/5    2/15
# Y=P   2/15   2/15   1/15

In [6]:
# Example 5.4.2.
# Are the RVs X and Y from Example 5.4.1 independent?

# First, lets write down the rule that must hold for
# independent RVs. This must hold for all values of X
# and Y.
#            P(X=x, Y=y) = P(X=x)P(Y=y)
#
# The next step is to find the marginal probabilities
# for the values of X and Y. Lets rewrite the JPT
# from 5.4.2 with the marginal values.
#
#       X=R    X=G    X=B
# Y=S   5/15   3/15   2/15  |  10/15
# Y=P   2/15   2/15   1/15  |   5/15
#       --------------------
#       7/15   5/15   3/15
#
# Now, lets see cell by cell if the rule holds.
#
# P(Y=S,X=R) = 5/15 != 7/15 * 10/15 = 14/45
#
# That's it. We need to test no further. They are not
# independent RVs.

In [7]:
# Example 5.5.1.
# Returning to our marble example, assume we start with
# the same jar of marbles as in 5.4.1, but we add some
# new marbles to the jar. These new marbles have the
# same joint distribution over X (color) and Y (pattern)
# as the existing ones, but there are twice as many new
# marbles as old ones, and the new ones are a larger size.
# Let Z be the size of a marble (small or large). If we
# pull a marble out uniformly at random, what are
# P(X, Y | Z=small) and P(X, Y | Z=large)? What is
# P(X, Y, Z)?

# Z=small are the marbles we had before, so P(X,Y|Z=small)
# is the same JPT as we have in exercise 5.4.1.
#
# The problem also states that "the new marbles" have the
# same joint distribution over X (color) and Y (pattern),
# so P(X, Y | Z=large) should have the same JPT as
# P(X, Y | z=small).
#
# This is a bit trickier because I'd need a 3D JPT to
# represent the problem. I'll just write two JPTs,
# one of Z=small and another for Z=large. It's worth
# pointing out that since there are twice as many
# large balls are small balls, P(Z=small) = 1/3 and
# P(Z=large) = 2/3. So the cells in the JPTs should
# add up to 1/3 and 2/3 respectively.
#
# JPT Z=small:
#
#       X=R        X=G        X=B
# Y=S   5/15*1/3   3/15*1/3   2/15*1/3 
# Y=P   2/15*1/3   2/15*1/3   1/15*1/3 
#
#       X=R    X=G    X=B
# Y=S   1/9    1/15   2/45
# Y=P   2/45   2/45   1/45 
#
#
# JPT Z=large:
#
#       X=R        X=G        X=B
# Y=S   5/15*2/3   3/15*2/3   2/15*2/3 
# Y=P   2/15*2/3   2/15*2/3   1/15*2/3 
#
#       X=R    X=G    X=B
# Y=S   2/9    2/15   4/45
# Y=P   4/45   4/45   2/45
#
# The text mentions that one can use the product rule
# to calculate these values, although it hadn't
# mentioned the case when there are more than two
# variables. It states that the formula is the
# following:
#
# P(X=x, Y=y, Z=z) = P(X=x,Y=y | Z=z) * P(Z=z)
#
# It doesn't show the actual calculation, so
# I don't really know how to go about it.

In [8]:
# Example 5.5.2.
# Let X and Y be variables representing whether I stay up
# late and whether I show up on time for my 9 a.m. class.
# Let Z represent whether I left my house on time.
# Intuitively, are X and Y independent?
# Are they conditionally independent given Z?

# No numbers are given, so assumptions are in order.
# Probably X and Y are related, If I stay up late
# I'm less likely to arrive in time for class next
# morning. But the question is if X, Y are conditionally
# independent given Z. If I stayed up late doesn't
# really matter if I managed to get up and leave on time,
# so I imagine that X and Y are conditionally independent
# given Z.

In [9]:
# Example 5.5.3.
# Suppose we are studying two closely related species of
# birds, the azure-breasted nuthatch (species a) and the
# blue-throated nuthatch (species b). Unfortunately, it
# isn’t possible to determine with complete accuracy
# which species a particular individual belongs to just
# by looking at it. However, certain features are more
# common among one or the other species. In particular,
# 70% of individuals in species a have red eyes, while
# only 20% of those in species b do. On the other hand,
# 20% of a individuals have yellow feet, while 40% of b
# individuals do. While bird-watching in an area in
# which the azure-breasted and blue-throated nuthatches
# are equally common, we see a bird with red eyes and
# yellow feet. Assuming that the eye color and foot
# color are conditionally independent given the species
# of bird, what is the probability that this bird is
# from species a?

# Ok, so we have three variables: species a or b, which
# will call P(S=a or ¬a); eye color which we define as
# P(E=r or ¬n); and P(F=y or ¬y). We also have the
# following information:
#
#     P(r|a) = 7/10, which means that P(¬r|a) = 3/10
#     P(r|b) = 2/10,                  P(¬r|b) = 8/10
#     P(y|a) = 2/10,                  P(¬y|a) = 8/10
#     P(y|b) = 4/10,                  P(¬y|b) = 6/10
#     P(a) = P(b) = 0.5, or           P(S) = 0.5
#
# Note, that they ask us to assume that E, F are
# independent, so they are probably pointing us
# to Naives Bayes. The question is: what is the
# probability of the bird being from species a
# if it has red eyes and yellow feet? In other
# words P(a|r,y).
#
# Bayes' rule for three variables is:
#
#     P(x|y,z) = ( P(y|x,z) * P(x|z) ) / P(y|z)
#
# So we say:
#
#     P(a|r,y) = ( P(r|a,y) * P(a|y) ) / P(r|y).
#
# But that doesn't look useful because we haven't
# decomposed the equation to values we know, so
# lets try with the alternative formulation of the
# Bayes rule, that treats all variables connected
# by "," (the ones expressed as intersection) as
# single random variables:
#
#     P(x|y,z) = ( P(y,z|x) * P(x) ) / P(y,z)
#
# So
#
#     P(a|r,y) = ( P(r,y|a) * P(a) ) / P(r,y)
#
# Remember we can treat intersected variables as a
# single random variable, so using the law of total
# probability we can decompose the denominator as
# follows:
#
#     P(a|r,y) = P(r,y|a) * P(a)
#         / P(r,y|a) * P(a) + P(r,y|b) * P(b)
#
# Here I'm at a loss. The text says we should
# apply the definition of conditional independence.
# Se we can get the following applying it on the
# numerator:
#
#     P(a|r,y) = P(r|a) * P(y|a) * P(a)
#         / P(r,y|a) * P(a) + P(r,y|b) * P(b)
#
# And then on the denominator:
#
#     P(a|r,y) = P(r|a) * P(y|a) * P(a)
#         / P(r|a) * P(y|a) * P(a) + P(r|b) * P(y|b) * P(b)
#
# Now we can substitute notation for actual values:
#
#     P(a|r,y) = 7/10 * 2/10 * 5/10
#         / 7/10 * 2/10 * 5/10 + 2/10 * 4/10 * 5/10
#
#     P(a|r,y) = 7/100
#         / 7/100 + 4/100
#
#     P(a|r,y) = 7/100 / 11/100 = 7/11 = 0.63