# Probability
## Generating independent and dependent random variables
In this assignment we will discuss how to generate independent random variables in Python to understand these notions deeper. This assignment is partly automatically graded (PMF calculations) and partly peer review graded. Please, submit this assignment using "Submit" button to automatic grading, then download ipynb file and submit to peer review grading (in corresponding course element).

### Preliminaries
We need this function to test our generators.

In [71]:
def count_frequencies(data, relative=False):
    counter = {}
    for element in data:
        if element not in counter:
            counter[element] = 1
        else:
            counter[element] += 1
    if relative:
        for element in counter:
            counter[element] /= len(data)
    return counter

### Independent random variables: PMF calculation
Consider random variables $X$ and $Y$. Assume that $X$ takes values $x_1, \ldots, x_n$ with probabilities $p_1, \ldots, p_n$ and $Y$ takes values $y_1, \ldots, y_m$ with probabilities $q_1, \ldots, q_m$. Assume that $X$ and $Y$ are independent. Implement function `joint_pmf(xvalues, xprobs, yvalues, yprobs)` that takes an array of values $x_1, \ldots, x_n$ as `xvalues`, an array of probabilities $p_1, \ldots, p_n$ as  `xprobs` and the same with `yvalues` and `yprobs`. The function should return a dictionary which keys are tuples `(x, y)` where `x` is some value $x_i$ and `y` is $y_j$ and corresponding values are values of joint probability mass function $pmf_{X, Y}(x_i, y_j)$.

In [72]:
def joint_pmf(xvalues, xprobs, yvalues, yprobs):
    # your code here
    
    # create a dictionary
    pmf_dict = dict()
    
    # calculate the joint probability distribution
    # and add them in the dictionary
    for i in range(len(xvalues)):
        for j in range(len(yvalues)):
            pmf_dict[(xvalues[i], yvalues[j])] = xprobs[i] * yprobs[j]
            
    # print(pmf_dict)
    
    # return the dictionary
    return pmf_dict
    

In [73]:
testdata = [([1], [1], [2, 3], [0.2, 0.8]),
            ([1, 2], [0.5, 0.5], [3, 4, 5], [0.3, 0.3, 0.4])]
answers = [{(1, 2): 0.2, (1, 3): 0.8},
           {(1, 3): 0.15,
            (1, 4): 0.15,
            (1, 5): 0.2,
            (2, 3): 0.15,
            (2, 4): 0.15,
            (2, 5): 0.2}]
for data, answer in zip(testdata, answers):
    assert joint_pmf(*data) == answer

In [74]:
# self check
a = {(1, 2): 0.2, (1, 3): 0.8}, {(1, 3): 0.15,(1, 4): 0.15,(1, 5): 0.2,(2, 3): 0.15,(2, 4): 0.15,(2, 5): 0.2}

b = {(1, 2): 0.2, (1, 3): 0.8}, {(1, 3): 0.15, (1, 4): 0.15, (1, 5): 0.2, (2, 3): 0.15, (2, 4): 0.15, (2, 5): 0.2}

print(a == b)

True


### Independent random variables: generation
Implement function `indep_choice(xvalues, xprobs, yvalues, yprobs)` that samples value `x` from random variable $X$ (here `xvalues` is an array of values $x_1, \ldots, x_n$ and `xprobs` is an array of probabilities $p_1, \ldots, p_n$) and value `y` from random variable $Y$ (here `yvalues` is an array of values $y_1, \ldots, y_m$ and `yprobs` is an array of probabilities $q_1, \ldots, q_m$) and returns a tuple `(x, y)`. Use `numpy.choice` in each case. 

In [75]:
from numpy.random import choice

def indep_choice(xvalues, xprobs, yvalues, yprobs):
    # your code here
    
    # define the random choice
    x = choice(xvalues, p=xprobs)
    y = choice(yvalues, p=yprobs)
    
    # return thhe selected data
    return (x, y)    
    

Now let us generate a large sample of these values and compare relative frequencies of each combination with corresponding value of PMF.

In [76]:
xvalues = [0, 1, 2]
xprobs = [0.2, 0.5, 0.3]

yvalues = [5, 6]
yprobs = [0.4, 0.6]

size = 10000

sample = [indep_choice(xvalues, xprobs, yvalues, yprobs) 
          for _ in range(size)] 

def print_sorted_keys(dictionary):
    for k in sorted(dictionary):
        print(f"{k}: {dictionary[k]}")

print("Obtained relative frequencies")
print_sorted_keys(count_frequencies(sample, relative=True))

print("\nValues of probability mass function")
print_sorted_keys(joint_pmf(xvalues, xprobs, yvalues, yprobs))

Obtained relative frequencies
(0, 5): 0.0798
(0, 6): 0.1197
(1, 5): 0.1984
(1, 6): 0.3085
(2, 5): 0.1184
(2, 6): 0.1752

Values of probability mass function
(0, 5): 0.08000000000000002
(0, 6): 0.12
(1, 5): 0.2
(1, 6): 0.3
(2, 5): 0.12
(2, 6): 0.18


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.

### Dependent random variables: probability mass function
Consider system $(X, Y)$ of random variables, defined in the following way. Let $X$ be Bernoulli random variable with parameter $p$, i.e. random variable that takes value 1 with probability $p$ and value $0$ with probability $1-p$. Assume also that $Y$ takes values 0 and 1 as well, and $P(Y=1\mid X = 0) = q_0$ and $P(Y=1 \mid X = 1) = q_1$. Implement function `dependent_bernoulli_pmf(p, q0, q1)` that generates dictionary with joint probability mass function (like in the first problem).

In [77]:
# your code here

def dependent_bernoulli_pmf(p, q0, q1):
    
    # define a dictionary keys
    dict_key = [(0, 0), (0, 1), (1, 0), (1, 1)]
    
    
    # define the probabilities
    # conditional probability was used to find the value
    probs = [(1-p)*(1-q0), q0*(1-p), p*(1-q1), q1*p]

    # create a dictionary
    pmf_dict = dict()
    
    for i in range(len(dict_key)):
        pmf_dict[dict_key[i]] = probs[i]
    
    # return the dictionary
    return pmf_dict


In [78]:
# self check

dependent_bernoulli_pmf(0.25, 0.125, 0.25)

{(0, 0): 0.65625, (0, 1): 0.09375, (1, 0): 0.1875, (1, 1): 0.0625}

In [79]:
assert dependent_bernoulli_pmf(0.25, 0.125, 0.25) == {(0, 0): 0.65625, 
                                                      (0, 1): 0.09375, 
                                                      (1, 0): 0.1875, 
                                                      (1, 1): 0.0625}

### Dependent random variables: generation

Implement function `dependent_bernoulli(p, q0, q1)` that generates a pair `(x, y)` that is a sample from a system $(X, Y)$ of random variables, described above.

In [80]:
# your code here
import random


def dependent_bernoulli(p, q0, q1):
    
    # define dictionary
    pmf_dict = dependent_bernoulli_pmf(p, q0, q1)
    
    # separate keys and values
    x_y  = [key for key in pmf_dict.keys()]
    probs = [prob for prob in pmf_dict.values()]
    
    # select a sample
    sample = random.choices(x_y, weights=probs)
    
    # return the first element
    return sample[0]
    

In [82]:
# self check
dependent_bernoulli(0.25, 0.125, 0.25)

(0, 0)

In [83]:
def test_dependent(p, q0, q1, size):
    sample = [dependent_bernoulli(p, q0, q1) for _ in range(size)]

    print("Obtained relative frequencies")
    print_sorted_keys(count_frequencies(sample, relative=True))

    print("\nValues of probability mass function")
    print_sorted_keys(dependent_bernoulli_pmf(p, q0, q1))
    
test_dependent(0.25, 0.125, 0.25, 10000)

Obtained relative frequencies
(0, 0): 0.6493
(0, 1): 0.0983
(1, 0): 0.1896
(1, 1): 0.0628

Values of probability mass function
(0, 0): 0.65625
(0, 1): 0.09375
(1, 0): 0.1875
(1, 1): 0.0625


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.

In [84]:
test_dependent(0.5, 0.125, 0.75, 10000)

Obtained relative frequencies
(0, 0): 0.4357
(0, 1): 0.068
(1, 0): 0.1177
(1, 1): 0.3786

Values of probability mass function
(0, 0): 0.4375
(0, 1): 0.0625
(1, 0): 0.125
(1, 1): 0.375


**Peer review grading:** Values of obtained frequencies should be close to values of PMF.