# Introduction to probability: exercises

This notebook contains exercises that build on the first lecture on probability.

## Exercise: A bag with *n* balls

A bag has 20 balls, 15 blue and 5 red. A person randomly gets two balls from the bag without looking (one ball in left and one in right hand) and writes down the colors.

1) What is one possible outcome?
2) What is the sample space?
3) If the event is that the ball in the right hand is blue, what possible outcomes correspond to that?
4) Write a function the will simulate this experiment and estimate the probability that the ball in the right hand is blue.

In [11]:
# 1. Possible outcome: (blue, blue) - picked up blue balls with both hands
# 2. Sample space: {(blue, blue), (blue, red), (red, blue), (red, red)} - all possible outcomes
# 3. Possible outcomes for the event: {(blue, blue), (red, blue)}

# 4. Simulation to estimate the event probability:

import random

def get_two_balls():
    bag = ['blue'] * 15 + ['red'] * 5
    return random.sample(bag, 2)

def estimate_probability_of_event3(num_experiments: int = 1000):
    event_count = 0
    for _ in range(num_experiments):
        outcome = get_two_balls()
        if outcome[1] == 'blue': 
            event_count += 1

    return event_count / num_experiments

estimate_probability_of_event3()
    

0.761

## Exercise: DNA example

A DNA sequence is generated with four nucleotides (A, C, T, G) with equal probabilities. What is the probability of observing the word GATAAG or a single nucleotide substitution of it (e.g., GAGAAG) at any position?

Write a function that simulates this experiment and computes the probability of the given word.

In [9]:
import random

def get_nucleotide():
    return random.choice(['A', 'C', 'G', 'T'])

def compute_sequence_probability(sequence: str, num_experiments: int):
    successes = 0

    for _ in range(num_experiments):
        simulated_sequence = [get_nucleotide() for _ in range(len(sequence))]
        mismatches = sum(letter1 != letter2 
                         for letter1, letter2 in zip(simulated_sequence, sequence))
        successes += mismatches <= 1

    return successes / num_experiments

compute_sequence_probability("GATAAG", 10000)

0.0049

## Exercise: Conditional probability

In the experiment with two rolls of a die, the sum of the two rolls is 9. How likely is it that the first roll was 6?

Write a function that simulates this experiment and computes the desired conditional probability. The sum of the rolls and the condition of the value for the first roll should be input arguments for the function.

In [1]:
import random

def roll_two_dice():
    return [random.choice(list(range(1,7))) for _ in range(2)]

def simulate_experiment(num_experiments: int = 1000, roll_sum: int = 9, first_roll: int = 6):
    outcome_of_interest = 0
    first_roll_count = 0
    
    for _ in range(num_experiments):
        rolls = roll_two_dice()
        if sum(rolls) == roll_sum:
            outcome_of_interest += 1
            if rolls[0] == first_roll:
                first_roll_count += 1

    return first_roll_count / outcome_of_interest

simulate_experiment()

0.24770642201834864

## Exercise: Disease test 

A test for a disease is correct 95% of the time: if the person has a disease, the test results are positive with probability 0.95, and if the person doesn't have the disease, the test results are negative with probability 0.95. A random person tested for a disease has probability of 0.001 of having a disease. If a person test positive, what is the probability of having the disease?

Write a function that simulates this problem and estimate the probability of having disease.

In [28]:
def estimate_disease_probability(num_experiments: int = 10000, disease_prob: float = 0.001,
                                test_correct_prob: float = 0.95):
    positive_test_count = 0
    positive_test_with_disease = 0

    for _ in range(num_experiments):
        has_disease = random.random() < disease_prob

        if has_disease:
            test_positive = random.random() < test_correct_prob
        else:
            test_positive = random.random() < (1 - test_correct_prob)

        if test_positive:
            positive_test_count += 1
            if has_disease:
                positive_test_with_disease += 1

    return positive_test_with_disease / positive_test_count

estimate_disease_probability()

0.01953125

## Exercise: The Monty Hall problem

A contestant on a game show is told that a prize is equally likely to be found behind one of the three closed doors in front of them. The contestant picks a door. The speaker on the show opens one of the remaining two doors after making sure that the prize is not behind it. The contestant can now either choose to stay with the initial choice of the door or switch to the other unopened door. The contenstant wins the prize if it is behind the final choice of the door. What is the probability of winning if the contestant sticks to the initial choice? What if they always switch to the unopened door?

Write a function that will simulate this show and compare the probabilities of winning. What is the winning strategy?

In [22]:
def simulate_monty_hall(num_experiments: int=100):
    stay_wins = 0
    switch_wins = 0

    doors = [0, 1, 2]

    for _ in range(num_experiments):
        prize = random.choice(doors)
        choice = random.choice(doors)

        remaining_doors = [door for door in doors if door not in [choice, prize]]
        opened = random.choice(remaining_doors)

        switch_choice = [d for d in doors if d not in [choice, opened]][0]

        if choice == prize:
            stay_wins += 1
        else:
            switch_wins += 1

    return stay_wins / num_experiments, switch_wins / num_experiments

simulate_monty_hall()

(0.39, 0.61)