<a href="https://colab.research.google.com/github/taylorp-j/NGG6050/blob/main/09_11_24_Binomial_Distribution_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import random as rnd
import collections
import matplotlib.pyplot as plt
import time
import scipy.stats as st

from scipy.stats import bernoulli, binom, poisson, chi2
from IPython.display import clear_output
from operator import itemgetter
from statsmodels.stats import proportion

from numpy import matlib


**Exercise 1:** Assume that there are 10 quanta available in a nerve terminal, and for a given release event each is released with a probability of 0.2. For one such event, what is the probability that 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 quanta will be released?

In [10]:
#Parameters
p=0.2
n=10

probabilities = binom.pmf((np.arange(0,11,1)), n, p)

for k, prob in enumerate(probabilities):
  print(f'Probability of {k} quanta released: {prob:.4f}')


Probability of 0 quanta released: 0.1074
Probability of 1 quanta released: 0.2684
Probability of 2 quanta released: 0.3020
Probability of 3 quanta released: 0.2013
Probability of 4 quanta released: 0.0881
Probability of 5 quanta released: 0.0264
Probability of 6 quanta released: 0.0055
Probability of 7 quanta released: 0.0008
Probability of 8 quanta released: 0.0001
Probability of 9 quanta released: 0.0000
Probability of 10 quanta released: 0.0000


**Exercise 2:** Let's say you know that a given nerve terminal contains exactly 14 quanta available for release. You have read in the literature that the release probability of these quanta is low, say 0.1. To assess whether this value is reasonable, you run a simple experiment: activate the nerve and measure the number of quanta that are released. The result is 8 quanta. What is the probability that you would get this result (8 quanta) if the true probability of release really was 0.1? What about if the true release probability was much higher; say, 0.7? What about for each decile of release probability (0.1, 0.2, ... 1.0)? Which value of release probability did you determine to be the most probable, given your measurement?

Note: here you are computing a likelihood function: a function describing how the value of the conditional probability p(data | parameters) changes when you hold your data fixed to the value(s) you measured and vary the value(s) of the parameter(s) of, in this case, the binomial distribution. Because you are varying the parameters and not the data, the values of the function are not expected to sum to one (e.g., you can have numerous parameters that have a very high probability of producing the given data) and thus this function is not a probability distribution (see here for an extended discussion). The maximum value of this function is called the maximum likelihood.

In [20]:
#Probability of getting results if true probability of release was 0.1
#Parameters
n=14
k=8
p=0.1

likelihoods = binom.pmf(k, n, p)
print(f'Probability of getting {k} quanta released of {n} with a release probability of {p}: {likelihoods:.4f}')

Probability of getting 8 quanta released of 14 with a release probability of 0.1: 0.0000


In [21]:
#Probability of getting results if true probability of release was 0.7
#Parameters
n=14
k=8
p=0.7

likelihoods = binom.pmf(k, n, p)
print(f'Probability of getting {k} quanta released of {n} with a release probability of {p}: {likelihoods:.4f}')

Probability of getting 8 quanta released of 14 with a release probability of 0.7: 0.1262


In [37]:
#Probability for each decile of release probability
#Parameters
n=14
k=8
p_values = np.arange(0.1,1.1,0.1)

likelihoods = binom.pmf(k, n, p_values)
print(f'Probability of getting {k} quanta released of {n} for each decile of release probability:')
for p, likelihood in zip(p_values, likelihoods):
  print(f'p = {p:.1f}, likelihood = {likelihood:.4f}')

#Most probably release probability
most_probable_p = p_values[np.argmax(likelihoods)]
print(f'Most probable release probability given measurement = {most_probable_p:.1f}')

Probability of getting 8 quanta released of 14 for each decile of release probability:
p = 0.1, likelihood = 0.0000
p = 0.2, likelihood = 0.0020
p = 0.3, likelihood = 0.0232
p = 0.4, likelihood = 0.0918
p = 0.5, likelihood = 0.1833
p = 0.6, likelihood = 0.2066
p = 0.7, likelihood = 0.1262
p = 0.8, likelihood = 0.0322
p = 0.9, likelihood = 0.0013
p = 1.0, likelihood = 0.0000
Most probable release probability given measurement = 0.6


**Exercise 3**: Not feeling convinced by your single experiment (good scientist!), you repeat it under identical conditions. This time you measure 5 quanta that were released. Your sample size has now doubled, to two measurements. You now want to take into account both measurements when you assess the likelihoods of different possible values of the underlying release probability. To do so, assume that the two measurements in this sample are independent of one another; that is, the value of each result had no bearing on the other. In this case, the total likelihood is simply the product of the likelihoods associated with each separate measurement. It is also typical to compute the logarithm of each likelihood and take their sum, which is often more convenient. What are the values of the total likelihood and total log-likelihood in this example, if we assume that the true release probability is 0.1?

Of course, knowing those values of the likelihood and log-likelihood is not particularly useful until you can compare them to the values computed for other possible values for the release probability, so you can determine which value of release probability is most likely, given the data. Therefore, compute the full likelihood and log-likelihood functions using deciles of release probability between 0 and 1. What is the maximum value? Can you improve your estimate by computing the functions at a higher resolution? How does the estimate improve as you increase the sample size?

In [36]:
#Parameters
n = 14
p = 0.1

#Quanta Released Measurements
k1 = 8
k2 = 5

#Likelihood for each measurement
likelihood1 = binom.pmf(k1, n, p)
likelihood2 = binom.pmf(k2, n, p)

#Total likelihood
total_likelihood = likelihood1 * likelihood2

# Calculate the total log-likelihood
log_likelihood1 = np.log(likelihood1)
log_likelihood2 = np.log(likelihood2)

total_log_likelihood = log_likelihood1 + log_likelihood2

#Results
print(f'Likelihood for k = {k1}: {likelihood1:.4f}')
print(f'Likelihood for k = {k2}: {likelihood2:.4f}')
print(f'Total likelihood: {total_likelihood:.4e}')
print(f'Total log-likelihood: {total_log_likelihood:.4f}')


Likelihood for k = 8: 0.0000
Likelihood for k = 5: 0.0078
Total likelihood: 1.2378e-07
Total log-likelihood: -15.9047


In [42]:
#Parameters
n = 14
k1 = 8
k2 = 5
p_values = np.arange(0,1.1,0.1)

#Likelihood for each measurement
likelihood1 = binom.pmf(k1, n, p_values)
likelihood2 = binom.pmf(k1, n, p_values)
total_likelihood = likelihood1 * likelihood2

#Total log-likelihood
log_likelihood1 = np.log(likelihood1)
log_likelihood2 = np.log(likelihood2)
total_log_likelihood = log_likelihood1 + log_likelihood2

#Max Values
max_likelihood = max(total_likelihood)
max_log_likelihood = max(total_log_likelihood)

#Probabilities for Max Values
max_likelihood_prob = p_values[np.argmax(total_likelihood)]
max_log_likelihood_prob = p_values[np.argmax(total_log_likelihood)]

# Print the results
print(f'Probability, Likelihood, Log-Likelihood')
for p, likelihood, log_likelihood in zip(p_values, total_likelihood, total_log_likelihood):
    print(f'probability = {p:0.1f}, likelihood = {likelihood:.4e}, log-likelihood = {log_likelihood:.6f}')

print(f"Maximum Likelihood: {max_likelihood:.4f} at probability {max_likelihood_prob:.1f}")
print(f"Maximum Log-Likelihood: {max_log_likelihood:.4f} at probability {max_log_likelihood_prob:.1f}")


Probability, Likelihood, Log-Likelihood
probability = 0.0, likelihood = 0.0000e+00, log-likelihood = -inf
probability = 0.1, likelihood = 2.5470e-10, log-likelihood = -22.090954
probability = 0.2, likelihood = 4.0614e-06, log-likelihood = -12.413995
probability = 0.3, likelihood = 5.3731e-04, log-likelihood = -7.528930
probability = 0.4, likelihood = 8.4311e-03, log-likelihood = -4.775825
probability = 0.5, likelihood = 3.3595e-02, log-likelihood = -3.393387
probability = 0.6, likelihood = 4.2683e-02, log-likelihood = -3.153965
probability = 0.7, likelihood = 1.5927e-02, log-likelihood = -4.139739
probability = 0.8, likelihood = 1.0397e-03, log-likelihood = -6.868818
probability = 0.9, likelihood = 1.6711e-06, log-likelihood = -13.302055
probability = 1.0, likelihood = 0.0000e+00, log-likelihood = -inf
Maximum Likelihood: 0.0427 at probability 0.6
Maximum Log-Likelihood: -3.1540 at probability 0.6


  log_likelihood1 = np.log(likelihood1)
  log_likelihood2 = np.log(likelihood2)


In [45]:
#At Higher Resolution
n = 14
k1 = 8
k2 = 5
p_values = np.arange(0,1.1,0.01)

#Likelihood for each measurement
likelihood1 = binom.pmf(k1, n, p_values)
likelihood2 = binom.pmf(k1, n, p_values)
total_likelihood = likelihood1 * likelihood2

#Total log-likelihood
log_likelihood1 = np.log(likelihood1)
log_likelihood2 = np.log(likelihood2)
total_log_likelihood = log_likelihood1 + log_likelihood2

#Max Values
max_likelihood = max(total_likelihood)
max_log_likelihood = max(total_log_likelihood)

#Probabilities for Max Values
max_likelihood_prob = p_values[np.argmax(total_likelihood)]
max_log_likelihood_prob = p_values[np.argmax(total_log_likelihood)]

# Print the results
print(f'Probability, Likelihood, Log-Likelihood')
for p, likelihood, log_likelihood in zip(p_values, total_likelihood, total_log_likelihood):
    print(f'probability = {p:0.3f}, likelihood = {likelihood:.4e}, log-likelihood = {log_likelihood:.6f}')

print(f"Maximum Likelihood: {max_likelihood:.4f} at probability {max_likelihood_prob:.3f}")
print(f"Maximum Log-Likelihood: {max_log_likelihood:.4f} at probability {max_log_likelihood_prob:.3}")

Probability, Likelihood, Log-Likelihood
probability = 0.000, likelihood = 0.0000e+00, log-likelihood = -inf
probability = 0.010, likelihood = 7.9934e-26, log-likelihood = -57.788593
probability = 0.020, likelihood = 4.6377e-21, log-likelihood = -46.820066
probability = 0.030, likelihood = 2.6935e-18, log-likelihood = -40.455703
probability = 0.040, likelihood = 2.3732e-16, log-likelihood = -35.977143
probability = 0.050, likelihood = 7.4356e-15, log-likelihood = -32.532502
probability = 0.060, likelihood = 1.2108e-13, log-likelihood = -29.742342
probability = 0.070, likelihood = 1.2545e-12, log-likelihood = -27.404275
probability = 0.080, likelihood = 9.3326e-12, log-likelihood = -25.397503
probability = 0.090, likelihood = 5.3887e-11, log-likelihood = -23.644124
probability = 0.100, likelihood = 2.5470e-10, log-likelihood = -22.090954
probability = 0.110, likelihood = 1.0235e-09, log-likelihood = -20.700070
probability = 0.120, likelihood = 3.5959e-09, log-likelihood = -19.443483
prob

  log_likelihood1 = np.log(likelihood1)
  log_likelihood2 = np.log(likelihood2)


**Exercise 4:** You keep going and conduct 100 separate experiments and end up with these results:

What is the most likely value of p (which we typically refer to as
, which is pronounced as "p-hat" and represents the maximum-likelihood estimate of a parameter in the population given our sample with a resolution of 0.01?

In [57]:
#Parameters
num_experiments = 100
n = 14

#Results of Experiments
experiments = [
    {'trials':0, 'successes':0},
    {'trials':0, 'successes':1},
    {'trials':3, 'successes':2},
    {'trials':7, 'successes':3},
    {'trials':10, 'successes':4},
    {'trials':19, 'successes':5},
    {'trials':26, 'successes':6},
    {'trials':16, 'successes':7},
    {'trials':16, 'successes':8},
    {'trials':5, 'successes':9},
    {'trials':5, 'successes':10},
    {'trials':0, 'successes':11},
    {'trials':0, 'successes':12},
    {'trials':0, 'successes':13},
    {'trials':0, 'successes':14}
]

total_successes = sum(exp['successes'] for exp in experiments)
total_trials = sum(exp['trials'] for exp in experiments)


#Maximum-likelihood estimate of a parameter in the population
p_mle = total_successes / total_trials
p_mle_rounded = round(p_mle, 2)

print(f'Maximum Likelihood Estimate (MLE) for p: {p_mle:.2f}')


Maximum Likelihood Estimate (MLE) for p: 0.98


**Exercise 5:** Let's say that you have run an exhaustive set of experiments on this synapse and have determined that the true release probability is 0.3 (within some very small tolerance). Now you want to test whether changing the temperature of the preparation affects the release probability. So you change the temperature, perform the experiment, and measure 7 quantal events for the same 14 available quanta. Compute p hat. Standard statistical inference now asks the question, what is the probability that you would have obtained that measurement given a Null Hypothesis of no effect? In this case, no effect corresponds to an unchanged value of the true release probability (i.e., its value remained at 0.3 even with the temperature change). What is the probability that you would have gotten that measurement if your Null Hypothesis were true? Can you conclude that temperature had an effect?

In [59]:
# Experiment Results
successes = 7
trials = 14

# Calculate the MLE for p
p_hat = successes / trials

# Round to the nearest resolution of 0.01
p_hat_rounded = round(p_hat, 2)

# Output the result
print(f"Maximum Likelihood Estimate for p: {p_hat_rounded:.2f}")

Maximum Likelihood Estimate for p: 0.50


In [60]:
# Define the parameters
n = 14  # Number of trials
k = 7   # Number of successes
p_null = 0.3  # Null hypothesis probability

# Calculate the p-value for the binomial test
p_value = 1 - binom.cdf(k - 1, n, p_null)

# Output the p-value
print(f"P-value: {p_value:.4f}")

P-value: 0.0933


You cannot conclude that the temperature had an effect given that p-value is >0.05; therefore, the null hypothesis is accepted.