# Statistics Exercises 01

By Yi-Lun (Alan) Chung $^\dagger$, Carter Vu$^*$ and Haoran Zhao$^*$

$^\dagger$ National Tsing Hua University, Taiwan

$^*$ University of Washington, Seattle

Winter 2021

## Intro
In this set of exercises, we will review 3 types of distributions: Gaussian/Normal distributions, Poisson distributions, and Binomial distributions. Check them out below!

https://en.wikipedia.org/wiki/Normal_distribution

https://en.wikipedia.org/wiki/Poisson_distribution

https://en.wikipedia.org/wiki/Binomial_distribution#:~:text=In%20probability%20theory%20and%20statistics,with%20probability%20p)%20or%20failure%20(

# Installing Packages

Install conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/



First, create an environment with only the packages for this tutorial, so that you can keep it separate and other installs on your system won’t interfere with it, and activate the environment so that you can access those packages.

conda create --name EPE_Stats_Packages python

conda activate EPE_Stats_Packages


Now that you are in the appropriate environment, install the following packages:

- conda install - c conda-forge jupyterlab (this gives you access to the full jupyterlab integrated development environment, or IDE, if you just want the base functionality you can use conda install -c conda-forge notebook)
- conda install -c anaconda scipy
- conda install -c conda-forge matplotlib
- conda install -c anaconda numpy (if you don’t have it already or by default)

You can use: 
- conda list to see the packages you’ve installed in this environment
- conda info --envs to see which environments are available/you’ve created

Open the notebook:

jupyter-notebook PATH/TO/NOTEBOOK.ipynb or jupyter-notebook PATH/TO/DIRECTORY if you like jupyter's GUI.

Heads up: eventually, we want to move to reproducing the plots in the paper here (https://arxiv.org/pdf/1007.1727.pdf), where we’ll want to have some familiarity with numpy, matplotlib, and scipy. Hold on to your hats!


In [None]:
# Hint: Use these packages! Feel free to use others if you like, but these should contain all you need.
import random
import math
import matplotlib.pyplot as plt
import numpy as np

## Exercise 1: Plot Gaussian, Poisson, binomial distribution based on some given PMFs

Below are the PMFs for three different distributions. Note that the PMF is the discrete counterpart of the continuous PDF. Create a function for each distribution, and plot them using matplotlib!

Probability density function (PDF) for the normal distribution:

## $$g(x|\mu,\sigma) = \frac{1}{\sqrt{2\pi\,\sigma^2}}exp[-\frac{(x-\mu)^2}{2\,\sigma^2}], \, \,  x \in (-\infty, \infty)$$


Probability mass function (PMF) for the poisson distribution: 

## $$p(x|\mu,\sigma) = exp[-\mu]\frac{\mu^x}{x!}, \, \,  x \in \text{N}, x\geq0$$


Probability mass function (PMF) for the binomial distribution: 

## $$p(x| N) = \frac{N!}{x!(N-x)!}\frac{1}{2^N}, \, \,  x\in \text{N}$$

 

## Exercise 2: Plot the PMFs as binned histograms instead of continuous distributions
Note: it can help if you think about the binned histogram value as the expectation value for that bin! You can use numpy.histogram.

## Exercise 3: Randomly sample the PMFs

Generate 10,000 values such that the histogram of the values follows a gaussian/poisson/binomial distribution. Randomly picking one such value according to a distribution is known as "sampling" that distribution. This is effectively what monte-carlo simulations are doing behind the scenes, except they sample a *lot* of complex distributions to approximate the overall behavior of some variable. Try using random.uniform(0,1) as one of your inputs to your pmf, or alternatively, try the hit-or-miss method on page 17 of these slides! https://hep1.phys.ntu.edu.tw/~kfjack/lecture/hepstat/in2/inter-2.pdf

## Exercise 4: Plot the same plots as in exercises 2 and 3, but normalize all histogram areas to one

## Exercise 5: Compare the plots from exercises 2, 3, and 4. How does the relationship change if you run it again with 100 events instead of 10,000? What about 50,000 events?