## **Statistical Thinking in Python (Part 1)**

**Course Description**

After all of the hard work of acquiring data and getting them into a form you can work with, you ultimately want to make clear, succinct conclusions from them. This crucial last step of a data analysis pipeline hinges on the principles of statistical inference. In this course, you will start building the foundation you need to think statistically, to speak the language of your data, to understand what they are telling you. The foundations of statistical thinking took decades upon decades to build, but they can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up to speed and begin thinking statistically by the end of this course.

**Imports**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from pprint import pprint as pp
import csv
from pathlib import Path

**Pandas Configuration Options**

In [None]:
pd.set_option('max_columns', 200)
pd.set_option('max_rows', 300)
pd.set_option('display.expand_frame_repr', True)

**Data Files Location**

* Most data files for the exercises can be found on the [course site](https://www.datacamp.com/courses/statistical-thinking-in-python-part-1)
    * [2008 election results (all states)](https://assets.datacamp.com/production/repositories/469/datasets/8fb59b9a99957c3b9b1c82b623aea54d8ccbcd9f/2008_all_states.csv)
    * [2008 election results (swing states)](https://assets.datacamp.com/production/repositories/469/datasets/e079fddb581197780e1a7b7af2aeeff7242535f0/2008_swing_states.csv)
    * [Belmont Stakes](https://assets.datacamp.com/production/repositories/469/datasets/7507bfed990379f246b4f166ea8a57ecf31c6c9d/belmont.csv)
    * [Speed of light](https://assets.datacamp.com/production/repositories/469/datasets/df23780d215774ff90be0ea93e53f4fb5ebbade8/michelson_speed_of_light.csv)

**Data File Objects**

In [None]:
data = Path.cwd() / 'data' / 'statistical_thinking_1'
elections_all_file = data / '2008_all_states.csv'
elections_swing_file = data / '2008_swing_states.csv'
belmont_file = data / 'belmont.csv'
sol_file = data / 'michelson_speed_of_light.csv'

# Graphical exploratory data analysis

Look before you leap! A very important proverb, indeed. Prior to diving in headlong into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data. So it is a fitting subject for the first chapter of Statistical Thinking in Python.

## Introduction to exploratory data analysis

### Tukey's comments on EDA

### Advantages of graphical EDA

## Plotting a histogram

### Plotting a histogram of iris data

### Axis labels!

### Adjusting the number of bins in a histogram

## Plotting all of your data: Bee swarm plots

### Bee swarm plot

### Interpreting a bee swarm plot

## Plotting all of your data: Empirical cumulative distribution functions

### Computing the ECDF

### Plotting the ECDF

### Comparison of ECDFs

## Onward toward the whole story

# Quantitative exploratory data analysis

In the last chapter, you learned how to graphically explore data. In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a data set with a few numbers.

## Introduction to summary statistics: The sample mean and median

### Means and medians

### Computing means

## Percentiles, outliers and box plots

### Computing percentiles

### Comparing percentiles to ECDF

### Box-and-whisker plot

## Variance and standard deviation

### Computing the variance

### The standard deviation and the variance

## Covariance and Pearson correlation coefficient

### Scatter plots

### Variance and covariance by looking

### Computing the covariance

### Computing the Pearson correlation coefficient

# Thinking probabilistically-- Discrete variables

Statistical inference rests upon probability. Because we can very rarely say anything meaningful with absolute certainty from data, we use probabilistic language to make quantitative statements about data. In this chapter, you will learn how to think probabilistically about discrete quantities, those that can only take certain values, like integers. It is an important first step in building the probabilistic language necessary to think statistically.

## Probabilistic logic and statistical inference

### What is the goal of statistical inference?

### Why do we use the language of probablility?

## Random number generators and hacker statistics

### Generating random numbers using the np.random module

### The np.random module and Bernoulli trials

### How many defaults might we expect?

### Will the bank fail?

## Probability distributions and stories: The Binomial distribution

### Sampling out of the Binomial distribution

### Plotting the Binomial PMF

## Poisson processes and the Poisson distribution

### Relationship between Binomial and Poisson distribution

### How many no-hitters in a season?

### Was 2015 anomalous?

# Thinking probabilistically-- Continuous variables

In the last chapter, you learned about probability distributions of discrete variables. Now it is time to move on to continuous variables, such as those that can take on any fractional value. Many of the principles are the same, but there are some subtleties. At the end of this last chapter of the course, you will be speaking the probabilistic language you need to launch into the inference techniques covered in the sequel to this course.

## Probability density functions

### Interpreting PDFs

### Interpreting CDFs

## Introduction to the Normal distribution

### The Normal PDF

### The Normal CDF

## The Normal distribution: Properties and warnings

### Gauss and the 10 Deutschmark banknote

### Are the Belmont Stakes results Normally distributed?

### What are the chances of a horse matching or beating Secretariat's record?

## The Exponential distribution

### Matching a story and a distribution

### Waiting for the next Secretariat

### If you have a story, you can simulate it!

### Distribution of no-hitters and cycles

## Final thoughts and encouragement toward Statistical Thinking II