## Summary Statistics Using Pandas

In [1]:
import numpy as np
import pandas as pd

Let's generate a random list of numbers:

In [2]:
random_seed = 175
np.random.seed(random_seed)

data = np.random.normal(5, 15, 30)
data_df = pd.DataFrame({"data":data})

Panda's `describe()` method prints some summary stats. But that doesn't include some of my favorite stats. 

So we'll write a few lines of code to generate below stats: 
* Minimum Value
* Maximum Value
* Range
* Mean
* Median
* Standard Deviation
* Inter Quartile Range

Pandas doesn't have built-in aggregate functions for IQR and range. So we write our own. 

In [23]:
# Input's inter quartile range
def IQR(column): 
    q25, q75 = column.quantile([0.25, 0.75])
    return q75-q25

# input's range  
def range_f(column):
    return column.max() - column.min()

In [24]:
stats_list = [
    'min', 'max', range_f, 'mean', 'median', 
    'std', # standard deviation 
    IQR   
]

In [27]:
summary_stats = data_df.agg(stats_list).round(2)

pretty_names = ['Minimum', 'Maximum', 'Range', 'Mean', 'Median', 'Standard Deviation', 'IQR']
summary_stats = pd.DataFrame(summary_stats.values, index=pretty_names, columns=['Value'])
summary_stats

Unnamed: 0,Value
Minimum,-23.72
Maximum,37.45
Range,61.17
Mean,5.92
Median,4.62
Standard Deviation,15.47
IQR,17.23
