## Custom Summary Statistics Using Pandas

In [1]:
import numpy as np
import pandas as pd

Let's generate a random list of numbers:

In [2]:
data = np.random.normal(5, 15, 30)
data_df = pd.DataFrame({"data":data})

Panda's `describe()` method prints some summary stats:

In [3]:
data_df.describe()

Unnamed: 0,data
count,30.0
mean,2.455036
std,17.356238
min,-28.232676
25%,-11.093863
50%,2.903202
75%,16.05909
max,34.131654


But that doesn't include some of my favorite stats. 

We'll write a few lines of code to include below stats: 
* Minimum Value
* Maximum Value
* Range
* Mean
* Median
* Standard Deviation
* Inter Quartile Range (IQR)

Pandas doesn't have built-in aggregate functions for Range and IQR. So we write our own:

In [4]:
# Input's inter quartile range
def IQR(column): 
    q25, q75 = column.quantile([0.25, 0.75])
    return q75-q25

# input's range  
def range_f(column):
    return column.max() - column.min()

In [5]:
stats_list = [
    'min', 'max', range_f, 'mean', 'median', 
    'std', # standard deviation 
    IQR   
]

Let's calculate the stats using pandas' `agg()` method: 

In [6]:
summary_stats = data_df.agg(stats_list).round(2)

pretty_names = ['Minimum', 'Maximum', 'Range', 'Mean', 'Median', 'Standard Deviation', 'IQR']
summary_stats = pd.DataFrame(summary_stats.values, index=pretty_names, columns=['Value'])
summary_stats

Unnamed: 0,Value
Minimum,-28.23
Maximum,34.13
Range,62.36
Mean,2.46
Median,2.9
Standard Deviation,17.36
IQR,27.15


---------------

Check out my Machine Learning blog at https://YashmeetSingh.com