# Custom Summary Statistics Using Pandas

In [1]:
import numpy as np
import pandas as pd

Let's generate a random list of numbers:

In [2]:
data = np.random.normal(5, 15, 30)
data_df = pd.DataFrame({"data":data})
data_df.head()

Unnamed: 0,data
0,19.111743
1,6.9414
2,-0.579221
3,17.938935
4,9.253374


You can use pandas `describe()` method to print few summary stats. But it doesn't include stats such as median and Inter Quartile Range (IQR). 

The pandas `agg()` method can help us get such custom stats. 

Let's use it to print below stats: 
* Minimum Value
* Maximum Value
* Range
* Mean
* Median
* Standard Deviation
* Inter Quartile Range (IQR)

Pandas doesn't have built-in aggregate functions for Range and IQR. So we write our own:

In [3]:
# Input's inter quartile range
def IQR(column): 
    q25, q75 = column.quantile([0.25, 0.75])
    return q75-q25

# input's range  
def range_f(column):
    return column.max() - column.min()

In [4]:
# All the stats we'll print
stats_list = [
    'min', 'max', range_f, 'mean', 'median', 
    'std', # standard deviation 
    IQR   
]

And here are the custom stats:  

In [5]:
summary_stats = data_df.agg(stats_list).round(2)

pretty_names = ['Minimum', 'Maximum', 'Range', 'Mean', 'Median', 'Standard Deviation', 'IQR']
summary_stats = pd.DataFrame(summary_stats.values, index=pretty_names, columns=['Value'])
summary_stats

Unnamed: 0,Value
Minimum,-17.84
Maximum,25.44
Range,43.28
Mean,5.3
Median,6.7
Standard Deviation,11.45
IQR,15.37


---------------

Check out my Machine Learning blog at https://YashmeetSingh.com