# Summary Subpackage

### Description
This provides examples of the modules and methods in the summary subpackage.

The first module:
**summary_classes.py**

This module has the class:
- Df_Info (df, type="columns")
    - Creates a Df_Info class object from a Pandas DataFrame.
    - This class is the building block for the Missing and Stats class.
Which contains the methods:
- Df_Info.total_max()
    - This returns the maximum value of the database.
- Df_Info.total_min()
    - This returns the minimum value of the database.
- Df_Info.total_mean()
    - This returns the average value of the database.
- Df_Info.total_missing()
    - This returns the total amount of missing values.
    
This module also has the class:
- Missing (df, type="columns")
    - Creates a Missing class object from a Pandas DataFrame.
    - Inherits methods from the Df_Info class.
    - Returns the total missing values and the percentage of missing values for each column (or row if type specified "row") upon initializiation.

This final class in this module is:
- Stats(df, type="columns")
    - Creates a Stats class object from a Pandas Dataframe.
    - Inherits methods from the Df_Info class.
    - Returns the maximum, minimum, and average value for each column (or row if type specified "row") upon initialization.
    
The second module:
**summary_stats.py**

This module has the methods:
- missing_summary(df, type="columns")
    - Takes a Pandas Dataframe and generates a Missing class object from the summary_classes module.
    - Returns the total missing values and the percentage of missing values for each column (or row if type specified "row").
- stats_summary(df, type="columns")
    - Takes a Pandas Dataframe and generates a Stats class object from the summary_classes module.
    - Returns the maximum, minimum, and average value for each column (or row if type specified "row").
- all_summary(df, type="columns")
    - Takes a Pandas Dataframe and calls upon the missing_summary() and stats_summary methods.
- simple_summary(df, type="columns")
    - Takes a Pandas Dataframe and generates a Df_Info class object from the summary_classes module.
    - Returns minimum, maximum, average, number of rows, number of columns, and number of missing values.
    
### Examples
Examples use the CarPrice.csv dataset saved in the data folder of this repository

In [24]:
#import packages
import pandas as pd
import quickscreen.summary.summary_stats as ss

In [25]:
# load data
df = pd.read_csv("./data/Carprice.csv")

In [26]:
# missing summary

ss.missing_summary(df)

Unnamed: 0,count_missing,percent_missing
CarName,0,0.0
curbweight,0,0.0
enginesize,0,0.0
horsepower,0,0.0
peakrpm,0,0.0
citympg,0,0.0
highwaympg,0,0.0
price,0,0.0


In [27]:
# stats summary

ss.stats_summary(df)

Unnamed: 0,max,min,mean
curbweight,4066.0,1488.0,2555.565854
enginesize,326.0,61.0,126.907317
horsepower,288.0,48.0,104.117073
peakrpm,6600.0,4150.0,5125.121951
citympg,49.0,13.0,25.219512
highwaympg,54.0,16.0,30.75122
price,45400.0,5118.0,13276.710571


In [28]:
# stats summary by row

ss.stats_summary(df, "rows").head(5)

Unnamed: 0,max,min,mean
0,13495.0,21.0,3047.428571
1,16500.0,21.0,3476.714286
2,16500.0,19.0,3524.857143
3,13950.0,24.0,3150.285714
4,17450.0,18.0,3723.571429


In [29]:
# all summary

ss.all_summary(df)

Unnamed: 0,max,min,mean,count_missing,percent_missing
curbweight,4066.0,1488.0,2555.565854,0,0.0
enginesize,326.0,61.0,126.907317,0,0.0
horsepower,288.0,48.0,104.117073,0,0.0
peakrpm,6600.0,4150.0,5125.121951,0,0.0
citympg,49.0,13.0,25.219512,0,0.0
highwaympg,54.0,16.0,30.75122,0,0.0
price,45400.0,5118.0,13276.710571,0,0.0
CarName,,,,0,0.0


In [30]:
# simple summary

ss.simple_summary(df)

Unnamed: 0,values
df_max,
df_min,
df_mean,
row_count,205.0
column_count,205.0
total_missing,0.0
