In [1]:
# Import essential packages
import os
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

## 1 Which kind of Data is plotable?

At this point (23/09/2021), I identify 3 kinds of Data that are plotable:
- Data type Number (`int` or `float`, or in general, not string `object`) with many unique values $\rightarrow$ Use Histogram.
- Data type Number with few unique values (I choose the threshold is `<= 10`) $\rightarrow$ Use Bar Plot.
- Data type String with few unique values $\rightarrow$ Use Bar Plot.

### 1.1 Mathematics Logic Definition & Proof for Our Condition
If we define the following statements:
- `p` : Data type String.
- `q` : Data with less than or equal to 10 unique values.

Then we have following statements consequently:
- `~p`: Data type different than String (which is Number).
- `~q`: Data with more than 10 unique values.

Now we have the condition for Data to be plotable: Data type Number, or Data type String with less than or equal to 10 unique values, or systematically, `Data type different than String, or Data type String and Data with less than or equal to 10 unique values.`. Symbolically:
- $\neg p \vee (p \wedge q)$

And with some basic logic:

$\neg p \vee (p \wedge q)$

$= (\neg p \vee p) \wedge (\neg p \vee q)$ ***`(Distributive Laws)`***

$= t \wedge (\neg p \vee q)$ ***`(Negation Laws)`***

$= \neg p \vee r$ ***`(Identity Laws)`***

So finally, we have the plotable condition: `Data type different than string and less than or equal to 10 unique values`, much shorter and easier to write.

In [5]:
def distribution_plot(df):
    plotable_col = [] # Contain all columns name whose data is plotable
    for col in df.columns:
        # Pass Id Col (if there is), we don't want to plot such a thing
        if len(df[col].unique()) == df[col].shape[0]:
            continue
        # Identify Plotable Data (columns)
        if df[col].dtype != 'object' and len(df[col].unique()):
            plotable_col.append(col)
            
    print(plotable_col)
    
    fig = plt.figure(figsize=(16, 16))
    fig.add_gridspec()

In [6]:
df = pd.read_csv('./datasets/titanic_train.csv')

In [7]:
distribution_plot(df)

['Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
