<a href="https://colab.research.google.com/github/vasudev-sharma/AI_Residencies/blob/master/Missing_values(dropna()%2Bisnull()%2Bnotnull()%2Bfillna()).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [9]:
# Handle Missing data
import pandas as pd
import numpy as np
df = pd.Series([1, np.nan, None, 3, 'Hello']) # Note: Both None and np.nan are treat as null(Nan) in Pandas

In [10]:
df

0        1
1      NaN
2     None
3        3
4    Hello
dtype: object

In [15]:
df_2 = pd.DataFrame([[1, np.nan, 3], [2, None, 2], ['Hello', 3, 0]])
df_2

Unnamed: 0,0,1,2
0,1,,3
1,2,,2
2,Hello,3.0,0


## Handle Missing data in Pandas
- `df.isnull()`: Boolean mask
- `df.notnull()`: Boolean mask
- `df.dropna()`: Filtered DataFrame
- `df.fillna()`: Modifed DataFrame

### df.isnull()

In [11]:
# Return a boolean series where the values is null
df.isnull()

0    False
1     True
2     True
3    False
4    False
dtype: bool

In [12]:
# Masking where the values are null
df[df.isnull()]

1     NaN
2    None
dtype: object

### df.notnull()

In [13]:
# Quite the opposite of `df.isnull()` method
df.notnull() # same can be done using ~df.isnull()

0     True
1    False
2    False
3     True
4     True
dtype: bool

In [14]:
# Masking 
df[df.notnull()]

0        1
3        3
4    Hello
dtype: object

### df.dropna()

In [16]:
# Series object: remove rows where the values are null
# DataFrame object: remove *complete• rows (by default) / column where the NaN values are existing.


df.dropna()

0        1
3        3
4    Hello
dtype: object

In [20]:
df_2

Unnamed: 0,0,1,2
0,1,,3
1,2,,2
2,Hello,3.0,0


In [18]:
# Drop Pandas DataFrame: as mentioned before, the whole row will be dropped
df_2.dropna()

Unnamed: 0,0,1,2
2,Hello,3.0,0


In [19]:
# dropna(): `axis` lets one specify column or row
df_2.dropna(axis=1)

Unnamed: 0,0,2
0,1,3
1,2,2
2,Hello,0


In [22]:
# dropna(): `how` {'all', 'any'} -> drop the entire column / row depending on the methdo
df_2.iloc[2, 1] = np.nan
df_2.dropna(axis=1, how='all')

# NOTE: By default, how='any'

Unnamed: 0,0,2
0,1,3.0
1,2,2.0
2,Hello,


In [23]:
# dropna(): `thresh` specifies the minimum number of null values in the axis
df_2.dropna(axis=1, thresh=2)

Unnamed: 0,0,2
0,1,3.0
1,2,2.0
2,Hello,


### df.fillna()

In [24]:
# Replaces the missing values with default value
df.fillna(3.0)

0        1
1      3.0
2      3.0
3        3
4    Hello
dtype: object