<h1>Handling Null Values in Pandas</h1>
<p>Null values, often represented as NaN in pandas, are missing or undefined data points in a dataset.Effectively managing null values is crucial for accurate analysis and modeling.By addressing null values effectively, analysts and data scientists can make informed decisions and build reliable machine learning models.</p>
<p>There are two types of dataset</p>
<p><b>Numerical Dataset:</b> Numerical datasets consist of quantitative data that can be measured and expressed as numerical values. Examples include age, height, temperature, and income.</p>
<p><b>Categorical Dataset:</b> Categorical datasets contain qualitative data that represents categories or groups. Examples include gender, color, country, etc.</p>

In [None]:
%pip install pandas
%pip install scipy

In [2]:
import pandas as pd
df = pd.read_csv("Loan 1.csv")
df.head()


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,,Rural,97009.18,...,2.0,833.2,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,,Urban,109980.0,...,,,0,,356,1939.23,4,Semi-Urban,1,180908.0


## Identifying null values
Several methods in pandas facilitate the identification of null values in a DataFrame:
<ol>
<li><b>isnull()</b>This method returns a DataFrame of the same shape as the input, where each element is True if it's a null value and False otherwise. It's useful for pinpointing null values within the dataset.</li>
<li><b>info()</b>The info() method provides a concise summary of the DataFrame, including the count of non-null values for each column. It's a quick way to assess the presence of null values and the overall data structure.</li>
</ol>

In [7]:
# code implementation
df.isnull() # return a dataset, If the value is null 'True' is written else 'False' is written.

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,True,False,False,...,True,True,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False
19996,False,False,False,False,False,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
19997,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
19998,False,False,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False


In [14]:
df.isnull().sum() #gives sum of null values in a specific column

Customer ID                       0
Name                              0
Gender                           31
Age                               0
Income (USD)                    750
Income Stability                813
Profession                        0
Type of Employment             4689
Location                          0
Loan Amount Request (USD)         0
Current Loan Expenses (USD)      83
Expense Type 1                    0
Expense Type 2                    0
Dependents                     1142
Credit Score                    743
No. of Defaults                   0
Has Active Credit Card         1076
Property ID                       0
Property Age                    892
Property Type                     0
Property Location               160
Co-Applicant                      0
Property Price                    0
dtype: int64

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 23 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Customer ID                  20000 non-null  object 
 1   Name                         20000 non-null  object 
 2   Gender                       19969 non-null  object 
 3   Age                          20000 non-null  int64  
 4   Income (USD)                 19250 non-null  float64
 5   Income Stability             19187 non-null  object 
 6   Profession                   20000 non-null  object 
 7   Type of Employment           15311 non-null  object 
 8   Location                     20000 non-null  object 
 9   Loan Amount Request (USD)    20000 non-null  float64
 10  Current Loan Expenses (USD)  19917 non-null  float64
 11  Expense Type 1               20000 non-null  object 
 12  Expense Type 2               20000 non-null  object 
 13  Dependents      

## Imputing Null Values:
Replacing null values with meaningful estimates.
Strategies include mean, median, mode, or using more complex methods like regression or machine learning models.<br>
<b>fillna():</b> Fills null values with a specified value or method.

In [13]:
# let's see how we can implement this method
df2 = df.fillna(value = 0) # it will replace all null values with value 0.
df2.head()

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,0,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,0,Rural,97009.18,...,2.0,833.2,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,0,Urban,109980.0,...,0.0,0.0,0,0,356,1939.23,4,Semi-Urban,1,180908.0


In [17]:
# Now let's try filling with value of the previous row
df3 = df.fillna(method='pad')
df3 #the output will have the Nan value replaced with the value of the row above it.

  df3 = df.fillna(method='pad')


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,Sales staff,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,Sales staff,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,Sales staff,Urban,109980.00,...,2.0,833.20,0,Inactive,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,Laborers,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,4.0,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,Core staff,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [18]:
# Let's change the null value with the value of the next row
df4 = df.fillna(method='bfill')
df4 

  df4 = df.fillna(method='bfill')


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,Sales staff,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,Sales staff,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,Sales staff,Urban,109980.00,...,2.0,620.58,0,Inactive,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,Core staff,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,1.0,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,Laborers,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [3]:
# Filling null values with the previous value but column wise
df5 = df.fillna(method='pad',axis=1)
df5

  df5 = df.fillna(method='pad',axis=1)


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,Working,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,Pensioner,Rural,97009.18,...,2.0,833.2,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,Pensioner,Urban,109980.0,...,N,N,0,0,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.6,Low,Commercial associate,Commercial associate,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.6,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.8,...,Y,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,Working,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [8]:
# Now let's fill different default null values column wise
df6 = df.fillna({'Type of Employment': 'Yet to ask', 'Dependents': '2.5'})
df6

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,Yet to ask,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,Yet to ask,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,Yet to ask,Urban,109980.00,...,2.5,,0,,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,Yet to ask,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,2.5,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,Yet to ask,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


## You can fill null values in Catagorical dataser with mean, median, mode and max/min:
The choice of what value to use—mean, median, mode, max, or min—depends on the distribution of your data and the nature of the variable:
1. <p style='color: yellow;'><b>Mean:</b></p>Use the mean when the data distribution is symmetric and there are no outliers. It’s suitable for continuous data with a normal distribution.
2. <p style='color: yellow;'><b>Median:</b></p>The median is robust to outliers and is a good choice when the data distribution suddenly changes. It’s also suitable for ordinal data.
3. <p style='color: yellow;'><b>Mode:</b></p>The mode is the most frequent value. It’s useful for categorical data or for filling in missing values in a discrete distribution.
4. <p style='color: yellow;'><b>Max/Min:</b></p>These are less common for imputation but can be used if you have a reason to replace missing values with the maximum or minimum observed value, such as setting a boundary or limit.

In [10]:
# now let's implement it
#filling null value with the mean of a column
df7 = df.fillna(value=df['Dependents'].mean())
df7

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.000000,799.140000,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.000000,833.310000,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,2.251246,Semi-Urban,176684.91,...,3.000000,627.440000,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,2.251246,Rural,97009.18,...,2.000000,833.200000,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,2.251246,Urban,109980.00,...,2.251246,2.251246,0,2.251246,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,2.251246,Semi-Urban,156766.97,...,4.000000,684.320000,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,2.251246,706.340000,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.000000,656.460000,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,2.251246,Semi-Urban,95430.73,...,2.000000,865.460000,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [11]:
#Now let's fill the dependents column with the median
df8 = df['Dependents'].fillna(value=df['Dependents'].median())
df8

0        2.0
1        2.0
2        3.0
3        2.0
4        2.0
        ... 
19995    4.0
19996    2.0
19997    1.0
19998    2.0
19999    3.0
Name: Dependents, Length: 20000, dtype: float64

In [13]:
df9 = df['Dependents'].fillna(value=df['Dependents'].max())
df9

0         2.0
1         2.0
2         3.0
3         2.0
4        13.0
         ... 
19995     4.0
19996    13.0
19997     1.0
19998     2.0
19999     3.0
Name: Dependents, Length: 20000, dtype: float64

## dropna()
The dropna() method in pandas removes rows or columns with null values from a DataFrame. Parameters like subset, thresh, how, and axis offer customization:


In [14]:
#let's drop all the row with the null values in the dataset
df10 = df.dropna()
df10

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
5,C-2600,Gina Weir,F,59,2944.81,Low,Working,Sales staff,Semi-Urban,31465.78,...,2.0,620.58,0,Inactive,497,2944.81,1,Semi-Urban,0,51075.31
6,C-9047,Lacey Cybulski,M,43,1957.31,Low,Working,Sales staff,Rural,150334.11,...,2.0,731.37,0,Unpossessed,206,1957.31,4,Semi-Urban,1,232535.55
11,C-43027,Karlyn Cree,M,29,2183.59,Low,Commercial associate,Laborers,Semi-Urban,53651.03,...,3.0,653.00,0,Unpossessed,307,2183.59,1,Semi-Urban,0,79681.39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19992,C-27000,Jackson Wheeless,M,18,2373.38,Low,Working,Security staff,Semi-Urban,160915.05,...,4.0,671.42,1,Active,236,2373.38,4,Rural,1,177353.14
19993,C-39756,Alethia Dively,M,25,2061.80,Low,Commercial associate,Cleaning staff,Semi-Urban,27815.85,...,2.0,861.95,0,Active,689,2061.80,3,Rural,1,43343.99
19994,C-32138,Cuc Verrett,M,19,3262.08,Low,Commercial associate,Laborers,Semi-Urban,61534.75,...,3.0,741.39,0,Inactive,893,3262.08,3,Semi-Urban,1,114653.08
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27


In [15]:
# Now let's see how does 'how' parameter work
# how='any' means if any row has a null values it is removed (default)
df11 =df.dropna(how='any')
df11

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
5,C-2600,Gina Weir,F,59,2944.81,Low,Working,Sales staff,Semi-Urban,31465.78,...,2.0,620.58,0,Inactive,497,2944.81,1,Semi-Urban,0,51075.31
6,C-9047,Lacey Cybulski,M,43,1957.31,Low,Working,Sales staff,Rural,150334.11,...,2.0,731.37,0,Unpossessed,206,1957.31,4,Semi-Urban,1,232535.55
11,C-43027,Karlyn Cree,M,29,2183.59,Low,Commercial associate,Laborers,Semi-Urban,53651.03,...,3.0,653.00,0,Unpossessed,307,2183.59,1,Semi-Urban,0,79681.39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19992,C-27000,Jackson Wheeless,M,18,2373.38,Low,Working,Security staff,Semi-Urban,160915.05,...,4.0,671.42,1,Active,236,2373.38,4,Rural,1,177353.14
19993,C-39756,Alethia Dively,M,25,2061.80,Low,Commercial associate,Cleaning staff,Semi-Urban,27815.85,...,2.0,861.95,0,Active,689,2061.80,3,Rural,1,43343.99
19994,C-32138,Cuc Verrett,M,19,3262.08,Low,Commercial associate,Laborers,Semi-Urban,61534.75,...,3.0,741.39,0,Inactive,893,3262.08,3,Semi-Urban,1,114653.08
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27


In [16]:
# how='all' means it drop the row that have all nul values
df12 = df.dropna(how='all')
df12

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,,Urban,109980.00,...,,,0,,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [17]:
# What if you want to drop column with null values, there is a parameter for it
df13 = df.dropna(axis=1)
df13

Unnamed: 0,Customer ID,Name,Age,Profession,Location,Loan Amount Request (USD),Expense Type 1,Expense Type 2,No. of Defaults,Property ID,Property Type,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,47,Commercial associate,Semi-Urban,137088.98,N,N,0,843,2,1,236644.5
1,C-35067,Jeannette Cha,57,Working,Rural,104771.59,Y,Y,0,22,1,1,142357.3
2,C-34590,Keva Godfrey,52,Working,Semi-Urban,176684.91,N,Y,0,1,1,1,300991.24
3,C-16668,Elva Sackett,65,Pensioner,Rural,97009.18,N,Y,0,730,1,0,125612.1
4,C-12196,Sade Constable,60,Pensioner,Urban,109980.00,N,N,0,356,4,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,19,Commercial associate,Semi-Urban,156766.97,Y,Y,0,681,4,1,212778
19996,C-17587,Evelina Hodges,22,Working,Urban,47924.80,Y,Y,0,213,4,1,90816.95
19997,C-46479,Karlyn Mckinzie,19,Working,Semi-Urban,18629.88,Y,N,0,270,2,0,21566.27
19998,C-3099,Mariana Pulver,21,Working,Semi-Urban,95430.73,N,Y,0,489,1,1,120281.17


In [21]:
# Drop rows where specific columns have null values
df14 = df.dropna(subset=['Type of Employment', 'Dependents'])
df14

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
5,C-2600,Gina Weir,F,59,2944.81,Low,Working,Sales staff,Semi-Urban,31465.78,...,2.0,620.58,0,Inactive,497,2944.81,1,Semi-Urban,0,51075.31
6,C-9047,Lacey Cybulski,M,43,1957.31,Low,Working,Sales staff,Rural,150334.11,...,2.0,731.37,0,Unpossessed,206,1957.31,4,Semi-Urban,1,232535.55
9,C-11606,Hellen Alexis,F,27,949.17,,Working,Laborers,Rural,24703.89,...,2.0,749.22,0,Active,260,949.17,2,Semi-Urban,1,41387.23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19992,C-27000,Jackson Wheeless,M,18,2373.38,Low,Working,Security staff,Semi-Urban,160915.05,...,4.0,671.42,1,Active,236,2373.38,4,Rural,1,177353.14
19993,C-39756,Alethia Dively,M,25,2061.80,Low,Commercial associate,Cleaning staff,Semi-Urban,27815.85,...,2.0,861.95,0,Active,689,2061.80,3,Rural,1,43343.99
19994,C-32138,Cuc Verrett,M,19,3262.08,Low,Commercial associate,Laborers,Semi-Urban,61534.75,...,3.0,741.39,0,Inactive,893,3262.08,3,Semi-Urban,1,114653.08
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27


In [31]:
# Drop rows with at least a certain number of non-null values
df15 =  df.dropna(thresh=5)
df15

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,,Urban,109980.00,...,,,0,,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


## interpolate()
The interpolate() method in pandas is used to fill missing values (NaN) in a DataFrame or Series by interpolating values based on the values surrounding them.
The interpolation technique in pandas, is primarily designed for numerical values.

In [34]:
# Interpolate missing values using linear interpolation
df16 = df.interpolate(method='linear')
df16

  df16 = df.interpolate(method='linear')


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,,Urban,109980.00,...,2.0,726.89,0,,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,2.5,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [37]:
# Interpolate missing values using polynomial interpolation of degree 2
# Polynomial interpolation is useful when the relationship between the data points is non-linear
df17 = df.interpolate(method='polynomial', order=2)
df17

  df17 = df.interpolate(method='polynomial', order=2)


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.000000,799.140000,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.000000,833.310000,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,,Semi-Urban,176684.91,...,3.000000,627.440000,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,,Rural,97009.18,...,2.000000,833.200000,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,,Urban,109980.00,...,1.711594,760.711195,0,,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,,Semi-Urban,156766.97,...,4.000000,684.320000,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,2.480533,706.340000,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.000000,656.460000,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,,Semi-Urban,95430.73,...,2.000000,865.460000,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


In [38]:
# Interpolate missing categorical values using the most recent non-null value
df19 = df.interpolate(method='pad')
df19

  df18 = df.interpolate(method='pad')


Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,Sales staff,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,Sales staff,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,Sales staff,Urban,109980.00,...,2.0,833.20,0,Inactive,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,Laborers,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,4.0,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,Core staff,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17


<h3>The replace() method</h3> function in pandas is used to replace values in a DataFrame or Series. It allows you to replace specific values with other values, either globally or within specific columns.

In [33]:
import numpy as np
df20 = df.replace(to_replace= np.nan, value= 0)
df20

Unnamed: 0,Customer ID,Name,Gender,Age,Income (USD),Income Stability,Profession,Type of Employment,Location,Loan Amount Request (USD),...,Dependents,Credit Score,No. of Defaults,Has Active Credit Card,Property ID,Property Age,Property Type,Property Location,Co-Applicant,Property Price
0,C-26247,Tandra Olszewski,F,47,3472.69,Low,Commercial associate,Managers,Semi-Urban,137088.98,...,2.0,799.14,0,Unpossessed,843,3472.69,2,Urban,1,236644.5
1,C-35067,Jeannette Cha,F,57,1184.84,Low,Working,Sales staff,Rural,104771.59,...,2.0,833.31,0,Unpossessed,22,1184.84,1,Rural,1,142357.3
2,C-34590,Keva Godfrey,F,52,1266.27,Low,Working,0,Semi-Urban,176684.91,...,3.0,627.44,0,Unpossessed,1,1266.27,1,Urban,1,300991.24
3,C-16668,Elva Sackett,M,65,1369.72,High,Pensioner,0,Rural,97009.18,...,2.0,833.20,0,Inactive,730,1369.72,1,Semi-Urban,0,125612.1
4,C-12196,Sade Constable,F,60,1939.23,High,Pensioner,0,Urban,109980.00,...,0.0,0.00,0,0,356,1939.23,4,Semi-Urban,1,180908
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19995,C-9076,Tobias Davilla,F,19,1349.60,Low,Commercial associate,0,Semi-Urban,156766.97,...,4.0,684.32,0,Inactive,681,1349.60,4,Semi-Urban,1,212778
19996,C-17587,Evelina Hodges,M,22,2019.78,Low,Working,Core staff,Urban,47924.80,...,0.0,706.34,0,Inactive,213,2019.78,4,Urban,1,90816.95
19997,C-46479,Karlyn Mckinzie,M,19,2252.03,Low,Working,Core staff,Semi-Urban,18629.88,...,1.0,656.46,0,Inactive,270,2252.03,2,Rural,0,21566.27
19998,C-3099,Mariana Pulver,F,21,1845.35,Low,Working,0,Semi-Urban,95430.73,...,2.0,865.46,0,Unpossessed,489,1845.35,1,Semi-Urban,1,120281.17
