### For complete Notes and videos follow on : https://bhaarathi-ai.blogspot.com/

### Deleting Missing values

<h2 style="color:blue; font-weight:bold;">Importing required libraries and Data set Loading</h2>

In [1]:
# Importing required libraries and Data set Loading
import pandas as pd
import numpy as np

data = {
    'Name': ['Venky', 'Ram', 'Sandeep', 'Syam', 'Hema', 'Priya'],
    'Math': [90, 85, np.nan, 92, np.nan, 88],
    'Science': [78, np.nan, 88, 92, 80, 85],
    'English': [85, 90, 92, np.nan, 88, 82]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Math,Science,English
0,Venky,90.0,78.0,85.0
1,Ram,85.0,,90.0
2,Sandeep,,88.0,92.0
3,Syam,92.0,92.0,
4,Hema,,80.0,88.0
5,Priya,88.0,85.0,82.0


<h2 style="color:blue; font-weight:bold;">1. Removing columns with any missing values</h2>

In [2]:
# Removing columns with any missing values
df_removed_cols = df.dropna(axis=1, how='any')
print("After Removing Columns with Any Missing Values:")
print(df_removed_cols)

After Removing Columns with Any Missing Values:
      Name
0    Venky
1      Ram
2  Sandeep
3     Syam
4     Hema
5    Priya


<h2 style="color:blue; font-weight:bold;">2. Removing rows with missing values</h2>

In [3]:
# Removing rows with missing values
df_removed_rows = df.dropna()
print("\nAfter Removing Rows with Missing Values:")
print(df_removed_rows)


After Removing Rows with Missing Values:
    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">3. Removing a column if any single missing value present in that</h2>

In [4]:
# Removing a column if any single missing value present in that
df_removed_col_any_missing = df.dropna(axis=1, how='any')
print("\nAfter Removing Columns with Any Single Missing Value:")
print(df_removed_col_any_missing)


After Removing Columns with Any Single Missing Value:
      Name
0    Venky
1      Ram
2  Sandeep
3     Syam
4     Hema
5    Priya


<h2 style="color:blue; font-weight:bold;">4. Removing a row if any single missing value present in that</h2>

In [5]:
# Removing a row if any single missing value present in that
df_removed_row_any_missing = df.dropna(axis=0, how='any')
print("\nAfter Removing Rows with Any Single Missing Value:")
print(df_removed_row_any_missing)


After Removing Rows with Any Single Missing Value:
    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">5. Removing a column when only all the values are missing</h2>

In [6]:
# Removing a column when only all the values are missing
df_removed_col_all_missing = df.dropna(axis=1, how='all')
print("\nAfter Removing Columns with All Missing Values:")
print(df_removed_col_all_missing)


After Removing Columns with All Missing Values:
      Name  Math  Science  English
0    Venky  90.0     78.0     85.0
1      Ram  85.0      NaN     90.0
2  Sandeep   NaN     88.0     92.0
3     Syam  92.0     92.0      NaN
4     Hema   NaN     80.0     88.0
5    Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">6. Removing a row when only all the values are missing</h2>

In [7]:
# Removing a row when only all the values are missing
df_removed_row_all_missing = df.dropna(axis=0, how='all')
print("\nAfter Removing Rows with All Missing Values:")
print(df_removed_row_all_missing)


After Removing Rows with All Missing Values:
      Name  Math  Science  English
0    Venky  90.0     78.0     85.0
1      Ram  85.0      NaN     90.0
2  Sandeep   NaN     88.0     92.0
3     Syam  92.0     92.0      NaN
4     Hema   NaN     80.0     88.0
5    Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">7. Drop the 'Math' column if it contains any missing values</h2>

In [8]:
# Drop the 'Math' column if it contains any missing values
df_without_math = df.drop('Math', axis=1)
print("After Removing 'Math' Column:")
print(df_without_math)

After Removing 'Math' Column:
      Name  Science  English
0    Venky     78.0     85.0
1      Ram      NaN     90.0
2  Sandeep     88.0     92.0
3     Syam     92.0      NaN
4     Hema     80.0     88.0
5    Priya     85.0     82.0


<h2 style="color:blue; font-weight:bold;">8. Drop Multiple Columns 'Science' and 'Math'if they contain any missing values</h2>

In [9]:
# Drop the 'Science' and 'Math' columns if they contain any missing values
df_without_missing_cols = df.drop(['Science', 'Math'], axis=1)
print("After Removing 'Science' and 'Math' Columns with Missing Values:")
print(df_without_missing_cols)

After Removing 'Science' and 'Math' Columns with Missing Values:
      Name  English
0    Venky     85.0
1      Ram     90.0
2  Sandeep     92.0
3     Syam      NaN
4     Hema     88.0
5    Priya     82.0


<h2 style="color:blue; font-weight:bold;">9. For-loop to drop only rows with missing values</h2>

In [10]:
# Iterate over rows and drop rows with missing values
for index, row in df.iterrows():
    if row.isnull().any():
        df = df.drop(index)

# Display the modified dataset after dropping rows with missing values
print("\nDataset After Dropping Rows with Missing Values:")
print(df)


Dataset After Dropping Rows with Missing Values:
    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">10. lambda expression to drop rows with missing values</h2>

In [11]:
# Using a lambda expression to drop rows with missing values
df_without_missing_rows = df.apply(lambda x: x.dropna(), axis=1)
print("After Dropping Rows with Missing Values:")
print(df_without_missing_rows)

After Dropping Rows with Missing Values:
    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">11. Drop the particular single column using "if Condition" if it contains any missing values:</h2>

In [12]:
if df['Math'].isnull().any():
    df.drop('Math', axis=1, inplace=True)
print(df)

    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">12. Drop the Selected columns using "if Condition" if they contains any missing values:</h2>

In [13]:
columns_to_check = ['Math', 'Science']

if df[columns_to_check].isnull().any().any():
    df.drop(columns_to_check, axis=1, inplace=True)

print(df)

    Name  Math  Science  English
0  Venky  90.0     78.0     85.0
5  Priya  88.0     85.0     82.0


<h2 style="color:blue; font-weight:bold;">Conclusion:</h2>

In conclusion, managing missing values in a dataset is a critical aspect of data preprocessing. The choice between deletion and imputation methods depends on the nature and extent of missing data, the underlying data distribution, and the objectives of the analysis or modeling task. Striking a balance between preserving valuable information and mitigating the impact of missing values on downstream tasks is essential. Additionally, careful consideration of the reasons behind missing values, coupled with domain knowledge, can inform a more informed and context-specific approach to handling missing data. Ultimately, transparency in documenting the chosen strategies contributes to the reproducibility and reliability of data analyses and machine learning models.