<a href="https://colab.research.google.com/github/umiSirya/General-Data-analysis/blob/main/MNAR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***Missing Not At Random:***
### When data is **missing not at random** (MNAR) the likelihood of a missing observation is related to its values. It can be difficult to identify MNAR data because the values of missing data are unobserved. This can result in distorted data.

eg 1. On a health survey, illicit drug users are less likely to respond to a question about illicit drug use.
  2. Individuals surveyed about their age are more likely to leave the age question blank when they are older.

In [None]:
import pandas as pd
import numpy as np
# Sample DataFrame with missing values that may be MNAR
data = {
    'Age': [25, 30, np.nan, 45, 50],
    'Income': [50000, 60000, np.nan, np.nan, 75000],
    'Marital_Status': ['Single', 'Married', 'Single', np.nan, 'Married']
}
df = pd.DataFrame(data)
print("Original DataFrame with missing values:")
print(df)
# Strategy 1: Labeling Missing Data as MNAR
df['Income_Missing'] = df['Income'].isnull().astype(int)
print("\nDataFrame after adding an indicator column for missing Income values:")
print(df)
# Strategy 2: Imputing based on assumption
# Here, we assume MNAR values in Income and use median imputation as an example
df['Income'].fillna(df['Income'].median(), inplace=True)
print("\nDataFrame after imputing missing Income values with median:")
print(df)

Original DataFrame with missing values:
    Age   Income Marital_Status
0  25.0  50000.0         Single
1  30.0  60000.0        Married
2   NaN      NaN         Single
3  45.0      NaN            NaN
4  50.0  75000.0        Married

DataFrame after adding an indicator column for missing Income values:
    Age   Income Marital_Status  Income_Missing
0  25.0  50000.0         Single               0
1  30.0  60000.0        Married               0
2   NaN      NaN         Single               1
3  45.0      NaN            NaN               1
4  50.0  75000.0        Married               0

DataFrame after imputing missing Income values with median:
    Age   Income Marital_Status  Income_Missing
0  25.0  50000.0         Single               0
1  30.0  60000.0        Married               0
2   NaN  60000.0         Single               1
3  45.0  60000.0            NaN               1
4  50.0  75000.0        Married               0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Income'].fillna(df['Income'].median(), inplace=True)


https://www.youtube.com/watch?v=rDZWtgOH124