**Data Processing Funnel**

Raw Data -> Handle missing values -> Encode Categorical Value -> Feature Scaling -> Split Data -> Prepared Data

In [14]:
import pandas as pd

data = {
    'Name' : ['Pavan','Kapil','Lalit','Ishan','Om'],
    'Age' : [25,None,44,23,None],
    'Salary' : [50000,60000,70000,None,None]
}

df = pd.DataFrame(data)
print("Original Dataframe")
print(df)

Original Dataframe
    Name   Age   Salary
0  Pavan  25.0  50000.0
1  Kapil   NaN  60000.0
2  Lalit  44.0  70000.0
3  Ishan  23.0      NaN
4     Om   NaN      NaN


In [15]:
print(df.isnull().sum())
df_drop = df.dropna()
print(df_drop)

Name      0
Age       2
Salary    2
dtype: int64
    Name   Age   Salary
0  Pavan  25.0  50000.0
2  Lalit  44.0  70000.0


In [10]:
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].mean(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Salary'].fillna(df['Salary'].mean(), inplace=True)


When you do:

df['Age']


Pandas sometimes returns a view or sometimes a copy.
If it's a copy, then:

df['Age'].fillna(..., inplace=True)


will update the copy â€” not the original DataFrame ðŸ˜¬

Thatâ€™s unsafe and unpredictable, so Pandas is changing this behavior.

Correct Way (Recommended)

Instead of using inplace=True, 

do this:

In [None]:
df['Age'] = df['Age'].fillna(df['Age'].mean())
df['Salary'] = df['Salary'].fillna(df['Salary'].mean())
print(df)

# OR 
# df.fillna({
#     'Age': df['Age'].mean(),
#     'Salary': df['Salary'].mean()
# }, inplace=True)


    Name        Age   Salary
0  Pavan  25.000000  50000.0
1  Kapil  30.666667  60000.0
2  Lalit  44.000000  70000.0
3  Ishan  23.000000  60000.0
4     Om  30.666667  60000.0


To Check what amount of data is missing

In [17]:
nullamount = df.isnull().mean()*100
print(nullamount)

Name      0.0
Age       0.0
Salary    0.0
dtype: float64
