## <font color="maroon"><h4 align="center">Handling Missing Data - replace method</font>

In [30]:
import pandas as pd
import numpy as np
df = pd.read_csv("weather_data.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


## **Replacing single value**

**Replace a single value in a specific column**

df['column_name'] = df['column_name'].replace(-99999, 0)

**Replace a single value in the entire DataFrame**

df = df.replace(-99999, 0)

(NaN is a more standardized and convenient way to represent missing or undefined numerical data.)

In [31]:
new_df = df.replace(-99999, value=np.NaN)
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


## **Replacing multiple values**

<font color='green'>Replace multiple values in a specific column using a list</font>

df['column_name'] = df['column_name'].replace([-99999, -88888], 0)

<font color='green'>Replace multiple values in the entire DataFrame using a list</font>

df = df.replace([-99999, -88888], 0)


.

<font color='green'>Replace multiple values in a specific column using dict</font>

df['column_name'] = df['column_name'].replace({-99999: 0, -88888: 1})

<font color='green'>Replace multiple values in the entire DataFrame using dict</font>

df = df.replace({-99999: 0, -88888: 1})


In [32]:
new_df = df.replace(to_replace=[-99999,-88888], value=0)   #this to_replace & value parameters name are optional
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,0,7,Sunny
2,1/3/2017,28,0,Snow
3,1/4/2017,0,7,0
4,1/5/2017,32,0,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


## **Replacing values in multiple columns**

In [33]:
new_df = df.replace({
        'temperature': [-99999,32],
        'windspeed': -99999,
        'event': '0'
    }, np.nan)
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,
4,1/5/2017,,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,


**Replacing by using mapping** 

In [39]:
new_df = df.replace({
        -99999: np.NaN,
        'no event': 'Sunny',
    })
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32.0,6.0,Rain
1,1/2/2017,,7.0,Sunny
2,1/3/2017,28.0,,Snow
3,1/4/2017,,7.0,0
4,1/5/2017,32.0,,Rain
5,1/6/2017,31.0,2.0,Sunny
6,1/6/2017,34.0,5.0,0


**Regex**

In [40]:
# when windspeed is 6 mph, 7 mph etc. & temperature is 32 F, 28 F etc.,if we use regex ,then it will return values without that mph,F etc..
new_df = df.replace({'temperature': '[A-Za-z]', 'windspeed': '[a-z]'},'', regex=True) 
new_df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,-99999,7,Sunny
2,1/3/2017,28,-99999,Snow
3,1/4/2017,-99999,7,0
4,1/5/2017,32,-99999,Rain
5,1/6/2017,31,2,Sunny
6,1/6/2017,34,5,0


**Replacing list with another list**

In [36]:
df_1 = pd.DataFrame({
    'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
    'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica']
})
df_1

Unnamed: 0,score,student
0,exceptional,rob
1,average,maya
2,good,parthiv
3,poor,tom
4,average,julian
5,exceptional,erica


In [37]:
df_1.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4])

Unnamed: 0,score,student
0,4,rob
1,2,maya
2,3,parthiv
3,1,tom
4,2,julian
5,4,erica
