In exercise folder (same level as this notebook on github) there is hiring.csv. This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors, HR will decide the salary. Given this data, you need to build a machine learning model for HR department that can help them decide salaries for future candidates. Using this predict salaries for following candidates,

2 yr experience, 9 test score, 6 interview score

12 yr experience, 10 test score, 10 interview score

Answer
53713.86 and 93747.79

In [6]:
import pandas as pd
import numpy as np
from sklearn import linear_model
from word2number import w2n

In [7]:
df = pd.read_csv('hiring.csv')
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   experience                  6 non-null      object 
 1   test_score(out of 10)       7 non-null      float64
 2   interview_score(out of 10)  8 non-null      int64  
 3   salary($)                   8 non-null      int64  
dtypes: float64(1), int64(2), object(1)
memory usage: 384.0+ bytes


In [9]:
import math
df.experience = df.experience.fillna('zero')
median_test_score = math.floor(df['test_score(out of 10)'].mean())
median_test_score

7

In [10]:
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(median_test_score)
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,zero,8.0,9,50000
1,zero,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,7.0,7,72000
7,eleven,7.0,8,80000


In [11]:
df.experience = df.experience.apply(w2n.word_to_num)
df

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,0,8.0,9,50000
1,0,8.0,6,45000
2,5,6.0,7,60000
3,2,10.0,10,65000
4,7,9.0,6,70000
5,3,7.0,10,62000
6,10,7.0,7,72000
7,11,7.0,8,80000


In [12]:
reg = linear_model.LinearRegression()
reg.fit(df[['experience', 'test_score(out of 10)', 'interview_score(out of 10)']], df['salary($)'])


In [13]:
reg.coef_

array([2922.26901502, 2221.30909959, 2147.48256637])

In [14]:
reg.intercept_

14992.65144669314

In [15]:
reg.predict([[2,9,6]])



array([53713.86677124])

In [16]:
reg.predict([[12,10,10]])



array([93747.79628651])

# pickel to save the trained model to avoid repeated training and save time.
Its a python module

In [17]:
import pickle

In [18]:
model = reg

## Save model to a File Using Python Pickle

In [21]:
with open('model_pickle', 'wb') as file:
    pickle.dump(model,file)

### Load Saved Model

In [22]:
with open('model_pickle', 'rb') as file:
    mp = pickle.load(file) 

In [23]:
mp.coef_

array([2922.26901502, 2221.30909959, 2147.48256637])

In [24]:
mp.intercept_

14992.65144669314

#### Pickle save the file in the binary formate

## Save Trained Model Using Joblib

In [26]:
import joblib

In [27]:
joblib.dump(model, 'model_joblib')

['model_joblib']

### load saved model

In [28]:
mj = joblib.load('model_joblib')

In [29]:
mj.coef_

array([2922.26901502, 2221.30909959, 2147.48256637])

In [30]:
mj.intercept_

14992.65144669314