# Purpose of this model
This model uses data from a quiz done in the Bulgarian IT community. The purpose is to be able to predict salary based on years of experience in the current tech stack and years of experience overall.

<b>To use just write 'regression.predict([[\<current_experience\>, \<overall_experience\>]])'</b>

In [1]:
import numpy as np
import pandas as pd
from sklearn import linear_model

In [2]:
salaries = pd.read_csv("salaryrangesbg.csv")
salaries

Unnamed: 0,salary,curexp,exp
0,5800,2.0,6.0
1,550,10.0,0.0
2,9600,5.0,8.0
3,9000,4.0,7.0
4,5000,2.0,7.0
...,...,...,...
168,2000,1.0,10.0
169,7000,1.0,9.0
170,16000,2.0,8.0
171,6000,5.0,10.0


In [3]:
# We filter the faulty rows (current experience cannot be higher than overall experience, salary cannot be 0)
salaries = salaries[(salaries.curexp <= salaries.exp) & (salaries.salary != 0)]
salaries

Unnamed: 0,salary,curexp,exp
0,5800,2.0,6.0
2,9600,5.0,8.0
3,9000,4.0,7.0
4,5000,2.0,7.0
5,10000,2.0,8.0
...,...,...,...
168,2000,1.0,10.0
169,7000,1.0,9.0
170,16000,2.0,8.0
171,6000,5.0,10.0


In [4]:
# We fill NaN column values
salary_median = round(salaries.salary.median(), 0)
curexp_median = round(salaries.curexp.median(), 0)
exp_median = round(salaries.exp.median(), 0)

salaries.salary = salaries.salary.fillna(salary_median)
salaries.curexp = salaries.curexp.fillna(curexp_median)
salaries.exp = salaries.exp.fillna(exp_median)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  salaries.salary = salaries.salary.fillna(salary_median)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  salaries.curexp = salaries.curexp.fillna(curexp_median)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  salaries.exp = salaries.exp.fillna(exp_median)


In [9]:
salary_median

5700.0

In [10]:
curexp_median

2.0

In [11]:
exp_median

8.0

In [5]:
regression = linear_model.LinearRegression()
regression.fit(salaries[['curexp', 'exp']].values, salaries.salary.values)

In [6]:
regression.coef_

array([ 61.95013477, 217.33760562])

In [7]:
regression.intercept_

4535.613794160036

In [8]:
regression.predict([[0, 4]])

array([5404.96421664])