Skip to content

vanle2000/Data-Science-Salary-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Data Science Salary Prediction

In the dynamic and ever-evolving realm of data science, one of the most pressing questions for both job seekers and employers alike is the determination of fair and competitive salaries. The Data Scientist Salary Prediction project embarks on a journey to harness the power of advanced machine learning techniques to provide accurate salary estimates based on a multitude of influential factors.

This project is dedicated to creating a robust and accurate method for predicting data scientist salaries. I base our predictions on essential attributes such as work year, experience level, employment type, job title, employee residence, remote work ratio, company location, and company size. I'll explore these attributes to create a predictive tool that assists job seekers in evaluating potential compensation and helps employers offer competitive salary packages.

Throughout this project, I will focus on data processing, feature engineering, and rigorous testing of various machine learning algorithms. By the end, I aim to deliver a practical and dependable solution that simplifies the salary estimation process for data scientists, promoting fairness and competitiveness in the job market.

Table of Contents

Introduction

Predicting data science salaries is a common problem in the job market. This project explores the use of machine learning algorithms to forecast salaries based on factors that influence compensation.

Dataset

The dataset used for this project includes information about data science job positions and their corresponding salaries. It contains features such as work year, experience level, employment type, job title, employee residence, remote work ratio, company location, and company size.

Installation

  1. Clone this repository to your local machine.

git clone https://github.com/your-username/data-science-salary-prediction.git cd data-science-salary-prediction

  1. Install the required Python packages using pip.

pip install -r requirements.txt

Usage

  1. Ensure you have the dataset (ds_salaries.csv) in the project directory.

  2. Run the main prediction script.

python predict_salary.py

  1. The script will train and evaluate both Decision Tree Regressor and Random Forest Regressor models, providing insights into their predictive capabilities.

Models

Multilinear Regression

Decision Tree Regressor

The Decision Tree Regressor is a single decision tree-based model designed to predict salaries. It constructs a tree structure where each internal node represents a decision based on a specific feature, leading to a final salary prediction in the leaf nodes.

Random Forest Regressor

The Random Forest Regressor is an ensemble model consisting of multiple decision trees. It aggregates the predictions from individual trees to arrive at a final prediction. This approach often results in improved accuracy and generalization.

Gradient Boosting Regressor

Evaluation

The performance of the models is evaluated using the Mean Squared Error (MSE) metric on an independent test dataset. Lower MSE values indicate better predictive accuracy.

Contributing

Contributions to this project are highly encouraged! If you have suggestions for enhancements or new features, please feel free to open an issue or submit a pull request. Your insights and contributions can help advance the project.


About

Predict Salary for Data Science related Jobs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published