# INFO7390 - Advance Data Science and Architecture

## Project Title: Job Recommendation System
### Teammates:
1. Aniruddha Tambe
2. Shubhankar Salvi
3. Sangram Vuppula

## Importing packages:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import missingno as msno
from icecream import ic
import os
import re
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.figure_factory as ff
from kaleido.scopes.plotly import PlotlyScope
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_selection import SelectKBest
import warnings
warnings.filterwarnings(action='ignore')
pd.options.display.max_columns = 100

# plotly settings and functions
scope = PlotlyScope(plotlyjs="https://cdn.plot.ly/plotly-latest.min.js")
pio.templates.default = 'plotly_white'

## Dataset:

1. Stack Overflow 2018 Developer Survey - Individual responses on the 2018 Developer Survey fielded by Stack Overflow

https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey#survey_results_public.csv

2. U.S. Technology Jobs on Dice.com - 22,000 US-based Technology Job Listings

https://www.kaggle.com/PromptCloudHQ/us-technology-jobs-on-dicecom

In [2]:
survey = pd.read_csv("./dataset/survey_results_public.csv")

## Preliminary data summary

In [3]:
# List all column names
all_col_names = survey.columns.values.tolist()
print('Number of columns: ',len(all_col_names))
#print(all_col_names)

Number of columns:  129


In [4]:
# Get categorical columns
cat_cols = [col for col in survey.columns if survey[col].dtype.name=="object"]
float_cols = [col for col in survey.columns if survey[col].dtype.name=="float64"]
int_cols = [col for col in survey.columns if survey[col].dtype.name=="int64"]
print('Number of categorical columns: ',len(cat_cols))
print('Number of float columns: ',len(float_cols))
print('Number of int columns: ',len(int_cols))

Number of categorical columns:  87
Number of float columns:  41
Number of int columns:  1


## Findings

1. Number of categorical columns:  87
2. Number of float columns:  41
3. Number of int columns:  1
4. Datatypes found: float64(41), int64(1), object(87)
5. Rows x Columns: 98855 x 129

## Dropping irrelevant columns

In [5]:
attrToDrop=[]
survey=survey[survey.columns.difference(attrToDrop)]
print("Number of columns dropped: ",len(attrToDrop))

Number of columns dropped:  102


In [6]:
# Get categorical columns
cat_cols = [col for col in survey.columns if survey[col].dtype.name=="object"]
float_cols = [col for col in survey.columns if survey[col].dtype.name=="float64"]
int_cols = [col for col in survey.columns if survey[col].dtype.name=="int64"]
print('Number of categorical columns: ',len(cat_cols))
print('Number of float columns: ',len(float_cols))
print('Number of int columns: ',len(int_cols))
print('Total number of columns: ',len(survey.columns))

Number of categorical columns:  43
Number of float columns:  29
Number of int columns:  1
Total number of columns:  73


## Export survey to csv

In [13]:
survey.to_csv('./dataset/survey_dropped_columns.csv',index=False)

## Survey

In [14]:
survey.head(10)

Unnamed: 0,Age,AgreeDisagree1,AgreeDisagree2,AgreeDisagree3,AssessBenefits1,AssessBenefits10,AssessBenefits11,AssessBenefits2,AssessBenefits3,AssessBenefits4,AssessBenefits5,AssessBenefits6,AssessBenefits7,AssessBenefits8,AssessBenefits9,AssessJob1,AssessJob10,AssessJob2,AssessJob3,AssessJob4,AssessJob5,AssessJob6,AssessJob7,AssessJob8,AssessJob9,CareerSatisfaction,CommunicationTools,CompanySize,ConvertedSalary,Country,Currency,CurrencySymbol,DatabaseDesireNextYear,DatabaseWorkedWith,Dependents,DevType,EducationTypes,Employment,FormalEducation,FrameworkDesireNextYear,FrameworkWorkedWith,Gender,HopeFiveYears,HoursComputer,IDE,JobEmailPriorities1,JobEmailPriorities2,JobEmailPriorities3,JobEmailPriorities4,JobEmailPriorities5,JobEmailPriorities6,JobEmailPriorities7,JobSatisfaction,JobSearchStatus,LanguageDesireNextYear,LanguageWorkedWith,LastNewJob,Methodology,MilitaryUS,OpenSource,OperatingSystem,PlatformDesireNextYear,PlatformWorkedWith,RaceEthnicity,Respondent,SalaryType,SelfTaughtTypes,Student,UndergradMajor,UpdateCV,VersionControl,YearsCoding,YearsCodingProf
0,25 - 34 years old,Strongly agree,Strongly agree,Neither Agree nor Disagree,,,,,,,,,,,,10.0,6.0,7.0,8.0,1.0,2.0,5.0,3.0,4.0,9.0,Extremely satisfied,Slack,20 to 99 employees,,Kenya,,KES,Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A...,Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A...,Yes,Full-stack developer,"Taught yourself a new language, framework, or ...",Employed part-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Django;React,Django;React,Male,Working as a founder or co-founder of my own c...,9 - 12 hours,Komodo;Vim;Visual Studio Code,5.0,6.0,7.0,2.0,1.0,4.0,3.0,Extremely satisfied,"I’m not actively looking, but I am open to new...",JavaScript;Python;HTML;CSS,JavaScript;Python;HTML;CSS,Less than a year ago,Agile;Scrum,,No,Linux-based,AWS;Azure;Linux;Firebase,AWS;Azure;Linux;Firebase,Black or of African descent,1,Monthly,The official documentation and/or standards fo...,No,Mathematics or statistics,My job status or other personal status changed,Git,3-5 years,3-5 years
1,35 - 44 years old,Agree,Agree,Neither Agree nor Disagree,1.0,2.0,8.0,5.0,3.0,7.0,10.0,4.0,11.0,9.0,6.0,1.0,9.0,7.0,10.0,8.0,2.0,5.0,4.0,3.0,6.0,Neither satisfied nor dissatisfied,Confluence;Office / productivity suite (Micros...,"10,000 or more employees",70841.0,United Kingdom,British pounds sterling (£),GBP,PostgreSQL,Redis;PostgreSQL;Memcached,Yes,Database administrator;DevOps specialist;Full-...,"Taught yourself a new language, framework, or ...",Employed full-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",React,Django,Male,Working in a different or more specialized tec...,5 - 8 hours,IPython / Jupyter;Sublime Text;Vim,1.0,3.0,4.0,5.0,2.0,6.0,7.0,Moderately dissatisfied,I am actively looking for a job,Go;Python,JavaScript;Python;Bash/Shell,More than 4 years ago,,,Yes,Linux-based,Linux,Linux,White or of European descent,3,Yearly,The official documentation and/or standards fo...,No,"A natural science (ex. biology, chemistry, phy...",I saw an employer’s advertisement,Git;Subversion,30 or more years,18-20 years
2,,,,,,,,,,,,,,,,,,,,,,,,,,Moderately satisfied,,20 to 99 employees,,United States,,,,,,Engineering manager;Full-stack developer,,Employed full-time,Associate degree,,,,Working as a founder or co-founder of my own c...,,,,,,,,,,Moderately satisfied,"I’m not actively looking, but I am open to new...",,,Less than a year ago,,,Yes,,,,,4,,,No,"Computer science, computer engineering, or sof...",,,24-26 years,6-8 years
3,35 - 44 years old,Disagree,Disagree,Strongly disagree,,,,,,,,,,,,,,,,,,,,,,Slightly dissatisfied,,100 to 499 employees,,United States,U.S. dollars ($),,"SQL Server;Microsoft Azure (Tables, CosmosDB, ...","SQL Server;Microsoft Azure (Tables, CosmosDB, ...",No,Full-stack developer,Completed an industry certification program (e...,Employed full-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Angular;.NET Core;React,,Male,Working as a founder or co-founder of my own c...,9 - 12 hours,Visual Studio;Visual Studio Code,,,,,,,,Neither satisfied nor dissatisfied,"I’m not actively looking, but I am open to new...",C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell,C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell,Less than a year ago,Agile;Kanban;Scrum,No,No,Windows,Azure,Azure,White or of European descent,5,,The official documentation and/or standards fo...,No,"Computer science, computer engineering, or sof...",A recruiter contacted me,Git,18-20 years,12-14 years
4,18 - 24 years old,Strongly agree,Agree,Strongly disagree,1.0,9.0,6.0,10.0,2.0,4.0,8.0,3.0,11.0,7.0,5.0,8.0,9.0,5.0,7.0,1.0,2.0,6.0,4.0,3.0,10.0,Moderately satisfied,"Office / productivity suite (Microsoft Office,...","10,000 or more employees",21426.0,South Africa,South African rands (R),ZAR,PostgreSQL;Oracle;IBM Db2,SQL Server;PostgreSQL;Oracle;IBM Db2,Yes,Data or business analyst;Desktop or enterprise...,Taken a part-time in-person course in programm...,Employed full-time,Some college/university study without earning ...,,,Male,Working in a different or more specialized tec...,Over 12 hours,Notepad++;Visual Studio;Visual Studio Code,7.0,3.0,6.0,2.0,1.0,4.0,5.0,Slightly satisfied,"I’m not actively looking, but I am open to new...",Assembly;C;C++;Matlab;SQL;Bash/Shell,C;C++;Java;Matlab;R;SQL;Bash/Shell,Between 1 and 2 years ago,Evidence-based software engineering;Formal sta...,,No,Windows,Arduino;Windows Desktop or Server,Arduino;Windows Desktop or Server,White or of European descent,7,Yearly,The official documentation and/or standards fo...,"Yes, part-time","Computer science, computer engineering, or sof...",My job status or other personal status changed,Zip file back-ups,6-8 years,0-2 years
5,18 - 24 years old,Disagree,Neither Agree nor Disagree,Strongly disagree,1.0,7.0,8.0,3.0,4.0,10.0,9.0,2.0,6.0,5.0,11.0,8.0,7.0,5.0,4.0,9.0,1.0,3.0,6.0,2.0,10.0,Slightly satisfied,Confluence;Jira;Office / productivity suite (M...,10 to 19 employees,41671.0,United Kingdom,British pounds sterling (£),GBP,PostgreSQL,MongoDB,No,Back-end developer;Database administrator;Fron...,Received on-the-job training in software devel...,Employed full-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",Node.js,Angular;Node.js,Male,Working in a different or more specialized tec...,9 - 12 hours,IntelliJ;PyCharm;Visual Studio Code,2.0,6.0,7.0,3.0,1.0,5.0,4.0,Moderately satisfied,I am actively looking for a job,C#;Go;Java;JavaScript;Python;SQL;TypeScript;HT...,Java;JavaScript;Python;TypeScript;HTML;CSS,Between 2 and 4 years ago,Agile,,No,Linux-based,Linux,Linux,White or of European descent,8,,The official documentation and/or standards fo...,No,"Computer science, computer engineering, or sof...",I did not receive an expected change in compen...,Git,6-8 years,3-5 years
6,18 - 24 years old,Disagree,Agree,Strongly disagree,1.0,10.0,5.0,3.0,2.0,9.0,11.0,4.0,8.0,6.0,7.0,5.0,6.0,3.0,9.0,4.0,1.0,8.0,2.0,7.0,10.0,Moderately satisfied,Confluence;Office / productivity suite (Micros...,"10,000 or more employees",120000.0,United States,U.S. dollars ($),USD,,MongoDB,No,Back-end developer;Front-end developer;Full-st...,Received on-the-job training in software devel...,Employed full-time,Some college/university study without earning ...,React;TensorFlow,Node.js;React,Male,Working as a founder or co-founder of my own c...,Over 12 hours,Atom;Visual Studio Code,1.0,5.0,3.0,4.0,2.0,6.0,7.0,Slightly satisfied,"I’m not actively looking, but I am open to new...",C;Go;JavaScript;Python;HTML;CSS,JavaScript;HTML;CSS,Less than a year ago,Agile;Scrum,No,Yes,MacOS,Linux,Linux,White or of European descent,9,Yearly,The official documentation and/or standards fo...,No,"Computer science, computer engineering, or sof...",My job status or other personal status changed,Git,9-11 years,0-2 years
7,25 - 34 years old,Strongly agree,Strongly disagree,Neither Agree nor Disagree,1.0,10.0,8.0,3.0,5.0,7.0,6.0,2.0,11.0,9.0,4.0,6.0,3.0,5.0,4.0,2.0,7.0,8.0,10.0,1.0,9.0,Moderately satisfied,Facebook;Google Hangouts/Chat;Office / product...,10 to 19 employees,,Nigeria,,,,"MongoDB;MySQL;Microsoft Azure (Tables, CosmosD...",No,Designer;Front-end developer;QA or test developer,Taken an online course in programming or softw...,Employed full-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",.NET Core;Django,Angular;Node.js,Female,Working as a founder or co-founder of my own c...,Over 12 hours,Atom;Notepad++;Sublime Text;Visual Studio Code,2.0,6.0,1.0,3.0,7.0,5.0,4.0,Slightly satisfied,"I’m not actively looking, but I am open to new...",Matlab;SQL;Kotlin;Bash/Shell,JavaScript;TypeScript;HTML;CSS,Less than a year ago,Agile;Extreme programming (XP);Scrum,,Yes,Windows,Amazon Echo;Android;Apple Watch or Apple TV;AW...,Azure;Heroku,Black or of African descent,10,,,No,"Computer science, computer engineering, or sof...",I saw an employer’s advertisement,Git,0-2 years,3-5 years
8,35 - 44 years old,Strongly agree,Strongly disagree,Strongly disagree,1.0,7.0,6.0,3.0,2.0,9.0,11.0,5.0,8.0,4.0,10.0,6.0,2.0,3.0,7.0,4.0,1.0,5.0,10.0,8.0,9.0,Moderately satisfied,Confluence;HipChat;Jira;Office / productivity ...,100 to 499 employees,250000.0,United States,U.S. dollars ($),USD,Redis;PostgreSQL;Amazon DynamoDB;Apache Hive;A...,Redis;PostgreSQL;Amazon DynamoDB;Apache HBase;...,Yes,"Back-end developer;C-suite executive (CEO, CTO...",Taken an online course in programming or softw...,Employed full-time,Some college/university study without earning ...,,Hadoop;Node.js;React;Spark,Male,Doing the same work,9 - 12 hours,IntelliJ;PyCharm;Sublime Text;Vim,3.0,7.0,2.0,4.0,1.0,6.0,5.0,Moderately satisfied,"I’m not actively looking, but I am open to new...",Erlang;Go;Python;Rust;SQL,Assembly;CoffeeScript;Erlang;Go;JavaScript;Lua...,Between 2 and 4 years ago,Agile;Evidence-based software engineering;Extr...,No,Yes,MacOS,AWS;Linux;Mac OS;Serverless,Amazon Echo;AWS;iOS;Linux;Mac OS;Serverless,White or of European descent,11,Yearly,The official documentation and/or standards fo...,No,Fine arts or performing arts (ex. graphic desi...,My job status or other personal status changed,Git,30 or more years,21-23 years
9,,,,,,,,,,,,,,,,,,,,,,,,,,,,500 to 999 employees,,India,,,,,,Designer,,Employed full-time,"Bachelor’s degree (BA, BS, B.Eng., etc.)",,,,,,,,,,,,,,,,,,,,,Yes,,,,,16,,,No,"Computer science, computer engineering, or sof...",,,0-2 years,
