William Rodemoyer
https://www.kaggle.com/datasets/wenruliu/adult-income-dataset
An individual’s annual income results from various factors. Intuitively, it is influenced by the individual’s education level, age, gender, occupation, and etc.
Of the individuals who make greater than 50k, clearly are older in age on average.
Both Genders who make more than 50k, work more hours per week than those who make 50k or less.
On average, males work more hours per week compared to females when comparing their respectable income categories.
I used 2 types of models with multiple variants to predict the income
- Logistic Regression Model
- Base
- Tuned
- Over Sample
- Under Sample
- PCA
- KNN Model
-
Base
-
Tuned
-
Over Sample
-
Under Sample
-
PCA
Model Name Precision Recall F1 Score Accuracy LR Tuned Train 0.737003 0.602072 0.662740 0.852826 LR Tuned Test 0.732665 0.605663 0.663138 0.853815 Model Name Precision Recall F1 Score Accuracy KNN Tuned Train 0.773205 0.586132 0.666796 0.859307 KNN Tuned Test 0.725936 0.562500 0.633852 0.845611
-
After comparing the models, my recommendation is:
- Logistic Regression Model w/ OverSampling
- This model gave us an 81% chance of correctly predicting an individuals income.
- We did have other models that scored higher than 81%
- But this model was the most balanced.
For any additional questions, please contact wrodemoyer@gmail.com

