This is one of my first Machine Learning team projects. In this project using ML we are trying to predict whether a person survived or not.
- LogisticRegression
- KNN
- Gradient Boosting
- Random Forest
- SVC All algorithms were used from the scikit-learn open source library.
- Data Imputation: MeanMedianImputer, ArbitraryNumberImputer, CategoricalImputer
- Feature Scaling: MinMaxScaler, StandardScaler, RobustScaler
- Encoding: RareLabelEncoder, OrdinalEncoder
- Seaborn
- Matplotlib
- Accuracy
- Recall
- Roc-auc
- F1
- Precision
- Confusion Matrix
- Pipeline
- Pandas
- ColumnTransformer
- GridSearch
Gradient Boosting Classifier shows a little bit better performance than other algorithms, but it has a huge overfit problem.
So we decided to take Logistic Regression, because this algorithm shows good performance without overfit problem.
Also, we obtain feature importances from coefficients and found that 'sex' and 'pclass' had the greatest impact on prediction .