This repository contains a concise, hands-on study plan to get comfortable applying Scikit-Learn for practical machine learning tasks. The focus is on understanding the core API, building end-to-end workflows, and preparing for code interviews with minimal but effective coverage.
- Understand the Scikit-Learn API and the fit/transform/predict pattern.
- Build reproducible pipelines for preprocessing and modeling.
- Evaluate models with proper train/test splits and metrics.
- Tune models quickly with cross-validation and simple search.
- Apply to small tabular datasets end-to-end.
- Estimators, Transformers, Pipelines
- Train/test split, cross-validation
- Feature preprocessing: scaling, encoding, imputation
- Model evaluation: classification vs regression metrics
- Model selection: GridSearchCV, RandomizedSearchCV
- Saving/loading models (joblib)
Pipeline
,ColumnTransformer
StandardScaler
,OneHotEncoder
,SimpleImputer
train_test_split
,cross_val_score
GridSearchCV
,RandomizedSearchCV
- Baselines:
DummyClassifier
,DummyRegressor
- Models:
LogisticRegression
,RandomForestClassifier
,RandomForestRegressor
,GradientBoostingRegressor
- Quick API tour with a toy dataset (
iris
,boston
alternative:fetch_california_housing
). - Preprocessing with
ColumnTransformer
(numeric vs categorical). - Pipelines: preprocessing + model in one object.
- Evaluation:
train_test_split
,cross_val_score
, metrics (accuracy, f1, roc_auc, rmse/mae). - Hyperparameter tuning with
GridSearchCV
/RandomizedSearchCV
. - Export model with
joblib
and reload for inference.