This repository contains a concise, hands-on study plan to get comfortable applying Scikit-Learn for practical machine learning tasks. The focus is on understanding the core API, building end-to-end workflows, and preparing for code interviews with minimal but effective coverage.
- Understand the Scikit-Learn API and the fit/transform/predict pattern.
- Build reproducible pipelines for preprocessing and modeling.
- Evaluate models with proper train/test splits and metrics.
- Tune models quickly with cross-validation and simple search.
- Apply to small tabular datasets end-to-end.
- Estimators, Transformers, Pipelines
- Train/test split, cross-validation
- Feature preprocessing: scaling, encoding, imputation
- Model evaluation: classification vs regression metrics
- Model selection: GridSearchCV, RandomizedSearchCV
- Saving/loading models (joblib)
Pipeline,ColumnTransformerStandardScaler,OneHotEncoder,SimpleImputertrain_test_split,cross_val_scoreGridSearchCV,RandomizedSearchCV- Baselines:
DummyClassifier,DummyRegressor - Models:
LogisticRegression,RandomForestClassifier,RandomForestRegressor,GradientBoostingRegressor
- Quick API tour with a toy dataset (
iris,bostonalternative:fetch_california_housing). - Preprocessing with
ColumnTransformer(numeric vs categorical). - Pipelines: preprocessing + model in one object.
- Evaluation:
train_test_split,cross_val_score, metrics (accuracy, f1, roc_auc, rmse/mae). - Hyperparameter tuning with
GridSearchCV/RandomizedSearchCV. - Export model with
jobliband reload for inference.