# Machine Learning with scikit-learn

## What Is Machine Learning?

* Difference between "Deep Learning" and other ML techniques
* Overview of techniques used in Machine Learning
* Classification vs. Regression vs. Clustering
* Dimensionality Reduction
* Feature Engineering
* Feature Selection
* Categorical vs. Ordinal vs. Continuous variables
* One-hot encoding
* Hyperparameters
* Grid Search
* Metrics

<div><a href="WhatIsML.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>


## Exploring a Data Set

* Looking for anomalies and data integrity problems
* Cleaning data
* Massaging data format to be model-ready
* Choosing features and a target
* Train/test split

<div><a href="Exploring.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Classification

* Choosing a model
* Feature importances
* Cut points in a decision tree
* Comparing multiple classifiers

<div><a href="Classification.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Regression

* Sample data sets in scikit-learn
* Linear regressors
* Probabilistic regressors
* Other regressors

<div><a href="Regression.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Clustering

* Overview of (some) clustering algorithms
* Kmeans clustering
* Agglomerative clustering
* Density based clustering: DBSan and HDBScan
* n_clusters, labels, and predictions
* Visualizing results

<div><a href="Clustering.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Hyperparameters

* Understanding hyperparameters
* Manual search of parameter space
* GridsearchCV
* Attributes of grid search and wrapped model

<div><a href="Hyperparameters.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Feature Engineering and Feature Selection
* Principal Component Analysis (PCA)
* Non-Negative Matrix Factorization (NMF)
* Latent Dirichlet Allocation (LDA)
* Independent component analysis (ICA)
* SelectKBest
* Dimensionality expansion
* Polynomial Features
* One-Hot Encoding
* Scaling with StandardScaler, RobustScaler, MinMaxScaler, Normalizer, and others
* Binning values with quantiles or binarize

<div><a href="Features.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Pipelines

* Feature Selection and Engineering
* Grid search
* Model

<div><a href="Pipelines.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>

## Robust Train/Test Splits 

* cross_val_score
* ShuffleSplit
* KFold, RepeatedKFold, LeaveOneOut, LeavePOut, StratifiedKFold

<div><a href="TrainTest.ipynb"><img src="img/open-notebook.png" align="left"/></a></div>