```
1. Supervised learning
1.1. Generalized Linear Models
1.2. Linear and Quadratic Discriminant Analysis
1.3. Kernel ridge regression
1.4. Support Vector Machines
1.5. Stochastic Gradient Descent
1.6. Nearest Neighbors
1.7. Gaussian Processes
1.8. Cross decomposition
1.9. Naive Bayes
1.10. Decision Trees
1.11. Ensemble methods
1.12. Multiclass and multilabel algorithms
1.13. Feature selection
1.14. Semi-Supervised
1.15. Isotonic regression
1.16. Probability calibration
1.17. Neural network models (supervised)
2. Unsupervised learning
2.1. Gaussian mixture models
2.2. Manifold learning
2.3. Clustering
2.4. Biclustering
2.5. Decomposing signals in components (matrix factorization problems)
2.6. Covariance estimation
2.7. Novelty and Outlier Detection
2.8. Density Estimation
2.9. Neural network models (unsupervised)
3. Model selection and evaluation
3.1. Cross-validation: evaluating estimator performance
3.2. Tuning the hyper-parameters of an estimator
3.3. Model evaluation: quantifying the quality of predictions
3.4. Model persistence
3.5. Validation curves: plotting scores to evaluate models
4. Inspection
4.1. Partial dependence plots
5. Dataset transformations
5.1. Pipelines and composite estimators
5.2. Feature extraction
5.3. Preprocessing data
5.4. Imputation of missing values
5.5. Unsupervised dimensionality reduction
5.6. Random Projection
5.7. Kernel Approximation
5.8. Pairwise metrics, Affinities and Kernels
5.9. Transforming the prediction target (y)
6. Dataset loading utilities
6.1. General dataset API
6.2. Toy datasets
6.3. Real world datasets
6.4. Generated datasets
6.5. Loading other datasets
7. Computing with scikit-learn
7.1. Strategies to scale computationally: bigger data
7.2. Computational Performance
7.3. Parallelism, resource management, and configuration
```

# Supervised learning

## 线性回归

### ordinary least squares

损失函数：

$$
min\Vert{Xw-y}\Vert_2^2
$$

优化方法：
1. 梯度下降法
2. 最小二乘法

验证方法：



### Ridge regression
损失函数：
$$
min\Vert{Xw-y}\Vert_2^2+\alpha\Vert{w}\Vert_2^2
$$
优化方法：

### Lasso 
损失函数：
$$
min\frac{1}{2n_{samples}}\Vert{Xw-y}\Vert_2^2+\alpha\Vert{w}\Vert_1
$$
优化方法：

### Elastic Net
损失函数：
$$min
\frac{1}{2n_{samples}}\Vert{Xw-y}\Vert_2^2+\alpha\rho\Vert{w}\Vert_1+\alpha\frac{1-\rho}{2}\Vert{w}\Vert_2^2
$$
优化方法：

### Multi-task Lasso
损失函数：
$$
min\frac{1}{2n_{samples}}\Vert{Xw-y}\Vert_{Fro}^2+\alpha\Vert{w}\Vert_{21}
$$
其中
$$
\Vert{A}\Vert_{Fro}=\sqrt{\sum_{ij}{x_{ij}^2}}
$$
且
$$
\Vert{A}\Vert_{21}=\sum_i\sqrt{\sum_j{x_{ij}^2}}
$$
优化方法：

### Multi-task Elestic-Net
损失函数：
$$
min\frac{1}{2n_{samples}}\Vert{Xw-y}\Vert_{Fro}^2+\alpha\rho\Vert{w}\Vert_{21}+\alpha\frac{1-\rho}{2}\Vert{w}\Vert_{Fro}^2
$$

### Orthogonal Matching Pursuit
损失函数：
$$
argmin_\gamma\Vert{y-X\gamma}\Vert_2^2\quad subject\quad to\quad \Vert{\gamma}\Vert_0\le{n_{nonzero\_coef}}
$$

### Bayesian regression
损失函数：
？？

模型：
$$
p(y\vert{X,w,\alpha})=\aleph(y\vert{Xw,\alpha})
$$

#### Bayesian Ridge regression
模型：
$$
p(w\vert\lambda)=\aleph(0, \lambda^{-1}I)
$$

#### Automatic Relevance Determination
模型：
$$
p(w\vert{\lambda})=\aleph(w\vert{0,A^{-1}})
$$

其中：
$$
diag(A)=\lambda=\{\lambda_1,\lambda_2,\ldots,\lambda_p\}
$$

### Logistic regression

## Probability Calibration

 依据Brier's score来进行概率校准

## Feature selection

GenericUnivariateSelect([…])	Univariate feature selector with configurable strategy.
SelectPercentile([…])	Select features according to a percentile of the highest scores.
SelectKBest([score_func, k])	Select features according to the k highest scores.
SelectFpr([score_func, alpha])	Filter: Select the pvalues below alpha based on a FPR test.
SelectFdr([score_func, alpha])	Filter: Select the p-values for an estimated false discovery rate
SelectFromModel(estimator)	Meta-transformer for selecting features based on importance weights.
SelectFwe([score_func, alpha])	Filter: Select the p-values corresponding to Family-wise error rate
RFE(estimator[, …])	Feature ranking with recursive feature elimination.
RFECV(estimator[, step, …])	Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
VarianceThreshold([threshold])	Feature selector that removes all low-variance features.
chi2(X, y)	Compute chi-squared stats between each non-negative feature and class.
f_classif(X, y)	Compute the ANOVA F-value for the provided sample.
f_regression(X, y[, center])	Univariate linear regression tests.
mutual_info_classif(X, y)	Estimate mutual information for a discrete target variable.
mutual_info_regression(X, y)	Estimate mutual information for a continuous target variable.

**chi2**

## Cross decomposition

### PLS-Partial Least Square

偏最小二乘法

### CCA-Conanical Correlated Analysis
典型相关性分析

cross_decomposition.CCA([n_components, …])	CCA Canonical Correlation Analysis.
cross_decomposition.PLSCanonical([…])	PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, referred as PLS-C2A in [Wegelin 2000].
cross_decomposition.PLSRegression([…])	PLS regression
cross_decomposition.PLSSVD([n_components, …])	Partial Least Square SVD

# Unsupervised learning

## Gaussian mixture models

## Manifold learning

## Clustering

## Biclustering

## Decomposing signals in components (matrix factorization problems)

## Covariance estimation

## Novelty and Outlier Detection

## Density Estimation

## Neural network models (unsupervised)

# 检验
## 部分依赖图
```
inspection.partial_dependence(estimator, X, …)	Partial dependence of features.
inspection.plot_partial_dependence(…[, …])	Partial dependence plots.
```
主要用于检验特征与target之间的依赖关系