Revert "FEA Neighborhood Components Analysis (scikit-learn#10058)"

This reverts commit 5e8a4ea.
xhluca · Apr 28, 2019 · 2daec5a · 2daec5a
1 parent 9871826
commit 2daec5a
Show file tree

Hide file tree

Showing 16 changed files with 21 additions and 1,662 deletions.
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -1169,7 +1169,6 @@ Model validation
    neighbors.RadiusNeighborsRegressor
    neighbors.NearestCentroid
    neighbors.NearestNeighbors
-   neighbors.NeighborhoodComponentsAnalysis
 
 .. autosummary::
    :toctree: generated/
@@ -1432,7 +1431,6 @@ Low-level methods
    utils.assert_all_finite
    utils.check_X_y
    utils.check_array
-   utils.check_scalar
    utils.check_consistent_length
    utils.check_random_state
    utils.class_weight.compute_class_weight

diff --git a/doc/modules/decomposition.rst b/doc/modules/decomposition.rst
@@ -957,7 +957,3 @@ when data can be fetched sequentially.
     * `"Stochastic Variational Inference"
       <http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf>`_
       M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013
-
-
-See also :ref:`nca_dim_reduction` for dimensionality reduction with
-Neighborhood Components Analysis.
diff --git a/doc/modules/neighbors.rst b/doc/modules/neighbors.rst
@@ -510,217 +510,3 @@ the model from 0.81 to 0.82.
 
   * :ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`: an example of
     classification using nearest centroid with different shrink thresholds.
-
-
-.. _nca:
-
-Neighborhood Components Analysis
-================================
-
-.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr>
-
-Neighborhood Components Analysis (NCA, :class:`NeighborhoodComponentsAnalysis`)
-is a distance metric learning algorithm which aims to improve the accuracy of
-nearest neighbors classification compared to the standard Euclidean distance.
-The algorithm directly maximizes a stochastic variant of the leave-one-out
-k-nearest neighbors (KNN) score on the training set. It can also learn a
-low-dimensional linear projection of data that can be used for data
-visualization and fast classification.
-
-.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
-   :target: ../auto_examples/neighbors/plot_nca_illustration.html
-   :scale: 50
-
-.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
-   :target: ../auto_examples/neighbors/plot_nca_illustration.html
-   :scale: 50
-
-.. centered:: |nca_illustration_1| |nca_illustration_2|
-
-In the above illustrating figure, we consider some points from a randomly
-generated dataset. We focus on the stochastic KNN classification of point no.
-3. The thickness of a link between sample 3 and another point is proportional
-to their distance, and can be seen as the relative weight (or probability) that
-a stochastic nearest neighbor prediction rule would assign to this point. In
-the original space, sample 3 has many stochastic neighbors from various
-classes, so the right class is not very likely. However, in the projected space
-learned by NCA, the only stochastic neighbors with non-negligible weight are
-from the same class as sample 3, guaranteeing that the latter will be well
-classified. See the :ref:`mathematical formulation <nca_mathematical_formulation>`
-for more details.
-
-
-Classification
---------------
-
-Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`),
-NCA is attractive for classification because it can naturally handle
-multi-class problems without any increase in the model size, and does not
-introduce additional parameters that require fine-tuning by the user.
-
-NCA classification has been shown to work well in practice for data sets of
-varying size and difficulty. In contrast to related methods such as Linear
-Discriminant Analysis, NCA does not make any assumptions about the class
-distributions. The nearest neighbor classification can naturally produce highly
-irregular decision boundaries.
-
-To use this model for classification, one needs to combine a
-:class:`NeighborhoodComponentsAnalysis` instance that learns the optimal
-transformation with a :class:`KNeighborsClassifier` instance that performs the
-classification in the projected space. Here is an example using the two
-classes:
-
-    >>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
-    ... KNeighborsClassifier)
-    >>> from sklearn.datasets import load_iris
-    >>> from sklearn.model_selection import train_test_split
-    >>> from sklearn.pipeline import Pipeline
-    >>> X, y = load_iris(return_X_y=True)
-    >>> X_train, X_test, y_train, y_test = train_test_split(X, y,
-    ... stratify=y, test_size=0.7, random_state=42)
-    >>> nca = NeighborhoodComponentsAnalysis(random_state=42)
-    >>> knn = KNeighborsClassifier(n_neighbors=3)
-    >>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
-    >>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
-    Pipeline(...)
-    >>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
-    0.96190476...
-
-.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
-   :target: ../auto_examples/neighbors/plot_nca_classification.html
-   :scale: 50
-
-.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
-   :target: ../auto_examples/neighbors/plot_nca_classification.html
-   :scale: 50
-
-.. centered:: |nca_classification_1| |nca_classification_2|
-
-The plot shows decision boundaries for Nearest Neighbor Classification and
-Neighborhood Components Analysis classification on the iris dataset, when
-training and scoring on only two features, for visualisation purposes.
-
-.. _nca_dim_reduction:
-
-Dimensionality reduction
-------------------------
-
-NCA can be used to perform supervised dimensionality reduction. The input data
-are projected onto a linear subspace consisting of the directions which
-minimize the NCA objective. The desired dimensionality can be set using the
-parameter ``n_components``. For instance, the following figure shows a
-comparison of dimensionality reduction with Principal Component Analysis
-(:class:`sklearn.decomposition.PCA`), Linear Discriminant Analysis
-(:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) and
-Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`) on
-the Digits dataset, a dataset with size :math:`n_{samples} = 1797` and
-:math:`n_{features} = 64`. The data set is split into a training and a test set
-of equal size, then standardized. For evaluation the 3-nearest neighbor
-classification accuracy is computed on the 2-dimensional projected points found
-by each method. Each data sample belongs to one of 10 classes.
-
-.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
-   :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
-   :width: 32%
-
-.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
-   :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
-   :width: 32%
-
-.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
-   :target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
-   :width: 32%
-
-.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|
-
-
-.. topic:: Examples:
-
- * :ref:`sphx_glr_auto_examples_neighbors_plot_nca_classification.py`
- * :ref:`sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py`
- * :ref:`sphx_glr_auto_examples_manifold_plot_lle_digits.py`
-
-.. _nca_mathematical_formulation:
-
-Mathematical formulation
-------------------------
-
-The goal of NCA is to learn an optimal linear transformation matrix of size
-``(n_components, n_features)``, which maximises the sum over all samples
-:math:`i` of the probability :math:`p_i` that :math:`i` is correctly
-classified, i.e.:
-
-.. math::
-
-  \underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i}
-
-with :math:`N` = ``n_samples`` and :math:`p_i` the probability of sample
-:math:`i` being correctly classified according to a stochastic nearest
-neighbors rule in the learned embedded space:
-
-.. math::
-
-  p_{i}=\sum\limits_{j \in C_i}{p_{i j}}
-
-where :math:`C_i` is the set of points in the same class as sample :math:`i`,
-and :math:`p_{i j}` is the softmax over Euclidean distances in the embedded
-space:
-
-.. math::
-
-  p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne
-            i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0
-
-
-Mahalanobis distance
-^^^^^^^^^^^^^^^^^^^^
-
-NCA can be seen as learning a (squared) Mahalanobis distance metric:
-
-.. math::
-
-    || L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
-
-where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size
-``(n_features, n_features)``.
-
-
-Implementation
---------------
-
-This implementation follows what is explained in the original paper [1]_. For
-the optimisation method, it currently uses scipy's L-BFGS-B with a full
-gradient computation at each iteration, to avoid to tune the learning rate and
-provide stable learning.
-
-See the examples below and the docstring of
-:meth:`NeighborhoodComponentsAnalysis.fit` for further information.
-
-Complexity
-----------
-
-Training
-^^^^^^^^
-NCA stores a matrix of pairwise distances, taking ``n_samples ** 2`` memory.
-Time complexity depends on the number of iterations done by the optimisation
-algorithm. However, one can set the maximum number of iterations with the
-argument ``max_iter``. For each iteration, time complexity is
-``O(n_components x n_samples x min(n_samples, n_features))``.
-
-
-Transform
-^^^^^^^^^
-Here the ``transform`` operation returns :math:`LX^T`, therefore its time
-complexity equals ``n_components * n_features * n_samples_test``. There is no
-added space complexity in the operation.
-
-
-.. topic:: References:
-
-    .. [1] `"Neighbourhood Components Analysis". Advances in Neural Information"
-      <http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>`_,
-      J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
-      Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
-
-    .. [2] `Wikipedia entry on Neighborhood Components Analysis
-      <https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_
diff --git a/doc/modules/neural_networks_supervised.rst b/doc/modules/neural_networks_supervised.rst
@@ -152,7 +152,7 @@ indices where the value is `1` represents the assigned classes of that sample::
     >>> clf.predict([[0., 0.]])
     array([[0, 1]])
 
-See the examples below and the docstring of
+See the examples below and the doc string of
 :meth:`MLPClassifier.fit` for further information.
 
 .. topic:: Examples:

diff --git a/doc/modules/sgd.rst b/doc/modules/sgd.rst
@@ -154,7 +154,7 @@ one-vs-all classification.
 
 :class:`SGDClassifier` supports both weighted classes and weighted
 instances via the fit parameters ``class_weight`` and ``sample_weight``. See
-the examples below and the docstring of :meth:`SGDClassifier.fit` for
+the examples below and the doc string of :meth:`SGDClassifier.fit` for
 further information.
 
 .. topic:: Examples:

diff --git a/doc/whats_new/v0.21.rst b/doc/whats_new/v0.21.rst
@@ -287,12 +287,6 @@ Support for Python 3.4 and below has been officially dropped.
 :mod:`sklearn.neighbors`
 ........................
 
-- |MajorFeature| A metric learning algorithm:
-  :class:`neighbors.NeighborhoodComponentsAnalysis`, which implements the
-  Neighborhood Components Analysis algorithm described in Goldberger et al.
-  (2005). :issue:`10058` by :user:`William de Vazelhes
-  <wdevazelhes>` and :user:`John Chiotellis <johny-c>`.
-
 - |API| Methods in :class:`neighbors.NearestNeighbors` :
   :func:`~neighbors.NearestNeighbors.kneighbors`,
   :func:`~neighbors.NearestNeighbors.radius_neighbors`,