Scalable Manifold Learning Final Report

Project Overview

Manifold Learning (ML), also known as Non-linear dimension reduction, finds a non-linear representation of high-dimensional data with a small number of parameters. ML is data intensive; it has been shown statistically that the estimation accuracy depends asymptotically on the sample size N like N^{1/(α d + β)}, hence requires large amounts of data when the intrinsic dimension d is larger than a few. On the other hand, manifold learning fully realizes its potential in scientific discovery from very large multi-dimensional data sets representing partially known physical systems, (e.g. spectra of galaxies) where there is reason to believe that the data can be modeled by a small set of parameters.

Therefore, we implemented a software suite that will enable scientists and methodologists alike to scale a broad class of manifold learning methods to very large data sets. In particular, the software can be used to analyze spectroscopic data from the SDSS, as well as other data from astronomical surveys. The software is written in Python, building upon the existing scikit-learn library for scientific computing/machine learning. Our project demonstrates, against the commonly held beliefs, that with careful implementation ML can be made tractable on large data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalable Manifold Learning Final Report

Project Overview

Project Links

Clone this wiki locally