# What Is Machine Learning?

### Categories of Machine Learning

- __Supervised__ learning models the relationship between measured features and label associated with some data. Once the model is built, it can be used to apply labels to new, unknown data.
- This is further subdivided into __classification__ and __regression__ tasks: classification  labels are __discrete__ categories; regression labels are __continuous__ quantities.

- __Unsupervised__ learning models the features of a dataset without reference to any label. It is often described as "letting the dataset speak for itself."
- These models include tasks such as __clustering__ and __dimensionality reduction.__
- Clustering algorithms identify __distinct__ groups of data, while dimensionality reduction algorithms search for __more succinct representations__ of the data.
- In addition, there are so-called __semi-supervised__ learning methods. They are often useful when labels are incomplete or misleading.

### Classification: Predicting discrete labels

- We will first take a look at a simple __classification__ task, in which you are given a set of labeled points and want to use these to classify some unlabeled points.

![](figures/05.01-classification-1.png)

- Here we have two *features* for each point, represented by the *(x,y)* positions of the points on the plane. We also have one of two *class labels* for each point represented by the point's color.
- We would like to create a model that will let us decide whether a new point should be labeled "blue" or "red."
- Let's assume the two groups can be separated by drawing a straight line through the plane between them, such that points on each side of the line fall in the same group.
- Here the *model* is a mathematical equivalent of "a straight line separates the classes", while the *model parameters* are the values describing the location and orientation of that line.
- The optimal values for these model parameters are learned from the data. This is often called *training the model*.
- Here's what the trained model looks like:

![](figures/05.01-classification-2.png)

- This trained model can now be applied to new, unlabeled data. This is called *prediction*. See the following figure:

![](figures/05.01-classification-3.png)

### Regression: Predicting continuous labels

- Consider the data shown in the following figure, which consists of a set of points with continuous labels:

![](figures/05.01-regression-1.png)

- The color of each point represents the continuous label for that point. We will use a simple linear regression to predict the points.
- This model assumes if we treat the label as a third spatial dimension, we can fit a plane to the data.

![](figures/05.01-regression-2.png)

Notice that the *feature 1-feature 2* plane here is the same as in the two-dimensional plot from before; in this case, however, we have represented the labels by both color and three-dimensional axis position.
From this view, it seems reasonable that fitting a plane through this three-dimensional data would allow us to predict the expected label for any set of input parameters.
Returning to the two-dimensional projection, when we fit such a plane we get the result shown in the following figure:

![](figures/05.01-regression-3.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-3)

This plane of fit gives us what we need to predict labels for new points.
Visually, we find the results shown in the following figure:

![](figures/05.01-regression-4.png)

As with the classification example, this may seem rather trivial in a low number of dimensions.
But the power of these methods is that they can be straightforwardly applied and evaluated in the case of data with many, many features.

For example, this is similar to the task of computing the distance to galaxies observed through a telescope—in this case, we might use the following features and labels:

- *feature 1*, *feature 2*, etc. $\to$ brightness of each galaxy at one of several wave lengths or colors
- *label* $\to$ distance or redshift of the galaxy

The distances for a small number of these galaxies might be determined through an independent set of (typically more expensive) observations.
Distances to remaining galaxies could then be estimated using a suitable regression model, without the need to employ the more expensive observation across the entire set.
In astronomy circles, this is known as the "photometric redshift" problem.

### Clustering: Inferring labels on unlabeled data

The classification and regression illustrations we just looked at are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data.
Unsupervised learning involves models that describe data without reference to any known labels.

One common case of unsupervised learning is "clustering," in which data is automatically assigned to some number of discrete groups.
For example, we might have some two-dimensional data like that shown in the following figure:

![](figures/05.01-clustering-1.png)

By eye, it is clear that each of these points is part of a distinct group.
Given this input, a clustering model will use the intrinsic structure of the data to determine which points are related.
The *k*-means algorithm yields the clusters shown in the following figure:

![](figures/05.01-clustering-2.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Clustering-Example-Figure-2)

*k*-means fits a model consisting of *k* cluster centers; the optimal centers are assumed to be those that minimize the distance of each point from its assigned center.
Again, this might seem like a trivial exercise in two dimensions, but as our data becomes larger and more complex, such clustering algorithms can be employed to extract useful information from the dataset.

### Dimensionality reduction: Inferring structure of unlabeled data

Dimensionality reduction is another example of an unsupervised algorithm, in which labels or other information are inferred from the structure of the dataset itself.
Dimensionality reduction is a bit more abstract than the examples we looked at before, but generally it seeks to pull out some low-dimensional representation of data that in some way preserves relevant qualities of the full dataset.

As an example of this, consider the data shown in the following figure:

![](figures/05.01-dimesionality-1.png)

Visually, it is clear that there is some structure in this data: it is drawn from a one-dimensional line that is arranged in a spiral within this two-dimensional space.
In a sense, you could say that this data is "intrinsically" only one dimensional, though this one-dimensional data is embedded in higher-dimensional space.
A suitable dimensionality reduction model in this case would be sensitive to this nonlinear embedded structure, and be able to pull out this lower-dimensionality representation.

The following figure shows a visualization of the results of the Isomap algorithm, a manifold learning algorithm that does exactly this:

![](figures/05.01-dimesionality-2.png)

Notice that the colors (which represent the extracted one-dimensional latent variable) change uniformly along the spiral, which indicates that the algorithm did in fact detect the structure we saw by eye.
As with the previous examples, the power of dimensionality reduction algorithms becomes clearer in higher-dimensional cases.
For example, we might wish to visualize important relationships within a dataset that has 100 or 1,000 features.
Visualizing 1,000-dimensional data is a challenge, and one way we can make this more manageable is to use a dimensionality reduction technique to reduce the data to two or three dimensions.

## Summary

Here we have seen a few simple examples of some of the basic types of machine learning approaches.
Needless to say, there are a number of important practical details that we have glossed over, but I hope this section was enough to give you a basic idea of what types of problems machine learning approaches can solve.

In short, we saw the following:

- *Supervised learning*: Models that can predict labels based on labeled training data

  - *Classification*: Models that predict labels as two or more discrete categories
  - *Regression*: Models that predict continuous labels
  
- *Unsupervised learning*: Models that identify structure in unlabeled data

  - *Clustering*: Models that detect and identify distinct groups in the data
  - *Dimensionality reduction*: Models that detect and identify lower-dimensional structure in higher-dimensional data
  
In the following sections we will go into much greater depth within these categories, and see some more interesting examples of where these concepts can be useful.