<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 1. Introduction to the three classes of algorithms in Machine Learning — Supervised, Unsupervised & Reinforcement Learning
*Artificial Intelligence: A Modern Approach, Third Edition*

----
An agent is *learning* if it improves its performance on future tasks after making observations about the world. As humans, we have many different ways we learn things. The way you learned calculus, for example, is probably not the same way you learned to stack blocks. The way you learned the alphabet is probably wildly different from the way you learned how to tell if objects are approaching you or going away from you. Why would we want an agent to learn? If the design of the agent can be improved, why wouldn’t the designers just program in that improvement to begin with? There are three main reasons: 
1. First, the designers cannot anticipate all possible situations that the agent might find itself in. For example, a robot designed to navigate mazes must learn the layout of each new maze it encounters.
2. Second, the designers cannot anticipate all changes over time; a program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change from boom to bust.
3. Third, sometimes human programmers have no idea how to program a solution themselves. For example, most people are good at recognizing the faces of family members, but even the best programmers are unable to program a computer to accomplish that task, except by using learning algorithms.

<br/>Two main ways that we can approach machine learning are *Supervised Learning* and *Unsupervised Learning*. Both are useful for different situations or kinds of data available. A third form, *Reinforcement learning*, in which the agent learns from a series of reinforcements—rewards or punishments. For example, the lack of a tip at the end of the journey gives the taxi agent an indication that it did something wrong. The two points for a win at the end of a chess game tells the agent it did something right. It is up to the agent to decide which of the actions prior to the reinforcement were most responsible for it. Reinforcement learning will not be covered in this tutorial.

<br/>In practice, these distinction are not always so crisp. In *Semi-supervised Learning* we are given a few labeled examples and must make what we can of a large collection of unlabeled examples. Even the labels themselves may not be the oracular truths that we hope for.

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 2. Supervised Learning
*Artificial Intelligence: A Modern Approach, Third Edition*

----
A. The task of supervised learning is this:
> Given a training set of $N$ example input–output pairs $(x_{1}, y_{1}), (x_{2}, y_{2}), . . . (x_{N}, y_{N})$, where each $y_{j}$ was generated by an unknown function $y = f(x)$, discover a function $h$ that approximates the true function $f$.

<br/>Here $x$ and $y$ can be any value; they need not be numbers. The function $h$ is a **hypothesis**. Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set. To measure the accuracy of a hypothesis we give it a **test set** of examples that are distinct from the training set. We say a hypothesis **generalizes** well if it correctly predicts the value of $y$ for novel examples. Sometimes the function $f$ is *stochastic*, it is not strictly a function of $x$, and what we have to learn is a conditional probability distribution, $P(Y | x)$. 

<br/>We say that a learning problem is **realizable** if the hypothesis space $H$ contains the true function. Unfortunately, we cannot always tell whether a given learning problem is realizable, because the true function is not known. In general, there is a tradeoff between complex hypotheses that fit the training data well and simpler hypotheses that may generalize better.

<br/>B. Two types of supervised learning:
1. **Classification:** when the output *y* is one of a finite set of values (such as *sunny, cloudy* or *rainy*), the learning problem is called **classification**, and is called Boolean or binary classification if there are only two values.
2. **Regression:** when *y* is a number (such as tomorrow’s temperature), the learning problem is called **regression**. (Technically, solving a regression problem is finding a conditional expectation or average value of $y$, because the probability that we have found exactly the right real-valued number for $y$ is 0.) 

<br/>C. The search for simplicity:
<br/>A fundamental problem in inductive learning is how to choose from among multiple consistent hypotheses? One answer is to prefer the simplest hypothesis consistent with the data. This principle is called **Ockham’s razor**, after the 14th-century English philosopher William of Ockham, who used it to argue sharply against all sorts of complications. Defining simplicity is not easy, but it seems clear, for instance, that a degree-1 polynomial is simpler than a degree-7 polynomial.

<br/>There is a tradeoff between the expressiveness of a hypothesis space and the complexity of finding a good hypothesis within that space. For example, fitting a straight line to data is an easy computation; fitting high-degree polynomials is somewhat harder; and fitting Turing machines is in general undecidable.

<br/>A second reason to prefer simple hypothesis spaces is that presumably we will want to use $h$ after we have learned it, and computing $h(x)$ when $h$ is a linear function is guaranteed to be fast, while computing an arbitrary Turing machine program is not even guaranteed to terminate. For these reasons, most work on learning has focused on simple representations.

<br/>The expressiveness–complexity tradeoff is not as simple as it first seems: it is often the case that an expressive language makes it possible for a *simple* hypothesis to fit the data, whereas restricting the expressiveness of the language means that any consistent hypothesis must be very complex. For example, the rules of chess can be written in a page or two of first-order logic, but require thousands of pages when written in propositional logic.

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 3. Unsupervised Learning
*in Python*

----
Let’s say you are an alien who has been observing the meals people eat. You’ve embedded yourself into the body of an employee at a typical tech startup, and you see people eating breakfasts, lunches, and snacks. Over the course of a couple weeks, you surmise that for breakfast people mostly eat foods like:
- Cereals
- Bagels
- Granola bars

<br/>Lunch is usually a combination of:
- Some sort of vegetable
- Some sort of protein
- Some sort of grain

<br/>Snacks are usually a piece of fruit or a handful of nuts. No one explicitly *told* you what kinds of foods go with each meal, but you learned from natural observation and put the patterns together. In unsupervised learning, we don’t tell the program anything about what we expect the output to be. The program itself analyzes the data it encounters and tries to pick out patterns and group the data in meaningful ways.

<br/>An example of this includes **clustering** to create segments in a business’s user population. In this case, an unsupervised learning algorithm would probably create groups (or clusters) based on parameters that a human may not even consider.

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 4. Summary
*in Python*

----
We have gone over the difference between supervised and unsupervised learning:
- **Supervised Learning:** data is labeled (structured) and the program learns to predict the output from the input data
- **Unsupervised Learning:** data is unlabeled (unstructured) and the program learns to recognize the inherent structure in the input data