# Lets Dive In  
When most people hear **“Machine Learning,”** they picture a robot: a dependable butler 
or a deadly Terminator depending on who you ask. But Machine Learning is not
just a futuristic fantasy, it’s already here. In fact, it has been around for decades in
some specialized applications, such as Optical Character Recognition (OCR). But the
first ML application that really became mainstream, improving the lives of hundreds
of millions of people, took over the world back in the 1990s: it was the **spam filter**.  
  
Where does Machine Learning start and where does it end? What exactly does it
mean for a machine to learn something? If I download a copy of Wikipedia, has my
computer really “learned” something? Is it suddenly smarter? In this Tutorial we will
start by clarifying what Machine Learning is and why you may want to use it.  
  
# Machine Learning
Machine Learning is the science (and art) of programming computers so they can
learn from data. (Basic Definition)  
  
Here is a slightly more general definition:
> <span style="color:red"><strong>Machine Learning is the field of study that gives computers the ability to learn
    without being explicitly programmed.</strong>
</span>
<span style="text-align:right">—Arthur Samuel, 1959</span>

* Samuels wrote a checkers playing program
    * Had the program play 10000 games against itself
    * Work out which board positions were good and bad depending on wins/losses  
  
  
Samuels claim to fame was that back in the 1950s he wrote a checkers playing program and the amazing thing about this checkers playing program was that Officer Samuel himself wasn't a very good checkers player but what he did was he had the program play tens of thousands of games against itself and by watching what sorts of board positions tended to lead to wins and what sort of board positions tended to veto losses the checkers playing program learned over time what are good board positions and what a bad board positions and eventually learned to play checkers better than Arthur Samuel himself was able to this was a remarkable result also Samuel himself turned out not to be a very good checkers player but because the computer has the patience to play tens of thousands of games against itself no human has the patience to play that many games by doing this the computer was able to get so much checkers playing experience that it eventually became a better checkers player than Samuel himself this is somewhat informal definition and an older one. 
  
And a more engineering-oriented one:  
> <span style="color:red"><strong>A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.</strong>
</span>
<span style="text-align:right">—Tom Mitchell, 1997</span>  

* The checkers example, 
    * E = 10000s games
    * T is playing checkers
    * P if you win or not
  
For example, your spam filter is a Machine Learning program that can learn to flag
spam given examples of spam emails (e.g., flagged by users) and examples of regular
(nonspam, also called “ham”) emails. The examples that the system uses to learn are
called the training set. Each training example is called a training instance (or sample).
In this case, the task T is to flag spam for new emails, the experience E is the training
data, and the performance measure P needs to be defined; for example, you can use
the ratio of correctly classified emails. This particular performance measure is called
accuracy and it is often used in classification tasks.  
If you just download a copy of Wikipedia, your computer has a lot more data, but it is
not suddenly better at any task. Thus, it is not Machine Learning.  
  
**Question**  
let's say your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?  
<pre>
a. Classifying email as spam or not.  
b. Watching you label emails as spam or not.  
c. The number(or fraction) of emails correctly classified as spam/ not spam.
d. None of the above.
</pre>  
**Answer** : option (a)  
option (b) is E, and option (c) is P  
  
# Why Use Machine Learning?  
Consider how you would write a spam filter using traditional programming techniques:  
1. First you would look at what spam typically looks like. You might notice that
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to
come up a lot in the subject. Perhaps you would also notice a few other patterns
in the sender’s name, the email’s body, and so on.   
  <br>
2. You would write a detection algorithm for each of the patterns that you noticed,
and your program would flag emails as spam if a number of these patterns are
detected.   
  <br>
3. You would test your program, and repeat steps 1 and 2 until it is good enough.  
![](img/Traditional.PNG)  
Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain.  
  <br>
In contrast, a spam filter based on Machine Learning techniques automatically learns
which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the ham examples. The program is much shorter, easier to maintain, and most likely more
accurate.  

![](img/ml.PNG)  
  
# Types of Machine Learning Systems
There are so many different types of Machine Learning systems that it is useful to
classify them in broad categories based on:  
1. Methods based on the amount of human supervision in the learning process  
a. Supervised learning  
b. Unsupervised learning  
c. Semi-supervised learning  
d. Reinforcement learning  
2. Methods based on the ability to learn from incremental data samples  
a. Batch learning  
b. Online learning  
3. Methods based on their approach to generalization from data samples  
a. Instance based learning  
b. Model based learning  
  
<br>
These criteria are not exclusive; you can combine them in any way you like. For
example, a state-of-the-art spam filter may learn on the fly using a deep neural network model trained using examples of spam and ham; this makes it an online, modelbased, supervised learning system.  
  
Let’s look at each of these criteria a bit more closely  
# 1. Supervised / Unsupervised / semi-supervised Learning / Reinforcement Learning
Machine Learning systems can be classified according to the amount and type of
supervision they get during training. There are four major categories: supervised
learning, unsupervised learning, semisupervised learning, and Reinforcement Learn‐
ing.  

### Supervised learning
The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

$$𝑌=𝑓(𝑋)$$
 
The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance.   
  
![](img\sl.png)   
  
**Continous variables** : Variables that can take infinite number of values ( take forever to count )  
**Discrete/ Categorical variables** : Variables that can take finite number of values   
  
Here are some of the most important supervised learning algorithms:  
• k-Nearest Neighbors  
• Linear Regression  
• Logistic Regression  
• Support Vector Machines (SVMs)  
• Decision Trees and Random Forests  
• Neural networks  

  
### Regression(28.5% used in industry)
A regression problem is when the output variable is a real value, such as “price” or “weight”  
  
![](img/lr.png)  
  
### Classification(66.5% used in industry)
A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. 
   
![](img/classification.png)  
  
  
### Example:  
  
![](img/basket.png)  
  
![](img/apple.png)  
  
<center>APPLE</center>  
  
![](img/bana.PNG)  
  
<center>BANANA</center>    
  
![](img/grapes.png)   
  
<center>GRAPES</center>   
  
![](img/CHERRIES.png)    
  
<center>CHERRIES</center>     
  
**Supervised Learning:**
* You already learn from your previous work about the physical characters of fruits
* So arranging  the same type of fruits at one place is easy now
* In data mining terminology the earlier work is called as training the data
* You already learn the things from your train data. This is because of response variable
* Response variable means just a decision variable
* You can observe response variable below (FRUIT NAME)   
   
   
|No.	| SIZE	| COLOR	| SHAPE |	FRUIT NAME |    
|---|---|---|---|---|     
|1	| Big|	Red |	Rounded shape with depression at the top|	Apple|    
|2	|Small|	Red|	Heart-shaped to nearly globular|	Cherry|   
|3	|Big|	Green|	Long curving cylinder|	Banana|   
|4	|Small|	Green|	Round to oval,Bunch shape Cylindrical|	Grape|   
  
* Suppose you have taken a new fruit from the basket then you will see the size, color, and shape of that particular fruit.
* If size is Big, color is Red, the shape is rounded shape with a depression at the top, you will confirm the fruit name as apple and you will put in apple group.
* Likewise for other fruits also.
* The job of grouping fruits was done and the happy ending.
* You can observe in the table that a column was labeled as “FRUIT NAME“. This is called as a response variable.
* If you learn the thing before from training data and then applying that knowledge to the test data(for new fruit), This type of learning is called as Supervised Learning.  
  
### Unsupervised learning
Unsupervised learning is where you only have input data (X) and no corresponding output variables.  
  
The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.  
  
These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.   
  
  
Here are some of the most important unsupervised learning algorithms:  
- Clustering  
    - k-Means
    - Hierarchical Cluster Analysis (HCA)
    - Expectation Maximization
- Visualization and dimensionality reduction
    - Principal Component Analysis (PCA)
    - Kernel PCA
    - Locally-Linear Embedding (LLE)
    - t-distributed Stochastic Neighbor Embedding (t-SNE)
- Association rule learning
    - Apriori
    - Eclat


**Unsupervised Learning:**
* Suppose you have a basket and it is filled with some different types of fruits and your task is to arrange them as groups.
* This time, you don’t know anything about the fruits, honestly saying this is the first time you have seen them. You have no clue about those.
* So, how will you arrange them?
* What will you do first???
* You will take a fruit and you will arrange them by considering the physical character of that particular fruit.
* Suppose you have considered color.
    * Then you will arrange them on considering base condition as color.
    * Then the groups will be something like this.
        * RED COLOR GROUP: apples & cherry fruits.
        * GREEN COLOR GROUP: bananas & grapes.
* So now you will take another physical character such as size.
    * RED COLOR AND BIG SIZE: apple.
    * RED COLOR AND SMALL SIZE: cherry fruits.
    * GREEN COLOR AND BIG SIZE: bananas.
    * GREEN COLOR AND SMALL SIZE: grapes.
* The job has done, the happy ending.
* Here you did not learn anything before ,means no train data and no response variable.
* In data mining or machine learning, this kind of learning is known as unsupervised learning.   
  
![](img/cluster.png)  
  
### Semisupervised learning
Some algorithms can deal with partially labeled training data, usually a lot of unlabeled 
data and a little bit of labeled data. This is called semisupervised learning.  
  
**Example:**  
Some photo-hosting services, such as Google Photos, are good examples of this. Once
you upload all your family photos to the service, it automatically recognizes that the
same person A shows up in photos 1, 5, and 11, while another person B shows up in
photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all
the system needs is for you to tell it who these people are. Just one label per person,4
and it is able to name everyone in every photo, which is useful for searching photos.  
  
Most semisupervised learning algorithms are combinations of unsupervised and
supervised algorithms. For example, deep belief networks (DBNs) are based on unsu‐
pervised components called **restricted Boltzmann machines** (RBMs) stacked on top of
one another. RBMs are trained sequentially in an unsupervised manner, and then the
whole system is fine-tuned using supervised learning techniques.
  
  
### Reinforcement Learning
  
Let us start with a simple analogy. If you have a pet at home, you may have used this technique with your pet.   
  
![](img/dog.png)
  
  
A clicker (or whistle) is a technique to let your pet know some treat is just about to get served! This is essentially “reinforcing” your pet to practice good behavior. You click the “clicker” and follow up with a treat. And with time, your pet gets accustomed to this sound and responds every time he/she hears the click sound. With this technique, you can train your pet to do “good” deeds when required.   
   
Now let’s make these replacements in the example:  
   
* The pet becomes the artificial agent
* The treat becomes the reward function
* The good behavior is the resultant action  
  
The above example explains what reinforcement learning looks like. This is actually a classic example of reinforcement learning.    
  
  
Reinforcement Learning is a very different beast. The learning system, called an agent
in this context, can observe the environment, select and perform actions, and get
rewards in return (or penalties in the form of negative rewards). It
must then learn by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when it is in a
given situation.  
  
![](img/rl.png)  
  
**For example**  
many robots implement Reinforcement Learning algorithms to learn
how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement
Learning: it made the headlines in March 2016 when it beat the world champion Lee
Sedol at the game of Go. It learned its winning policy by analyzing millions of games,
and then playing many games against itself. Note that learning was turned off during
the games against the champion; AlphaGo was just applying the policy it had learned.  
  
# 2. Batch and Online Learning
Another criterion used to classify Machine Learning systems is whether or not the
system can learn incrementally from a stream of incoming data.  
### Batch learning
In batch learning, the system is incapable of learning incrementally: it must be trained
using all the available data. This will generally take a lot of time and computing
resources, so it is typically done offline. First the system is trained, and then it is
launched into production and runs without learning anymore; it just applies what it
has learned. This is called **offline learning**.   
  
If you want a batch learning system to know about new data (such as a new type of
spam), you need to train a new version of the system from scratch on the full dataset
(not just the new data, but also the old data), then stop the old system and replace it
with the new one.  
  
Fortunately, the whole process of training, evaluating, and launching a Machine
Learning system can be automated fairly easily, so even a batch learning system can adapt to change. Simply update the data and train a new
version of the system from scratch as often as needed.  
  
This solution is simple and often works fine, but training using the full set of data can
take many hours, so you would typically train a new system only every 24 hours or
even just weekly. If your system needs to adapt to rapidly changing data (e.g., to pre‐
dict stock prices), then you need a more reactive solution.  
  
Also, training on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and
you automate your system to train from scratch every day, it will end up costing you a
lot of money. If the amount of data is huge, it may even be impossible to use a batch
learning algorithm.  
  
Finally, if your system needs to be able to learn autonomously and it has limited
resources (e.g., a smartphone application or a rover on Mars), then carrying around
large amounts of training data and taking up a lot of resources to train for hours
every day is a showstopper.   
  
Fortunately, a better option in all these cases is to use algorithms that are capable of
learning incrementally.  
### Online learning
Online learning methods work in a different way as compared to batch learning methods. The training
data is usually fed in multiple incremental batches to the algorithm. These data batches are also known as
mini-batches in ML terminology. However, the training process does not end there unlike batch learning
methods. It keeps on learning over a period of time based on new data samples which are sent to it for
prediction. Basically it predicts and learns in the process with new data on the fly without have to re-run the
whole model on previous data samples.  
  
There are several advantages to online learning—it is suitable in real-world scenarios where the model
might need to keep learning and re-training on new data samples as they arrive. Problems like device failure
or anomaly prediction and stock market forecasting are two relevant scenarios. Besides this, since the data
is fed to the model in incremental mini-batches, you can build these models on commodity hardware
without worrying about memory or disk constraints since unlike batch learning methods, you do not need
to load the full dataset in memory before training the model. Besides this, once the model trains on datasets,
you can remove them since we do not need the same data again as the model learns incrementally and
remembers what it has learned in the past.  
  
One of the major caveats in online learning methods is the fact that bad data samples can affect the
model performance adversely. All ML methods work on the principle of “Garbage In Garbage Out”. Hence
if you supply bad data samples to a well-trained model, it can start learning relationships and patterns that
have no real significance and this ends up affecting the overall model performance. Since online learning
methods keep learning based on new data samples, you should ensure proper checks are in place to
notify you in case suddenly the model performance drops. Also suitable model parameters like learning
rate should be selected with care to ensure the model doesn’t overfit or get biased based on specific data
samples  
  
# 3. Instance-Based Versus Model-Based Learning
One more way to categorize Machine Learning systems is by how they generalize.
Most Machine Learning tasks are about making predictions. This means that given a
number of training examples, the system needs to be able to generalize to examples it
has never seen before. Having a good performance measure on the training data is
good, but insufficient; the true goal is to perform well on new instances.  
  
There are two main approaches to generalization: instance-based learning and
model-based learning.  
  
### Instance-based learning
There are various ways to build Machine Learning models using methods that try to generalize based
on input data. Instance based learning involves ML systems and methods that use the raw data points
themselves to figure out outcomes for newer, previously unseen data samples instead of building an explicit
model on training data and then testing it out.  
  
A simple example would be a K-nearest neighbor algorithm. Assuming k = 3, we have our initial training
data. The ML method knows the representation of the data from the features, including its dimensions,
position of each data point, and so on. For any new data point, it will use a similarity measure (like cosine or
Euclidean distance) and find the three nearest input data points to this new data point. Once that is decided,
we simply take a majority of the outcomes for those three training points and predict or assign it as the
outcome label/response for this new data point. Thus, instance based learning works by looking at the input
data points and using a similarity metric to generalize and predict for new data points  
  
![](img/ib.png)  
  
  
### Model-based learning
The model based learning methods are a more traditional ML approach toward generalizing based on
training data. Typically an iterative process takes place where the input data is used to extract features and
models are built based on various model parameters (known as hyperparameters). These hyperparameters
are optimized based on various model validation techniques to select the model that generalizes best on the
training data and some amount of validation and test data (split from the initial dataset). Finally, the best
model is used to make predictions or decisions as and when needed.  
  
![](img/mb.png)  

# Qustions:
1. How would you define Machine Learning?  
> Machine Learning is about building systems that can learn from data. Learning
means getting better at some task, given some performance measure.
2. Can you name four types of problems where it shines?  
> (By User)
3. What is a labeled training set?  
> A labeled training set is a training set that contains the desired solution (a.k.a. a
label) for each instance.
4. What are the two most common supervised tasks?
> regression and classification.
5. Can you name four common unsupervised tasks?
> Common unsupervised tasks include clustering, visualization, dimensionality
reduction, and association rule learning
6. What type of Machine Learning algorithm would you use to allow a robot to
walk in various unknown terrains?  
> Reinforcement Learning 
7. What type of algorithm would you use to segment your customers into multiple
groups?  
> If you don’t know how to define the groups, then you can use a clustering algo‐
rithm (unsupervised learning) to segment your customers into clusters of similar
customers. However, if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised
learning), and it will classify all your customers into these groups.  
8. Would you frame the problem of spam detection as a supervised learning prob‐
lem or an unsupervised learning problem?  
> supervised learning problem
9. What is an online learning system?  
> An online learning system can learn incrementally, as opposed to a batch learn‐
ing system. This makes it capable of adapting rapidly to both changing data and
autonomous systems, and of training on very large quantities of data.
10. What is out-of-core learning?
> Out-of-core algorithms can handle vast quantities of data that cannot fit in a
computer’s main memory. An out-of-core learning algorithm chops the data into
mini-batches and uses online learning techniques to learn from these minibatches.
11. What type of learning algorithm relies on a similarity measure to make predic‐
tions?  
> An instance-based learning system learns the training data by heart; then, when
given a new instance, it uses a similarity measure to find the most similar learned
instances and uses them to make predictions
12. What is the difference between a model parameter and a learning algorithm’s
hyperparameter?  
> A model has one or more model parameters that determine what it will predict
given a new instance (e.g., the slope of a linear model). A learning algorithm tries
to find optimal values for these parameters such that the model generalizes well
to new instances. A hyperparameter is a parameter of the learning algorithm
itself, not of the model (e.g., the amount of regularization to apply).

13. What do model-based learning algorithms search for? What is the most common
strategy they use to succeed? How do they make predictions?   
> Model-based learning algorithms search for an optimal value for the model
parameters such that the model will generalize well to new instances. We usually
train such systems by minimizing a cost function that measures how bad the sys‐
tem is at making predictions on the training data, plus a penalty for model com‐
plexity if the model is regularized. To make predictions, we feed the new
instance’s features into the model’s prediction function, using the parameter values found by the learning algorithm.
14. Can you name four of the main challenges in Machine Learning?
> Some of the main challenges in Machine Learning are the lack of data, poor data
quality, nonrepresentative data, uninformative features, excessively simple mod‐
els that underfit the training data, and excessively complex models that overfit
the data.
15. If your model performs great on the training data but generalizes poorly to new
instances, what is happening? Can you name three possible solutions?   
> . If a model performs great on the training data but generalizes poorly to new
instances, the model is likely overfitting the training data (or we got extremely
lucky on the training data). Possible solutions to overfitting are getting more
data, simplifying the model (selecting a simpler algorithm, reducing the number
of parameters or features used, or regularizing the model), or reducing the noise
in the training data.
16. What is a test set and why would you want to use it?
> A test set is used to estimate the generalization error that a model will make on
new instances, before the model is launched in production.

17. What is the purpose of a validation set?
> A validation set is used to compare models. It makes it possible to select the best
model and tune the hyperparameters.
18. What can go wrong if you tune hyperparameters using the test set?
> If you tune hyperparameters using the test set, you risk overfitting the test set,
and the generalization error you measure will be optimistic (you may launch a
model that performs worse than you expect).
19. What is cross-validation and why would you prefer it to a validation set?
> Cross-validation is a technique that makes it possible to compare models (for
model selection and hyperparameter tuning) without the need for a separate vali‐
dation set. This saves precious training data.