# INTRO TO MACHINE LEARNING

> "A brief introduction to machine learning"

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [MachineLearning, ML, DataScience]
- image: images/IntroToML/image12.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2

## Data Science from different perspective

![image.png](IntroToML/image0.png)

## What is Data Science

Data science is an interdisciplinary field about __scientific__ _methods_ , _processes_ and _systems_ to extract __Knowledge or insights__ from data in various forms, either __structured or unstructured__.
![image.png](images/image1.png)

## Data Science Process

The three components involved in data science are __organising__, __packaging__ and __delivering__ data.<br/>
The 3 step OPD Data Science Process

### Step 1. Organise Data.
Organising data involves the __physical storage and format of data__ and incorporated best practices in data management.

### Step 2. Package Data. 
Packaging data involves __logically manipulating__ and __joining the underlying raw data__ into a _new representation_ and package.

### Step 3. Deliver Data.
Delivering data involves ensuring that the __message__ that the __data__ has, is being accessed by those that need to hear it.

## Intro to Machine Learning


![image.png](IntroToML/image00.png)

#### Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. 

Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.

![image.png](IntroToML/image2.png)

# Types of Machine Learning Systems

There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on:
- Whether or not they are trained with human supervision (__supervised__, __unsupervised__, __semisupervised__, and __Reinforcement Learning__)
- Whether or not they can learn incrementally on the fly (__online versus batch learning__)
- Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (__instance-based versus model-based learning__)


## Supervised Learning

![image.png](IntroToML/image3.png)

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. The __goal__ is to _approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data_.

![image.png](IntroToML/image4.png)

## Supervised learning problems can be further grouped into regression and classification problems.

#### Classification:
A __classification problem__ is when the output variable is a __category__, such as “red” or “blue” or “disease” and “no disease”.
#### Regression: 
A __regression problem__ is when the output variable is a __real value__, such as “rupees” or “weight”.

#### Some popular examples of supervised machine learning algorithms are:

- __Linear regression__ for _regression_ problems,
- __Random forest__ for _classification_ and _regression_ problems,
- __Support vector machines (SVM)__ for _classification_ problems.

In Machine Learning an **attribute** is a data type (e.g., “Mileage”), while a **feature** has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage = 15,000”). Many people use the words attribute and feature interchangeably, though.

![image.png](IntroToML/image5.png)

## Unsupervised Machine Learning

![image.png](IntroToML/image6.png)

Unsupervised learning is where you only have __input data (X)__ and __no corresponding output variables__.

The __goal__ for unsupervised learning is to _model the underlying structure or distribution in the data in order to learn more about the data_.

![image.png](IntroToML/image7.png)

### Unsupervised learning problems can be further grouped into clustering and association problems.

#### Clustering:
A __clustering problem__ is where you want to discover the __inherent groupings__ in the data, such as _grouping customers by purchasing behavior_.
#### Association:
An __association rule__ learning problem is where you want to discover __rules that describe large portions__ of your data, such as _people that buy X also tend to buy Y_.

## Supervised Vs Unsupervised

* __Supervised learning__
    - Trying to predict a specific quantity.
    - Have training examples with labels.
    - Cam measure accuracy directly
    
* __Unsupervised learning__
    - Trying to understand the data
    - Looking for structures or unusual patterns
    - Not looking for something specific (supervised)
    - Does no require labelled data
    - Evaluation, usually indirect or qualitative
    
* __Semi Supervised learning__
    - Using unsupervised methods to improve supervised algorithms.
    - Usually few labelled examples + lot of unlabelled examples

## Semisupervised learning

Some algorithms can deal with __partially labeled training data__, usually a _lot of unlabeled data and a little bit of labeled data_. 

Some __photo-hosting services__, such as _Google Photos_, are good examples of this.

![image.png](images/IntroToML.png)

# Why Use Machine Learning?

![image.png](IntroToML/image9.png)

_______________________________________________________________________________________

![image.png](IntroToML/image10.png)

____

![image.png](IntroToML/image11.png)

###  To summarize, Machine Learning is great for:
- Problems for which __existing solutions require a lot of hand-tuning__ or long lists of rules: one Machine Learning algorithm can often simplify code and perform better.

- Complex problems for which __there is no good solution at all using a traditional approach__: the best Machine Learning techniques can find a solution.
- __Fluctuating environments__: a Machine Learning system can adapt to new data.
- Getting __insights__ about _complex problems_ and _large amounts of data_.

![image.png](IntroToML/image12.png)

# Reinforcement Learning

Reinforcement Learning is a very different __beast__. The learning system, called an __agent__ in this context, can observe the __environment__, select and __perform actions__, and __get rewards__

![image.png](IntroToML/image13.png)

# Batch and Online Learning

Another criterion used to classify Machine Learning systems is whether or not the system can __learn incrementally__ from a __stream of incoming data__.


## Batch learning

In batch learning, the system is __incapable__ of learning incrementally: it must be _trained using all the available data_. 

## Online learning

In online learning, you __train the system incrementally__ by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is __fast and cheap__, so the system can learn about _<b>new data on the fly</b>_ , as it arrives

![image.png](IntroToML/image14.png)

# Instance-Based Versus Model-Based Learning

One more way to categorize Machine Learning systems is by __how they generalize__.

## Instance-based learning

The system learns the examples by __heart__, then generalizes to new cases using a _similarity measure_

![image.png](IntroToML/image15.png)

##  Model-based learning

Another way to generalize from a __set of examples__ is to _build a model of these examples_ , then use that model to make __predictions__. This is called model-based learning 

![image.png](IntroToML/image16.png)

# Technologies

![image.png](IntroToML/image17.png)