## Recognizing Machine Learning Applications

#### Course includes

- Classifying data into pre-defined categories

- Predicting relationships between variables with regression

- Recommending products to a user

- Clustering large datasets into meaningful groups




----------

## Making Inferences from Data

### Rule Based Approach

A rule-based system has two main components:

- A set of facts: also known as the knowledge base. These facts are a combination of data, such as income and a condition such as "is zero" or "is greater than $10k."


- A set of rules: also known as the rules engine. It is the rules that describe the relationship between the IF and the THEN statements. Rules are written manually.

Rules are easy to write, but need to account for special cases. Rules systems can become unwieldy, and are static. 


### Machine Learning Based Approach

Based on models. Machine learning is probabilistic and uses statistical models rather than deterministic rules. 

Machine learning models need to be trained on data. Rules get updated based on historical data, and are dynamic.

Similar to how humans learn. Humans learn to identify patterns when they're exposed to a phenomenon for a prolonged period of time, learn from 'experience.' 

--------

### When to Use Machine Learning

- Difficult for humans to express rules

- A large amount of historical data is available

- If the patters or relationships among the data are dynamic and keep changing with time




-----

### Knowing When to Use Machine Learning



![workflow](Images/01_01.jpg)

#### Pick Your Problem

Most machine learning problems generally fall under one of the following categories:

- Classification (Naive Bayes, SVMs, )

- Regression
    - Sales forecasting
    - Predicting the value of a stock market index

- Clustering (K-Means, Hierarchical Clustering, )
    - Customer recommendation

- Recommendations
    - Product recommendations


Each type of problem has its own basic workflow
   - How to set up the problem statement
   - How to represent data



#### Represent Data

Need historical data that you can use as initial input. Can be in the form of 

- Unstructured text
- Images
- Videos

Important to use meaningful numeric attributes to represent the data. 



#### Apply an Algorithm

The choice of algorithm depends mainly on the type of problem. Use an algorithm to find patterns from the historical data, and use them to create rules. The rules are a quantitative representation of the relationships between variables. 

The rules together form a model. A model is represented as a mathematical equation or a set of rules which might be if-then-else statements. 

----------

### Types of ML Problems

![ML Problems](Images/01_02.jpg)

### Classification

You are given a problem instance that you need to classify, and categories into which it can be classified into. 

A classifier uses a set of instances for which the correct category membership is known.


- Spam detection
    - Spam or ham?

- Sentiment Analysis
    - Tweet positive or negative?

- Identifying a Trading Strategy
    - Trading day up-day or down-day?
   
   
#### Requires:

- A classifier (uses a set of instances for which the correct category membership is known).

- Training data for which the category membership is already known.



### Regression

Computing a continuous value as compared to classification where the output is categorical. You know the value depends on certain inputs. 


- What will be the price of a stock on a given date

- How long will it take to commute from point A to point B

- What will be the sales of a product in a given week



#### Requires

- Training data

- Uses the training data to identify the function that relates the input and the output.

### Clustering

Uses the algorithm to identify what groups might exist. Once groups exist, can use a classification algorithm to assign a new user to a group.


- Dividing users of a social network into groups with common attributes
    - The groups to be divided into are unknown beforehand
    - Groups represent meaningful divisions like likes and dislikes or demographics 





### Recommendation

Use the user's past behavior to determine what else they might like or need.


Collaborative Filtering: method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users.



- What kind of artists a user would like

- What are the top 10 book picks for a user

- If a user buys an item, what else would they buy


