# Machine Learning
___
## Notes
In this notebook, we will discuss the features needed for our machine learning model. 

Specifically, we'll be talking about the following:
> [Overview](#Overview:-Feature-Engineering)
>
> [Feature Creation](#Feature-Creation)
>
> [Feature Evaluation](#Feature-Evaluation)
>
> [Feature Transformations](#Feature-Transformations)

We finish the notebook off with a [review](#Review) of everything discussed. 

___

> ### Overview: Feature Engineering
> ___
> **Machine Learning** refers to algorithms that use data to make predictions. 
>
> There are two types of machine learning: 
> - **Supervised Learning**: Inferring a function from labeled training data to make predictions on unseen data.
> 
> - **Unsupervised Learning**: Deriving structure from data where we don't know the effect of the variables. 
> 
> Last section, we looked at creating a vectorization that will be used as one feature for our model. However, we know that some other features may also prove to be quite useful, such as:
> - Length of the text
> - Percentage of characters that are punctuation
> - Percentage of characters that are capitalized
>
> We can also perform **transformations** on current data to create new features, such as:
> - Power transformations (square, square root, log, etc.)
> - Data standardization
>
> Of course, we need to be careful not to apply transformations on features that don't need to be transformed in the first place. We also need to make sure that the transformed feature is more informative than the untransformed feature. 
> 
> Let's jump in and see all of these concepts in action.

___

## Feature Creation

First, we'll read in our data as usual. 

In [1]:
import pandas as pd

data = pd.read_csv('SMSSpamCollection.tsv', sep='\t', header=None)
data.columns = ['label','text']