# Three major areas
* Feature extraction and engineering
* Feature scaling
* Feature Selection

# Feature extraction and engineering
"Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied Machine Learning' is basically feature engineering." - Prof. Andrew Ng

# Why Feature Engineering?
* **Better representation of data**: Features are basically various representations of the underlying raw data. These representations can be better understood by Machine Learning algorithms. Besides this, we can also often easily visualize these representations. A simple example would be to visualize the frequent word occurrences of a newspaper article as opposed to being totally perplexed as to what to do with the raw text!
* **Better performing models**: The right features tend to give models that outperform other models no matter how complex the algorithm is. In general if you have the right feature set, even a simple model will perform well and give desired results. In short, better features make better models.
* **Essential for model building and evaluation**: We have mentioned this numerous times by now, raw data cannot be used to build Machine Learning models. Get your data, extract features, and start building models! Also on evaluating model performance and tuning the models, you can reiterate over your feature set to choose the right set of features to get the best model.
* **More flexibility on data types**: While is it definitely easier to use numeric data types directly with Machine Learning algorithms with little or no data transformations, the real challenge is to build models on more complex data types like text, images, and even videos. Feature engineering helps us build models on diverse data types by applying necessary transformations and enables us to work even on complex unstructured data.
* **Emphasis on the business and domain**: Data scientists and analysts are usually busy in processing, cleaning data and building models as a part of their day to day tasks. This often creates a gap between the business stakeholders and the technical/analytics team. Feature engineering involves and enables data scientists to take a step back and try to understand the domain and the business better, by taking valuable inputs from the business and subject matter experts. This is necessary to create and select features that might be useful for building the right model to solve the problem. Pure statistical and mathematical knowledge is rarely sufficient to solve a complex real-world problem. Hence feature engineering emphasizes to focus on the business and the domain of the problem when building features.

# How Do You Engineer Features?
* Numeric data
* Categorical data
* Text data
* Temporal data
* Image data

# Numeric data
* Values
* Counts
* Binaraization
* Rounding
* Interactions
    * PolynomialFeatures
* Binning
    * Fixed-Width Binning
        * pd.cut(.., bins=.., labels=..)
    * Adaptive Binning
        * Quantile-Based
            * pd.qcut()
* Statistical Transformations
    * Log Transform
    * Box-Cox Transform


# Categorical data
* Transforming Nominal Features
    * Label encoder
* Transforming Ordinal Features
    * Map function
* One Hot Encoding scheme
* Dummy Coding scheme (n-1)
* Effect Coding scheme (replace all 0 to -1 for dummy coding)
* Bin-counting scheme e.g. IP address
* Feature Hashing scheme e.g. FeatureHasher

# Text data
* Pre-processing and normalizing text
    * Text Tokenization and lower casing
    * Removing special characters
    * Contraction expansion
    * Removing stopwords
    * Correcting spellings
    * Stemming
    * Lemmatization
* Feature extraction and engineering
* Bag of words model
* Bag of n-grams model
* TF-IDF Model (Term Frequency Inverse Document Frequency)
* Document Similarity e.g. cosine distance, BM25 distance, Hellinger-Bhattacharya distance, jaccard distance
* Topic Model
* Word Embedding

# Temporal data
* Date-based features
    * Year, Month, Day, DayOfWeek, DayName, DayOfYear, WeekOfYear, Quarter
* Time-based features
    * Hour, Minute, Second, MicroSecond, UTC_offset
    * TimeOfDayBin : 
        * [-1,5,11,16,21,23] 
        * ['Late Night', 'Morning', 'Afternoon', 'Evening', 'Night']
* Elapsed time difference

# Image data
* Image metadata features
    * image create date and time
    * image dimensions
    * image compression format
    * device make and model
    * image resolution and aspect ratio
    * image artist
    * flash, aerture, focal length, and exposure
* Raw Image and Channel Pixels
    * RGB
* Grayscale image pixels
    * Y = 0.2125*R + 0.7154*G + 0.0721*B  [0,1] [black, white]
* Binning Image Intensity Distribution
* Image aggregation statistics
    * describe
* Edge Detection
    * sklearn - canny
* Object detection
    * sklearn - hog
* Localized feature extraction
    * mahotas.feature.surf
* Visual Bag of Words Model
* Automated Feature Engineering with Deep Learning

# Feature scaling
- from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
- Standardized Scaling
    - mean :0, var : 1
- Mix-Max Scaling
- Robust Scaling
    * (X - median) / IQR

# Feature Selection
* Filter methods
    * Threshold-base Methods e.g. word frequency
    * Statistical methods
        * mutual information
        * ANOVA (analysis of variance)
        * chi-square test
        * SelectKBest chi2
        * For regression: f_regression, mutual_info_regression
        * For classification: chi2, f_classif, mutual_info_classif
* Wrapper methods
    * Recursive Feature Elimination
* Embedded methods
    * Model-Based Selection e.g. feature importance of random forests, decision trees
* Dimensionality Reduction
    * PCA
