This repository contains a compendium of useful feature extraction techniques I have learned about over the years. If you have a favorite that I have missed, let me know.
One-hot encoding
Hashed one-hot encoding
Unique ID
Binary encoding after sorting
Count encoding
Rank encoding
Rank-change
Naive Bayes Rate Encoding
Semantic embedding
tf.idf
Luduan terms *
Binning *
Rounding
Log
Day of week, Hour of day, Weekend/holiday indicators
Quadrature encodings
Distance to event
Lagged features
Pre-clustering
S2 Geo Points
Proximity to cities
MSA
Zip3
tf.idf
Luduan terms
Semantic embeddings
Glove https://nlp.stanford.edu/projects/glove/?source=post_page
Indicator detection
Reverse resolution
CIDR
CIDR prefix
As a special value (unknown word)
Means
Reverse model
Unknown word
Stemming
User agent
IP domains
Email address
Headers
Referrer
5P energy models
Q scaling
Z scaling
Min-max scaling
Log
Other models
Modeled structure
Word2vec