Skip to content

Sample techniques for a variety of feature extraction methods

License

Notifications You must be signed in to change notification settings

tdunning/feature-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Practical Feature Extraction

This repository contains a compendium of useful feature extraction techniques I have learned about over the years. If you have a favorite that I have missed, let me know.

Techniques covered (aspirationally)

Categorical

One-hot encoding

Hashed one-hot encoding

Unique ID

Binary encoding after sorting

Count encoding

Rank encoding

Rank-change

Naive Bayes Rate Encoding

Semantic embedding

tf.idf

Luduan terms *

Numerical

Binning *

Rounding

Log

Temporal

Day of week, Hour of day, Weekend/holiday indicators

Quadrature encodings

Distance to event

Lagged features

Geographical

Pre-clustering

S2 Geo Points

Proximity to cities

MSA

Zip3

Word-like and Text

tf.idf

Luduan terms

Semantic embeddings

Glove https://nlp.stanford.edu/projects/glove/?source=post_page

Indicator detection

IP Address

Reverse resolution

CIDR

CIDR prefix

Missing Data

As a special value (unknown word)

Means

Reverse model

Consolidation

Unknown word

Stemming

Parsing and Modeling

User agent

IP domains

Email address

Headers

Referrer

5P energy models

Scaling

Q scaling

Z scaling

Min-max scaling

Log

Cross modeling

Other models

Modeled structure

Word2vec

About

Sample techniques for a variety of feature extraction methods

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published