Practical Feature Extraction

This repository contains a compendium of useful feature extraction techniques I have learned about over the years. If you have a favorite that I have missed, let me know.

Techniques covered (aspirationally)

Categorical

One-hot encoding

Hashed one-hot encoding

Unique ID

Binary encoding after sorting

Count encoding

Rank encoding

Rank-change

Naive Bayes Rate Encoding

Semantic embedding

tf.idf

Luduan terms *

Numerical

Binning *

Rounding

Log

Temporal

Day of week, Hour of day, Weekend/holiday indicators

Quadrature encodings

Distance to event

Lagged features

Geographical

Pre-clustering

S2 Geo Points

Proximity to cities

MSA

Zip3

Word-like and Text

tf.idf

Luduan terms

Semantic embeddings

Glove https://nlp.stanford.edu/projects/glove/?source=post_page

Indicator detection

IP Address

Reverse resolution

CIDR

CIDR prefix

Missing Data

As a special value (unknown word)

Means

Reverse model

Consolidation

Unknown word

Stemming

Parsing and Modeling

User agent

IP domains

Email address

Headers

Referrer

5P energy models

Scaling

Q scaling

Z scaling

Min-max scaling

Log

Cross modeling

Other models

Modeled structure

Word2vec

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
java		java
src/python		src/python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
categorical.md		categorical.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

java

java

src/python

src/python

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

categorical.md

categorical.md

Repository files navigation

Practical Feature Extraction

Techniques covered (aspirationally)

Categorical

Numerical

Temporal

Geographical

Word-like and Text

IP Address

Missing Data

Consolidation

Parsing and Modeling

Scaling

Cross modeling

About

Releases

Packages

Contributors 2

Languages

License

tdunning/feature-extraction

Folders and files

Latest commit

History

Repository files navigation

Practical Feature Extraction

Techniques covered (aspirationally)

Categorical

Numerical

Temporal

Geographical

Word-like and Text

IP Address

Missing Data

Consolidation

Parsing and Modeling

Scaling

Cross modeling

About

Resources

License

Stars

Watchers

Forks

Languages