Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple implementation of the "hashing trick" for featurization #154

Merged
merged 3 commits into from
Apr 9, 2013

Conversation

avibryant
Copy link
Contributor

This produces fixed-size feature vectors given any number of key/value pairs, for any value type V:Group and key type that can be converted to a byte array. This is most directly inspired by Jeremy Hoon's recent post on hash kernels, but is better know from its use in Vowpal Wabbit etc - see http://en.wikipedia.org/wiki/Feature_hashing

@@ -0,0 +1,20 @@
package com.twitter.algebird
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright header

@azymnis
Copy link
Contributor

azymnis commented Apr 8, 2013

Cool stuff. We have been doing exactly this internally, but not within algebird.

@@ -7,6 +7,6 @@ addSbtPlugin("com.typesafe.sbt" % "sbt-ghpages" % "0.5.0")

addSbtPlugin("com.twitter" % "sbt-gitflow" % "0.1.0")

addSbtPlugin("com.jsuereth" % "xsbt-gpg-plugin" % "0.6")
//addSbtPlugin("com.jsuereth" % "xsbt-gpg-plugin" % "0.6")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment? We use this to publish.

johnynek added a commit that referenced this pull request Apr 9, 2013
Simple implementation of the "hashing trick" for featurization
@johnynek johnynek merged commit 7378e5e into twitter:develop Apr 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants