Permalink
Switch branches/tags
Nothing to show
Commits on Dec 17, 2013
  1. updated tokenization udfs (now does ngrams, uses newer version of luc…

    thedatachef committed Dec 17, 2013
    …ene), modern tfidf macro that works with pig 0.12
Commits on Jan 30, 2013
  1. Merge pull request #2 from rjurney/master

    thedatachef committed Jan 30, 2013
    Stanford Tokenizer works, filtered tokens < 3 chars
Commits on Jan 29, 2013
Commits on Mar 13, 2012
Commits on May 3, 2011
  1. major fixes all around, turns out everything is parallelizable and th…

    thedatachef committed May 3, 2011
    …eres no need for UDFS to compute similarities or centroids
Commits on Apr 29, 2011
  1. fixme

    thedatachef committed Apr 29, 2011
  2. cluster_documents is now much more scalable (cant really do centroids…

    thedatachef committed Apr 29, 2011
    … in one process for humungous vectors), little script to check convergence of k means, and updated the readme
Commits on Apr 27, 2011
Commits on Apr 26, 2011
  1. crufty shell scripts are not the way foward, use pig for now, figure …

    thedatachef committed Apr 26, 2011
    …out how to register the jars yourself
  2. updated readme to reflect runner, not convinced having a separate run…

    thedatachef committed Apr 26, 2011
    …ner is a great idea yet, probably a better way to autoregister jars
  3. licenses and all that jazz

    thedatachef committed Apr 26, 2011
Commits on Apr 25, 2011