Word2Vec in Julia
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src Julia 0.3 compat May 26, 2015
test
.gitignore Word2Vec.jl generated files. Apr 26, 2015
.travis.yml Word2Vec.jl generated files. Apr 26, 2015
LICENSE.md allow in Julia 0.3. update readme, license May 26, 2015
README.md allow in Julia 0.3. update readme, license May 26, 2015
REQUIRE Julia 0.3 compat May 26, 2015

README.md

Word2Vec

Build Status

  • Create an instance of WordEmbedding: embed = WordEmbedding(100, Word2Vec.random_inited, Word2Vec.huffman_tree, subsampling = subsampling)
  • To train sequentially: train(embed, inputfile)
  • Alternatively, to train in parallel:
    • add worker nodes: addprocs(N)
    • chunk the input file using Blocks: b = Block(File(inputfile), nworkers())
    • start training the chunks, also provide a filename that will be used to exchange data between workers and master node: train(embed, b, "/tmp/emb")
  • After successful training, query for similar words: find_nearest_words(embed, "query words")

This is still work in progress. Parallel training with weight averaging does not yeild very good results. May need to implement asynchronous stochastic gradient descent used by Mikolov 2013.

Datasets

Credits

This is based on this original code by Zhixuan Yang (https://github.com/yangzhixuan/embed)