Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2vec on GPU slower than CPU #13048

Closed
manneshiva opened this issue Sep 14, 2017 · 4 comments
Closed

Word2vec on GPU slower than CPU #13048

manneshiva opened this issue Sep 14, 2017 · 4 comments

Comments

@manneshiva
Copy link

System information

  • OS Platform and Distribution: Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 1.3.0
  • Python version: 2.7.12
  • Bazel version (if compiling from source): 0.5.0
  • CUDA/cuDNN version: 8.0/6.0
  • GPU model and memory: NVIDIA GTX 1060 / 3GB
  • Docker used: yes
  • I picked up the code from the word2vec example on your official repo and made a few changes.The core code to train word2vec remains the same.

Describe the problem

I have been working on benchmarking commonly used frameworks/libraries for unsupervised learning of word embeddings(word2vec). I am currently comparing tensorflow(cpu/gpu), gensim, deeplearning4j and the original c code on standard metrics like training time, peak memory usage and quality of learned vectors. Link to my github repo (still working on it). I ran the benchmark on text8 corpus(plan to run it on a much larger corpus later for the true picture) which gave me strange results.

  • Tensorflow on GPU is much slower than CPU
  • Tensorflow is much slower than other frameworks

Is this behavior expected? Would appreciate any inputs.

Source code / logs

Link to tensorflow code
Link to results of sample benchmark on text8 corpus

@manneshiva manneshiva changed the title Word2vec GPU slower than Word2vec CPU Word2vec on GPU slower than CPU Sep 14, 2017
@cy89
Copy link

cy89 commented Sep 15, 2017

A few thoughts:
Our tutorial code tends to be written to maximize clarity rather than to maximize performance. It's not surprising that tutorial code wouldn't necessarily run very efficiently, on either CPU or GPU.

I don't see anything in the word2vec code that suggests that it's been optimized to work on a GPU.
Embeddings, by their nature, tend to emphasize fine-grained, random memory lookups. That plays much less to the strengths of the GPU.

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@cy89 cy89 closed this as completed Sep 15, 2017
@piskvorky
Copy link

piskvorky commented Sep 18, 2017

@cy89 do you know of a more optimized implementation of word2vec in TensorFlow (less tutorial-ish)?

@GuoleiSun
Copy link

Same problem. Word2vec cpu is 10 times faster than word2vec gpu. Yes, it is very surprising! But this is what I got. Both cpu and gpu are slow. Waste a lot of time studying the code and modifying the code.

@ticlazau
Copy link

Hello,

I have the same problem on TensorFlow 1.8 running word2vec_optimized.py on a system with Volta GPUs.

Rgds,
FM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants