New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark eager vs. graph? #2
Comments
Ok I guess my intuition is not that great. I just did some actual benchmark for training the full yolov3-tiny network on 1 epoch of the VOC2012 dataset with Tesla M60 (comparable to GTX 1060). keras.fit eager: train: 157s Eager is slower than graph under keras.fit, but gradient tape by it self is not slower Worth noting that Gradient tape didn't have any of the keras metrics or callbacks, so that might contribute to the speed up. I tried to compile GradientTape with @tf.function, but did not see any performance change. Now for the maintainability point. If you want battery-included callbacks and metrics, use keras.fit and adjust the eager flag as needed. For more customized training control use gradient tapes. Either way, there is never a case where you have to keep both keras.fit or gradient tape. I did it here only to showcase the different ways to train models in tf2.0. However, for inference there is a pretty big difference on eager and graph mode. Yolov3 on 608x608 takes about 200ms in eager mode but only 120 in graph mode. |
Thanks for the details! |
@zzh8829 what changes did you make to switch from eager mode to graph mode? I tried to do this by adding |
Thanks for sharing this work! It's definitely interesting to see an example of what TF 2.0 is going to be to work with.
I have one question: in your readme you mention:
Could you expand on this? Maybe share some numbers?
I'm interested because the advent of eager execution has always been to have imperative programming (for easier workflow) while not losing too much in performance. If it turns out however that for practical purposes it's not feasible to train in eager mode, one would have to maintain separate training loops, like you've done in
train.py
. It seems to me this would be detrimental to maintainability of TF2 repositories. Do you have a view on this?The text was updated successfully, but these errors were encountered: