-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] adding support for "iter_size" like hyperparameters (Caffe) #54
Comments
That is a cool trick. I am not aware of a plan right now but as we start working with converging FP16 this might be something that is done to test larger batch sizes. I will leave this open so everyone on the perf team can see it. If we do something, I will try to update the ticket and if not close it in 30-60 days. |
This can be done by writing a new |
@ppwwyyxx TensorPack has everything. :-) I like looking through your code and examples. |
@tfboyd thanks for your reply. If supporting the case is easy with what @ppwwyyxx has now, I would hope this feature be supported soon. |
Merge internal changes into public repository (change 184035298)
Hi, thanks a lot for sharing this awesome project.
I wonder if the code currently support the Caffe "iter_size" like hyperparameter? That is, accumulating gradients for "iter_size" number of batches and then apply the gradient. By using this hyperparameter, one can emulate the training with larger batch_size without distributed training. When the bathc_size is set to, let's say 64, and iter_size set to ITER_SIZE, then the effective batch_size will be 64*ITER_SIZE since all the gradients in ITER_SIZE batches are accumulated.
Is this doable in current code? Is there any plan for supporting this feature?
Thank you.
The text was updated successfully, but these errors were encountered: