Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Size Ablations #91

Closed
fan23j opened this issue Jun 17, 2024 · 1 comment
Closed

Batch Size Ablations #91

fan23j opened this issue Jun 17, 2024 · 1 comment
Labels
question Further information is requested

Comments

@fan23j
Copy link

fan23j commented Jun 17, 2024

Hi,

Thank you for your work and well-organized repo. Reading through the paper, I was unable to locate ablations on effect of batch size (or effective batch size) on the generation performance. Could you provide any insight into how batch size affects the quality of video generation? In particular if using effective batch size through gradient accumulation steps, would you increase the total training iters to compensate?

Intuitively, it would be obvious that a higher batch size correlates with better performance (as shown through the efficacy of image-video joint training), but I was curious whether the benefits tapered off at all with the specific model size since the whole pipeline is relatively expensive to train, especially if we have to scale for gradient accumulation steps.

Thanks.

@maxin-cn
Copy link
Collaborator

Hi,

Thank you for your work and well-organized repo. Reading through the paper, I was unable to locate ablations on effect of batch size (or effective batch size) on the generation performance. Could you provide any insight into how batch size affects the quality of video generation? In particular if using effective batch size through gradient accumulation steps, would you increase the total training iters to compensate?

Intuitively, it would be obvious that a higher batch size correlates with better performance (as shown through the efficacy of image-video joint training), but I was curious whether the benefits tapered off at all with the specific model size since the whole pipeline is relatively expensive to train, especially if we have to scale for gradient accumulation steps.

Thanks.

Thanks for your interest. I also think a larger batch size leads to better performance. But in my experience so far, using gradient accumulative does not provide significant gains for text-to-video tasks.

@maxin-cn maxin-cn added the question Further information is requested label Jun 21, 2024
@fan23j fan23j closed this as completed Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants