You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for this project. Just opening for future investigation. I am finding that training with more than one GPU using the basic pytorch DPP demo on CIFAR-10 results in NaN outputs after a few epochs. Training using a single gpu works great within the DPP framework.
Just in case someone ever has a similar issue with DPP: need to ensure proper usage of DPP.no_sync. In my case, it was necessary for adversarial training.
Hi, thanks for this project. Just opening for future investigation. I am finding that training with more than one GPU using the basic pytorch DPP demo on CIFAR-10 results in NaN outputs after a few epochs. Training using a single gpu works great within the DPP framework.
The implementation from [meliketoy](https://github.com/meliketoy/wide-resnet.pytorch works fine, but uses more gpu memory.
The text was updated successfully, but these errors were encountered: