Learning rate schedulers#56
Conversation
|
@dfalbel I'd like to help track down what's causing the output differences on Windows (I'm seeing it too and hadn't isolated yet whether it was CUDA or Windows, but this looks like it's Windows). @topepo We're using {luz} in {tidybert} (relatively big changes are actively in progress so don't look too close at that before ~tomorrow), so I'm likely going to re-implement this idea with slight changes for the {luz} version. Any caveats to watch for (other than the OS differences)? |
|
PyTorch (and LibTorch) doesn't really ensure strong reproducibility across platforms and hardware.
See eg: https://pytorch.org/docs/stable/notes/randomness.html We don't have a wrapper for |
|
It's really odd where/when the differences occur. The same snapshots run fine but then later have differences. Since all I have is a Mac, I'm going to isolate snapshots to that OS. I guess when we add gpu support here I've have to figure out a way to test and develop with gpu capabilities. |
|
@jonthegeek As mentioned above, it is very difficult to predict when/where the differences occur. It doesn't seem random but does change over time, even when the code does not. |
Yeah, we're running into that much already, although the cases we have are machine-stable so far. We use |
|
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Closes #12
The schedulers were made in R instead of using the torch functions. For constant rates,
rate_schedule = "none"is used (and is the default).