Learning rate schedulers #56

topepo · 2022-09-14T16:13:55Z

Closes #12

The schedulers were made in R instead of using the torch functions. For constant rates, rate_schedule = "none" is used (and is the default).

…o schedulers

jonthegeek · 2022-09-15T13:19:56Z

@dfalbel I'd like to help track down what's causing the output differences on Windows (I'm seeing it too and hadn't isolated yet whether it was CUDA or Windows, but this looks like it's Windows).

@topepo We're using {luz} in {tidybert} (relatively big changes are actively in progress so don't look too close at that before ~tomorrow), so I'm likely going to re-implement this idea with slight changes for the {luz} version. Any caveats to watch for (other than the OS differences)?

R/mlp-fit.R

dfalbel · 2022-09-15T14:30:19Z

PyTorch (and LibTorch) doesn't really ensure strong reproducibility across platforms and hardware.

Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

See eg: https://pytorch.org/docs/stable/notes/randomness.html

We don't have a wrapper for torch.use_deterministic_algorithms() though in R which could help isolating the issues.

topepo · 2022-09-15T15:17:43Z

It's really odd where/when the differences occur. The same snapshots run fine but then later have differences. Since all I have is a Mac, I'm going to isolate snapshots to that OS.

I guess when we add gpu support here I've have to figure out a way to test and develop with gpu capabilities.

topepo · 2022-09-15T15:38:57Z

@jonthegeek As mentioned above, it is very difficult to predict when/where the differences occur. It doesn't seem random but does change over time, even when the code does not.

jonthegeek · 2022-09-15T15:49:24Z

@jonthegeek As mentioned above, it is very difficult to predict when/where the differences occur. It doesn't seem random but does change over time, even when the code does not.

Yeah, we're running into that much already, although the cases we have are machine-stable so far. We use torch::torch_manual_seed() fairly liberally, though. I have a note in a test that it seemed to make things more stable to call it once at the top of the test and then again when getting results. Even with that I still get different results on my Windows PC + CUDA vs even the same machine running torch in a docker container... BUT it seems to be one set for my PC and a second set for non-Windows (I haven't gotten torch + CUDA to work in docker yet to finish isolating it past there, though).

github-actions · 2023-01-11T00:38:26Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

topepo added 16 commits March 7, 2022 12:12

initial work for mlp models

a8c15a5

update tests

50be5f8

unit tests

caf3179

rename _learn_rate_* to schedule_*

b1d7136

Update news

d007f78

begin to harmonize learn_rate options and schedulers

79e4797

change "constant" scheduler to "none"

5fa5530

fix some documentation issues

d354a43

GHA update

24d3d73

differences in OS results; skipping some snapshots on linux

95bc9f4

Merge branch 'main' into schedulers

b35701b

donttest and fewer significant digits

9bd66fb

Merge branch 'schedulers' of https://github.com/tidymodels/brulee int…

1c00d43

…o schedulers

update snapshots

7d38248

another skip due to OS differences

cd85cb3

dont check for on windows

0bf5a90

topepo requested a review from dfalbel September 15, 2022 11:52

This was referenced Sep 15, 2022

learning rate scheduler parameters tidymodels/dials#253

Merged

add LBFGS optimizer with mlp fitting #57

Merged

added back args: 'c("--no-multiarch", "--no-manual")'

e7df98b

dfalbel reviewed Sep 15, 2022

View reviewed changes

R/mlp-fit.R Show resolved Hide resolved

note on optimizers

49c8e3e

topepo merged commit fe9ebff into main Sep 15, 2022

topepo deleted the schedulers branch September 15, 2022 18:32

topepo mentioned this pull request Sep 15, 2022

different results on different operating systems #29

Closed

github-actions bot locked and limited conversation to collaborators Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate schedulers #56

Learning rate schedulers #56

topepo commented Sep 14, 2022

jonthegeek commented Sep 15, 2022

dfalbel commented Sep 15, 2022

topepo commented Sep 15, 2022

topepo commented Sep 15, 2022

jonthegeek commented Sep 15, 2022

github-actions bot commented Jan 11, 2023

Learning rate schedulers #56

Learning rate schedulers #56

Conversation

topepo commented Sep 14, 2022

jonthegeek commented Sep 15, 2022

dfalbel commented Sep 15, 2022

topepo commented Sep 15, 2022

topepo commented Sep 15, 2022

jonthegeek commented Sep 15, 2022

github-actions bot commented Jan 11, 2023