Reduce explicit types required when instantiating the SGD optimizer. #28

saeta · 2019-02-24T01:40:59Z

Previously, in order to instantiate the Optimizer, you had to call
type(of: model) and pass that into the Optimizer constructor in order to get
type inference to pick the right type for Model. This could be a little
confusing for new users.

This commit proposes an alternate way to write this:

let optimizer = SGD(learningRate: 0.01, for: model)

The above formulation is clear and readable. It avoids any unnecessary typing
of types, and should be generalizable to the other optimizers. (This is left as
an exercise for the reader. :-D)

We rely on the Swift optimizer to do the right thing such that only a reference
to the model is passed to the SGD.init call so that we don't pay for the cost
of a full model copy (which could eventually be very expensive). This is deemed
to be a reasonably safe assumption, especially given the ABI standardization in
Swift 5.

Previously, in order to instantiate the Optimizer, you had to call `type(of: model)` and pass that into the Optimizer constructor in order to get type inference to pick the right type for `Model`. This could be a little confusing for new users. This commit proposes an alternate way to write this: ```swift let optimizer = SGD(learningRate: 0.01, for: model) ``` The above formulation is clear and readable. It avoids any unnecessary typing of types, and should be generalizable to the other optimizers. (This is left as an exercise for the reader. :-D) We rely on the Swift optimizer to do the right thing such that only a reference to the model is passed to the `SGD.init` call so that we don't pay for the cost of a full model copy (which could eventually be very expensive). This is deemed to be a reasonably safe assumption, especially given the ABI standardization in Swift 5.

lattner · 2019-02-24T02:50:28Z

Awesome, so much nicer and less foreign, thanks!

Random question for @rxwei and @dan-zheng - would marking the argument explicitly as __shared avoid the possibility of a copy?

rxwei · 2019-02-24T03:13:42Z

would marking the argument explicitly as __shared avoid the possibility of a copy?

Yep, that is exactly what I was going to suggest!

rxwei

Thanks! This looks like a pretty idiomatic use. My other suggestion is to apply this to all other optimizers at once -- they are still using metatypes right now. It would be quite a hassle to update broken models (and toolchains) in different patches.

Sources/DeepLearning/Optimizer.swift

Should go in after #28 gets merged.

jekbradbury · 2019-02-24T04:46:55Z

Any reason not to put the model argument first? I’m OK either way, but having it first would have been my initial intuition.

rxwei · 2019-02-24T05:09:59Z

Any reason not to put the model argument first? I’m OK either way, but having it first would have been my initial intuition.

I think that intuition is definitely plausible, but here are my reasons not to make for: model the first argument.

Per Swift API Design Guidelines, the first argument to an initializer call should not form a phrase starting with the base name. Therefore SGD(for: model, learningRate: ...) would be somewhat not idiomatic. If we made it look like SGD(model: model, learningRate: ...), then the argument label model: implies to me that SGD would hold a copy of model, which is not the case. We want to make it look like SGD is constructed for a model type, instead of owning a model.
Making for: model the last argument also helps reduce the salience and the necessity of passing the instance of a model and convey that it's only there for type inference. We still want an initializer without needing this model instance argument, but that can be added in a separate patch.

rxwei · 2019-02-24T05:25:09Z

Tests/DeepLearningTests/TrivialModelTests.swift

        }
-        let optimizer = SGD<Classifier, Float>(learningRate: 0.02)
        var classifier = Classifier(hiddenSize: 4)
+        let optimizer = SGD(learningRate: 0.02, for: classifier)


There's actually a much more serious issue: Have you actually tested it to make sure this really works?

I don't think this would train at all. Floating point literal 0.02 defaults to Double, so the Scalar associated type in SGD would be double. Then it'll try to get key paths to Tensor<Double>, which do not exist at all in a model where all parameters are Tensor<Float>. So my hypothesis is that this will compile but definitely fail to train. This also exposes how confusing it would be to not make optimizers take a currency scalar type explicitly or not make the model provide one.

Potential solutions:

Make the Layer protocol actually require a Scalar associated type as the currency type, but that would be really unusable because it would trigger a lot of protocol conformance errors when the user didn't specify a Scalar type alias.

Still require a scalar type explicitly in the optimizer. The way to do that is to require a "scalarType" metatype argument as before: When it's specified as an argument, everything generic parameter gets inferred; when it's not specified as an argument, the compiler would expect it to be provided as a generic type parameter. The crucial thing here is to make Scalar not be inferred from the learningRate: argument -- I thought about this earlier, and this is why I added an explicit scalarType: argument in the first place.

My conclusion is that our optimizers (without undergoing a massive change in all building blocks) really need to be explicit about what the parameter type is that they are optimizing. Given all the trade-offs, making the scalar type explicit on the optimizer side seems to be the most flexible approach that does not come at the cost of making mixed precision models hard to express.

I agree with solution 2: with the current design, scalar type should be made explicit for optimizers.

Yup, you're right that what happens is that it doesn't train. (I didn't notice because it compiled, and there was a previously failing test.)

I've filed TF-302 to follow up with the larger usability issue, and fixed up the API and tests so they're all passing now.

- Adding back in the scalarType - Fixing the tests - Updating the other optimizers

@jekbradbury

Thanks for the suggestion @jekbradbury. Definitely much better.

saeta · 2019-02-25T20:47:56Z

Whoops, just saw @rxwei 's response to @jekbradbury 's comment about ordering (after making the change to the code). Sorry! I'd like to understand this a bit further. Maybe we can all discuss this later today?

rxwei · 2019-02-25T20:53:54Z

The argument ordering is not a big deal as it's mainly about style -- sure, we can chat about it today. But I think it's critical to get the parameter type issue resolved before discussing other minor issues.

Sources/DeepLearning/Optimizer.swift

TF-302).

rxwei · 2019-02-26T20:20:29Z

This is breaking a lot of existing code that use Optimizer<Model, Float>(...). I'd suggested sending another PR that adds the original initializers back, treat those as the canonical initializers, and make the init(for:...) version call those canonical ones.

marcrasi · 2019-02-27T02:18:12Z

This is breaking a lot of existing code that use Optimizer<Model, Float>(...). I'd suggested sending another PR that adds the original initializers back, treat those as the canonical initializers, and make the init(for:...) version call those canonical ones.

I'll make a PR that does this!

saeta requested review from dan-zheng, lattner and rxwei February 24, 2019 01:41

rxwei approved these changes Feb 24, 2019

View reviewed changes

Sources/DeepLearning/Optimizer.swift Outdated Show resolved Hide resolved

rxwei added a commit that referenced this pull request Feb 24, 2019

Update README sample code to reflect #28.

e4c2be6

Should go in after #28 gets merged.

rxwei mentioned this pull request Feb 24, 2019

Update README sample code to reflect #28. #29

Closed

rxwei reviewed Feb 24, 2019

View reviewed changes

saeta added 4 commits February 25, 2019 20:19

Use __shared to ensure the model is never copied.

d559164

Merge branch 'master' into alt-optimizer-constructor

91c8f6a

Respond to review comments by:

ee6e57a

- Adding back in the scalarType - Fixing the tests - Updating the other optimizers

Move model to be first argument.

220c42e

Thanks for the suggestion @jekbradbury. Definitely much better.

rxwei reviewed Feb 25, 2019

View reviewed changes

Sources/DeepLearning/Optimizer.swift Outdated Show resolved Hide resolved

Remove the default for scalarType to avoid unexpected behavior (such as

68458f7

TF-302).

saeta merged commit 3e8d86e into master Feb 26, 2019

saeta deleted the alt-optimizer-constructor branch February 26, 2019 17:10

marcrasi mentioned this pull request Feb 27, 2019

add old optimizer initializers back #36

Merged

Reduce explicit types required when instantiating the SGD optimizer. #28

Reduce explicit types required when instantiating the SGD optimizer. #28

Uh oh!

Conversation

saeta commented Feb 24, 2019

Uh oh!

lattner commented Feb 24, 2019

Uh oh!

rxwei commented Feb 24, 2019

Uh oh!

rxwei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jekbradbury commented Feb 24, 2019

Uh oh!

rxwei commented Feb 24, 2019

Uh oh!

rxwei Feb 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dan-zheng Feb 24, 2019

Choose a reason for hiding this comment

Uh oh!

saeta Feb 25, 2019

Choose a reason for hiding this comment

Uh oh!

saeta commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxwei commented Feb 25, 2019

Uh oh!

Uh oh!

rxwei commented Feb 26, 2019

Uh oh!

marcrasi commented Feb 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rxwei Feb 24, 2019 •

edited

Loading

saeta commented Feb 25, 2019 •

edited

Loading