-
Notifications
You must be signed in to change notification settings - Fork 138
Add a helper function for sequential models. #20
Conversation
Many deep learning models are composed of sequential layers stacked one on
top of each other. It can be relatively tedious to write out the explicit
`applied(to:)` function because it's fairly repetitive and the underlying
intent is relatively obscured. (It can be especially bothersome because
it's the 2nd (or 3rd) time you're writing out all the layers. (The first time
is to declare all the instance variables, and the second time (if necessary)
is in the initializer.)
Fortunately, with a single helper functions, we can make everything both type
safe as well as convenient and easily expressible & readable!
This commit adds a family of `Sequential` functions that take in a context, an
input, and a variable number of layers. It chains through the output of one
layer into the input of the next.
This API approach has a number of advantages:
1. It avoids introducing new symbolic operators, which can be very confusing
to new users.
2. It works with today's AutoDiff implementation. (Yay!)
3. It is very readable and clean.
4. It avoids users "getting stuck". Concretely, if someone implemented a model
using my previously proposed `>>>` operator, if they wanted to add a
residual (or skip) connection, they would have to basically re-write their
whole model using a struct, etc. With this API structure, only "local"
changes are required. (e.g. If only one skip-connection is required, they
can split the sequential chain into two pieces.)
Downsides of this approach:
1. It doesn't DRY-out the types required to define a model. (I have some
thoughts here, but there isn't enough room in this
margin^H^H^H^H^H^Hcommit message.)
2. We should think hard about how things should look when we have loops.
3. I'm sure there's a better way to code-gen out all the different Sequential
airities. (I got bored hand-writing them out after 4...) Suggestions
welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposed direction!
(Feedback below is copied from internal chat, but I'll continue that in the open.)
There are always tradeoffs between sugar operators and methods with clear names. Swift standard library APIs (or Swift libraries in general) tend to discourage functional sugars. It's certainly not necessarily the guideline to follow for every domain-specific library, but we should start with clear named APIs and carefully consider the consequences of each proposed sugar.
I have the following concerns:
- Naming. Swift function names should use camel case. Moreover, function names should be clear about the operation -- "sequential(in:from:...)" is not doing a good job at indicating this function is doing a chain of function applications because it feels like returning a sequence.
- Usually a gerund is better in Swift. However, the argument label
from:is still unclear.func sequencing<...>(in context: Context, from input: Input, _ layer1: Layer1, ...) -> LayerN.Output
- We can use
through:to make it read better, but it's overly verbose. So, tradeoffs.func sequencing<...>(in context: Context, from input: Input, through layer1: Layer1, ...) -> LayerN.Output
- Usually a gerund is better in Swift. However, the argument label
- Free functions should be introduced with extra care. They are generally discouraged because they will 1) mess up code completion and 2) are not quite idiomatic unless they have lots of common precedents, e.g.
sin(_:),matmul(_:_:). In this case, I do not think it should be a top-level function. Instead, I'd very much prefer to define this as a protocol extension method inDifferentiable:This approach would make the API easy to discover without messing up top-level code completion, and conforms to Swift API design guidelines. And it makes the input not juxtaposed with layers, emphasizing that theextension Differentiable { func sequenced<...>(in context: Context, through layer1: Layer1, _ layer2: Layer2, ...) -> LayerN.Output }
inputthe source of what's being sequenced. Of course, it would look much much better without the context argument, but we don't support implicit arguments yet.func applied(to input: Input) -> Output { return input.sequenced(in: context, through: layer1, layer2, layer3, ...) }
Sources/DeepLearning/Layer.swift
Outdated
| } | ||
|
|
||
| @differentiable(wrt: (input, l1, l2)) | ||
| public func Sequential<L1: Layer, L2: Layer>(in context: Context, from input: L1.Input, _ l1: L1, _ l2: L2) -> L2.Output where L1.Output == L2.Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function names should always use camel case.
…ntax and avoiding polluting the global function namespace. Also switch to camelCase.
rxwei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
Sources/DeepLearning/Layer.swift
Outdated
| } | ||
| } | ||
|
|
||
| extension Differentiable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use public extension so you can drop publics from individual methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; fixed!
| @differentiable(wrt: (self, l1, l2)) | ||
| public func sequenced<L1: Layer, L2: Layer>( | ||
| in context: Context, through l1: L1, _ l2: L2) | ||
| -> L2.Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to leave comments about indentation, but I thought "it's gonna be gyb'd anyway" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't have my editor set up to do the indentation automatically for me. :-( Sorry!
Many deep learning models are composed of sequential layers stacked one on
top of each other. It can be relatively tedious to write out the explicit
applied(to:)function because it's fairly repetitive and the underlyingintent is relatively obscured. (It can be especially bothersome because
it's the 2nd (or 3rd) time you're writing out all the layers. (The first time
is to declare all the instance variables, and the second time (if necessary)
is in the initializer.)
Fortunately, with a single helper functions, we can make everything both type
safe as well as convenient and easily expressible & readable!
This commit adds a family of
Sequentialfunctions that take in a context, aninput, and a variable number of layers. It chains through the output of one
layer into the input of the next.
This API approach has a number of advantages:
It avoids introducing new symbolic operators, which can be very confusing
to new users.
It works with today's AutoDiff implementation. (Yay!)
It is very readable and clean.
It avoids users "getting stuck". Concretely, if someone implemented a model
using my previously proposed
>>>operator, if they wanted to add aresidual (or skip) connection, they would have to basically re-write their
whole model using a struct, etc. With this API structure, only "local"
changes are required. (e.g. If only one skip-connection is required, they
can split the sequential chain into two pieces.)
Downsides of this approach:
It doesn't DRY-out the types required to define a model. (I have some
thoughts here, but there isn't enough room in this
margin^H^H^H^H^H^Hcommit message.)
We should think hard about how things should look when we have loops.
I'm sure there's a better way to code-gen out all the different Sequential
airities. (I got bored hand-writing them out after 4...) Suggestions
welcome!