-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new layer initialization APIs with automatic shape computation #584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple questions:
-
Why do you separate the definition of the model from the
buildModel
call that provides the input shape? Could you describe the advantages of this API design vs taking the input right at the start? -
Why do you use a
.then(...)
construction, instead of just chaining the calls directly? (Some advantages I can think of for just chaining the calls directly include: (a) IDE autocomplete can work a bit nicer including automatically only having available what layers work on a given input shape, (b) avoids extra syntactical noise. Disadvantages include: when defining a layer, you also want to define an extension method on some protocol to make the autocomplete go.)
Great work so far! I'd definitely make sure to try out a NN with skip connections, such as ResNet sooner rather than later, to ensure your design appropriately takes those into account. (And, of course, as you noted previously, a network that uses weight sharing between layers (aka a network that reuses a layer multiple times).)
8e4da64
to
adaaf94
Compare
Responding to @saeta's questions:
let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(Dense(...)).then(Dense(...)).then(Dense(...)) If we require input shapes to be specified at the beginning, we have to manually compute the input shape for the sub-graph, which introduces more overhead for the developer and also makes it impossible to re-use the subgraph in another model where the input shape is different. let mySubModel = InputNode(shape: ...).then(Dense(...)).then(Dense(...))
let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(mySubModel).then(Dense(...)) If we have users specify the input shape at the end, however, this is solved more naturally because the entire model is composed without being fixed to a specific input shape. We can plug in any input shape at the end and produce a model instance with the appropriate propagated shapes let mySubModel = Dense(...).then(Dense(...))
let myModelDefinition = Dense(...).then(mySubModel).then(Dense(...))
// elsewhere
var myModelInstance = myModelDefinition.buildModel(inputShape: ...)
Not sure I understood the extension method issue, could you explain further? |
Also, VGG16 works now, but before I proceed to implement more models with this API I'd like to discuss the type complexity being introduced by the In the VGG16 implementation, for example, when breaking out the model into a function we have to explicitly state the return type. This type contains the full model structure, which results in some very long types. However, it seems that we need this type information in order to return the appropriate |
c7f28db
to
5ed492b
Compare
939de51
to
9d0d1ce
Compare
Some early benchmark results: With x10: |
@shadaj Awesome work—just saw this. Thanks for the efforts. Just wanted to give my 2 cents - I know some of the stuff is marked as |
re:crazy types, I wonder if associated type inference of opaque result types might be able to help us out here. |
The last commit (4a9719a) sets up the initial infrastructure for a |
Examples/LeNet-MNIST/main.swift
Outdated
@@ -33,7 +33,7 @@ let denseDef = AutoConv2D<Float>(filterShape: (5, 5), outputChannels: 6, padding | |||
.then(AutoDense(outputSize: 84, activation: relu)) | |||
.then(KeyedAutoLayer(AutoDense(outputSize: 10), key: lastDenseKey)) | |||
|
|||
var (classifier, keys) = denseDef.buildModelWithKeys(inputShape: (28, 28, 1)) | |||
var classifier = denseDef.buildModel(inputShape: (28, 28, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @shadaj and other instances of simpler code. var classifier = ...
appears friendlier for the folks who use Pythonic frameworks. Also +1 for the separateBuiltAutoLayer
struct and I think the naming is in line with other struct - KeyedAutoLayer
- so it makes sense.
190b886
to
56b4cdb
Compare
56b4cdb
to
7f69b64
Compare
Interestingly, running with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about your approach and looking over this code again. I think you're right that being able to specify how to build a model and have that be reusable for a variety of different shapes is a good idea. But I feel like there's a lot of boilerplate required to make a new AutoLayer. While we should care very much about the API to compose pre-written layers, I think it'd also be a good idea to think carefully about the API for making new layers as well.
This led me to wondering: is there a way to have less ceremony without losing the separation between model specification and shape specification. I haven't fully thought this through, but I wonder if closures could actually be that.
Said another way, the difference between:
var model = Sequential<Float>(shape: (1, 2, 3))
.conv(...)
.flatten
.dense(output: 10)
and
let modelBuilder = Sequential<Float>()
.conv(...)
.flatten
.dense(output: 10)
var model = modelBuilder.build(shape: (1, 2, 3))
is delayed binding of the shape. But I think we can get the simplicity of the former with the decoupling of the latter by using a closure / function.
func makeModel(shape: Shape) -> Model {
return ...
}
Maybe said another way: your design of separating the two is essentially curry-ing the model initialization to separate out other hyperparameters from the shape sizes. One commonly used point in design space for this is to return a closure.
Does that make sense?
@@ -0,0 +1,61 @@ | |||
public func intTupleToArray(tuple: Any) -> [Int] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential idea: consider making a dedicated type that is expressible by array literal (https://developer.apple.com/documentation/swift/expressiblebyarrayliteral)
@saeta hmm, not sure I completely understand how the shape-first approach simplifies layer definitions. Right now, the reason why we have I do really like the chained API style to reduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not explaining it well. When reading your code, it looked like there were a lot of struct
s that had an init
to store some values, and just had a single other function to execute their behavior. I've highlighted a few of them below. Does that help?
let axis: Int | ||
let momentum: Scalar | ||
let epsilon: Scalar | ||
|
||
public typealias InstanceType = BatchNorm<Scalar> | ||
public typealias InputShape = Shape | ||
public typealias OutputShape = Shape | ||
|
||
public init( | ||
axis: Int = -1, | ||
momentum: Scalar = 0.99, | ||
epsilon: Scalar = 0.001 | ||
) { | ||
self.axis = axis | ||
self.momentum = momentum | ||
self.epsilon = epsilon | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to me looks just like a closure capture.
let filterShape: (Int, Int) | ||
let outputChannels: Int | ||
let strides: (Int, Int) | ||
let padding: Padding | ||
let dilations: (Int, Int) | ||
let activation: Conv2D<Scalar>.Activation | ||
let useBias: Bool | ||
let filterInitializer: ParameterInitializer<Scalar> | ||
let biasInitializer: ParameterInitializer<Scalar> | ||
|
||
public typealias InstanceType = Conv2D<Scalar> | ||
public typealias InputShape = (Int, Int, Int) | ||
public typealias OutputShape = (Int, Int, Int) | ||
|
||
public init( | ||
filterShape: (Int, Int), | ||
outputChannels: Int, | ||
strides: (Int, Int) = (1, 1), | ||
padding: Padding = .valid, | ||
dilations: (Int, Int) = (1, 1), | ||
activation: @escaping Conv2D<Scalar>.Activation = identity, | ||
useBias: Bool = true, | ||
filterInitializer: @escaping ParameterInitializer<Scalar> = glorotUniform(), | ||
biasInitializer: @escaping ParameterInitializer<Scalar> = zeros() | ||
) { | ||
self.filterShape = filterShape | ||
self.outputChannels = outputChannels | ||
self.strides = strides | ||
self.padding = padding | ||
self.dilations = dilations | ||
self.activation = activation | ||
self.useBias = useBias | ||
self.filterInitializer = filterInitializer | ||
self.biasInitializer = biasInitializer | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also just looks like a closure capture.
let outputSize: Int; | ||
let activation: Dense<Scalar>.Activation | ||
|
||
public typealias InstanceType = Dense<Scalar> | ||
public typealias InputShape = Int | ||
public typealias OutputShape = Int | ||
|
||
public init(outputSize: Int, activation: @escaping Dense<Scalar>.Activation = identity) { | ||
self.outputSize = outputSize | ||
self.activation = activation | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks like a closure capture.
Models/LayerInit/AutoFunction.swift
Outdated
public struct AutoFunction<Input: Differentiable, Output: Differentiable, InputShape, OutputShape>: AutoLayer { | ||
let fnShape: (InputShape) -> OutputShape | ||
let fn: @differentiable (Input) -> Output | ||
|
||
public typealias InstanceType = Function<Input, Output> | ||
public typealias InputShape = InputShape | ||
public typealias OutputShape = OutputShape | ||
|
||
public init(fnShape: @escaping (InputShape) -> OutputShape, fn: @escaping @differentiable (Input) -> Output) { | ||
self.fnShape = fnShape | ||
self.fn = fn | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also appears to be a hand-written closure capture.
public struct KeyedAutoLayer<Underlying: AutoLayer>: AutoLayer { | ||
let underlying: Underlying | ||
let key: AutoLayerKey<InstanceType> | ||
|
||
public typealias InstanceType = Underlying.InstanceType | ||
public typealias InputShape = Underlying.InputShape | ||
public typealias OutputShape = Underlying.OutputShape | ||
|
||
public init(_ underlying: Underlying, key: AutoLayerKey<InstanceType>) { | ||
self.underlying = underlying | ||
self.key = key | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks a bit like a closure capture.
let poolSize: (Int, Int) | ||
let strides: (Int, Int) | ||
let padding: Padding | ||
|
||
public typealias InstanceType = AvgPool2D<Scalar> | ||
public typealias InputShape = (Int, Int, Int) | ||
public typealias OutputShape = (Int, Int, Int) | ||
|
||
public init( | ||
poolSize: (Int, Int), | ||
strides: (Int, Int) = (1, 1), | ||
padding: Padding = .valid | ||
) { | ||
self.poolSize = poolSize | ||
self.strides = strides | ||
self.padding = padding | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks like a closure capture.
(Also below.)
Ah, that makes a lot more sense now! I think I have an idea that can get the best of both worlds by implementing these layers as functions that return something similar to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of any sort of documentation comments makes it very difficult for me to say anything useful review-wise. I can't tell what these components are supposed to be doing.
I recommend we convert this PR to a draft, as we're not ready to actually merge in the approach based on this direction, and instead continue to refine our thinking per tensorflow/swift#515 |
Closing for now. Feel free to reopen if there is more progress! |
to be addressed:
AutoBatchNorm
is tied to 2 dimensions + channel inputs, we should handle inputs of any dimension (anHList
equivalent (Automatic Requirement Satisfaction?) could help here)Model Porting Progress: