Introduce new layer initialization APIs with automatic shape computation #584

shadaj · 2020-06-04T23:52:12Z

to be addressed:

AutoBatchNorm is tied to 2 dimensions + channel inputs, we should handle inputs of any dimension (an HList equivalent (Automatic Requirement Satisfaction?) could help here)
throw an error if the same key is used twice in the model and offer some alternative for attaching keys to layers in reused chunks

Model Porting Progress:

saeta

Couple questions:

Why do you separate the definition of the model from the buildModel call that provides the input shape? Could you describe the advantages of this API design vs taking the input right at the start?
Why do you use a .then(...) construction, instead of just chaining the calls directly? (Some advantages I can think of for just chaining the calls directly include: (a) IDE autocomplete can work a bit nicer including automatically only having available what layers work on a given input shape, (b) avoids extra syntactical noise. Disadvantages include: when defining a layer, you also want to define an extension method on some protocol to make the autocomplete go.)

Great work so far! I'd definitely make sure to try out a NN with skip connections, such as ResNet sooner rather than later, to ensure your design appropriately takes those into account. (And, of course, as you noted previously, a network that uses weight sharing between layers (aka a network that reuses a layer multiple times).)

saeta · 2020-06-08T23:17:22Z

CC @dan-zheng @dabrahams

shadaj · 2020-06-09T18:12:48Z

Responding to @saeta's questions:

Passing the input at the beginning/end reduces to a decision about what the "root" node of a layer graph should be. If we pass input at the beginning, the root node for any graph would have to be a node that specifies the input shape. This makes it harder to break down complex graphs into independent chunks. For example, consider a situation where we we want to split out the middle two layers in a graph.

let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(Dense(...)).then(Dense(...)).then(Dense(...))

If we require input shapes to be specified at the beginning, we have to manually compute the input shape for the sub-graph, which introduces more overhead for the developer and also makes it impossible to re-use the subgraph in another model where the input shape is different.

let mySubModel = InputNode(shape: ...).then(Dense(...)).then(Dense(...))
let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(mySubModel).then(Dense(...))

If we have users specify the input shape at the end, however, this is solved more naturally because the entire model is composed without being fixed to a specific input shape. We can plug in any input shape at the end and produce a model instance with the appropriate propagated shapes

let mySubModel = Dense(...).then(Dense(...))
let myModelDefinition = Dense(...).then(mySubModel).then(Dense(...))

// elsewhere

var myModelInstance = myModelDefinition.buildModel(inputShape: ...)

Couple reasons why I picked .then

guides type inference to automatically pick the correct Scalar type by looking at the previous layer in the graph (in the example code, we only have to specify Float for the first layer)
as you mentioned, guides IDE completion
supports any length of sequential layers since .then can build up a linked-list of types (compared to Sequential which AFAIK is limited in the number of layers since it is just code generated)

Not sure I understood the extension method issue, could you explain further?

shadaj · 2020-06-09T18:17:28Z

Also, VGG16 works now, but before I proceed to implement more models with this API I'd like to discuss the type complexity being introduced by the AutoLayer protocol.

In the VGG16 implementation, for example, when breaking out the model into a function we have to explicitly state the return type. This type contains the full model structure, which results in some very long types. However, it seems that we need this type information in order to return the appropriate InstanceType when buildModel is called. Any ideas on how we could simplify/eliminate these long types?

shadaj · 2020-06-11T18:28:07Z

Some early benchmark results:
ResNetCIFAR10 (training) - old: 543.36 examples/sec, new: 435.76 examples/sec
ResNetCIFAR10 (inference) - old: 5286.49 examples/sec, new: 5527.64 examples/sec

With x10:
ResNetCIFAR10 (training) - old: 718.85 examples/sec, new: 753.17 examples/sec
ResNetCIFAR10 (inference) - old: 31586.44 examples/sec, new: 31124.57 examples/sec

8bitmp3 · 2020-06-11T20:19:25Z

@shadaj Awesome work—just saw this. Thanks for the efforts. Just wanted to give my 2 cents - I know some of the stuff is marked as TODO ... oh no 🙂 e.g. ...<AutoSequenced<AutoSequenced<AutoSequenced.... in VGG16. So, hopefully the new stuff will be human friendly and readable in the spirit of Swiftlang. 🙌

saeta · 2020-06-11T20:29:29Z

re:crazy types, I wonder if associated type inference of opaque result types might be able to help us out here.

shadaj · 2020-06-11T22:28:44Z

The last commit (4a9719a) sets up the initial infrastructure for a LayerModule that uses properties as a way to get automatic type inference for abstracted layer chunks. Right now, explicit types are still needed when there are multiple lines in the layer creation logic, but with swiftlang/swift#32223 we should have full type inference.

8bitmp3 · 2020-06-15T20:04:10Z

Examples/LeNet-MNIST/main.swift

@@ -33,7 +33,7 @@ let denseDef = AutoConv2D<Float>(filterShape: (5, 5), outputChannels: 6, padding
  .then(AutoDense(outputSize: 84, activation: relu))
  .then(KeyedAutoLayer(AutoDense(outputSize: 10), key: lastDenseKey))

-var (classifier, keys) = denseDef.buildModelWithKeys(inputShape: (28, 28, 1))
+var classifier = denseDef.buildModel(inputShape: (28, 28, 1))


Thanks for this @shadaj and other instances of simpler code. var classifier = ... appears friendlier for the folks who use Pythonic frameworks. Also +1 for the separateBuiltAutoLayer struct and I think the naming is in line with other struct - KeyedAutoLayer - so it makes sense.

shadaj · 2020-06-18T22:31:40Z

Interestingly, running with -cross-module-optimization results in the linker crashing with SIGBUS (Misaligned address error), but this crash doesn't happen on master. Need to investigate further...

saeta

I was thinking about your approach and looking over this code again. I think you're right that being able to specify how to build a model and have that be reusable for a variety of different shapes is a good idea. But I feel like there's a lot of boilerplate required to make a new AutoLayer. While we should care very much about the API to compose pre-written layers, I think it'd also be a good idea to think carefully about the API for making new layers as well.

This led me to wondering: is there a way to have less ceremony without losing the separation between model specification and shape specification. I haven't fully thought this through, but I wonder if closures could actually be that.

Said another way, the difference between:

var model = Sequential<Float>(shape: (1, 2, 3))
  .conv(...)
  .flatten
  .dense(output: 10)

and

let modelBuilder = Sequential<Float>()
   .conv(...)
   .flatten
   .dense(output: 10)
var model = modelBuilder.build(shape: (1, 2, 3))

is delayed binding of the shape. But I think we can get the simplicity of the former with the decoupling of the latter by using a closure / function.

func makeModel(shape: Shape) -> Model {
  return ...
}

Maybe said another way: your design of separating the two is essentially curry-ing the model initialization to separate out other hyperparameters from the shape sizes. One commonly used point in design space for this is to return a closure.

Does that make sense?

saeta · 2020-06-19T19:29:06Z

Models/LayerInit/TupleToArray.swift

@@ -0,0 +1,61 @@
+public func intTupleToArray(tuple: Any) -> [Int] {


One potential idea: consider making a dedicated type that is expressible by array literal (https://developer.apple.com/documentation/swift/expressiblebyarrayliteral)

shadaj · 2020-06-19T20:04:10Z

@saeta hmm, not sure I completely understand how the shape-first approach simplifies layer definitions. Right now, the reason why we have AutoModule instead of just regular functions is for type inference, but functions do work as well. The only issue with functions is that the return type (which is very long) would have to be explicitly included in the code.

I do really like the chained API style to reduce .then calls for simple sequential models. I'll work on getting that added and using that in the examples, which should give us a better idea of the complexity of the end-user experience.

saeta

Sorry for not explaining it well. When reading your code, it looked like there were a lot of structs that had an init to store some values, and just had a single other function to execute their behavior. I've highlighted a few of them below. Does that help?

saeta · 2020-06-19T20:09:16Z

Models/LayerInit/AutoBatchNorm.swift

+    let axis: Int
+    let momentum: Scalar
+    let epsilon: Scalar
+
+    public typealias InstanceType = BatchNorm<Scalar>
+    public typealias InputShape = Shape
+    public typealias OutputShape = Shape
+
+    public init(
+        axis: Int = -1,
+        momentum: Scalar = 0.99,
+        epsilon: Scalar = 0.001
+    ) {
+        self.axis = axis
+        self.momentum = momentum
+        self.epsilon = epsilon
+    }


This to me looks just like a closure capture.

saeta · 2020-06-19T20:09:35Z

Models/LayerInit/AutoConv.swift

+    let filterShape: (Int, Int)
+    let outputChannels: Int
+    let strides: (Int, Int)
+    let padding: Padding
+    let dilations: (Int, Int)
+    let activation: Conv2D<Scalar>.Activation
+    let useBias: Bool
+    let filterInitializer: ParameterInitializer<Scalar>
+    let biasInitializer: ParameterInitializer<Scalar>
+
+    public typealias InstanceType = Conv2D<Scalar>
+    public typealias InputShape = (Int, Int, Int)
+    public typealias OutputShape = (Int, Int, Int)
+
+    public init(
+        filterShape: (Int, Int),
+        outputChannels: Int,
+        strides: (Int, Int) = (1, 1),
+        padding: Padding = .valid,
+        dilations: (Int, Int) = (1, 1),
+        activation: @escaping Conv2D<Scalar>.Activation = identity,
+        useBias: Bool = true,
+        filterInitializer: @escaping ParameterInitializer<Scalar> = glorotUniform(),
+        biasInitializer: @escaping ParameterInitializer<Scalar> = zeros()
+    ) {
+        self.filterShape = filterShape
+        self.outputChannels = outputChannels
+        self.strides = strides
+        self.padding = padding
+        self.dilations = dilations
+        self.activation = activation
+        self.useBias = useBias
+        self.filterInitializer = filterInitializer
+        self.biasInitializer = biasInitializer
+    }


This also just looks like a closure capture.

saeta · 2020-06-19T20:09:57Z

Models/LayerInit/AutoDense.swift

+    let outputSize: Int;
+    let activation: Dense<Scalar>.Activation
+
+    public typealias InstanceType = Dense<Scalar>
+    public typealias InputShape = Int
+    public typealias OutputShape = Int
+
+    public init(outputSize: Int, activation: @escaping Dense<Scalar>.Activation = identity) {
+        self.outputSize = outputSize
+        self.activation = activation
+    }


This also looks like a closure capture.

saeta · 2020-06-19T20:10:24Z

Models/LayerInit/AutoFunction.swift

+public struct AutoFunction<Input: Differentiable, Output: Differentiable, InputShape, OutputShape>: AutoLayer {
+    let fnShape: (InputShape) -> OutputShape
+    let fn: @differentiable (Input) -> Output
+
+    public typealias InstanceType = Function<Input, Output>
+    public typealias InputShape = InputShape
+    public typealias OutputShape = OutputShape
+
+    public init(fnShape: @escaping (InputShape) -> OutputShape, fn: @escaping @differentiable (Input) -> Output) {
+        self.fnShape = fnShape
+        self.fn = fn
+    }
+


This also appears to be a hand-written closure capture.

saeta · 2020-06-19T20:11:06Z

Models/LayerInit/AutoLayerKey.swift

+public struct KeyedAutoLayer<Underlying: AutoLayer>: AutoLayer {
+    let underlying: Underlying
+    let key: AutoLayerKey<InstanceType>
+
+    public typealias InstanceType = Underlying.InstanceType
+    public typealias InputShape = Underlying.InputShape
+    public typealias OutputShape = Underlying.OutputShape
+
+    public init(_ underlying: Underlying, key: AutoLayerKey<InstanceType>) {
+        self.underlying = underlying
+        self.key = key
+    }
+


This also looks a bit like a closure capture.

saeta · 2020-06-19T20:11:29Z

Models/LayerInit/AutoPool.swift

+    let poolSize: (Int, Int)
+    let strides: (Int, Int)
+    let padding: Padding
+
+    public typealias InstanceType = AvgPool2D<Scalar>
+    public typealias InputShape = (Int, Int, Int)
+    public typealias OutputShape = (Int, Int, Int)
+
+    public init(
+        poolSize: (Int, Int),
+        strides: (Int, Int) = (1, 1),
+        padding: Padding = .valid
+    ) {
+        self.poolSize = poolSize
+        self.strides = strides
+        self.padding = padding
+    }
+


This also looks like a closure capture.

(Also below.)

shadaj · 2020-06-19T20:52:53Z

Ah, that makes a lot more sense now! I think I have an idea that can get the best of both worlds by implementing these layers as functions that return something similar to AutoFunction but with a function that returns a layer instance instead. Will work on implementing this!

dabrahams

The lack of any sort of documentation comments makes it very difficult for me to say anything useful review-wise. I can't tell what these components are supposed to be doing.

Models/LayerInit/AutoFunction.swift

saeta · 2020-08-20T02:19:50Z

I recommend we convert this PR to a draft, as we're not ready to actually merge in the approach based on this direction, and instead continue to refine our thinking per tensorflow/swift#515

shabalind · 2020-11-18T18:15:05Z

Closing for now. Feel free to reopen if there is more progress!

saeta reviewed Jun 8, 2020

View reviewed changes

shadaj force-pushed the layer-init-prototyping branch 2 times, most recently from 8e4da64 to adaaf94 Compare June 9, 2020 00:07

shadaj force-pushed the layer-init-prototyping branch from c7f28db to 5ed492b Compare June 9, 2020 19:22

This was referenced Jun 10, 2020

[WIP]: Exploring removing <Float> everywhere. #465

Closed

[WIP]: Idea exploring easier layer init. tensorflow/swift-apis#883

Closed

shadaj force-pushed the layer-init-prototyping branch 6 times, most recently from 939de51 to 9d0d1ce Compare June 11, 2020 00:04

8bitmp3 reviewed Jun 15, 2020

View reviewed changes

shadaj changed the title ~~Prototype new layer initialization APIs with automatic shape computation~~ Introduce new layer initialization APIs with automatic shape computation Jun 16, 2020

saeta assigned dabrahams and saeta Jun 17, 2020

shadaj force-pushed the layer-init-prototyping branch 4 times, most recently from 190b886 to 56b4cdb Compare June 17, 2020 18:05

shadaj marked this pull request as ready for review June 17, 2020 18:21

shadaj added 2 commits June 17, 2020 11:22

Initial prototype for layer init with automatic shapes

a11892f

Don't expose output shape type in buildModel

ebb4e3e

shadaj added 8 commits June 17, 2020 11:22

Update CMake config

0a24d9b

Rename AutoSequencedDefinition to AutoSequenced for naming concistency

e13728a

Explore LayerModule protocol to enable better type inference

630c952

Implement initial API for accessing instance layers

f1628ed

Simplify API for getting layers by key

edbe93c

Simplify ResNet block definition

97a4eac

Add support for models that reuse layers

741bdc2

Update CMake config

7f69b64

shadaj force-pushed the layer-init-prototyping branch from 56b4cdb to 7f69b64 Compare June 17, 2020 18:23

Dynamically unpack tuples where input shape dimension is unknown

82d9d83

shadaj added kokoro:run kokoro:force-run and removed kokoro:run labels Jun 17, 2020

kokoro-team removed the kokoro:force-run label Jun 17, 2020

Use tuple unpacking to check shapes when building AutoReuseLayer

fff5176

saeta reviewed Jun 19, 2020

View reviewed changes

dabrahams suggested changes Jun 22, 2020

View reviewed changes

Add initial docs

26ebd4e

dabrahams reviewed Jun 22, 2020

View reviewed changes

Models/LayerInit/AutoFunction.swift Outdated Show resolved Hide resolved

Use triple-slash style for docs

402b4f2

saeta mentioned this pull request Jun 23, 2020

Exploring structural generic programming and layer APIs #613

Closed

saeta marked this pull request as draft August 20, 2020 02:19

shabalind closed this Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce new layer initialization APIs with automatic shape computation #584

Introduce new layer initialization APIs with automatic shape computation #584

shadaj commented Jun 4, 2020 •

edited

Loading

saeta left a comment

saeta commented Jun 8, 2020

shadaj commented Jun 9, 2020

shadaj commented Jun 9, 2020

shadaj commented Jun 11, 2020 •

edited

Loading

8bitmp3 commented Jun 11, 2020 •

edited

Loading

saeta commented Jun 11, 2020

shadaj commented Jun 11, 2020 •

edited

Loading

8bitmp3 Jun 15, 2020

shadaj commented Jun 18, 2020

saeta left a comment

saeta Jun 19, 2020

shadaj commented Jun 19, 2020

saeta left a comment

saeta Jun 19, 2020

saeta Jun 19, 2020

saeta Jun 19, 2020

saeta Jun 19, 2020

saeta Jun 19, 2020

saeta Jun 19, 2020

shadaj commented Jun 19, 2020

dabrahams left a comment

saeta commented Aug 20, 2020

shabalind commented Nov 18, 2020

		@@ -0,0 +1,61 @@
		public func intTupleToArray(tuple: Any) -> [Int] {

Introduce new layer initialization APIs with automatic shape computation #584

Introduce new layer initialization APIs with automatic shape computation #584

Conversation

shadaj commented Jun 4, 2020 • edited Loading

saeta left a comment

Choose a reason for hiding this comment

saeta commented Jun 8, 2020

shadaj commented Jun 9, 2020

shadaj commented Jun 9, 2020

shadaj commented Jun 11, 2020 • edited Loading

8bitmp3 commented Jun 11, 2020 • edited Loading

saeta commented Jun 11, 2020

shadaj commented Jun 11, 2020 • edited Loading

Choose a reason for hiding this comment

shadaj commented Jun 18, 2020

saeta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shadaj commented Jun 19, 2020

saeta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shadaj commented Jun 19, 2020

dabrahams left a comment

Choose a reason for hiding this comment

saeta commented Aug 20, 2020

shabalind commented Nov 18, 2020

shadaj commented Jun 4, 2020 •

edited

Loading

shadaj commented Jun 11, 2020 •

edited

Loading

8bitmp3 commented Jun 11, 2020 •

edited

Loading

shadaj commented Jun 11, 2020 •

edited

Loading