Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new layer initialization APIs with automatic shape computation #584

Closed
wants to merge 18 commits into from

Conversation

shadaj
Copy link
Contributor

@shadaj shadaj commented Jun 4, 2020

to be addressed:

  • AutoBatchNorm is tied to 2 dimensions + channel inputs, we should handle inputs of any dimension (an HList equivalent (Automatic Requirement Satisfaction?) could help here)
  • throw an error if the same key is used twice in the model and offer some alternative for attaching keys to layers in reused chunks

Model Porting Progress:

Copy link
Contributor

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions:

  1. Why do you separate the definition of the model from the buildModel call that provides the input shape? Could you describe the advantages of this API design vs taking the input right at the start?

  2. Why do you use a .then(...) construction, instead of just chaining the calls directly? (Some advantages I can think of for just chaining the calls directly include: (a) IDE autocomplete can work a bit nicer including automatically only having available what layers work on a given input shape, (b) avoids extra syntactical noise. Disadvantages include: when defining a layer, you also want to define an extension method on some protocol to make the autocomplete go.)

Great work so far! I'd definitely make sure to try out a NN with skip connections, such as ResNet sooner rather than later, to ensure your design appropriately takes those into account. (And, of course, as you noted previously, a network that uses weight sharing between layers (aka a network that reuses a layer multiple times).)

@saeta
Copy link
Contributor

saeta commented Jun 8, 2020

CC @dan-zheng @dabrahams

@shadaj shadaj force-pushed the layer-init-prototyping branch 2 times, most recently from 8e4da64 to adaaf94 Compare June 9, 2020 00:07
@shadaj
Copy link
Contributor Author

shadaj commented Jun 9, 2020

Responding to @saeta's questions:

  1. Passing the input at the beginning/end reduces to a decision about what the "root" node of a layer graph should be. If we pass input at the beginning, the root node for any graph would have to be a node that specifies the input shape. This makes it harder to break down complex graphs into independent chunks. For example, consider a situation where we we want to split out the middle two layers in a graph.
let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(Dense(...)).then(Dense(...)).then(Dense(...))

If we require input shapes to be specified at the beginning, we have to manually compute the input shape for the sub-graph, which introduces more overhead for the developer and also makes it impossible to re-use the subgraph in another model where the input shape is different.

let mySubModel = InputNode(shape: ...).then(Dense(...)).then(Dense(...))
let myModelDefinition = InputNode(shape: ...).then(Dense(...)).then(mySubModel).then(Dense(...))

If we have users specify the input shape at the end, however, this is solved more naturally because the entire model is composed without being fixed to a specific input shape. We can plug in any input shape at the end and produce a model instance with the appropriate propagated shapes

let mySubModel = Dense(...).then(Dense(...))
let myModelDefinition = Dense(...).then(mySubModel).then(Dense(...))

// elsewhere

var myModelInstance = myModelDefinition.buildModel(inputShape: ...)
  1. Couple reasons why I picked .then
  • guides type inference to automatically pick the correct Scalar type by looking at the previous layer in the graph (in the example code, we only have to specify Float for the first layer)
  • as you mentioned, guides IDE completion
  • supports any length of sequential layers since .then can build up a linked-list of types (compared to Sequential which AFAIK is limited in the number of layers since it is just code generated)

Not sure I understood the extension method issue, could you explain further?

@shadaj
Copy link
Contributor Author

shadaj commented Jun 9, 2020

Also, VGG16 works now, but before I proceed to implement more models with this API I'd like to discuss the type complexity being introduced by the AutoLayer protocol.

In the VGG16 implementation, for example, when breaking out the model into a function we have to explicitly state the return type. This type contains the full model structure, which results in some very long types. However, it seems that we need this type information in order to return the appropriate InstanceType when buildModel is called. Any ideas on how we could simplify/eliminate these long types?

@shadaj
Copy link
Contributor Author

shadaj commented Jun 11, 2020

Some early benchmark results:
ResNetCIFAR10 (training) - old: 543.36 examples/sec, new: 435.76 examples/sec
ResNetCIFAR10 (inference) - old: 5286.49 examples/sec, new: 5527.64 examples/sec

With x10:
ResNetCIFAR10 (training) - old: 718.85 examples/sec, new: 753.17 examples/sec
ResNetCIFAR10 (inference) - old: 31586.44 examples/sec, new: 31124.57 examples/sec

@8bitmp3
Copy link
Contributor

8bitmp3 commented Jun 11, 2020

@shadaj Awesome work—just saw this. Thanks for the efforts. Just wanted to give my 2 cents - I know some of the stuff is marked as TODO ... oh no 🙂 e.g. ...<AutoSequenced<AutoSequenced<AutoSequenced.... in VGG16. So, hopefully the new stuff will be human friendly and readable in the spirit of Swiftlang. 🙌

@saeta
Copy link
Contributor

saeta commented Jun 11, 2020

re:crazy types, I wonder if associated type inference of opaque result types might be able to help us out here.

@shadaj
Copy link
Contributor Author

shadaj commented Jun 11, 2020

The last commit (4a9719a) sets up the initial infrastructure for a LayerModule that uses properties as a way to get automatic type inference for abstracted layer chunks. Right now, explicit types are still needed when there are multiple lines in the layer creation logic, but with swiftlang/swift#32223 we should have full type inference.

@@ -33,7 +33,7 @@ let denseDef = AutoConv2D<Float>(filterShape: (5, 5), outputChannels: 6, padding
.then(AutoDense(outputSize: 84, activation: relu))
.then(KeyedAutoLayer(AutoDense(outputSize: 10), key: lastDenseKey))

var (classifier, keys) = denseDef.buildModelWithKeys(inputShape: (28, 28, 1))
var classifier = denseDef.buildModel(inputShape: (28, 28, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @shadaj and other instances of simpler code. var classifier = ... appears friendlier for the folks who use Pythonic frameworks. Also +1 for the separateBuiltAutoLayer struct and I think the naming is in line with other struct - KeyedAutoLayer - so it makes sense.

@shadaj shadaj changed the title Prototype new layer initialization APIs with automatic shape computation Introduce new layer initialization APIs with automatic shape computation Jun 16, 2020
@shadaj shadaj force-pushed the layer-init-prototyping branch 4 times, most recently from 190b886 to 56b4cdb Compare June 17, 2020 18:05
@shadaj shadaj marked this pull request as ready for review June 17, 2020 18:21
@shadaj
Copy link
Contributor Author

shadaj commented Jun 18, 2020

Interestingly, running with -cross-module-optimization results in the linker crashing with SIGBUS (Misaligned address error), but this crash doesn't happen on master. Need to investigate further...

Copy link
Contributor

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about your approach and looking over this code again. I think you're right that being able to specify how to build a model and have that be reusable for a variety of different shapes is a good idea. But I feel like there's a lot of boilerplate required to make a new AutoLayer. While we should care very much about the API to compose pre-written layers, I think it'd also be a good idea to think carefully about the API for making new layers as well.

This led me to wondering: is there a way to have less ceremony without losing the separation between model specification and shape specification. I haven't fully thought this through, but I wonder if closures could actually be that.

Said another way, the difference between:

var model = Sequential<Float>(shape: (1, 2, 3))
  .conv(...)
  .flatten
  .dense(output: 10)

and

let modelBuilder = Sequential<Float>()
   .conv(...)
   .flatten
   .dense(output: 10)
var model = modelBuilder.build(shape: (1, 2, 3))

is delayed binding of the shape. But I think we can get the simplicity of the former with the decoupling of the latter by using a closure / function.

func makeModel(shape: Shape) -> Model {
  return ...
}

Maybe said another way: your design of separating the two is essentially curry-ing the model initialization to separate out other hyperparameters from the shape sizes. One commonly used point in design space for this is to return a closure.

Does that make sense?

@@ -0,0 +1,61 @@
public func intTupleToArray(tuple: Any) -> [Int] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential idea: consider making a dedicated type that is expressible by array literal (https://developer.apple.com/documentation/swift/expressiblebyarrayliteral)

@shadaj
Copy link
Contributor Author

shadaj commented Jun 19, 2020

@saeta hmm, not sure I completely understand how the shape-first approach simplifies layer definitions. Right now, the reason why we have AutoModule instead of just regular functions is for type inference, but functions do work as well. The only issue with functions is that the return type (which is very long) would have to be explicitly included in the code.

I do really like the chained API style to reduce .then calls for simple sequential models. I'll work on getting that added and using that in the examples, which should give us a better idea of the complexity of the end-user experience.

Copy link
Contributor

@saeta saeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not explaining it well. When reading your code, it looked like there were a lot of structs that had an init to store some values, and just had a single other function to execute their behavior. I've highlighted a few of them below. Does that help?

Comment on lines +4 to +20
let axis: Int
let momentum: Scalar
let epsilon: Scalar

public typealias InstanceType = BatchNorm<Scalar>
public typealias InputShape = Shape
public typealias OutputShape = Shape

public init(
axis: Int = -1,
momentum: Scalar = 0.99,
epsilon: Scalar = 0.001
) {
self.axis = axis
self.momentum = momentum
self.epsilon = epsilon
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This to me looks just like a closure capture.

Comment on lines +4 to +38
let filterShape: (Int, Int)
let outputChannels: Int
let strides: (Int, Int)
let padding: Padding
let dilations: (Int, Int)
let activation: Conv2D<Scalar>.Activation
let useBias: Bool
let filterInitializer: ParameterInitializer<Scalar>
let biasInitializer: ParameterInitializer<Scalar>

public typealias InstanceType = Conv2D<Scalar>
public typealias InputShape = (Int, Int, Int)
public typealias OutputShape = (Int, Int, Int)

public init(
filterShape: (Int, Int),
outputChannels: Int,
strides: (Int, Int) = (1, 1),
padding: Padding = .valid,
dilations: (Int, Int) = (1, 1),
activation: @escaping Conv2D<Scalar>.Activation = identity,
useBias: Bool = true,
filterInitializer: @escaping ParameterInitializer<Scalar> = glorotUniform(),
biasInitializer: @escaping ParameterInitializer<Scalar> = zeros()
) {
self.filterShape = filterShape
self.outputChannels = outputChannels
self.strides = strides
self.padding = padding
self.dilations = dilations
self.activation = activation
self.useBias = useBias
self.filterInitializer = filterInitializer
self.biasInitializer = biasInitializer
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also just looks like a closure capture.

Comment on lines +4 to +14
let outputSize: Int;
let activation: Dense<Scalar>.Activation

public typealias InstanceType = Dense<Scalar>
public typealias InputShape = Int
public typealias OutputShape = Int

public init(outputSize: Int, activation: @escaping Dense<Scalar>.Activation = identity) {
self.outputSize = outputSize
self.activation = activation
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like a closure capture.

Comment on lines 3 to 15
public struct AutoFunction<Input: Differentiable, Output: Differentiable, InputShape, OutputShape>: AutoLayer {
let fnShape: (InputShape) -> OutputShape
let fn: @differentiable (Input) -> Output

public typealias InstanceType = Function<Input, Output>
public typealias InputShape = InputShape
public typealias OutputShape = OutputShape

public init(fnShape: @escaping (InputShape) -> OutputShape, fn: @escaping @differentiable (Input) -> Output) {
self.fnShape = fnShape
self.fn = fn
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also appears to be a hand-written closure capture.

Comment on lines +17 to +29
public struct KeyedAutoLayer<Underlying: AutoLayer>: AutoLayer {
let underlying: Underlying
let key: AutoLayerKey<InstanceType>

public typealias InstanceType = Underlying.InstanceType
public typealias InputShape = Underlying.InputShape
public typealias OutputShape = Underlying.OutputShape

public init(_ underlying: Underlying, key: AutoLayerKey<InstanceType>) {
self.underlying = underlying
self.key = key
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks a bit like a closure capture.

Comment on lines +4 to +21
let poolSize: (Int, Int)
let strides: (Int, Int)
let padding: Padding

public typealias InstanceType = AvgPool2D<Scalar>
public typealias InputShape = (Int, Int, Int)
public typealias OutputShape = (Int, Int, Int)

public init(
poolSize: (Int, Int),
strides: (Int, Int) = (1, 1),
padding: Padding = .valid
) {
self.poolSize = poolSize
self.strides = strides
self.padding = padding
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like a closure capture.

(Also below.)

@shadaj
Copy link
Contributor Author

shadaj commented Jun 19, 2020

Ah, that makes a lot more sense now! I think I have an idea that can get the best of both worlds by implementing these layers as functions that return something similar to AutoFunction but with a function that returns a layer instance instead. Will work on implementing this!

Copy link
Contributor

@dabrahams dabrahams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of any sort of documentation comments makes it very difficult for me to say anything useful review-wise. I can't tell what these components are supposed to be doing.

@saeta
Copy link
Contributor

saeta commented Aug 20, 2020

I recommend we convert this PR to a draft, as we're not ready to actually merge in the approach based on this direction, and instead continue to refine our thinking per tensorflow/swift#515

@saeta saeta marked this pull request as draft August 20, 2020 02:19
@shabalind
Copy link
Contributor

Closing for now. Feel free to reopen if there is more progress!

@shabalind shabalind closed this Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants