tensorflow · marcrasi · Aug 11, 2018 · Aug 11, 2018 · Aug 11, 2018 · Aug 11, 2018
diff --git a/proposals/ImplicitCopyWarnings.md b/proposals/ImplicitCopyWarnings.md
@@ -0,0 +1,144 @@
+# Implicit Copy Warning Improvements
+
+* Author: [Marc Rasi](https://github.com/marcrasi)
+
+## Introduction
+
+Currently, implicit copy warnings are very noisy. For example, [this simple
+model] produces [these warnings]. (See the "Performance Predictability" section
+of [Graph Program Extraction] for more information about implicit copy
+warnings).
+
+I propose that we clean up the warnings as follows:
+1. Emit no warnings for data transferred while the program is starting (e.g.
+   training data being copied to the GPU) or ending (e.g. final weights being
+   copied to the CPU).
+2. Within the program, only warn when data makes a round trip between devices:
+   when a piece of data moves from device A to device B, then a computation
+   using that data happens on device B, and then the result of that computation
+   moves back to device A.
+
+Concretely, this proposal eliminates all warnings in [the example].
+
+Since the round-trip-rule might be hard to implement, I also propose that we
+initially implement a simple heuristic that approximates the round-trip-rule:
+Warn for all transfers from the host to the accelerator, but do not warn for any
+transfers from the accelerator to the host.
+
+Since all round trips involve a transfer from the host to the accelerator, the
+heuristic catches all transfers that the round-trip-rule catches.
+
+[Graph Program Extraction]: https://github.com/tensorflow/swift/blob/master/docs/GraphProgramExtraction.md
+[this simple model]: ./ImplicitCopyWarnings/LinearRegression.swift
+[the example]: ./ImplicitCopyWarnings/LinearRegression.swift
+[these warnings]: ./ImplicitCopyWarnings/LinearRegression-warnings.txt
+
+## Justification
+
+The main purpose of implicit copy warnings is to alert the user when the Swift
+for TensorFlow programming model causes their program to have unexpectedly bad
+performance.
+
+Users expect their programs to start off by transferring data to an accelerator,
+they expect their programs to occasionally send debugging or status information
+(e.g. model loss) back to the CPU for display or logging purposes, and they
+expect their programs to send results back to the CPU when they finish. So we
+should not produce warnings for any of these things. For example, this code,
+which does all of those things, should not produce any warnings:
+
+```swift
+public func train(inputs: Tensor<Float>, outputs: Tensor<Float>, initialWeights: Tensor<Float>) -> Tensor<Float> {
+  var weights = initialWeights
+  for step in 0...1000 {
+    let predictions = inputs • weights
+    let errors = predictions - outputs
+    let dweights = (errors * inputs).sum(alongAxes: 0).transposed()
+    weights -= 0.01 * dweights
+
+    if (step % 100 == 0) {
+      print("Current weights: \(weights)") // Notice that `weights` gets copied to the CPU
+    }
+  }
+  return weights
+}
+```
+
+What we want to avoid is unexpectedly blocking users' computations. This can
+happen when the user writes some code that (unbeknownst to them) forces some
+computation to happen on the CPU before computation on the accelerator can
+proceed. For example, suppose that `cpuOnlyComputation` runs a computation that
+can only happen on the CPU. Then this training loop blocks on data transfer and
+CPU computation every iteration:
+
+```swift
+public func train(inputs: Tensor<Float>, outputs: Tensor<Float>, initialWeights: Tensor<Float>) -> Tensor<Float> {
+  var weights = initialWeights
+  for step in 0...1000 {
+    let predictions = cpuOnlyComputation(inputs • weights)
+    let errors = predictions - outputs
+    let dweights = (errors * inputs).sum(alongAxes: 0).transposed()
+    weights -= 0.01 * dweights
+
+    if (step % 100 == 0) {
+      print("Current weights: \(weights)")
+    }
+  }
+  return weights
+}
+```
+
+We should emit a warning in the above code so that the user is aware that their
+training loop is blocking on data transfer and CPU computation.
+
+The round-trip-rule achieves exactly what we want in these examples! So does the
+heuristic of warning for all transfers from the host to the accelerator.
+
+## False-positive / False-negative tradeoff
+
+The round-trip-rule does not catch all slowness related to implicit copies. For
+example, a program that frequently dumps large pieces of data from the GPU to
+the CPU might soak up GPU memory with data waiting to be copied, and the
+round-trip-rule will not catch that.
+
+However, the round-trip-rule does capture what I currently believe will be the
+main source of performance unpredictability (accidentally writing code that
+moves a piece of a computation onto a different device), so I propose that it's
+a good initial balance between false positives and false negatives.
+
+After we gain more experience with Swift models, we can revisit implicit copy
+warnings and see whether the balance still appears to be good.
+
+## Issues with the heuristic
+
+The heuristic (warn for all transfers from the host to the accelerator) can
+produce false positive warnings for harmless transfers from the host to the
+accelerator. One common situation where this happens is when the host calculates
+some simple control flow conditions and sends them to the accelerator. For
+example:
+
+```swift
+public func example(steps: Int) -> Tensor<Float> {
+  var result: Tensor<Float> = Tensor(zeros: [10, 10])
+  vat step: Int = 0
+  while step < steps {
+    if step % 2 == 0 {
+      result += 1
+    } else {
+      result += 2
+    }
+    step += 1
+  }
+  return result
+}
+```
+
+If `step < steps` and `step % 2 == 0` get evaluated on the host and the boolean
+results get copied over to the accelerator, then the heuristic will warn about
+implicit copies. But these implicit copies are harmless because the host can
+quickly run through a bunch of iterations and queue up a bunch of booleans for
+the accelerator to consume.
+
+Without implementing the true round-trip-rule, we can denoise the warnings in
+that situation by suppressing warnings for scalar transfers. But the
+round-trip-rule suppresses those warnings in a cleaner and more reliable way, so
+I propose that we eventually do implement the round-trip-rule.
diff --git a/proposals/ImplicitCopyWarnings/LinearRegression-warnings.txt b/proposals/ImplicitCopyWarnings/LinearRegression-warnings.txt
@@ -0,0 +1,84 @@
+reduced.swift:14:23: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public func predict(inputs: Tensor<Float>) -> Tensor<Float> {
+                      ^~~~~~~~~~~~~~~~~~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:15:23: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
+reduced.swift:15:27: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
+reduced.swift:18:20: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public func loss(inputs: Tensor<Float>, outputs: Tensor<Float>) -> Float {
+                   ^~~~~~~~~~~~~~~~~~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:15:23: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
+reduced.swift:15:27: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
+reduced.swift:18:43: warning: 'outputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public func loss(inputs: Tensor<Float>, outputs: Tensor<Float>) -> Float {
+                                          ^~~~~~~~~~~~~~~~~~~~~~
+reduced.swift:20:25: note: value used here
+    return (predictions - outputs).squared().mean()
+            ~~~~~~~~~~~~^~~~~~~~~
+reduced.swift:23:34: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func trainStep(inputs: Tensor<Float>, outputs: Tensor<Float>,
+                                 ^~~~~~~~~~~~~~~~~~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:23:24: warning: 'self' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func trainStep(inputs: Tensor<Float>, outputs: Tensor<Float>,
+                       ^~~~~~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:23:57: warning: 'outputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func trainStep(inputs: Tensor<Float>, outputs: Tensor<Float>,
+                                                        ^~~~~~~~~~~~~~~~~~~~~~
+reduced.swift:27:30: note: value used here
+    let errors = predictions - outputs
+                 ~~~~~~~~~~~~^~~~~~~~~
+reduced.swift:35:30: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                             ^~~~~~~~~~~~~~~~~~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:35:24: warning: 'self' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                       ^~~~~
+reduced.swift:15:19: note: value used here
+    return inputs • w + b
+reduced.swift:35:53: warning: 'outputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                                                    ^~~~~~~~~~~~~~~~~~~~~~
+reduced.swift:27:30: note: value used here
+    let errors = predictions - outputs
+                 ~~~~~~~~~~~~^~~~~~~~~
+reduced.swift:31:7: warning: value implicitly copied to the host, use .toHost() to make transfer explicit
+    w -= learningRate * dw
+    ~~^~~~~~~~~~~~~~~~~~~~
+reduced.swift:35:24: note: value used here
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                  ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+reduced.swift:32:7: warning: value implicitly copied to the host, use .toHost() to make transfer explicit
+    b -= learningRate * db
+    ~~^~~~~~~~~~~~~~~~~~~~
+reduced.swift:35:24: note: value used here
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                  ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+reduced.swift:31:7: warning: value implicitly copied to the host, use .toHost() to make transfer explicit
+    w -= learningRate * dw
+    ~~^~~~~~~~~~~~~~~~~~~~
+reduced.swift:65:10: note: value used here
+  return model
+         ^~~~~
+reduced.swift:32:7: warning: value implicitly copied to the host, use .toHost() to make transfer explicit
+    b -= learningRate * db
+    ~~^~~~~~~~~~~~~~~~~~~~
+reduced.swift:65:10: note: value used here
+  return model
+         ^~~~~
+reduced.swift:15:23: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
+reduced.swift:15:27: warning: value implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+    return inputs • w + b
diff --git a/proposals/ImplicitCopyWarnings/LinearRegression.swift b/proposals/ImplicitCopyWarnings/LinearRegression.swift
@@ -0,0 +1,92 @@
+import TensorFlow
+
+public struct LinearModel {
+  var w: Tensor<Float>
+  var b: Tensor<Float>
+
+  init(inputSize: Int32) {
+    w = Tensor<Float>(randomUniform: [inputSize, 1])
+    b = Tensor<Float>(randomUniform: [1])
+  }
+}
+
+extension LinearModel {
+  public func predict(inputs: Tensor<Float>) -> Tensor<Float> {
+    return inputs • w + b
+  }
+
+  public func loss(inputs: Tensor<Float>, outputs: Tensor<Float>) -> Float {
+    let predictions = predict(inputs: inputs)
+    return (predictions - outputs).squared().mean()
+  }
+
+  public mutating func trainStep(inputs: Tensor<Float>, outputs: Tensor<Float>,
+                                 learningRate: Float) {
+    let predictions = predict(inputs: inputs)
+
+    let errors = predictions - outputs
+    let dw = (errors * inputs).sum(alongAxes: 0).transposed()
+    let db = errors.sum(squeezingAxes: 0)
+
+    w -= learningRate * dw
+    b -= learningRate * db
+  }
+
+  public mutating func train(inputs: Tensor<Float>, outputs: Tensor<Float>, learningRate: Float,
+                             steps: Int) {
+    print("Training for \(steps) steps")
+    for i in 0..<steps {
+      trainStep(inputs: inputs, outputs: outputs, learningRate: learningRate)
+      if i % (steps / 10) == 0 || i == steps - 1 {
+        print("Current model: \(self), training loss: \(loss(inputs: inputs, outputs: outputs))")
+      }
+    }
+  }
+}
+
+public func trainSampleModel() -> LinearModel {
+  // The output is the sum of the two inputs. But the inputs are noisy.
+  let inputSize: Int32 = 2
+  let trainInputs = Tensor<Float>([
+    [1, 1],
+    [-1, 1],
+    [5, 5],
+    [2, 3]
+  ]) + 0.1 * Tensor(randomNormal: [4, 2])
+  let trainOutputs = Tensor<Float>([
+    [2],
+    [0],
+    [10],
+    [5]
+  ])
+
+  var model = LinearModel(inputSize: inputSize)
+  model.train(inputs: trainInputs, outputs: trainOutputs, learningRate: 0.01, steps: 1000)
+  return model
+}
+
+public func testSampleModel(model: LinearModel) {
+  // The output is the sum of the two inputs. But the inputs are noisy.
+  let testInputs = Tensor<Float>([
+    [3, 10],
+    [-5, 6],
+    [4, 2],
+    [0, 6]
+  ]) + 0.1 * Tensor(randomNormal: [4, 2])
+  let testOutputs = Tensor<Float>([
+    [13],
+    [1],
+    [6],
+    [6]
+  ])
+
+  let loss = model.loss(inputs: testInputs, outputs: testOutputs)
+  let predictions = model.predict(inputs: testInputs)
+  print("Testing loss: \(loss)")
+  print("Testing predictions: \(predictions)")
+  print("Correct outputs: \(testOutputs)")
+}
+
+let model = trainSampleModel()
+print("")
+testSampleModel(model: model)