Implicit Copy Warning Improvements #55

marcrasi · 2018-08-12T23:33:10Z

Here's a proposal on how to clean up implicit copy warnings, based on some discussions with @mhong.

mhong

Thanks for the great write-up! Left some comments/questions.

mhong · 2018-08-12T23:55:34Z

proposals/ImplicitCopyWarnings.md

+   training data being copied to the GPU) or ending (e.g. final weights being
+   copied to the CPU).
+2. Within the program, only warn when data makes a round trip between devices:
+   when a piece of data moves from device A to device B, then a computation


does the notion of a "device" include swift host? usually device refers only to a TF device, like TF GPU.

Also is the description here overly general -- we are not talking about warnings over data transfers between TF CPU and GPU, correct?

Oh yes, I meant to just talk about host<->accelerator transfers. I will reword this to make it more clear.

mhong · 2018-08-13T00:04:50Z

proposals/ImplicitCopyWarnings.md

+Since the round-trip-rule might be hard to implement, I also propose that we
+initially implement a simple heuristic that approximates the round-trip-rule:
+Warn for all transfers from the host to the accelerator, but do not warn for any
+transfers from the accelerator to the host.


arguably warning only on swift->tf transfers can stand on its own right, rather than serving as an approximation due to impl-level convenience. e.g. say swift is running along side TF, and will send various tensors from swift to TF when these tensors are produced -- even without round-trips, such sends can slow down TF, if swift is being slow in producing these tensors.

I see this as similar to the tf->swift transfer case -- there are a lot of situations where the user does expect the transfers to happen and where the transfers don't slow anything down. For example, scalar transfers for control flow. Or some kind of CPU loading & preprocessing step that queues up a bunch of training data for the accelerator.

Thinking through scalars is what made me come up with the "round trip" idea. We can suppress all scalar warnings, like we do now, but this creates a possibly-bad situation in combination with suppressing tf->swift transfers: what if your program does a really slow tf->swift transfer and then sends it back as a scalar? Something like this:

for step in 0...1000 { let gradients = ... // on accelerator weights += gradients // on accelerator if cpuOnlyFunc(weights) == 0 { // long round-trip transfer that blocks the computation weights += 1 } }

If we hide all warnings for scalar transfers and we hide all warnings for tf->swift transfers, then that compiles without warnings. A "round trip" rule would catch it though.

If a tf-swift transfer is really slow, it might stall TF compute (due to limited TF RAM buffering) already, without involving a round-trip. :)

Overall what compiler can warn based on static analysis has limited accuracy, so we might as well go with a fairly simple (explainable / predictible to end users) model. Round-trip involves somewhat more mental calculation than saying "we warn on all host->tf transfers, other than scalar ones, but that's still just an approximation". :)

But my position is based on some speculation and is just my opinion. I'll let you decide here.

mhong · 2018-08-13T00:29:46Z

proposals/ImplicitCopyWarnings/LinearRegression-warnings.txt

@@ -0,0 +1,84 @@
+reduced.swift:14:23: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
+  public func predict(inputs: Tensor<Float>) -> Tensor<Float> {


hmm i don't understand why such warnings appear in the first place -- there should be no sends/recvs here.
maybe this is due to the use of mutating?

This is because predict(inputs:) is being partitioned as a deabstraction scope. It's a public function.

Before we have a better inlining model, should we recommend that all public APIs be marked as "inline always"? Otherwise if user code calls a sequence of public APIs, that could generate a lot of sends/recvs.

public APIs be marked as "inline always"?

I think you meant @inlinable. @inline(__always) is just local inlining and won't make the SIL function serialized.

I would not say requiring @inlinable is a good idea, because

Requiring excessive annotation from the user breaks the mental model of "tensor code just works".

It would delay shape checking for public APIs if they are not called in the current module. If any shape error occurs in the caller module, the error location would be unknown.

Partitioning in the first place guarantees the function will at least run under all circumstances, instead of printing "__tfop_xxx cannot be lowered to LLVM IR!!" at run-time. For example, if you use an opaque library function like expectEqual that makes a call to a tensor function through protocol conformance, the program fails because the function was never partitioned. This is unavoidable for generic functions because generic functions cannot be partitioned and must be @inlinable, but non-generic functions can at least be partitioned and have a binary representation to execute reliably no matter how it's being called.

I think this discussion is mostly orthogonal to the warnings proposal. Regardless of what causes such a function to get partitioned on its own, we need to make the warnings good when it does get partitioned.

Yep, it is orthogonal! But at the same time I hope this is providing some context on how argument send/receive warnings are emitted in the first place.

One middle ground I can think of is to have the compiler still partition inlinable functions outside the TF module. This would resolve both Mingsheng’s concerns and mine.

Richard's proposal SGTM.

I agree with Mark that what's partitionable is itself orthogonal to the warnings proposal. One related aspect is: when we are partitioning a public API, but if it does not get called in user code, it'd be nice not to generate sends/recvs warnings for it?

mhong · 2018-08-13T00:34:20Z

proposals/ImplicitCopyWarnings.md

+warnings).
+
+I propose that we clean up the warnings as follows:
+1. Emit no warnings for data transferred while the program is starting (e.g.


agree #1 is a good idea, but how do we implement it (how do we tell whether a swift->tf data transfer happens before the TF graph runs)?

Currently our use of "input args" and "result tensors" of the TF program serve the purpose of differentiating themselves from the sends/recvs, and thus generate no warning.

If we convert the input args and result tensors into tensor sends/recvs, i'm not sure how to reliably tell them apart from those other sends/revs that are happening while the tf program is running.

Also, if swift is being too slow sending the first tensor to tf, it'd still increase the end-to-end running time, and it might be useful to warn in that case (if possible).

I wanted to capture the current use of "input args" and "result tensors" in this rule, because it seems like part of the big picture.

Also it seems like they currently might be bugged in some way because a lot of the warnings in the example look like they should be input args or result tensors. So a big part of cleaning up the warnings might be debugging those.

Are we planning to eventually remove input args and result tensors and replace them with send/recv? If we do that, could we keep the logic that currently creates input args and result tensors, and use it to decide whether to warn?

Also it seems like they currently might be bugged in some way because a lot of the warnings in the example look like they should be input args or result tensors. So a big part of cleaning up the warnings might be debugging those.

SGTM.

Are we planning to eventually remove input args and result tensors and replace them with send/recv? If we do that, could we keep the logic that currently creates input args and result tensors, and use it to decide whether to warn?

I think that direction is worth exploration, but it's too early to commit to that. If we do go there, I agree that should probably not affect our warnings policy.

mhong · 2018-08-13T00:39:51Z

proposals/ImplicitCopyWarnings.md

+the accelerator to consume.
+
+Without implementing the true round-trip-rule, we can denoise the warnings in
+that situation by suppressing warnings for scalar transfers. But the


and we already implemented the disabling off warnings on scalar values -- should we still include that in our policy design here (instead of forcing us to choose between it and the new round-trip-rule)?

I think that this proposal interacts pretty interestingly with disabled scalar warnings, because of the example in my comment above with the slow scalar transfer.

mhong · 2018-08-13T00:42:45Z

proposals/ImplicitCopyWarnings.md

+Without implementing the true round-trip-rule, we can denoise the warnings in
+that situation by suppressing warnings for scalar transfers. But the
+round-trip-rule suppresses those warnings in a cleaner and more reliable way, so
+I propose that we eventually do implement the round-trip-rule.


another possible future direction that you mentioned in our earlier discussion is:
Move sends/recvs diagnostics to a separate tool, possibly using runtime info (e.g. profile data).

Do you want to also briefly cover it in this doc, to establish the "space" of our design exploration?

mhong · 2018-08-14T02:39:04Z

Sorry about the delayed reply. The doc is good to land from my perspective!

rxwei

Great doc!

rxwei · 2018-08-17T21:05:28Z

@marcrasi Feel free to land this as is and await further feedback!

eaplatanios · 2018-08-19T00:33:58Z

Thanks for putting together this document! I understand how such warnings can be annoying but I am leaning more towards letting the user know when data is moved without them explicitly specifying it. I also see how currently Swift for TF is a bit loose about devices; there is a notion of a single accelerator and host, but in many applications we use multiple GPUs, for example. Allowing for customized device placement is in the plan I imagine, and so assuming that eventually we will be able to place ops on devices as we choose, I believe that those warnings will be very useful. Currently, I feel I often lose control of when TensorFlow (using the Python API) will move stuff to the CPU when allowing soft placement and also when using "fake" int32 kernels. Therefore, I believe that the warnings are very useful even though they may be a bit verbose. And there is also a way to prevent them which is good.

One option may be to add such heuristics to prevent some of the warnings, but be able to not use them through say a compiler flag that controls verbosity for device placement warnings.

marcrasi · 2019-01-18T01:04:01Z

Things have changed a lot since I wrote this. I think it's mostly obsolete, so I'll close it. Thanks for the comments everyone!

* Add information about where docs go * add samples too

Marc Rasi added 6 commits August 10, 2018 19:19

implicit copy warnings

3b5f5c0

smallify

3ea901b

smallify more

3207a5a

elaborate

9fe4b40

fill in final bits

2ffcc89

final revisions

599e933

marcrasi requested review from rxwei, mhong and lattner August 12, 2018 23:33

mhong reviewed Aug 13, 2018

View reviewed changes

mhong approved these changes Aug 14, 2018

View reviewed changes

rxwei approved these changes Aug 15, 2018

View reviewed changes

marcrasi closed this Jan 18, 2019

marcrasi deleted the implicit-copy-warnings branch January 18, 2019 01:04

brettkoonce pushed a commit to brettkoonce/swift that referenced this pull request Mar 14, 2019

Add information about where docs go (tensorflow#55)

dba77c7

* Add information about where docs go * add samples too

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implicit Copy Warning Improvements #55

Implicit Copy Warning Improvements #55

marcrasi commented Aug 12, 2018

mhong left a comment

mhong Aug 12, 2018

marcrasi Aug 13, 2018

mhong Aug 13, 2018

marcrasi Aug 13, 2018

mhong Aug 14, 2018 •

edited

Loading

mhong Aug 13, 2018

rxwei Aug 13, 2018

mhong Aug 13, 2018

rxwei Aug 13, 2018

marcrasi Aug 13, 2018

rxwei Aug 13, 2018 •

edited

Loading

mhong Aug 14, 2018

mhong Aug 13, 2018

marcrasi Aug 13, 2018

mhong Aug 14, 2018

mhong Aug 13, 2018

marcrasi Aug 13, 2018

mhong Aug 13, 2018

marcrasi Aug 13, 2018

mhong commented Aug 14, 2018

rxwei left a comment

rxwei commented Aug 17, 2018

eaplatanios commented Aug 19, 2018

marcrasi commented Jan 18, 2019

		@@ -0,0 +1,84 @@
		reduced.swift:14:23: warning: 'inputs' implicitly copied to the accelerator, use .toAccelerator() to make transfer explicit
		public func predict(inputs: Tensor<Float>) -> Tensor<Float> {

Implicit Copy Warning Improvements #55

Implicit Copy Warning Improvements #55

Conversation

marcrasi commented Aug 12, 2018

mhong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhong Aug 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxwei Aug 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhong commented Aug 14, 2018

rxwei left a comment

Choose a reason for hiding this comment

rxwei commented Aug 17, 2018

eaplatanios commented Aug 19, 2018

marcrasi commented Jan 18, 2019

mhong Aug 14, 2018 •

edited

Loading

rxwei Aug 13, 2018 •

edited

Loading