scalars: multiplex data fetches within a tag #4050

wchargin · 2020-08-18T17:11:21Z

Summary:
As of this patch, a tf-scalar-card can make just one network request
to fetch its data, instead of one request per time series (i.e., one
request per run, since each scalar chart renders a single tag). This
reduces network overhead, improves throughput due to higher request
concurrency, and offers the opportunity for data providers to more
efficiently request the data in batch.

This is implemented with a new POST route /scalars_multirun, since the
list of run names may be long. The frontend is configured to batch
requests to at most 64 runs at once, so the multiplexing is bounded.

This only affects the scalars plugin. Other scalar chart sources, like
PR curves, custom scalars, and the hparams metrics views, are not
affected.

Supersedes #3835, with the same idea and same general backend approach,
but using the frontend APIs enabled by #4045.

Test Plan:
On the hparams demo with four charts showing, each fetching 50 runs, we
now make only four requests as opposed to 200. On a Google-internal
networked data provider, this improves end-to-end time (measured from
“first spinner appears” to “last spinner disappears”) by about 50%, from
22±1 seconds to 11±1 seconds. (Before this patch, the network requests
were being queued all the way to the end of the test period.)

Changing the batch size to 4 and then running on a dataset with 14 runs
shows that the requests are properly batched, including the last one
with just 2 runs.

Testing hparams, custom scalars, and PR curves shows that they continue
to work, even when multiple time series are requested.

wchargin-branch: scalars-mux-runs

Summary: The data loader behavior maintains a fine-grained cache of key-value pairs. This is useful because when the set of requested keys expands, only the new keys need to be fetched. But, until now, the behavior has been hard-coded to fire a separate request for each key-value pair. Clients can have either fine-grained cache invalidation or efficient batched requests, but not both at the same time. This patch enriches the behavior to support just that. In a future change, the scalars dashboard will take advantage of this to batch requests for multiple runs and a single tag. In doing so, we need to shuffle around the API a bit. Instead of asking clients to provide `getDataLoadUrl: (Item) => string` plus a separate function `requestData: (url: string) => Promise<Data>` (where “`Item`” and “`Data`” are the keys and values, respectively), clients now provide a single function `requestData` that takes raw `Item`s (now plural), performs the request(s), and returns the data. The function provides a stream of key-value pairs, which is represented in callback style for convenience. (We don’t want to drag Observable into this.) The purpose of this approach, as opposed to a perhaps more natural approach that simply adapts `getDataLoadUrl` to return some kind of request struct with a callback to map a response into key-value pairs, is to accommodate the variety of existing clients. The structures get pretty wild: e.g., `tf-line-chart-data-loader` mixes in the behavior but doesn’t actually provide the needed properties; they’re provided instead by `tf-scalar-card`, but then `tf-hparams-session-group-details` further overrides some of those properties of `tf-scalar-card` with an entirely different backend. It’s a bit wordier for clients, but at least there are fewer moving pieces to keep track of. Test Plan: The scalars, custom scalars, distributions, histograms, and hparams dashboards all work. The fine-grained invalidation on the scalars dashboard works: e.g., set the tag filter to `mnist` and then to `mnist|hparams`, and watch only the hparams demo data load; then, set it to `hparams` and watch the MNIST charts disappear without any repaints to the hparams demo charts. The post-load callback properly causes scalar charts’ domains to adjust. The refresh button in the dashboard UI properly invalidates and re-fetches data. (Make sure to run with `TB_POLYMER3=1` until that’s the default.) wchargin-branch: dlb-batch-finegrained wchargin-source: 14ec8abde36c4563a4922209d361fc5bd16b6061

Summary: DO NOT SUBMIT until corresponding internal change is submitted. Test Plan: Manually tested; to be documented. wchargin-branch: scalars-mux-runs wchargin-source: 9459c8cc6bc0b041016dfc689306639a04080217

wchargin · 2020-08-19T04:07:54Z

DO NOT SUBMIT until corresponding internal change is submitted.

No internal change actually needed. (The relevant internal ML dashboard
is isolated from these changes due to the Polymer 3 migration.) I still
do want to better document and profile this, though, so leaving as draft
for the moment.

wchargin · 2020-08-19T04:28:02Z

With a networked data provider (Google-internally) on a dataset with
four scalar charts each with 50 runs, triggering load by using “Toggle
All Runs” to turn all runs from off to on (on clean load cache),
timing from “first spinner appears” to “last spinner disappears”,
I measure an improvement from 22±1 seconds to 11±1 seconds: about 50%.

wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a # Conflicts: # tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts

wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a

wchargin-branch: scalars-mux-runs wchargin-source: 0484e726315d46c282c3d61994ffb9e4dc07c20b

wchargin-branch: scalars-mux-runs wchargin-source: f183c2962c3d464b6ac903b63d833f0c926fe162

stephanwlee · 2020-08-19T20:53:02Z

tensorboard/plugins/scalar/http_api.md

+Accepts form-encoded POST data with a (required) singleton key `tag` and a
+repeated key `runs`. Returns a JSON object mapping run names to arrays of the


What is the contract if I violate this constraint? (0 or 2+ tag)? Unlike the zero length runs case, it looks like we are throwing 400.

Also, do you think it is a good idea to perhaps write an example of the request?

Zero-length runs is fine; it just returns an empty object. Missing
tag is a 400. Duplicate tags will silently pick one of them
(probably last?). I can make that an error, though; probably a good
idea. Thanks.

Also, do you think it is a good idea to perhaps write an example of
the request?

Yep. Done.

Ah, I expected to see FormData but this works too :)

stephanwlee · 2020-08-20T19:49:26Z

tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts

+        return this.requestManager.request(url, {tag, runs}).then((allData) => {
+          for (const run of runs) {
+            const item = {tag, run};
+            if (Object.prototype.hasOwnProperty.call(allData, run)) {


Any reason why we have to use Object.prototype.hasOwnProperty? (i.e., why not allData.hasOwnProperty?)

If you have a run called hasOwnProperty, then that would be a type
error.

> ({train: 1}).hasOwnProperty("train") true > ({train: 1, hasOwnProperty: 2}).hasOwnProperty("train") Thrown: TypeError: {(intermediate value)(intermediate value)}.hasOwnProperty is not a function > Object.prototype.hasOwnProperty.call({train: 1, hasOwnProperty: 2}, "train") true

Yeah.

I prefer to write this kind of code defensively—even if it seems
unlikely, it sometimes happens (e.g.: #1283).

One day https://github.com/hasOwnProperty will show up in a JSON
object mapping GitHub login names to database IDs, and someone will have
a fun time…

Got it. Thanks.

wchargin-branch: scalars-mux-runs wchargin-source: a4ca83ea5bf6b665052c076df835592105e5eefb

wchargin

thanks!

wchargin · 2020-08-20T20:40:33Z

tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts

+        return this.requestManager.request(url, {tag, runs}).then((allData) => {
+          for (const run of runs) {
+            const item = {tag, run};
+            if (Object.prototype.hasOwnProperty.call(allData, run)) {


If you have a run called hasOwnProperty, then that would be a type
error.

> ({train: 1}).hasOwnProperty("train") true > ({train: 1, hasOwnProperty: 2}).hasOwnProperty("train") Thrown: TypeError: {(intermediate value)(intermediate value)}.hasOwnProperty is not a function > Object.prototype.hasOwnProperty.call({train: 1, hasOwnProperty: 2}, "train") true

Yeah.

I prefer to write this kind of code defensively—even if it seems
unlikely, it sometimes happens (e.g.: #1283).

wchargin · 2020-08-20T20:58:47Z

tensorboard/plugins/scalar/http_api.md

+Accepts form-encoded POST data with a (required) singleton key `tag` and a
+repeated key `runs`. Returns a JSON object mapping run names to arrays of the


Zero-length runs is fine; it just returns an empty object. Missing
tag is a 400. Duplicate tags will silently pick one of them
(probably last?). I can make that an error, though; probably a good
idea. Thanks.

Also, do you think it is a good idea to perhaps write an example of
the request?

Yep. Done.

stephanwlee · 2020-08-20T21:17:55Z

tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts

+        return this.requestManager.request(url, {tag, runs}).then((allData) => {
+          for (const run of runs) {
+            const item = {tag, run};
+            if (Object.prototype.hasOwnProperty.call(allData, run)) {


Got it. Thanks.

stephanwlee · 2020-08-20T21:18:46Z

tensorboard/plugins/scalar/http_api.md

+Accepts form-encoded POST data with a (required) singleton key `tag` and a
+repeated key `runs`. Returns a JSON object mapping run names to arrays of the


Ah, I expected to see FormData but this works too :)

Summary: In #4050, we added multiplexing to scalar chart fetches for performance. This works nicely in local TensorBoard and public Colab, but the POST request machinery isn’t yet supported in the Google-internal version of Colab, so #4050 caused a regression there. This patch conditionally rolls back the change for Colab only. This includes rolling it back for public Colab, where it worked fine. We hope that this isn’t too big of a problem, since we expect that most Colab users are inspecting datasets with few runs (generated from within Colab) rather than massive hyperparameter sweeps. We also hope that we can simply revert this patch once the Google-internal behavior is fixed. Jupyter environments are unaffected (and still work). We used to have a global `window.TENSORBOARD_ENV` that listed whether we were in Colab, but we removed that in #2798 because we thought that it was no longer necessary. Since in that PR we switched to create an iframe rather than manually linking and loading the HTML (much nicer, to be sure), we now plumb this information via a query parameter. This also has the advantage that it’s easy to test the Colab codepaths by simply adding that query parameter to a normal TensorBoard instance. Test Plan: First, test that normal TensorBoard (`bazel run //tensorboard`) still works with multiplexing, and that adding `?tensorboardColab=true` causes it to send multiple GET requests instead. Then, build the Pip package (`bazel run //tensorboard/pip_package:extract_pip_packages`), upload it to public Colab, and install it into the runtime. Verify that the scalar charts still work there, albeit without multiplexing. Finally, cherry-pick these changes into google3 via a test sync and follow the test plan at <http://cl/333398676> to verify the fix. wchargin-branch: scalars-nomux-colab wchargin-source: 453503f1dea196985e889be483c2a7675cc87aa1

Summary: In #4050, we added multiplexing to scalar chart fetches for performance. This works nicely in local TensorBoard and public Colab, but the POST request machinery isn’t yet supported in the Google-internal version of Colab, so #4050 caused a regression there. This patch conditionally rolls back the change for Colab only. This includes rolling it back for public Colab, where it worked fine. We hope that this isn’t too big of a problem, since we expect that most Colab users are inspecting datasets with few runs (generated from within Colab) rather than massive hyperparameter sweeps. We also hope that we can simply revert this patch once the Google-internal behavior is fixed. Jupyter environments are unaffected (and still work). We used to have a global `window.TENSORBOARD_ENV` that listed whether we were in Colab, but we removed that in #2798 because we thought that it was no longer necessary. Since in that PR we switched to create an iframe rather than manually linking and loading the HTML (much nicer, to be sure), we now plumb this information via a query parameter. This also has the advantage that it’s easy to test the Colab codepaths by simply adding that query parameter to a normal TensorBoard instance. Fixes #4174. Test Plan: First, test that normal TensorBoard (`bazel run //tensorboard`) still works with multiplexing, and that adding `?tensorboardColab=true` causes it to send multiple GET requests instead. Then, build the Pip package (`bazel run //tensorboard/pip_package:extract_pip_packages`), upload it to public Colab, and install it into the runtime. Verify that the scalar charts still work there, albeit without multiplexing. Finally, cherry-pick these changes into google3 via a test sync and follow the test plan at <http://cl/333398676> to verify the fix. wchargin-branch: scalars-nomux-colab

wchargin added 2 commits August 18, 2020 02:47

scalars: multiplex data fetches within a tag

beff777

Summary: DO NOT SUBMIT until corresponding internal change is submitted. Test Plan: Manually tested; to be documented. wchargin-branch: scalars-mux-runs wchargin-source: 9459c8cc6bc0b041016dfc689306639a04080217

wchargin added plugin:scalars theme:performance Performance, scalability, large data sizes, slowness, etc. labels Aug 18, 2020

googlebot added the cla: yes label Aug 18, 2020

wchargin mentioned this pull request Aug 18, 2020

data-loader-behavior: add fine-grained batching support #4045

Merged

Base automatically changed from wchargin-dlb-batch-finegrained to master August 19, 2020 17:18

wchargin added 3 commits August 19, 2020 10:18

[update diffbase]

f4ddbf7

wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a # Conflicts: # tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts

[update patch]

50c7112

wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a

[update patch]

a80c564

wchargin-branch: scalars-mux-runs wchargin-source: 0484e726315d46c282c3d61994ffb9e4dc07c20b

wchargin closed this Aug 19, 2020

wchargin reopened this Aug 19, 2020

[bump ci]

aaa17f9

wchargin-branch: scalars-mux-runs wchargin-source: f183c2962c3d464b6ac903b63d833f0c926fe162

wchargin closed this Aug 19, 2020

wchargin reopened this Aug 19, 2020

wchargin requested a review from stephanwlee August 19, 2020 20:32

stephanwlee approved these changes Aug 20, 2020

View reviewed changes

[update patch]

534702f

wchargin-branch: scalars-mux-runs wchargin-source: a4ca83ea5bf6b665052c076df835592105e5eefb

wchargin commented Aug 20, 2020

View reviewed changes

stephanwlee approved these changes Aug 20, 2020

View reviewed changes

wchargin merged commit 0bf9c12 into master Aug 20, 2020

wchargin deleted the wchargin-scalars-mux-runs branch August 20, 2020 22:05

wchargin mentioned this pull request Aug 24, 2020

Scalars: Multiplex fetch (one tag, many runs). #3835

Closed

wchargin mentioned this pull request Sep 23, 2020

scalars: don’t multiplex fetches in Colab #4191

Merged

cliffwoolley mentioned this pull request Feb 13, 2021

TB2.4+ POST to scalar_multirun omits XSRF header, blocking use of TensorBoard under JupyterLab #4685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalars: multiplex data fetches within a tag #4050

scalars: multiplex data fetches within a tag #4050

wchargin commented Aug 18, 2020 •

edited

wchargin commented Aug 19, 2020

wchargin commented Aug 19, 2020

stephanwlee Aug 19, 2020

wchargin Aug 20, 2020

stephanwlee Aug 20, 2020

stephanwlee Aug 20, 2020

wchargin Aug 20, 2020

wchargin Aug 20, 2020

stephanwlee Aug 20, 2020

wchargin left a comment

wchargin Aug 20, 2020

wchargin Aug 20, 2020

stephanwlee Aug 20, 2020

stephanwlee Aug 20, 2020

		Accepts form-encoded POST data with a (required) singleton key `tag` and a
		repeated key `runs`. Returns a JSON object mapping run names to arrays of the

scalars: multiplex data fetches within a tag #4050

scalars: multiplex data fetches within a tag #4050

Conversation

wchargin commented Aug 18, 2020 • edited

wchargin commented Aug 19, 2020

wchargin commented Aug 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin commented Aug 18, 2020 •

edited