Skip to content

Conversation

@stephanwlee
Copy link
Contributor

This change has following commits:

Merge: rebase (not squash).

@google-cla
Copy link

google-cla bot commented Apr 9, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@stephanwlee
Copy link
Contributor Author

RE CLA: commits are made by Googlers.

wchargin and others added 2 commits April 9, 2021 10:21
Summary:
Issue tensorflow#4844 shows that there are circumstances when communication
between TensorBoard and the local data server process cannot be
established. As of this patch, we send a trivial RPC to the data server
and wait for its response before committing to use the new loading
paths. This adds about 1–5ms total time to the happy path on my machine,
with a worst case penalty of 5s due to the timeout. If the server is not
reachable, we print a warning and fall back to the legacy paths.

Test Plan:

  - Test with normal working TensorBoard.

  - Simulate failed data server connection by changing the definition of
    `addr` to `"localhost:%d" % (port + 777)` on line 175. This should
    print the `UNAVAILABLE`/“failed to connect” message from tensorflow#4844.

  - Simulate slow data server by adding the following to `cli.rs`:

    ```rust
    tokio::time::sleep(Duration::from_secs(3)).await;
    ```

    Add this right before the `Server::builder().(...).await?` call at
    the end of `main`: i.e., after we write the port file, but before we
    actually respond to requests. Note that TensorBoard still works with
    a 3-second delay, but that it actually delays printing the startup
    message for those 3 seconds as it determines which data provider to
    use.

  - Simulate extra-slow data provider as above but waiting 6 seconds,
    and note that TensorBoard prints a `DEADLINE_EXCEEDED` error, falls
    back to the legacy paths, and shows valid data.

wchargin-branch: data-liveness-check
wchargin-source: 7fd916ebcf426d92babaeb57c38789d47446df6e
@google-cla
Copy link

google-cla bot commented Apr 9, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no and removed cla: yes labels Apr 9, 2021
@stephanwlee stephanwlee closed this Apr 9, 2021
@stephanwlee stephanwlee deleted the 2.5 branch April 9, 2021 19:05
@stephanwlee stephanwlee restored the 2.5 branch April 9, 2021 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants