Skip to content

Conversation

@stephanwlee
Copy link
Contributor

This PR is created to see and diagnose potential CI quirks.

This change has following commits:

version bump to 2.5.0.
cherrypick of #4851.
Merge: rebase (not squash).

wchargin and others added 2 commits April 9, 2021 10:21
Summary:
Issue tensorflow#4844 shows that there are circumstances when communication
between TensorBoard and the local data server process cannot be
established. As of this patch, we send a trivial RPC to the data server
and wait for its response before committing to use the new loading
paths. This adds about 1–5ms total time to the happy path on my machine,
with a worst case penalty of 5s due to the timeout. If the server is not
reachable, we print a warning and fall back to the legacy paths.

Test Plan:

  - Test with normal working TensorBoard.

  - Simulate failed data server connection by changing the definition of
    `addr` to `"localhost:%d" % (port + 777)` on line 175. This should
    print the `UNAVAILABLE`/“failed to connect” message from tensorflow#4844.

  - Simulate slow data server by adding the following to `cli.rs`:

    ```rust
    tokio::time::sleep(Duration::from_secs(3)).await;
    ```

    Add this right before the `Server::builder().(...).await?` call at
    the end of `main`: i.e., after we write the port file, but before we
    actually respond to requests. Note that TensorBoard still works with
    a 3-second delay, but that it actually delays printing the startup
    message for those 3 seconds as it determines which data provider to
    use.

  - Simulate extra-slow data provider as above but waiting 6 seconds,
    and note that TensorBoard prints a `DEADLINE_EXCEEDED` error, falls
    back to the legacy paths, and shows valid data.

wchargin-branch: data-liveness-check
wchargin-source: 7fd916ebcf426d92babaeb57c38789d47446df6e
@google-cla
Copy link

google-cla bot commented Apr 9, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Apr 9, 2021
@google-cla
Copy link

google-cla bot commented Apr 9, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@wchargin
Copy link
Contributor

wchargin commented Apr 9, 2021

@googlebot I consent.

@google-cla google-cla bot added cla: yes and removed cla: no labels Apr 9, 2021
@stephanwlee stephanwlee marked this pull request as ready for review April 9, 2021 20:11
@stephanwlee stephanwlee requested a review from bmd3k April 13, 2021 17:48
@stephanwlee stephanwlee merged commit b186ff1 into tensorflow:2.5 Apr 15, 2021
@stephanwlee stephanwlee deleted the 2.5 branch April 15, 2021 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants