Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm that the connection to tensorboard works or change to localhost #2371

Merged
merged 7 commits into from
Mar 13, 2020

Conversation

emonigma
Copy link
Contributor

@emonigma emonigma commented Jun 20, 2019

  • Motivation for features / changes

The TensorBoard log told me to go to http://laptop-name.corp.domain.tld:6006 and that connection failed, but http://localhost:6006 worked.

  • Technical description of changes

I tested the connection to the host with socket(), and if it is closed, the host becomes localhost.

  • Screenshots of UI changes

No UI changes.

  • Detailed steps to verify changes work correctly (as executed by you)

I ran export PYTHONPATH="$PYTHONPATH:$HOME/code/tensorflow:$HOME/code:tensorboard"; python3 tensorboard/program.py, which threw this unrelated error (and both repos are in sync with upstream):

  File "/Users/mmorin/code/tensorboard/tensorboard/data_compat.py", line 24, in <module>
    from tensorboard.compat.proto import summary_pb2
ImportError: cannot import name 'summary_pb2'
  • Alternate designs / implementations considered

None.

Copy link
Collaborator

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on :)

# Confirm that the connection is open, otherwise change to `localhost`
socket.setdefaulttimeout(1)
try:
socket.socket().connect((display_host, self.server_port))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use socket.create_connection() instead - it's a higher level API that should handle, for example, IPv4 and IPv6 (rather than picking just one, at which point if the hostname is only available on the other protocol it will incorrectly fall back to localhost).

Also, I think it only makes sense to do this fallback when the user didn't specify an explicit host via the --host flag, since if --host was passed we should use it without modification. That would mean moving this fallback code up into the first case of the if statement above, where we do socket.gethostname(). And actually it's probably better to replace socket.gethostname() with socket.getfqdn() so we try to resolve the hostname to a reachable domain name first if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and addressed this in 88e25, 6a5fc8, and 6c796 respectively.

@@ -526,6 +526,14 @@ def get_url(self):
host = self._flags.host
display_host = (
'[%s]' % host if ':' in host and not host.startswith('[') else host)

# Confirm that the connection is open, otherwise change to `localhost`
socket.setdefaulttimeout(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than changing the default for the entire socket module, let's use .settimeout() on the individual socket object - or better yet, pass it as an argument into create_connection() per the comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, fixed in 1ae50.

socket.setdefaulttimeout(1)
try:
socket.socket().connect((display_host, self.server_port))
except socket.timeout as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's catch socket.error generally so that if there is some other error attempting the connection (not a timeout), we fall back to localhost in that case too. For example, it looks like if the DNS name doesn't resolve at all, this line fails with socket.gaierror rather than socket.timeout:

>>> socket.socket().connect(("foobartensorboard.com", 6006))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well spotted, fixed in 71307.


# Confirm that the connection is open, otherwise change to `localhost`
socket.setdefaulttimeout(1)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that creating the URL involves an actual network request, let's avoid doing that on each call to get_url() and just cache the constructed URL in a property and return that if it's called another time.

Copy link
Contributor Author

@emonigma emonigma Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I proposed a solution in bf05d. Another solution could be to move the code into __init__() of class WerkzeugServer and change get_url() to only the last two lines of the current function (return 'http:// ... .rstrip('/'))).. Also, why do we have the property _auto_wildcard if it's simply not host?

Copy link
Collaborator

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience, I was on vacation for a while.

FYI, we're planning on soon changing the default behavior of TensorBoard to only serve on localhost, in which case the URL printed out would use "localhost" always in that case. (Basically, it'll be like passing --host=localhost by default.) So after that change, the logic in this PR would only affect cases where users specifically opt-in to the wildcard behavior.

else:
host = self._flags.host
display_host = (
if not self.display_host:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the variable self._url instead and just store the full URL.

It would probably read better if we factor out something like get_url_host() e.g.

def get_url_host(self):
  if not self._auto_wildcard:
    return self._flags.host
  host = socket.getfqdn()
  try:
    socket.create_connection((host, self.server_port), timeout=1)
    return host
  except socket.error as e:
    return 'localhost'

def get_url(self):
  if not self._url:
    host = self.get_url_host()
    display_host = (
        '[%s]' % host if ':' in host and not host.startswith('[') else host)
    self._url = 'http://%s:%d%s/' % (self.display_host, self.server_port,
        self._flags.path_prefix.rstrip('/'))
  return self._url

if not self.display_host:

if self._auto_wildcard:
display_host = socket.getfqdn()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "host" is a better name here since this is still a real hostname; the name "display_host" was meant to be only for the host after adding brackets around an IPv6 address so it renders correctly when printed out in the URL. See also suggestion above for a get_url_host() helper.

@emonigma
Copy link
Contributor Author

emonigma commented Dec 7, 2019

@nfelt Thanks for your patience too, I was overwhelmed for a while. Given the move of tensorboard and the conflicts with the base branch, does this PR still make sense, or shall I open a new one from the new codebase?

@nfelt
Copy link
Collaborator

nfelt commented Dec 12, 2019

@miguelmorin Thanks for following up - I might be misunderstanding but I don't think anything actually moved? There was a behavior change in #2589 though that as mentioned in my last comment makes it so that now by default the URL would use localhost anyway in all cases (unless passing --bind_all or an explicit hostname).

This PR would still affect the default value shown when using --bind_all but it's more of an edge case now. If you're still interested in landing this and want to update the PR I think we're still open to it, but also fine if you'd prefer to drop it.

@emonigma
Copy link
Contributor Author

I misnamed the change of default behavior as a "move". Yes, I still want to update the PR for this edge case. What is simpler for you: that I start a new branch from the latest code or that I rebase master onto this branch?

Iin #2589 you mention "wildcard dual-binding behavior": is that when TensorBoard "defaults to serving on the entire local network"?

@nfelt
Copy link
Collaborator

nfelt commented Jan 2, 2020

What is simpler for you: that I start a new branch from the latest code or that I rebase master onto this branch?

@miguelmorin the easiest option is probably if you can rebase your local changes onto latest master and then force-push that branch to minguelmorin:localhost to update this PR. Or if you prefer you can close this and open a new PR, either way.

Iin #2589 you mention "wildcard dual-binding behavior": is that when TensorBoard "defaults to serving on the entire local network"?

Yes, this was the default behavior prior to 2.0.

@emonigma
Copy link
Contributor Author

Hi @nfelt, I've finally made time to finish this. The rebasing was straightforward and I force-pushed the branch. Your requested changes are still marked as unsolved and those commits are now gone. How would you like to proceed?

Copy link
Collaborator

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resuming this! Mostly looks good, just a couple last comments.

@@ -597,6 +597,7 @@ def __init__(self, wsgi_app, flags):
host = "localhost"

self._host = host
self.display_host = None # Will be set by get_url() below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just cache the entire URL for this, so self._url.

That way we can skip even the last formatting step in get_url() if we already have generated the URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I cached the entire URL in 3f99276.

if not self.display_host:
if self._auto_wildcard:
self.display_host = socket.getfqdn()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we leave out the blank lines inside the function, since it's not that long?

If we want separation for clarity I'd recommend just factoring out a private helper function as described in the previous comment.

Copy link
Contributor Author

@emonigma emonigma Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just removing line 734, which I did in f3bbb87? If not, I don't see you mean by "as described in the previous comment". And what do you mean by "nit: "?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I had in mind both blank lines (so also 742), but it's fine.

The second part of the comment was an alternative suggestion, that for the purpose of clarifying the sub-parts of the function, instead of setting them off with blank lines, to just refactor it into smaller functions. By "previous comment" I meant from earlier in the review thread (#2371 (comment)).

Anyway, sorry that was a bit confusing, this is fine as is.

if not self.display_host:
if self._auto_wildcard:
self.display_host = socket.getfqdn()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I had in mind both blank lines (so also 742), but it's fine.

The second part of the comment was an alternative suggestion, that for the purpose of clarifying the sub-parts of the function, instead of setting them off with blank lines, to just refactor it into smaller functions. By "previous comment" I meant from earlier in the review thread (#2371 (comment)).

Anyway, sorry that was a bit confusing, this is fine as is.

@emonigma
Copy link
Contributor Author

Sorry, I missed that second part of the 18 July comment. My changes already seem outdated, do you want me pull the master branch and do a manual rebase of my changes?

@nfelt
Copy link
Collaborator

nfelt commented Mar 13, 2020

I don't see any conflicts with the base branch? I can go ahead and merge now unless there were further changes you were going to make.

@emonigma
Copy link
Contributor Author

emonigma commented Mar 13, 2020

No, I don't see conflicts with the base branch. I see tensorboard/program.py Outdated above the code in your comment under commit 2b6067a. Is the notification of "outdated" itself outdated? If so, I'm fine with merging.

@nfelt
Copy link
Collaborator

nfelt commented Mar 13, 2020

Outdated just means the comment is on code that isn't at the most recent commit touching that file; it's fine in this case. Will merge.

@nfelt nfelt merged commit 9465b89 into tensorflow:master Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants