tensorboard won't deploy #59

dylanbannon · 2018-11-11T06:53:20Z

The tensorboard code that we merged into master this past week isn't fully functional. I observed, when dpeloying the current master branch on GKE, that helmfile deployment would ultimately fail because the tensorboard deployment held up things long enough for helmfile to timeout. Upon closer inspection, it looks like the tensorboard container gets stuck pulling the image for up to 20 minutes... If there's no problem pulling this image (tensorflow/tensorflow:latest) in other settings, then I suspect this is some sort of cluster resource issue, perhaps insufficient disk space on some node.

The text was updated successfully, but these errors were encountered:

dylanbannon · 2018-11-11T19:53:16Z

So, I just tested GKE cluster creation and tensorboard deployed without a problem... I know you've observed the hanging during image pulling before too, @willgraf. Maybe it's an intermittent issue?

I looked here
https://hub.docker.com/r/tensorflow/tensorflow/tags/
and noticed that the latest image for Tensorflow is only 480MB. I remember it being 3GB, though, right?

In general, is there a best practice regarding whether or not to pull latest versions of Docker images?

Until I hear back from you, @willgraf, I'm just going to ignore this as long as it agrees to stay hidden.

osterman · 2018-11-12T22:16:47Z

You can set the timeout in the helmfile.

helmDefaults:
  timeout: 1200

osterman · 2018-11-12T22:17:17Z

Also, when calling helm you can pass --timeout=1200 (which is what the helmfile default does)

willgraf · 2018-11-16T01:30:59Z

The tensorboard instance seems to run well, however, it cannot get the data from the bucket. This may be due to some conflict with NodeJS/Express server, since they use the same routing.

In the console of tensorboard I see several errors:

Failed to decode downloaded font: http://35.230.25.91/font-roboto/oMMgfZMQthOryQo9n22dcuvvDin1pK8aKteLpeZ5c0A.woff2

and

Uncaught SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at XMLHttpRequest.req.onload (tensorboard:39466)

and

OTS parsing error: invalid version tag

maybe these errors are causing tensorboard to stop loading data (or are evidence of tensorboard fetching the data but being unable to render it).

This Stack Overflow post makes me think it may have to do with our express server and our tensorboard being on the same load balancer? I can't explain how that would cause failure, but the top answer describes a similar set up to our own.

UPDATE: This issue is due to an ingress problem. by not including the trailing "/" in the URL /tensorboard/, the express engine attempts to process the page as well as the tensorboard server. This is not quite understood yet, but there has been a specific issue for this, #68.

this ingress issue is unrelated to the current issue of tensorboard not deploying.

willgraf · 2018-11-17T02:10:21Z

I think this issue will be resolved by #70 as it tackled many of the tensorboard issues.

UPDATE: I believe this issue has been resolved by #70. I'll wait for a few days but if there is not any further activity, this issue will be closed.

willgraf · 2019-04-15T17:34:22Z

Resolved by #70

dylanbannon assigned willgraf and dylanbannon Nov 11, 2018

dylanbannon added the bug Something isn't working label Nov 11, 2018

dylanbannon mentioned this issue Nov 11, 2018

Implement automated testing #60

Closed

dylanbannon added the question Further information is requested label Nov 11, 2018

willgraf closed this as completed Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorboard won't deploy #59

tensorboard won't deploy #59

dylanbannon commented Nov 11, 2018

dylanbannon commented Nov 11, 2018

osterman commented Nov 12, 2018

osterman commented Nov 12, 2018

willgraf commented Nov 16, 2018 •

edited

willgraf commented Nov 17, 2018 •

edited

willgraf commented Apr 15, 2019

tensorboard won't deploy #59

tensorboard won't deploy #59

Comments

dylanbannon commented Nov 11, 2018

dylanbannon commented Nov 11, 2018

osterman commented Nov 12, 2018

osterman commented Nov 12, 2018

willgraf commented Nov 16, 2018 • edited

willgraf commented Nov 17, 2018 • edited

willgraf commented Apr 15, 2019

willgraf commented Nov 16, 2018 •

edited

willgraf commented Nov 17, 2018 •

edited