Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs on startup, then throws error #6265

Closed
TomiOhl opened this issue Nov 28, 2023 · 6 comments · Fixed by #6274
Closed

Hangs on startup, then throws error #6265

TomiOhl opened this issue Nov 28, 2023 · 6 comments · Fixed by #6274
Labels
bug Something isn't working needs repro case

Comments

@TomiOhl
Copy link

TomiOhl commented Nov 28, 2023

Expected Behavior

When I run tilt up, it should start normally.

Current Behavior

When I run tilt up, first it hangs for around 30 secs, then throws an error:

[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 1 [running]:
	>  runtime/debug.Stack()
	>  	/usr/local/go/src/runtime/debug/stack.go:24 +0x65
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:60 +0xcd
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc0003114c0, {0x477c332, 0x14})
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
	>  github.com/go-logr/logr.Logger.WithName({{0x52b8c40, 0xc0003114c0}, 0x0}, {0x477c332?, 0x0?})
	>  	/root/project/vendor/github.com/go-logr/logr/logr.go:336 +0x46
	>  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x6dcb7e8?, {0xc001aed890, 0xc0004bbab0, {0x52bb840, 0xc001ae45c0}, 0xc001aed830, {0x0, 0x0}, 0x0})
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:122 +0xff
	>  sigs.k8s.io/controller-runtime/pkg/client.New(0xc001aed890?, {0xc001aed890, 0xc0004bbab0, {0x52bb840, 0xc001ae45c0}, 0xc001aed830, {0x0, 0x0}, 0x0})
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:103 +0x85
	>  sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc000d1f8c0, {0xc000dc7d90, 0x1, 0x0?})
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/cluster/cluster.go:227 +0x59b
	>  sigs.k8s.io/controller-runtime/pkg/manager.New(_, {0xc0004bbab0, 0x0, {0x0, 0x0, {0x0, 0x0}, 0x0, 0x0, 0x0, ...}, ...})
	>  	/root/project/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go:322 +0xf4
	>  github.com/tilt-dev/tilt/internal/controllers.(*TiltServerControllerManager).SetUp(0xc000301f40, {0x52aebf0?, 0xc000e4ef30?}, {0x52af9b8?, 0xc0006231a0?})
	>  	/root/project/internal/controllers/manager.go:72 +0x435
	>  github.com/tilt-dev/tilt/internal/store.(*subscriberEntry).maybeSetUp(0xc000e4eb40, {0x52aebf0, 0xc000e4ef30}, {0x52af9b8, 0xc0006231a0})
	>  	/root/project/internal/store/subscriber.go:235 +0x111
	>  github.com/tilt-dev/tilt/internal/store.(*subscriberList).SetUp(0xc000c12030, {0x52aebf0, 0xc000e4ef30}, {0x52af9b8, 0xc0006231a0})
	>  	/root/project/internal/store/subscriber.go:104 +0x152
	>  github.com/tilt-dev/tilt/internal/store.(*Store).Loop(0xc0006231a0, {0x52aebf0?, 0xc000e4ef30?})
	>  	/root/project/internal/store/store.go:138 +0x7e
	>  github.com/tilt-dev/tilt/internal/engine.Upper.Init({0x432dd80?}, {0x52aebf0, 0xc000e4ef30}, {{0xc0009c9e80, 0x3d}, {0xc00043b910, 0x1, 0x1}, {0x6f2e928, 0x0, ...}, ...})
	>  	/root/project/internal/engine/upper.go:106 +0x68
	>  github.com/tilt-dev/tilt/internal/engine.Upper.Start({0xc0007c3f00?}, {0x52aebf0, 0xc000e4ef30}, {0x6f2e928, 0x0, 0x0}, {{0x52657dc, 0x6}, {0x0, 0x0}, ...}, ...)
	>  	/root/project/internal/engine/upper.go:91 +0x290
	>  github.com/tilt-dev/tilt/internal/cli.(*upCmd).run(0xc000a69bf0, {0x52aeb48?, 0xc000345270?}, {0x6f2e928, 0x0, 0x0})
	>  	/root/project/internal/cli/up.go:168 +0x85a
	>  github.com/tilt-dev/tilt/internal/cli.addCommand.func1(0xc000958600?, {0x6f2e928, 0x0, 0x0})
	>  	/root/project/internal/cli/cli.go:156 +0xb1
	>  github.com/spf13/cobra.(*Command).execute(0xc000958600, {0x6f2e928, 0x0, 0x0})
	>  	/root/project/vendor/github.com/spf13/cobra/command.go:944 +0x847
	>  github.com/spf13/cobra.(*Command).ExecuteC(0xc000958000)
	>  	/root/project/vendor/github.com/spf13/cobra/command.go:1068 +0x3bd
	>  github.com/spf13/cobra.(*Command).Execute(...)
	>  	/root/project/vendor/github.com/spf13/cobra/command.go:992
	>  github.com/spf13/cobra.(*Command).ExecuteContext(...)
	>  	/root/project/vendor/github.com/spf13/cobra/command.go:985
	>  github.com/tilt-dev/tilt/internal/cli.Execute()
	>  	/root/project/internal/cli/cli.go:96 +0xf36
	>  main.main()
	>  	/root/project/cmd/tilt/main.go:19 +0xde

After the error, it starts and works correctly though.

Steps to Reproduce

  1. Go to a directory where a tilt project is
  2. Run tilt up

Context

tilt doctor Output

$ tilt doctor
Tilt: v0.33.6, built 2023-09-29
System: linux-amd64
---
Docker
- Host: [default]
- Server Version: 
- API Version: 
- Builder: 
- Compose Version: v2.21.0
---
Kubernetes
- Env: gke
- Context: gke_[redacted]-cluster
- Cluster Name: gke_[redacted]-cluster
- Namespace: default
- Container Runtime: read-failure
- Version: Error: Get "https://172.23.0.2/version": dial tcp 172.23.0.2:443: i/o timeout
- Cluster Local Registry: none
...

@TomiOhl TomiOhl added the bug Something isn't working label Nov 28, 2023
@nicks
Copy link
Member

nicks commented Nov 28, 2023

hmmm...i think the warning stack trace is downstream of the problem. i think that might just be the underlying libraries reporting a slow boot sequence.

I'm more concerned about the tilt doctor output, which reports that you don't have a Docker daemon or a Kubernetes cluster available. so i suspect it's waiting on that?

@nicks
Copy link
Member

nicks commented Nov 28, 2023

does the problem still happen if you have a working docker daemon and kubernetes cluster?

@TomiOhl
Copy link
Author

TomiOhl commented Nov 29, 2023

Hmm yes that might be the issue, however, I'm not sure how to check this.
If I run systemctl status docker, this is the output:

● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-11-27 13:05:02 CET; 2 days ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 2013 (dockerd)
      Tasks: 26
     Memory: 926.9M
        CPU: 2min 18.457s
     CGroup: /system.slice/docker.service
             └─2013 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

nov 29 15:04:00 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:04:00.850027047+01:00" level=error msg="Error running exec 8fe95732cdf13128e74486a5c2f6396531f9f58147b0ab6cf4f21784fcb528f7 in container: Error: Exec command 8fe95732cdf13128e74486a5c2f6396531f9f58147b0ab6cf4f21784fcb528f7 is already running"
nov 29 15:04:01 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:04:01.871361112+01:00" level=info msg="ignoring event" container=a45053c45558d526a57ce45f46b8c05877623a67a86e4727184a81a0e330aa1c module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
nov 29 15:04:02 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:04:02.413869979+01:00" level=info msg="ignoring event" container=52826d1d42b659eb3abcd80082a840e2e6d747ecc797fea342cda37b75c5097e module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
nov 29 15:06:47 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:06:47.543594261+01:00" level=warning msg="no trace recorder found, skipping"
nov 29 15:07:10 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:10.431103182+01:00" level=warning msg="no trace recorder found, skipping"
nov 29 15:07:10 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:10.953142240+01:00" level=info msg="ignoring event" container=9dab649c0928b2e202531a9b75803372290931f402317e3ef69aa402edc31838 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
nov 29 15:07:10 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:10.962103433+01:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=9dab649c0928b2e202531a9b75803372290931f402317e3ef69aa402edc31838 daemonShuttingDown=false error="restart canceled" execDuration=3m15.056318102s exitStatus="{137 2023-11-29 14:07:10.942968647 +0000 UTC}" hasBeenManuallyStopped=true restartCount=0
nov 29 15:07:11 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:11.714700661+01:00" level=info msg="ignoring event" container=99e3ae6ecaa7d2da0b86e546939756611d6ab7c7e0ecdfff698a340ae7f38906 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
nov 29 15:07:11 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:11.723010997+01:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=99e3ae6ecaa7d2da0b86e546939756611d6ab7c7e0ecdfff698a340ae7f38906 daemonShuttingDown=false error="restart canceled" execDuration=3m10.881484738s exitStatus="{137 2023-11-29 14:07:11.704780356 +0000 UTC}" hasBeenManuallyStopped=true restartCount=0
nov 29 15:07:12 ThinkPad-T16 dockerd[2013]: time="2023-11-29T15:07:12.772556222+01:00" level=info msg="ignoring event" container=35575ec3e216527fd210848b003b4e2c0f5eba34218f542217430b6e53cc6fa9 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

The logs are from the time when I tried the demo project.
What's weird that everything seems to work after the error message.
In fact I've just tried the tilt demo command which should create it's owne kubernetes cluster and it also has the slow boot + error message problem.

@nicks
Copy link
Member

nicks commented Nov 29, 2023

hmmm i played around with different combos of dead docker daemons and dead clusters, and wasn't able to repro.

could you do a sigabrt on tilt while it's hanging and send us the goroutine trace? e.g., run tilt up then kill -abrt $(pgrep -f "tilt up"). That will tell us where it's getting stuck.

(NOTE: the full goroutine trace may be very large)

@TomiOhl
Copy link
Author

TomiOhl commented Nov 30, 2023

Sure! Here is the output (ran tilt up, waited a few seconds and then aborted it): out.txt

@nicks
Copy link
Member

nicks commented Nov 30, 2023

Thanks! The stack trace shows that the kubectl client libraries are hanging forever waiting on a TCP connection from gcp. You probably have a firewalled or non-functional control plane, as described in this doc: https://cloud.google.com/kubernetes-engine/docs/troubleshooting#kubectl-times-out

We should add some checks for this case in tilt. But for now you can work around this by changing your kubectl context to something else.

nicks added a commit to nicks/tilt that referenced this issue Dec 4, 2023
this should make tilt faster to load when connecting to a remote
or non-responsive k8s cluster

fixes tilt-dev#6265

Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks added a commit to nicks/tilt that referenced this issue Dec 4, 2023
this has been noticed in a few places, including
tilt-dev#6265

all users of controller-runtime now need to set a default logger

Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks added a commit that referenced this issue Dec 4, 2023
this should make tilt faster to load when connecting to a remote
or non-responsive k8s cluster

fixes #6265

Signed-off-by: Nick Santos <nick.santos@docker.com>
nicks added a commit that referenced this issue Dec 4, 2023
this has been noticed in a few places, including
#6265

all users of controller-runtime now need to set a default logger

Signed-off-by: Nick Santos <nick.santos@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs repro case
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants