Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bass sometimes fails to start on first run after computer boot #220

Closed
axvr opened this issue Jul 13, 2022 · 6 comments
Closed

Bass sometimes fails to start on first run after computer boot #220

axvr opened this issue Jul 13, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@axvr
Copy link

axvr commented Jul 13, 2022

I'm absolutely loving Bass so far (it easily makes it into my top 5 open source projects! 🏆). Unfortunately I've been encountering this issue (minor inconvenience) with the new automatic starting of the Buildkit daemon.

Describe the bug

Often after a computer reboot, running bass to start the REPL, exits with status 1 and prints the following message.

$ bass
init buildkit runtime for platform os=linux, arch=any: list buildkit workers: failed to list workers: Unavailable: connection closed before server preface received

Tip: if this error is too cryptic, please open an issue:

  https://github.com/vito/bass/issues/new?labels=cryptic&template=cryptic-error-message.md
$ bass
=>

This only happens on the first run of bass, subsequent runs work every time.

To Reproduce

Sometimes after a system reboot, the first time running bass to start the REPL, it will fail.

Expected behavior

Bass should start without error after computer boot.

Additional context

My setup: Fedora Linux 36, with Docker (sudo dnf install docker).

Bass version (downloaded from the GitHub release page):

$ bass --version
bass	v0.8.1
commit	94f1fc06725b8e4936c058cb49872ae159d4c44a
date	2022-07-11T01:49:39Z
@axvr axvr added the bug Something isn't working label Jul 13, 2022
@axvr axvr changed the title Bass sometimes fails to start the Buildkit daemon (or wait for it to start?) Bass sometimes fails to start on first run after computer boot Jul 13, 2022
@vito
Copy link
Owner

vito commented Jul 13, 2022

Thanks :)

I wonder if this is from trying to use buildkit too quickly after spawning it. Does the same thing happen if you docker rm -f bass-buildkitd?

@axvr
Copy link
Author

axvr commented Jul 13, 2022

I get some warnings, but no errors.

$ docker rm -f bass-buildkitd
bass-buildkitd
$ bass
WARN[0002] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
WARN[0002] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
=> 
$ docker rm -f bass-buildkitd
bass-buildkitd
$ bass
WARN[0003] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
WARN[0003] commandConn.CloseRead: commandconn: failed to wait: signal: terminated 
WARN[0003] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
=> 
$ docker rm -f bass-buildkitd
bass-buildkitd
$ bass
WARN[0002] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
WARN[0002] commandConn.CloseRead: commandconn: failed to wait: signal: terminated 
WARN[0002] commandConn.CloseWrite: commandconn: failed to wait: signal: terminated 
=> 

@axvr
Copy link
Author

axvr commented Jul 13, 2022

This is the log from bass --debug after a computer reboot:

$ bass --debug
00:52:55.583	debug	acquiring buildkitd lock	{"lockFilePath": "/run/user/1000/bass/buildkitd.lock"}
00:52:55.583	debug	acquired buildkitd lock
00:52:57.171	debug	detected buildkit config	{"version": "v0.10.3", "isActive": true, "haveHostNetwork": true}
init buildkit runtime for platform os=linux, arch=any: list buildkit workers: failed to list workers: Unavailable: connection closed before server preface received

Tip: if this error is too cryptic, please open an issue:

  https://github.com/vito/bass/issues/new?labels=cryptic&template=cryptic-error-message.md
$ bass --debug
00:53:01.642	debug	acquiring buildkitd lock	{"lockFilePath": "/run/user/1000/bass/buildkitd.lock"}
00:53:01.642	debug	acquired buildkitd lock
00:53:01.660	debug	detected buildkit config	{"version": "v0.10.3", "isActive": true, "haveHostNetwork": true}
=> 

@vito
Copy link
Owner

vito commented Jul 15, 2022

Very weird. Those warnings are a red herring; I see those too and need to clean them up.

I'm having trouble reproducing the failed start though. I'm using WSL on Windows so a wsl --shutdown should let me simulate a reboot, but here's what I see after starting it back up:

~
❯ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
~
❯ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   1744  1084 ?        Sl   20:55   0:00 /init
root         7  0.0  0.0   1752    72 ?        Ss   20:55   0:00 /init
root         8  0.0  0.0   1752    80 ?        S    20:55   0:00  \_ /init
vito         9  0.2  0.0 236628  8696 pts/0    Ssl  20:55   0:00      \_ -fish
vito       611  0.0  0.0  10620  3356 pts/0    R+   20:56   0:00      |   \_ ps auxf
vito        31  0.0  0.0   6036   456 ?        Ss   20:55   0:00      \_ ssh-agent -c
~
❯ sudo /etc/init.d/docker start
 * Starting Docker: docker                                                                                                                                                                                                                                                                                           [ OK ]
~
❯ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   1744  1084 ?        Sl   20:55   0:00 /init
root         7  0.0  0.0   1752    72 ?        Ss   20:55   0:00 /init
root         8  0.0  0.0   1752    80 ?        S    20:55   0:00  \_ /init
vito         9  0.2  0.0 236628  8728 pts/0    Ssl  20:55   0:00      \_ -fish
vito      1445  0.0  0.0  10620  3260 pts/0    R+   20:56   0:00      |   \_ ps auxf
vito        31  0.0  0.0   6036   456 ?        Ss   20:55   0:00      \_ ssh-agent -c
root       865 32.0  0.7 7281324 118892 ?      Sl   20:56   0:00      \_ /usr/bin/dockerd -p /var/run/docker.pid
root      1076  3.0  0.3 2157880 53468 ?       Ssl  20:56   0:00      |   \_ containerd --config /var/run/docker/containerd/containerd.toml --log-level info
root      1357  0.0  0.0 710388  8044 ?        Sl   20:56   0:00      \_ /usr/bin/containerd-shim-runc-v2 -namespace moby -id 731a985cba7b4f6c3e4a16809d906c315efc20d9c6b10e660bd8223925f164d9 -address /var/run/docker/containerd/containerd.sock
root      1376 14.0  0.2 776796 45752 ?        Ssl  20:56   0:00          \_ buildkitd --debug --allow-insecure-entitlement security.insecure
~
❯ docker ps
CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS          PORTS     NAMES
731a985cba7b   moby/buildkit:v0.10.3   "buildkitd --debug -…"   6 minutes ago   Up 10 seconds             bass-buildkitd
~
❯ bass --debug
20:56:34.500    debug   acquiring buildkitd lock        {"lockFilePath": "/mnt/wslg/runtime-dir/bass/buildkitd.lock"}
20:56:34.500    debug   acquired buildkitd lock
20:56:34.519    debug   detected buildkit config        {"version": "v0.10.3", "isActive": true, "haveHostNetwork": true}
=>

This seems like something that could be fixed by adding retry logic to the part that lists workers but it'd be nice to understand what's going on here before potentially adding something that makes actually-broken configurations slower to diagnose.

When you start Docker after the reboot can you see buildkitd actually running in ps auxf?

Is there anything enlightening in docker logs bass-buildkitd?

@axvr
Copy link
Author

axvr commented Jul 15, 2022

Even weirder, I'm not able to reproduce it anymore... Only thing that changed was that I did a system update, so I assume that must have (somehow) fixed it, even though I don't remember it updating Docker.

(Perhaps it was a Systemd bug?)

I'll close this issue then as it doesn't appear to be an issue with Bass. Thanks for taking the time to look into it anyway.

@axvr axvr closed this as completed Jul 15, 2022
@axvr
Copy link
Author

axvr commented Aug 10, 2022

Started encountering this issue again and did some more investigation. Here are my findings:

(TL;DR: run systemctl enable docker to fix this issue.)

  1. On system boot, Docker is "inactive (dead)" on Fedora. (I haven't tested it on a fresh install.)
  2. Running systemctl kill docker then bass will reproduce the issue 100% of the time.
  3. When I run docker after system boot, it takes a while to start up. Subsequent runs are quick.
  4. Running systemctl enable docker fixes the issue, by starting Docker on boot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants