Skip to content

etcd crashes on startup when wal-dir is bind-mounted to host directory, fails with unlinkat /var/lib/etcd/wal: device or resource busy #20218

Open
@RampantDespair

Description

@RampantDespair

Bug report criteria

What happened?

When configuring etcd with a wal-dir that is mapped to a bind-mounted host directory, the server crashes at startup with the following error:

panic: failed to create WAL
error: unlinkat /var/lib/etcd/wal: device or resource busy

This occurs consistently when the following configuration is used:

data-dir: "/var/lib/etcd/data"
wal-dir: "/var/lib/etcd/wal"

And mounted via Docker with:

-v /mnt/user/appdata/etcd/data:/var/lib/etcd/data
-v /mnt/user/appdata/etcd/wal:/var/lib/etcd/wal

etcd logs show:

failed to rename the temporary WAL directory
tmp-dir-path: /var/lib/etcd/wal.tmp
dir-path: /var/lib/etcd/wal
error: unlinkat /var/lib/etcd/wal: device or resource busy

What did you expect to happen?

etcd should start normally and initialize its WAL, even if wal-dir is a separate volume. The WAL rename behavior should succeed without crashing the server.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create two bind mounts on the host for etcd:

    mkdir -p /mnt/user/appdata/etcd/data
    mkdir -p /mnt/user/appdata/etcd/wal
  2. Run etcd in Docker with this config:

    data-dir: "/var/lib/etcd/data"
    wal-dir: "/var/lib/etcd/wal"

    And mount:

    -v /mnt/user/appdata/etcd/data:/var/lib/etcd/data
    -v /mnt/user/appdata/etcd/wal:/var/lib/etcd/wal
  3. Observe the panic with unlinkat failure on /var/lib/etcd/wal.

Anything else we need to know?

Workaround

  1. First run with:

    wal-dir: "/var/lib/etcd/wal/wal"
  2. Shut down etcd, then run:

    mv -v /mnt/user/appdata/etcd/wal/wal/* /mnt/user/appdata/etcd/wal/
  3. Revert config to:

    wal-dir: "/var/lib/etcd/wal"

etcd then starts successfully. This workaround is clunky and non-obvious.

Etcd version (please run commands below)

I'm unable to open a console on the container, but I am running gcr.io/etcd-development/etcd:v3.6.1

Etcd configuration (command line flags or environment variables)

# https://github.com/etcd-io/etcd/blob/main/etcd.conf.yml.sample

# --- Member ---

# Human-readable name for this member.
name: "default"

# Path to the data directory.
data-dir: "/var/lib/etcd/data"

# Path to the dedicated wal directory.
wal-dir: "/var/lib/etcd/wal"

# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000

# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100

# Time (in milliseconds) for an election to timeout. See tuning documentation for details.
election-timeout: 1000

# Whether to fast-forward initial election ticks on boot for faster election.
initial-election-tick-advance: true

# List of URLs to listen on for peer traffic.
listen-peer-urls: "http://localhost:2380"

# List of URLs to listen on for client grpc traffic and http as long as --listen-client-http-urls is not specified.
listen-client-urls: "http://localhost:2379"

# List of URLs to listen on for http only client traffic. Enabling this flag removes http services from --listen-client-urls.
listen-client-http-urls: ""

# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5

# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5

# Raise alarms when backend size exceeds the given quota (0 defaults to low space quota).
quota-backend-bytes: 0

# BackendFreelistType specifies the type of freelist that boltdb backend uses(array and map are supported types).
backend-bbolt-freelist-type: "map"

# BackendBatchInterval is the maximum time before commit the backend transaction.
backend-batch-interval:

# BackendBatchLimit is the maximum operations before commit the backend transaction.
backend-batch-limit: 0

# Maximum number of operations permitted in a transaction.
max-txn-ops: 128

# Maximum client request size in bytes the server will accept.
max-request-bytes: 1572864

# Minimum duration interval that a client should wait before pinging server.
#grpc-keepalive-min-time: 5s

# Frequency duration of server-to-client ping to check if a connection is alive (0 to disable).
#grpc-keepalive-interval: 2h

# Additional duration of wait before closing a non-responsive connection (0 to disable).
#grpc-keepalive-timeout: 20s

# Enable to set socket option SO_REUSEPORT on listeners allowing rebinding of a port already in use.
socket-reuse-port: false

# Enable to set socket option SO_REUSEADDR on listeners allowing binding to an address in TIME_WAIT state.
socket-reuse-address: false

# --- Clustering ---

# List of this member's peer URLs to advertise to the rest of the cluster.
initial-advertise-peer-urls: "http://localhost:2380"

# Initial cluster configuration for bootstrapping.
initial-cluster: "default=http://localhost:2380"

# Initial cluster state ('new' or 'existing').
initial-cluster-state: "new"

# Initial cluster token for the etcd cluster during bootstrap.
# Specifying this can protect you from unintended cross-cluster interaction when running multiple clusters.
initial-cluster-token: "etcd-cluster"

# List of this member's client URLs to advertise to the public.
# The client URLs advertised should be accessible to machines that talk to etcd cluster.
# etcd client libraries parse these URLs to connect to the cluster.
advertise-client-urls: "http://localhost:2379"

# Discovery URL used to bootstrap the cluster.
discovery: ""

# Expected behavior ('exit' or 'proxy') when discovery services fails.
# "proxy" supports v2 API only.
discovery-fallback: "proxy"

# HTTP proxy to use for traffic to discovery service.
discovery-proxy: ""

# DNS srv domain used to bootstrap the cluster.
discovery-srv: ""

# Suffix to the dns srv name queried when bootstrapping.
discovery-srv-name: ""

# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: true

# Enable the raft Pre-Vote algorithm to prevent disruption when a node that has been partitioned away rejoins the cluster.
pre-vote: true

# Auto compaction retention length. 0 means disable auto compaction.
auto-compaction-retention: "0"

# Interpret 'auto-compaction-retention' as one of: periodic|revision.
# 'periodic' for duration-based retention, defaulting to hours if no time unit is provided (e.g. '5m').
# 'revision' for revision number-based retention.
auto-compaction-mode: "periodic"

# Accept etcd V2 client requests. Deprecated and to be decommissioned in v3.6.
enable-v2: false

# Phase of v2store deprecation. Allows to opt-in for higher compatibility mode.
# Supported values:
#   'not-yet'              – Issues a warning if v2store has meaningful content (default in v3.5)
#   'write-only'           – Custom v2 state is not allowed (planned default in v3.6)
#   'write-only-drop-data' – Custom v2 state will get DELETED!
#   'gone'                 – v2store is not maintained any longer (planned default in v3.7)
v2-deprecation: "not-yet"

# --- Security ---

# Path to the client server TLS cert file.
cert-file: "/etc/certs/etcd_crt.pem"

# Path to the client server TLS key file.
key-file: "/etc/certs/etcd_prv.pem"

# Enable client cert authentication.
# It's recommended to enable client cert authentication to prevent attacks from unauthenticated clients (e.g. CVE-2023-44487), especially when running etcd as a public service.
client-cert-auth: true

# Path to the client certificate revocation list file.
client-crl-file: ""

# Allowed TLS hostname for client cert authentication.
client-cert-allowed-hostname: ""

# Path to the client server TLS trusted CA cert file.
# Note: setting this parameter will also automatically enable client cert authentication no matter what value is set for `--client-cert-auth`.
trusted-ca-file: "/etc/ssl/certs/home-ca.crt"

# Client TLS using generated certificates.
auto-tls: false

# Path to the peer server TLS cert file.
peer-cert-file: ""

# Path to the peer server TLS key file.
peer-key-file: ""

# Enable peer client cert authentication.
# It's recommended to enable peer client cert authentication to prevent attacks from unauthenticated forged peers (e.g. CVE-2023-44487).
peer-client-cert-auth: false

# Path to the peer server TLS trusted CA file.
peer-trusted-ca-file: ""

# Required CN for client certs connecting to the peer endpoint.
peer-cert-allowed-cn: ""

# Allowed TLS hostname for inter peer authentication.
peer-cert-allowed-hostname: ""

# Peer TLS using self-generated certificates if --peer-key-file and --peer-cert-file are not provided.
peer-auto-tls: false

# The validity period of the client and peer certificates that are automatically generated by etcd when you specify ClientAutoTLS and PeerAutoTLS, the unit is year, and the default is 1.
self-signed-cert-validity: 1

# Path to the peer certificate revocation list file.
peer-crl-file: ""

# Comma-separated list of supported TLS cipher suites between client/server and peers (empty will be auto-populated by Go).
cipher-suites:

# Comma-separated whitelist of origins for CORS, or cross-origin resource sharing (empty or * means allow all).
cors: "*"

# Acceptable hostnames from HTTP client requests, if server is not secure (empty or * means allow all).
host-whitelist: "*"

# Minimum TLS version supported by etcd.
tls-min-version: "TLS1.2"

# Maximum TLS version supported by etcd (empty will be auto-populated by Go).
tls-max-version: ""

# --- Auth ---

# Specify a v3 authentication token type and its options ('simple' or 'jwt').
auth-token: "simple"

# Specify the cost / strength of the bcrypt algorithm for hashing auth passwords. Valid values are between 4 and 31.
bcrypt-cost: 10

# Time (in seconds) of the auth-token-ttl.
auth-token-ttl: 300

# --- Profiling and Monitoring ---

# Enable runtime profiling data via HTTP server. Address is at client URL + "/debug/pprof/"
enable-pprof: false

# Set level of detail for exported metrics, specify 'extensive' to include server side grpc histogram metrics.
metrics: "basic"

# List of URLs to listen on for the metrics and health endpoints.
listen-metrics-urls: ""

# --- Logging ---

# Currently only supports 'zap' for structured logging.
logger: "zap"

# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd,
# or list of comma separated output targets.
log-outputs:
  - "default"

# Configures log level. Only supports debug, info, warn, error, panic, or fatal.
log-level: "info"

# Enable log rotation of a single log-outputs file target.
enable-log-rotation: false

# Configures log rotation if enabled with a JSON logger config.
# MaxSize(MB), MaxAge(days, 0=no limit), MaxBackups(0=no limit),
# LocalTime (use computer's local time), Compress (gzip)
log-rotation-config-json: '{"maxsize": 100, "maxage": 0, "maxbackups": 0, "localtime": false, "compress": false}'

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

I'm unable to open a console on the container.

Relevant log output

{"level":"warn","ts":"2025-06-24T18:47:36.053464-0400","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2025-06-24T18:47:36.053538-0400","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
{"level":"info","ts":"2025-06-24T18:47:36.053566-0400","caller":"etcdmain/config.go:176","msg":"loaded server configuration, other configuration command line flags and environment variables will be ignored if provided","path":"/etc/etcd/config.yaml"}
{"level":"warn","ts":"2025-06-24T18:47:36.053588-0400","caller":"etcdmain/config.go:270","msg":"--snapshot-count is deprecated in 3.6 and will be decommissioned in 3.7."}
{"level":"warn","ts":"2025-06-24T18:47:36.053602-0400","caller":"etcdmain/config.go:270","msg":"--max-snapshots is deprecated in 3.6 and will be decommissioned in 3.7."}
{"level":"warn","ts":"2025-06-24T18:47:36.053610-0400","caller":"etcdmain/config.go:270","msg":"--v2-deprecation is deprecated and scheduled for removal in v3.8. The default value is enforced, ignoring user input."}
{"level":"info","ts":"2025-06-24T18:47:36.053631-0400","caller":"etcdmain/etcd.go:64","msg":"Running: ","args":["/usr/local/bin/etcd"]}
{"level":"info","ts":"2025-06-24T18:47:36.053690-0400","caller":"etcdmain/etcd.go:124","msg":"Initialize and start etcd server","data-dir":"/var/lib/etcd/data","dir-type":"empty"}
{"level":"warn","ts":"2025-06-24T18:47:36.053703-0400","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2025-06-24T18:47:36.053710-0400","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
{"level":"info","ts":"2025-06-24T18:47:36.053719-0400","caller":"embed/etcd.go:138","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":"2025-06-24T18:47:36.054060-0400","caller":"embed/etcd.go:146","msg":"configuring client listeners","listen-client-urls":["http://localhost:2379"]}
{"level":"info","ts":"2025-06-24T18:47:36.054227-0400","caller":"embed/etcd.go:323","msg":"starting an etcd server","etcd-version":"3.6.1","git-sha":"a4708be","go-version":"go1.23.10","go-os":"linux","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":false,"name":"default","data-dir":"/var/lib/etcd/data","wal-dir":"/var/lib/etcd/wal","wal-dir-dedicated":"/var/lib/etcd/wal","member-dir":"/var/lib/etcd/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"],"listen-client-urls":["http://localhost:2379"],"listen-metrics-urls":[],"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"default=http://localhost:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"feature-gates":"","initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1,"v2-deprecation":"write-only"}
{"level":"warn","ts":"2025-06-24T18:47:36.054270-0400","caller":"fileutil/fileutil.go:56","msg":"check file permission","error":"directory \"/var/lib/etcd/data\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"level":"info","ts":"2025-06-24T18:47:36.054774-0400","logger":"bbolt","caller":"backend/backend.go:203","msg":"Opening db file (/var/lib/etcd/data/member/snap/db) with mode -rw------- and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000088c98}"}
{"level":"info","ts":"2025-06-24T18:47:36.058782-0400","logger":"bbolt","caller":"bbolt@v1.4.0/db.go:321","msg":"Opening bbolt db (/var/lib/etcd/data/member/snap/db) successfully"}
{"level":"info","ts":"2025-06-24T18:47:36.058815-0400","caller":"storage/backend.go:80","msg":"opened backend db","path":"/var/lib/etcd/data/member/snap/db","took":"4.10216ms"}
{"level":"info","ts":"2025-06-24T18:47:36.058860-0400","caller":"etcdserver/bootstrap.go:220","msg":"restore consistentIndex","index":0}
{"level":"info","ts":"2025-06-24T18:47:36.058871-0400","caller":"etcdserver/bootstrap.go:94","msg":"bootstrapping cluster"}
{"level":"info","ts":"2025-06-24T18:47:36.058974-0400","caller":"etcdserver/bootstrap.go:101","msg":"bootstrapping storage"}
{"level":"warn","ts":"2025-06-24T18:47:36.171892-0400","caller":"wal/wal.go:178","msg":"failed to rename the temporary WAL directory","tmp-dir-path":"/var/lib/etcd/wal.tmp","dir-path":"/var/lib/etcd/wal","error":"unlinkat /var/lib/etcd/wal: device or resource busy"}
{"level":"panic","ts":"2025-06-24T18:47:36.172085-0400","caller":"etcdserver/bootstrap.go:663","msg":"failed to create WAL","error":"unlinkat /var/lib/etcd/wal: device or resource busy","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.bootstrapNewWAL\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:663\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapStorage\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:171\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:102\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:307\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:129\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
panic: failed to create WAL

goroutine 1 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x1?, 0x0?, {0x0?, 0x0?, 0xc0000b20c0?})
        go.uber.org/zap@v1.27.0/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000266270, {0xc000358840, 0x1, 0x1})
        go.uber.org/zap@v1.27.0/zapcore/entry.go:262 +0x24e
go.uber.org/zap.(*Logger).Panic(0xc00023e180?, {0x11f503b?, 0x11?}, {0xc000358840, 0x1, 0x1})
        go.uber.org/zap@v1.27.0/logger.go:285 +0x51
go.etcd.io/etcd/server/v3/etcdserver.bootstrapNewWAL({{0xc00012a700, 0x7}, {0x0, 0x0}, {0x0, 0x0}, {{{0x0, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
        go.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:663 +0x1f9
go.etcd.io/etcd/server/v3/etcdserver.bootstrapStorage(...)
        go.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:171
go.etcd.io/etcd/server/v3/etcdserver.bootstrap({{0xc00012a700, 0x7}, {0x0, 0x0}, {0x0, 0x0}, {{{0x0, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
        go.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:102 +0x945
go.etcd.io/etcd/server/v3/etcdserver.NewServer({{0xc00012a700, 0x7}, {0x0, 0x0}, {0x0, 0x0}, {{{0x0, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
        go.etcd.io/etcd/server/v3/etcdserver/server.go:307 +0x78
go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc000248808)
        go.etcd.io/etcd/server/v3/embed/etcd.go:262 +0x1158
go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc00023e180?)
        go.etcd.io/etcd/server/v3/etcdmain/etcd.go:207 +0x17
go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2({0xc00003e0a0, 0x1, 0x1})
        go.etcd.io/etcd/server/v3/etcdmain/etcd.go:129 +0x103b
go.etcd.io/etcd/server/v3/etcdmain.Main({0xc00003e0a0, 0x1, 0x1})
        go.etcd.io/etcd/server/v3/etcdmain/main.go:40 +0xf3
main.main()
        go.etcd.io/etcd/server/v3/main.go:31 +0x28

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions