Skip to content

v3.1.0+ tfenv install fails with "Failed to acquire install lock" when TFENV_CONFIG_DIR is read-only (regression from #471) #524

@lumaks-redox

Description

@lumaks-redox

Summary

Since v3.1.0 (PR #471, "Add per-version locking to prevent concurrent install races"), tfenv install fails on a fresh, single-process invocation whenever TFENV_CONFIG_DIR points at a directory the invoking user cannot write to. The error is misleading — it reports lock contention when in reality mkdir is failing with EACCES, but the error is silenced.

This reproduces reliably in two common real-world setups:

  1. Homebrew-installed tfenv on GitHub-hosted Actions runnersTFENV_ROOT resolves to /home/linuxbrew/.linuxbrew/Cellar/tfenv/3.2.0, which is owned by a different user than the workflow runner; the runner can read+execute but not write.
  2. Any system-wide install where the binary lives under /opt, /usr/local, etc., owned by a different user than the one running tfenv install.

Pre-3.1.0 (v3.0.0 and earlier) this worked because there was no lock step.

Observed failure

tfenv: Another process is installing Terraform v1.13.3. Waiting...
tfenv: Lock wait timeout after 60s. Removing stale lock and retrying.
tfenv: [ERROR] Failed to acquire install lock for Terraform v1.13.3

The "Another process is installing…" line fires ~200 ms after tfenv is on PATH, with no other tfenv process running anywhere on the system.

Reproduction

Verified on Docker + Ubuntu 24.04 (takes ~70s due to the 60s lock wait):

docker run --rm ubuntu:24.04 bash -c '
  set -e
  apt-get update -qq >/dev/null && apt-get install -y -qq git curl unzip ca-certificates sudo >/dev/null
  git clone --depth 1 -b v3.2.0 https://github.com/tfutils/tfenv.git /opt/tfenv >/dev/null 2>&1
  useradd -m tester
  chmod -R a-w /opt/tfenv       # simulate read-only install (e.g. Homebrew Cellar for a different user)
  chmod a+rx /opt/tfenv /opt/tfenv/bin /opt/tfenv/libexec /opt/tfenv/lib
  echo 1.13.3 > /home/tester/.terraform-version
  su tester -c "cd /home/tester && /opt/tfenv/bin/tfenv install"
'

Result — exact error from above, exit 1.

Control — pointing TFENV_CONFIG_DIR at a user-writable, pre-created directory succeeds:

# Same setup, but:
su tester -c "mkdir -p /home/tester/.tfenv && cd /home/tester && \
              TFENV_CONFIG_DIR=/home/tester/.tfenv /opt/tfenv/bin/tfenv install"
# -> installs successfully

This isolates the problem to the lock-directory location, not to any other piece of the install flow.

Root cause

In libexec/tfenv-install (line 105 in v3.2.0):

declare lockdir="${TFENV_CONFIG_DIR}/.install-lock-${version}"
# ...
while ! mkdir "${lockdir}" 2>/dev/null; do
  if [ "${lock_retries}" -eq 0 ]; then
    log 'info' "Another process is installing Terraform v${version}. Waiting..."
  fi
  # ...

When mkdir fails with EACCES (the parent directory is read-only for this user), 2>/dev/null swallows the actual error and the code treats it identically to EEXIST. The loop waits 60s, then the "stale lock recovery" path (rmdir "${lockdir}" followed by another mkdir) also fails silently for the same permission reason, and the script exits via log 'error' "Failed to acquire install lock".

Relatedly: after log 'error' (which calls exit 1 via the lightweight log shim in lib/helpers.sh), the break on line 116 is unreachable — minor nit, but suggests the path wasn't exercised on a permission-denied case during development.

Suggested fixes (non-exclusive)

  1. Place the lock in a writable path regardless of TFENV_CONFIG_DIR permissions:

    declare lockdir="${TMPDIR:-/tmp}/tfenv-install-lock-$(id -u)-${version}"

    (Per-UID to avoid cross-user collisions on shared hosts.)

  2. Don't swallow mkdir errors; distinguish EEXIST from other failures:

    err=$(mkdir "${lockdir}" 2>&1); rc=$?
    if [ $rc -ne 0 ]; then
      if [[ "$err" == *"File exists"* ]]; then
        # lock held — wait
      else
        log 'error' "Cannot create lock at ${lockdir}: ${err}"
      fi
    fi
  3. Fall back when TFENV_CONFIG_DIR isn't writable:

    if [ ! -w "${TFENV_CONFIG_DIR}" ]; then
      TFENV_CONFIG_DIR="${HOME}/.tfenv"
      mkdir -p "${TFENV_CONFIG_DIR}"
    fi

Impact

Any user running tfenv 3.1.0 or 3.2.0 via Homebrew in a CI context where the runner user differs from the Homebrew install user (i.e. essentially all GitHub-hosted ubuntu-latest jobs that brew install tfenv) is hitting this. Since the Homebrew bottle of tfenv bumped to 3.2.0 on 2026-04-25, pipelines that were green for years have started failing with the misleading lock-contention error.

Versions tested

Happy to test any candidate fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions