Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to properly cache node-gyp builds? #785

Open
arminrosu opened this issue Jun 19, 2023 · 6 comments
Open

Question: How to properly cache node-gyp builds? #785

arminrosu opened this issue Jun 19, 2023 · 6 comments
Labels
feature request New feature or request to improve the current logic

Comments

@arminrosu
Copy link

arminrosu commented Jun 19, 2023

Hello,

Description:

I'm struggling to figure out how to properly cache node-gyp builds. This adds +30s to my yarn install, by rebuilding them every time yarn install runs.

After manually caching and restoring the build paths, it still triggers a new gyp build every yarn install. The paths I cached:

# Cache location of node headers
~/.cache/node-gyp
# Explicit caching of the affected packages
node_modules/cpu-features/build
node_modules/unix-dgram/build

My previous question in the node-gyp repo lead me here. The suggestion was to look into node-gyp-build.

Any advice would be appreciated.

Justification:

Save myself time and the planet by not wasting electricity.

Are you willing to submit a PR?

Of course.

@arminrosu arminrosu added feature request New feature or request to improve the current logic needs triage labels Jun 19, 2023
@dmitry-shibanov
Copy link
Contributor

Hello @arminrosu. I think for that case is better to use actions/cache because the setup-node saves only global cache. Through actions/cache you can specify primary/restore keys and paths for caching.

@arminrosu
Copy link
Author

@dmitry-shibanov thanks, that's how I did it. I asked the question here because this repo is concerned with node, whereas actions/cache is a generic action. Was hoping the team behind the actions was sharing knowledge maybe.

@aspiers
Copy link

aspiers commented Jan 18, 2024

It's quite disappointing and surprising to find that https://github.com/marketplace/actions/yarn-install-cache was deprecated in favour of this (actions/setup-node) and yet this basic functionality of caching node_modules is not covered out of the box. Surely every single developer using yarn with CI needs this?!

So what's the recommended best practice for GitHub actions which caches everything which might need to be cached? https://yarnpkg.com/features/caching#github-actions says:

We're still investigating the exact set of defaults that make GH Action caching more efficient. It's likely that we'll provide an official yarn-cache action mid-term for this purpose.

My best guess is that currently a combination of action/setup-node and action/cache is required, but this really should be documented at the very least, if not automated, so that literally millions of developers don't have to reinvent the same wheel.

@aspiers
Copy link

aspiers commented Jan 18, 2024

I found yarnpkg/berry#5924 which says that caching .yarn/install-state.gz turns YN0007 errors into YN0008. I guess our work is still not done... 😞

@aspiers
Copy link

aspiers commented Jan 18, 2024

Also there seems to be no way to configure this action to cache .yarn/install-state.gz, short of forking it and extending the functionality 😞

@dbalatero
Copy link

I went down the rabbit hole of caching node_modules with Github Actions cache. The issue is that node_modules was so massive in my repo (like 900mb cached?) and Github Actions has a cache limit of 10gb per repo. This meant that 10-12 CI runs would LRU eject from the cache, and this would break other crucial caches that CI would rely on. It's not really worth it.

The best I ever did was to pre-build a Docker image with a fairly recent yarn install:

#  ╭──────────────────────────────────────────────────────────╮
#  │ Stage 1: Create a base image from node18 + alpine        │
#  │  with the packages we need.                              │
#  ╰──────────────────────────────────────────────────────────╯
# The --platform flag is required when building on Apple Silicon for the Github
# Action runners.
FROM --platform=linux/amd64 node:18.17.0-alpine3.18 AS base

# Set up some core ENV variables for yarn install + playwright
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 \
    YARN_ENABLE_GLOBAL_CACHE=false \
    YARN_NM_MODE=hardlinks-local \
    YARN_CACHE_FOLDER=.yarn/cache

# Create a blank directory to store artifacts in.
RUN mkdir -p /build-artifacts

#  ╭──────────────────────────────────────────────────────────╮
#  │ Stage 2: Build node_modules + yarn in an initial         │
#  │  builder container                                       │
#  ╰──────────────────────────────────────────────────────────╯
FROM base AS builder
RUN mkdir -p /__w/my-repo/my-repo
WORKDIR /__w/my-repo/my-repo

# Copy the whole git repo over and yarn install, it's the easiest option.
COPY . ./
RUN yarn install --immutable

# To reduce yarn install times in CI, we need to copy over the following:
#   - all the node_modules folders
#   - .yarn/install-state.gz
#   - the NPM global cache folder (whatever `npm config get cache` is)
#
# We _don't_ need to copy over `.yarn/cache`, because we already commit that to
# the Git repo as part of yarn zero-installs.

# Recursively tar up the node_modules directory
RUN fd -0 -t d node_modules | tar --zstd -cf /build-artifacts/node_modules_archive.tar.zst --null -T -

# Copy over the yarn install state
RUN cp .yarn/install-state.gz /build-artifacts/yarn-install-state.gz

# Copy over the NPM global cache folder
RUN cd $(npm config get cache) && tar --zstd -cf /build-artifacts/npm_global_cache.tar.zst *

#  ╭──────────────────────────────────────────────────────────╮
#  │ Stage 3: Redo the image with just the build              │
#  │  artifacts, to keep the size down                        │
#  ╰──────────────────────────────────────────────────────────╯
FROM base

RUN mkdir -p /build-artifacts
COPY --from=builder /build-artifacts/* /build-artifacts

Then I created a Github action I could reuse in jobs in .github/actions/fast-yarn-install/action.yml

name: "fast monorepo yarn install"
description: |
  Our base CI image contains prebaked npm + yarn + node_modules caches inside
  an artifacts directory.

  This shared action will:

    - Set up all the caches from the artifacts directory
    - Run `yarn install --immutable` to resolve any drift

  This action _will_ get slower over time as we add more packages to yarn, so
  rebuilding the base CI image every so often to resolve package drift is
  advised.

runs:
  using: composite
  steps:
    - name: Find the NPM global cache directory
      id: npm-config
      shell: bash
      run: |
        echo "NPM_GLOBAL_CACHE_FOLDER=$(npm config get cache)" >> $GITHUB_OUTPUT

    - name: Move yarn install state into place
      shell: bash
      run: |
        mv /build-artifacts/yarn-install-state.gz .yarn/install-state.gz

    - name: Unpack npm global cache
      shell: bash
      run: |
        mkdir -p "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"
        tar xf /build-artifacts/npm_global_cache.tar.zst -C "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"

    - name: Unpack recursive node_modules cache directly into the monorepo
      shell: bash
      run: |
        tar xf /build-artifacts/node_modules_archive.tar.zst -C .

    - name: Run yarn install
      shell: bash
      run: |
        yarn install --immutable --inline-builds
      env:
        # Use local cache folder to keep downloaded archives
        YARN_ENABLE_GLOBAL_CACHE: "false"

        # Reduce node_modules size
        YARN_NM_MODE: "hardlinks-local"

        # Ensure we're using the local monologue cache
        YARN_CACHE_FOLDER: ".yarn/cache"

and referenced it in jobs like:

      - name: yarn install
        uses: ./.github/actions/fast-yarn-install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request to improve the current logic
Projects
None yet
Development

No branches or pull requests

4 participants