Skip to content

[ci] Use pixi and pre-commit for all linting jobs #6901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

borchero
Copy link
Collaborator

@borchero borchero commented Apr 27, 2025

Motivation

Running linters in this repository is always a little involved (one needs to execute a number of commands to actually run everything), leading to many (unexpected) CI failures. This has already improved by using pre-commit which bundles a bunch of linters.

This PR proposes to bundle all linters in pre-commit and use pixi to track linting dependencies (more on this below).

Changes

  • Use pixi to manage all linting dependencies
  • Use local hooks for pre-commit, using the dependencies installed via pixi
  • Remove the lint task from the .ci/test.sh file
  • Adjust the CI job executing the linters
  • Adjust code as per the versions of the linters installed now (changes performed by ruff and typos)

What is pixi and why?

Pixi is a modern and fast alternative to conda, mamba, etc. Hence, at a high level, it is a cross-platform package manager which installs conda packages (from conda-forge). Some differences that make it so nice to use:

  • It has a "project-based" approach. Instead of global environments with a name, environments are project-bound. All dependencies are installed into the .pixi directory in the repository root. One can have arbitrarily many environments. The default environment typically captures the local development environment.
  • Pixi maintains a lockfile (pixi.lock) which has two benefits: on the one hand, CI doesn't suddenly fail if a new version of a tool is available. On the other hand, the computationally intensive "solve" step does not need to run in the CI. The solve is only performed once locally (and is very fast compared to conda -- it runs in seconds even in projects that are considerably more complex than LightGBM).
  • Pixi allows to add tasks which can be executed via pixi run. These tasks run in the deno task shell and, thus, can be executed in a platform-agnostic manner. When running a task, all required dependencies are automatically installed, making it trivial for the user to get started.
  • Not directly related to pixi, but: setup-pixi automatically caches environments in the CI, further improving download times.

What's up with the pre-commit setup?

pre-commit is the most ergonomic way to bundle all linting jobs together. One can now simply run

pre-commit run --all-files

to execute all linters in this repository. Due to the custom pixi task, you can also run

pixi run lint

This automatically installs all required dependencies, including linters and pre-commit itself (except for pwsh which is not currently available via conda-forge for Linux & OSX).

The pre-commit setup itself does NOT use other repositories anymore but, instead, uses pixi to execute the linting jobs using the linters installed via the pixi. This has the benefit that (1) the behavior of pre-commit matches the behavior of running linters manually (e.g. when auto-formatting in the editor) and (2) there is only one update process for all dependencies.

One additional benefit of using pre-commit is also that linters are executed "lazily", i.e. if one linter fails, the other linters are being run as well. Before, we eagerly exited on the first linter failure.

Why remove the "lint" job from .ci/test.sh?

The linting job has a trivial entrypoint now and keeping the task in this lengthy script is rather confusing than helpful IMO.

Where to go from here?

If we can agree that the use of pixi makes sense, I'll propose to use it to manage the Python dependencies in a separate PR. This will allow us to cut down CI runtimes significantly and, much more importantly, allow contributors to get started with all necessary dependencies much quicker. No worries: I will not propose to manage compilers in CI jobs via conda ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this exists to make it easy to activate the environment in the active shell.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this, working through it, and so thoroughly documenting it in the PR description! I've never used pixi so this would be quite a disruptive change to my development workflow, but I am absolutely willing to try it as part of reviewing this. I agree, it would be great to consolidate all of these linting tasks and make it easier to get the development environment set up (to help with #6350), and the combination of pre-commit + pixi looks like a powerful way to do that while avoiding the need to manually set up virtualenvs / conda environments.

I'm not ready to approve this yet. However, there are other parts of this PR I really like and support immediately:

  • removing linting from test.sh
    • (sticking it in a new .ci/lint-all.sh, that'd get deleted in this pixi PR, would be fine for now)
  • upgrading all the linters to their latest versions and addressing the new warnings
  • splitting up lint and check-docs into separate jobs in the static_analysis.yml file, and renaming check-docs to python-check-docs for consistency with the other job called r-check-docs
  • adding .pixi stuff to .gitignore and the ignores in biome.json
    • people could be using pixi locally even without the changes from this MR, so those added exclusions are helpful to do immediately

Would you consider splitting those parts off together into a single, separate PR? That could be merged more quickly, and would also cut down the size of the diff here (and yes I understand that a large part of that is the pixi.lock file, which I'm ignoring because I know it was automatically generated).


If we can agree that the use of pixi makes sense, I'll propose to use it to manage the Python dependencies in a separate PR.

I want to make an early comment about this before you spend too much time on it... if "use pixi" necessarily means "lock to specific versions via a lock file", then I'd be very opposed to expanding this to installing runtime / test-time dependencies in CI.

I really want CI to continue using >= pins wherever possible for dependencies, so that we find out about incompatibilities automatically when a project LightGBM depends on puts up a new release. In my experience, that spreads the work of reacting to changes in dependencies more evenly over time and reduces the total amount of "make LightGBM compatible with the new version of <dependency>" work.

@@ -14,6 +14,7 @@ FILES_TO_LINT <- list.files(
, recursive = TRUE
, include.dirs = FALSE
)
FILES_TO_LINT <- FILES_TO_LINT[!grepl(paste0("^", file.path(".", ".pixi", "")), FILES_TO_LINT)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this one. The line above already is saying "only look at files ending in .r or .rmd" (ignoring casing).... what files like that are in ./.pixi/?


# pixi environments
.pixi
*.egg-info
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already this line in .gitignore:

*.egg-info/

Is it not working? Why was this *.egg-info necessary?

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented May 6, 2025

I personally found pixi about a month ago from this comment. Looks really cool! Have to try it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants