Description
Currently the codeql documentation contains this example for use with bazel based projects:
# Navigate to the Bazel workspace.
# Before building, remove cached objects
# and stop all running Bazel server processes.
bazel clean --expunge
# Build using the following Bazel flags, to help CodeQL detect the build:
# `--spawn_strategy=local`: build locally, instead of using a distributed build
# `--nouse_action_cache`: turn off build caching, which might prevent recompilation of source code
# `--noremote_accept_cached`, `--noremote_upload_local_results`: avoid using a remote cache
# `--disk_cache=`: avoid using a disk cache. Note that a disk cache is no longer considered a remote cache as of Bazel 6.
codeql database create new-database --language=<language> \
--command='bazel build --spawn_strategy=local --nouse_action_cache --noremote_accept_cached --noremote_upload_local_results --disk_cache= //path/to/package:target'
# After building, stop all running Bazel server processes.
# This ensures future build commands start in a clean Bazel server process
# without CodeQL attached.
bazel shutdown
The gist of this is that you must start from scratch and build the whole project on the local machine without any caching. For many projects that moved to bazel moving away from this model was one of the core motivations. For large builds that rely on caching / remote execution, it may no longer be feasible to run the entire build on 1 machine, even if it's only on a scheduled cadence. Moreover the user might not even host the infrastructure to do this anymore, as all of their actual developer and CI builds go through remote execution.
It would be great if there was another way to communicate the necessary information to codeql, in a way that could work alongside bazel's model.
I can't tell from the documentation what the codeql CLI is actually pulling from the build, and I imagine it's quite in depth, but I assume there are other models that could work for providing the same information for certain types of builds.
For example for C++ some natural alternatives (from the user perspective) would be to:
- produce a compile_commands.json, which is also used in many other developer workflows like C++ LSPs
- produce a clang indexstore, which can be used as a query-able database for the project
- produce a kythe index
- produce a scip-clang index
Each of these options would work better in the bazel model, since indexes could be produced remotely, and cached, and compile_commands.json should already be supported for developer workflows.
related: