A small, dependency-light Python library and CLI that builds a complete deep C++
include graph from a compile_commands.json and answers dependency queries on
it.
The core job is to find, for every C++ file, which other files it includes - directly and transitively. From that one can find:
-
File inclusion dependencies: direct, indirect, transitive - all of them.
-
File's complete changeset. That is, all the files that include this one file in any way and are affected by its changes.
The primary usages can be:
-
Discover module dependencies of your project. For instance, how different libraries depend on each other. It can be used to verify that there are no loops in the dependency graph.
-
Discover the full changeset expanding from directly changed source and header files (for instance, in a commit). This can be used conveniently by static analysis tools to only run their checks on the files affected by the changeset. That in turn can be very beneficial in large codebases, where static checks on all the files could easily run for an hour.
-
Collect statistics: files having most inclusions; total number of inclusions in the project; anything else.
The usage isn't limited to that. The tool just builds the graph and provides a simple API to it.
Inclusions are detected from the real compiler, not by parsing #include
lines or running some third-party tools. This means all sorts of conditional
includes, macros, and include paths are all resolved exactly as in a real build,
respecting the same compiler flags.
The primary usage is expected to be on a compile_commands.json file. For each
entry in it the tool runs the clang preprocessor with
-E -H -fshow-skipped-includes and reads the include tree clang prints.
- Python 3.8+.
clangandclang++that support-fshow-skipped-includes. Other compilers do not support that flag as far as we know at the time of writing. This flag is fundamentally necessary to build a truly correct graph.- A
compile_commands.jsonfor your project. CMake produces one with-DCMAKE_EXPORT_COMPILE_COMMANDS=ON; other build systems have equivalents. - PyYAML (the only third-party dependency, used to save/load the graph).
The project is not published to any package index. Use it from a checkout:
pip install -e .
or simply place the cpp_dependency_graph/ package on your PYTHONPATH.
- Forward graph:
file -> set of files it directly includes. This is the fundamental data; everything else is derived from it. - filter_dirs: directories whose files are kept in the graph. Any included file outside all of them (typically system or third-party headers) is dropped. Paths are resolved to absolute form, and graph nodes are absolute paths.
- changeset: when a file changes, the files that must be re-checked are the file itself plus everything that transitively includes it.
# 1. Build the graph once and save it.
python -m cpp_dependency_graph build \
-b path/to/build_dir \
-f path/to/src1 \
-f path/to/src2 \
-j 8 \
-o deps.yaml
# 2. Query it.
python -m cpp_dependency_graph changeset -g deps.yaml \
path/to/src/foo.hpp \
path/to/src/bar.cpp
python -m cpp_dependency_graph includes -g deps.yaml path/to/src/foo.cpp
# 3. Optionally validate module dependencies.
python -m cpp_dependency_graph check-modules -g deps.yaml \
-m path/to/src/moduleA \
-m path/to/src/moduleB \
--root path/to/src
build takes either -b/--build-dir (a directory containing
compile_commands.json) or -c/--compile-commands (the file itself). Both
-f/--filter-dir and -j/--jobs are required, so values are chosen
deliberately; -f may be repeated.
from cpp_dependency_graph import build_graph, save_graph, load_graph
graph = build_graph('build/compile_commands.json',
filter_dirs=['/abs/path/src'], jobs=8)
save_graph(graph, 'deps.yaml')
graph = load_graph('deps.yaml')
graph.transitive_includes('/abs/path/src/foo.cpp') # what foo.cpp depends on
graph.changeset_of('/abs/path/src/foo.hpp') # what foo.hpp affects-
build_graph(compile_commands_path, filter_dirs, jobs)- build aDependencyGraphfrom acompile_commands.json. Same asDependencyGraph.from_compile_commands(...). -
DependencyGraph.files- every file in the graph..transitive_includes(file)- all filesfiledepends on..changeset_of(files)- the given file(s) plus everything that transitively includes them. Accepts a single path or an iterable..to_dict()/.to_reverse_dict()- the raw forward and reverse graphs.
-
save_graph(graph, path)/load_graph(path)- persist and restore a graph as YAML. -
check_modules(graph, modules, project_root=None)- given a list of module directories, returnNoneif every file in a module includes only files in some module and the module dependency graph is acyclic, otherwise a message describing the first violation. Modules may be nested: a file belongs to its nearest enclosing module. -
find_cycle(graph)- a generic cycle finder over any adjacency dict; returns the cycle as a list of nodes (closing node repeated) orNone. -
collect_stats(graph)-{'total_included_headers': N}summed over the compiled files. -
scan_file(command, file, workdir, filter_dirs)- the low-level building block: extract the forward graph of one compile command. -
ScanError- raised when the compiler invocation or its output is invalid.
See ai/skills/dev.md for the development, review, and contribution rules.
Tests use pytest:
pip install -e '.[test]'
python -m pytest
Tests that need real clang/cmake are skipped automatically when those tools
are not installed; the rest of the suite runs anywhere.