Skip to content

Add cross-file call linker for C/C++/Go directory parsing#34

Merged
tob-scott-a merged 1 commit intotrailofbits:mainfrom
Tomer-PL:fix/cross-file-call-linker
Apr 29, 2026
Merged

Add cross-file call linker for C/C++/Go directory parsing#34
tob-scott-a merged 1 commit intotrailofbits:mainfrom
Tomer-PL:fix/cross-file-call-linker

Conversation

@Tomer-PL
Copy link
Copy Markdown

Summary

  • Adds a post-merge _link_cross_file_calls() pass in parse_directory() that rewrites dangling CALLS edges to point at definitions in other modules
  • Per-file parsers resolve bare calls like foo() as current_module:foo — after merging, if foo is defined in another module, that edge target doesn't exist and gets silently dropped by GraphStore._build_index
  • The linker builds a name→node-id index of all functions/methods, then rewrites unresolved call edges by matching on bare function name

How it works

  1. After all per-file graphs are merged, scan for CALLS edges whose target_id doesn't exist in the node set
  2. Extract the bare function name (after the last : separator)
  3. Skip qualified calls (::, ->, . in the bare name) — these are method calls that shouldn't be rewritten
  4. If exactly one definition exists across all modules → rewrite with CERTAIN confidence
  5. If multiple definitions exist → prefer the cross-module candidate (not in the caller's own module); if still ambiguous → pick the first with UNCERTAIN confidence

Impact

Tested on libavif v1.4.1 (24K lines of C, 818 nodes, 6,339 call edges):

Metric Before After
Resolved edges 1,099 2,780
Cross-module edges 0 1,681
Decode-path reachability 125 functions (1 module) 240 functions (15 modules)
Unresolved call targets 1,015 ~380 (system/library calls)

Previously, paths_between("avifDecoderRead", "avifImageRGBToYUV") returned nothing because the call crosses from read.c to reformat.c. After this fix, the full decode→reformat→stream call chain resolves.

Test plan

  • 3 new tests in test_common_parser.py covering unique resolution, same-module preservation, and unresolvable calls
  • All existing tests pass (1,041 total)
  • Validated on real-world C codebase (libavif)

Closes #10

🤖 Generated with Claude Code

…parse

Per-file parsers resolve bare calls like `foo()` as `current_module:foo`.
After merging files, if `foo` is defined in another module that edge target
doesn't exist and is silently dropped by the graph index. This adds a
post-merge linker pass in `parse_directory` that rewrites dangling CALLS
edges to point at the actual definition site by matching on function name.

Tested on libavif (24K lines of C): resolved edges increased from 1,099
to 2,780 and decode-path reachability expanded from 125 functions in 1
module to 240 functions across 15 modules.

Closes trailofbits#10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 27, 2026

CLA assistant check
All committers have signed the CLA.

@tob-scott-a tob-scott-a merged commit d46c3e7 into trailofbits:main Apr 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does this resolve calls across file boundaries?

3 participants