Skip to content

Recover C/C++ functions inside preprocessor blocks and error nodes#35

Merged
tob-scott-a merged 1 commit intotrailofbits:mainfrom
Tomer-PL:fix/preproc-recovery
Apr 29, 2026
Merged

Recover C/C++ functions inside preprocessor blocks and error nodes#35
tob-scott-a merged 1 commit intotrailofbits:mainfrom
Tomer-PL:fix/preproc-recovery

Conversation

@Tomer-PL
Copy link
Copy Markdown

Summary

  • Tree-sitter's C grammar places function definitions inside preproc_if, preproc_ifdef, preproc_else, preproc_elif, and ERROR container nodes rather than at the translation unit's top level
  • The C and C++ parsers only walked root.children, silently missing all functions nested inside these containers
  • The fix recurses into these container nodes in _visit_top_level_node for both parsers

Why this happens

When tree-sitter encounters #if/#else blocks with alternative code paths (common in cross-platform C code), it nests all subsequent content inside the preprocessor node. For example:

#if defined(_WIN32)
unsigned int __stdcall worker(void * arg)
#else
void * worker(void * arg)
#endif
{
    return 0;
}

// This function ends up inside the preproc_else node:
void important_api(int x) { ... }

Similarly, #if/#elif/#else chains inside static initializer arrays produce ERROR nodes that swallow all subsequent function definitions.

Impact

Tested on libavif v1.4.1 (24K lines of C):

Metric Before After
Nodes 818 1,015 (+197)
Functions 680 798 (+118)
Call edges 6,339 7,558 (+1,219)

118 functions across 14 files were previously invisible. Key recoveries:

File Recovered Notable functions
src/reformat.c 12 avifImageYUVToRGB (public YUV→RGB API, 9 cross-module callers)
src/read.c 11 avifParseMinimizedImageBox (cc=100, highest complexity on decode path)
apps/shared/avifjpeg.c 24 JPEG I/O with platform-specific code
src/reformat_libyuv.c 18 Entire libyuv integration (all functions behind #ifdef)
src/codec_aom.c 14 AOM codec interface
src/obu.c 8 AV1 bitstream parsing

avifParseMinimizedImageBox at cc=100 is the new highest-complexity function on the decode path — it was entirely invisible before this fix because it sits inside #if defined(AVIF_ENABLE_EXPERIMENTAL_MINI).

Test plan

  • 2 new tests in test_c_parser.py: #ifdef branch recovery and #if/#else split-signature recovery
  • All 1,043 existing tests pass
  • Validated on real-world C codebase (libavif)

🤖 Generated with Claude Code

… parsers

Tree-sitter's C grammar places function definitions inside `preproc_if`,
`preproc_ifdef`, `preproc_else`, `preproc_elif`, and `ERROR` container
nodes rather than at the translation unit's top level. The parser only
walked top-level children, silently missing these functions.

The fix recurses into these container nodes in `_visit_top_level_node`
for both the C and C++ parsers.

Tested on libavif (24K lines of C): recovered 118 functions across 14
files, including `avifImageYUVToRGB` (the public YUV-to-RGB API) and
`avifParseMinimizedImageBox` (cc=100, the highest-complexity function
on the decode path, inside an `#ifdef AVIF_ENABLE_EXPERIMENTAL_MINI`).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tob-scott-a tob-scott-a merged commit 4198276 into trailofbits:main Apr 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants