Recover C/C++ functions inside preprocessor blocks and error nodes#35
Merged
tob-scott-a merged 1 commit intotrailofbits:mainfrom Apr 29, 2026
Merged
Conversation
… parsers Tree-sitter's C grammar places function definitions inside `preproc_if`, `preproc_ifdef`, `preproc_else`, `preproc_elif`, and `ERROR` container nodes rather than at the translation unit's top level. The parser only walked top-level children, silently missing these functions. The fix recurses into these container nodes in `_visit_top_level_node` for both the C and C++ parsers. Tested on libavif (24K lines of C): recovered 118 functions across 14 files, including `avifImageYUVToRGB` (the public YUV-to-RGB API) and `avifParseMinimizedImageBox` (cc=100, the highest-complexity function on the decode path, inside an `#ifdef AVIF_ENABLE_EXPERIMENTAL_MINI`). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
preproc_if,preproc_ifdef,preproc_else,preproc_elif, andERRORcontainer nodes rather than at the translation unit's top levelroot.children, silently missing all functions nested inside these containers_visit_top_level_nodefor both parsersWhy this happens
When tree-sitter encounters
#if/#elseblocks with alternative code paths (common in cross-platform C code), it nests all subsequent content inside the preprocessor node. For example:Similarly,
#if/#elif/#elsechains inside static initializer arrays produceERRORnodes that swallow all subsequent function definitions.Impact
Tested on libavif v1.4.1 (24K lines of C):
118 functions across 14 files were previously invisible. Key recoveries:
src/reformat.cavifImageYUVToRGB(public YUV→RGB API, 9 cross-module callers)src/read.cavifParseMinimizedImageBox(cc=100, highest complexity on decode path)apps/shared/avifjpeg.csrc/reformat_libyuv.c#ifdef)src/codec_aom.csrc/obu.cavifParseMinimizedImageBoxat cc=100 is the new highest-complexity function on the decode path — it was entirely invisible before this fix because it sits inside#if defined(AVIF_ENABLE_EXPERIMENTAL_MINI).Test plan
test_c_parser.py:#ifdefbranch recovery and#if/#elsesplit-signature recovery🤖 Generated with Claude Code