fix(clp-s): Use smart pointers in the SchemaMatch pass to prevent use-after-free (fixes #1986).#1990
Conversation
… can be invalidated during constant propagation.
WalkthroughReplaces raw Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/search/SchemaMatch.hpp (1)
72-78: 🧹 Nitpick | 🔵 TrivialRemaining raw-pointer maps could become stale for the same reason.
m_column_to_descriptornow correctly holdsshared_ptrto keep descriptors alive across AST mutations. However,m_descriptor_to_schema(line 77) andm_unresolved_descriptor_to_descriptor(line 78) are still keyed by rawColumnDescriptor*. These are safe today because:
m_unresolved_descriptor_to_descriptoris cleared inrun()before the second pass.m_descriptor_to_schemais populated inpopulate_schema_mapping()which runs after constant propagation, so the AST is stable.But this is fragile — if the pass ordering changes, these maps would suffer the same use-after-free. Consider documenting this invariant or, for consistency, switching these to
shared_ptr-based keys as well.
🤖 Fix all issues with AI agents
In `@components/core/src/clp_s/search/SchemaMatch.cpp`:
- Around line 350-354: The loop is copying a std::shared_ptr on each iteration
causing unnecessary atomic ref-count ops; change the loop to take the element by
reference (use auto const& descriptor) so you don't copy the shared_ptr, then
continue to call descriptor->is_pure_wildcard() and descriptor.get() when
inserting into m_descriptor_to_schema via try_emplace (symbols:
m_column_to_descriptor, descriptor, is_pure_wildcard(), m_descriptor_to_schema,
try_emplace, descriptor.get()).
LinZhihao-723
left a comment
There was a problem hiding this comment.
Two nit catch otherwise lgtm.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/core/src/clp_s/search/SchemaMatch.cpp (1)
361-389: 🧹 Nitpick | 🔵 Trivial
m_descriptor_to_schemastill keyed by rawColumnDescriptor*— consider whether this should also migrate.
intersect_schemasandintersect_and_sub_exprindexm_descriptor_to_schemawith rawColumnDescriptor*(lines 382, 447, 453, 460, 469). These pointers are obtained fromcolumn.get()on live AST nodes, so they're valid during the intersection pass (which runs after constant propagation is complete). However, this leaves a mixed pointer-ownership model:m_column_to_descriptorusesshared_ptrvalues whilem_descriptor_to_schemauses raw-pointer keys.This is safe today because
m_descriptor_to_schemais only populated and consumed within a singlerun()invocation after constant propagation. If future changes introduce another constant-propagation pass betweenpopulate_schema_mappingandintersect_schemas, the same use-after-free category could resurface. No action required for this PR, but worth documenting the invariant.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/core/src/clp_s/search/SchemaMatch.cpp` around lines 361 - 389, intersect_schemas and intersect_and_sub_expr currently index m_descriptor_to_schema with raw ColumnDescriptor* while m_column_to_descriptor uses shared_ptr, which mixes ownership and risks use-after-free; either migrate m_descriptor_to_schema to use shared_ptr<ColumnDescriptor> keys (update all access sites in intersect_schemas, intersect_and_sub_expr, populate_schema_mapping, and any places that call m_descriptor_to_schema to use the shared_ptr from m_column_to_descriptor) or, if you choose not to change types now, add a clear invariant comment and a runtime check in run()/populate_schema_mapping guaranteeing that m_descriptor_to_schema is populated and consumed within a single pass (and assert that pointers in m_descriptor_to_schema match those held by m_column_to_descriptor) so future refactors don't introduce lifetime bugs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@components/core/src/clp_s/search/SchemaMatch.cpp`:
- Around line 361-389: intersect_schemas and intersect_and_sub_expr currently
index m_descriptor_to_schema with raw ColumnDescriptor* while
m_column_to_descriptor uses shared_ptr, which mixes ownership and risks
use-after-free; either migrate m_descriptor_to_schema to use
shared_ptr<ColumnDescriptor> keys (update all access sites in intersect_schemas,
intersect_and_sub_expr, populate_schema_mapping, and any places that call
m_descriptor_to_schema to use the shared_ptr from m_column_to_descriptor) or, if
you choose not to change types now, add a clear invariant comment and a runtime
check in run()/populate_schema_mapping guaranteeing that m_descriptor_to_schema
is populated and consumed within a single pass (and assert that pointers in
m_descriptor_to_schema match those held by m_column_to_descriptor) so future
refactors don't introduce lifetime bugs.
---
Duplicate comments:
In `@components/core/src/clp_s/search/SchemaMatch.cpp`:
- Around line 350-355: Replace the redundant count+at lookup around
m_column_to_descriptor with a single find: use
m_column_to_descriptor.find(column_id) and check the iterator against end(),
then iterate over the found iterator->second (using the existing auto const&
descriptor) and update m_descriptor_to_schema (as currently done with
try_emplace and descriptor.get()) — this removes the extra map lookup while
keeping the descriptor->is_pure_wildcard() check and current insertion logic for
schema_id/column_id.
Description
This PR fixes a use-after-free bug in the
SchemaMatchpass where a cached mapping ofcolumn Id -> set<ColumnDescriptor*>could end up referencing invalid pointers after some parts of the AST were constant propagated away. The bug and the address sanitizer output that helped us catch it is detailed in #1986.The fix is simply to change the mapping to use a
set<shared_ptr<ColumnDescriptor>>, since the AST itself already uses smart pointers.Besides this change, we also perform a small amount of cleanup in the surrounding code.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit