feat: add Query::fingerprint() for shape-only query hashing#859
feat: add Query::fingerprint() for shape-only query hashing#859
Conversation
Compute a deterministic hash of query structure (method + attribute) with values excluded. Useful for grouping queries by pattern — e.g. slow-query analytics where two queries with the same shape but different parameter values should count as the same pattern. Accepts raw query strings or parsed Query objects. Order-independent.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 47 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdded Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds Confidence Score: 5/5Safe to merge — implementation is correct, previously flagged issues are resolved, and test coverage is comprehensive. All prior concerns about nested logical queries losing their inner shape have been addressed by the iterative No files require special attention. Important Files Changed
Reviews (5): Last reviewed commit: "refactor: expose shape() as an iterative..." | Re-trigger Greptile |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Database/Query.php`:
- Around line 436-455: The fingerprint function (Query::fingerprint) only
records outer getMethod():getAttribute(), which collapses distinct nested
queries; update fingerprint to build a recursive canonical shape for each Query
instance by traversing child/value structures returned by
getAttribute/getValue/getChildren (or analogous accessors) so that nested calls
like or(equal(name)) vs or(greaterThan(age)) produce different entries; include
structural markers for helpers (exists, notExists, select) by serializing their
inner attribute/value structure into the shape, sort the resulting shapes and
then md5 the joined canonical strings—modify the loop in fingerprint to call a
new recursive serializer function that handles Query instances and arrays/values
to produce an unambiguous string representation.
In `@tests/unit/QueryTest.php`:
- Around line 472-505: Add tests to QueryTest::testFingerprint that cover nested
structural shapes and helpers whose shape is encoded in values: create queries
using Query::or, Query::and, and Query::elemMatch with nested child queries
(e.g., or([equal(...), greaterThan(...)]) and elemMatch('arr', equal(...))) and
also helpers that carry structural info inside values (e.g., in/contains-like
helpers), then assert that same structural shapes with different terminal values
produce identical fingerprints via Query::fingerprint while structural
differences produce different fingerprints; also include order-independence
checks for nested children and a deterministic hash for empty inputs as already
done.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0ea7c924-91a7-4942-97ee-731e6fb1d4e1
📒 Files selected for processing (2)
src/Database/Query.phptests/unit/QueryTest.php
2f1b8f0 to
94048d2
Compare
Previously all `and(...)`, `or(...)`, and `elemMatch(...)` queries hashed as `and:` / `or:` / `elemMatch:` regardless of their child shapes, defeating the purpose of fingerprinting for slow-query pattern grouping. The helper now recurses into logical query children, producing canonical shapes like `and(equal:name|greaterThan:age)`. Invalid array elements (non-string, non-Query) throw a QueryException instead of a fatal PHP error. Added tests for nested AND/OR differentiation, child-order independence, and rejection of invalid elements.
94048d2 to
7661f0d
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/Database/Query.php (1)
453-482:⚠️ Potential issue | 🟠 MajorPreserve structural attributes in the recursive shape.
elemMatch()currently drops its attribute, andexists(),notExists(), andselect()all collapse because their field names live invalues. That makes distinct query shapes share the same fingerprint.🐛 Proposed fix
- $shapes[] = self::queryShape($query); + $shapes[] = self::queryShape($query); } - \sort($shapes); + \usort($shapes, static fn (array $left, array $right): int => \serialize($left) <=> \serialize($right)); - return \md5(\implode('|', $shapes)); + return $shapes === [] ? \md5('') : \md5(\serialize($shapes)); } /** * Canonical shape string for a single Query — recursive for logical types. * * `@param` Query $query - * `@return` string + * `@return` array<string, mixed> */ - private static function queryShape(self $query): string + private static function queryShape(self $query): array { $method = $query->getMethod(); + $shape = [ + 'method' => $method, + 'attribute' => $query->getAttribute(), + ]; if (\in_array($method, self::LOGICAL_TYPES, true)) { $childShapes = []; foreach ($query->getValues() as $child) { - if ($child instanceof self) { - $childShapes[] = self::queryShape($child); + if (!$child instanceof self) { + throw new QueryException('Invalid nested query element for fingerprint: expected Query instance'); } + + $childShapes[] = self::queryShape($child); } - \sort($childShapes); - return $method . '(' . \implode('|', $childShapes) . ')'; + \usort($childShapes, static fn (array $left, array $right): int => \serialize($left) <=> \serialize($right)); + $shape['children'] = $childShapes; + } elseif (\in_array($method, [self::TYPE_EXISTS, self::TYPE_NOT_EXISTS, self::TYPE_SELECT], true)) { + $attributes = $query->getValues(); + \usort($attributes, static fn (mixed $left, mixed $right): int => \serialize($left) <=> \serialize($right)); + $shape['attributes'] = $attributes; } - return $method . ':' . $query->getAttribute(); + return $shape; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/Query.php` around lines 453 - 482, queryShape currently drops the parent attribute for logical types so different shapes (e.g., elemMatch, exists, notExists, select) collapse; update queryShape(self $query) to include the parent's attribute for LOGICAL_TYPES (e.g., produce "METHOD:attribute(childShapes...)" or "METHOD(attribute)|...") while still recursing into children via getValues() and collecting child shapes from nested Query instances (using getAttribute() on children), then sort and implode childShapes as before so fingerprints remain distinct; refer to queryShape, LOGICAL_TYPES, getValues(), getAttribute(), and methods like elemMatch/exists/notExists/select when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/Database/Query.php`:
- Around line 453-482: queryShape currently drops the parent attribute for
logical types so different shapes (e.g., elemMatch, exists, notExists, select)
collapse; update queryShape(self $query) to include the parent's attribute for
LOGICAL_TYPES (e.g., produce "METHOD:attribute(childShapes...)" or
"METHOD(attribute)|...") while still recursing into children via getValues() and
collecting child shapes from nested Query instances (using getAttribute() on
children), then sort and implode childShapes as before so fingerprints remain
distinct; refer to queryShape, LOGICAL_TYPES, getValues(), getAttribute(), and
methods like elemMatch/exists/notExists/select when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 448e8bd6-6f0a-4c2e-a4f6-091f54dc08a7
📒 Files selected for processing (2)
src/Database/Query.phptests/unit/QueryTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/QueryTest.php
Logical queries and/or have an empty attribute, but elemMatch carries the field name being matched. Without including the attribute, elemMatch('tags', [...]) and elemMatch('categories', [...]) with the same inner shape would hash identically.
Canonical shape is now `method:attribute(child1|child2)` for logical types — and/or are unaffected (attribute empty), elemMatch preserves the field.
| * @param Query $query | ||
| * @return string | ||
| */ | ||
| private static function queryShape(self $query): string |
There was a problem hiding this comment.
If we need an instance anyway, lets make it an instance method shape, with a loop instead of recursion
Per review: make the canonical-shape helper a public instance method on Query (`shape()`) rather than a private static inside `fingerprint()`, and replace the recursive walk with an iterative stack-based post-order traversal. Added test covering leaf, logical, elemMatch, and 4-level-deep nested shapes to verify iterative equivalence with the previous recursive version.
Summary
Adds
Query::fingerprint(array $queries): string— a static helper that computes a deterministic hash of query structure (method + attribute) with values excluded.Why
Useful for grouping queries by pattern — for example, slow-query analytics where two queries with the same shape but different parameter values should count as the same pattern. Two queries like
equal("name", "Alice")andequal("name", "Bob")produce the same fingerprint.Behavior
Queryobjectsmethod:attribute|method:attribute|...stringmd5(''))Test plan
Added
testFingerprint()intests/unit/QueryTest.phpcovering:QueryobjectsAll 5 tests pass locally (257 assertions).
Summary by CodeRabbit