Skip to content

feat: add Query::fingerprint() for shape-only query hashing#859

Merged
abnegate merged 4 commits intomainfrom
CLO-4204-query-fingerprint
Apr 20, 2026
Merged

feat: add Query::fingerprint() for shape-only query hashing#859
abnegate merged 4 commits intomainfrom
CLO-4204-query-fingerprint

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

@lohanidamodar lohanidamodar commented Apr 20, 2026

Summary

Adds Query::fingerprint(array $queries): string — a static helper that computes a deterministic hash of query structure (method + attribute) with values excluded.

Why

Useful for grouping queries by pattern — for example, slow-query analytics where two queries with the same shape but different parameter values should count as the same pattern. Two queries like equal("name", "Alice") and equal("name", "Bob") produce the same fingerprint.

Behavior

  • Accepts raw query strings or already-parsed Query objects
  • Order-independent (sorts shapes before hashing)
  • Returns md5 hash of canonical method:attribute|method:attribute|... string
  • Empty array returns a deterministic hash (md5(''))

Test plan

Added testFingerprint() in tests/unit/QueryTest.php covering:

  • Same shape different values → same hash
  • Different attribute → different hash
  • Different method → different hash
  • Order-independence
  • Accepts parsed Query objects
  • Empty array behavior

All 5 tests pass locally (257 assertions).

Summary by CodeRabbit

  • New Features
    • Added query fingerprinting to generate consistent identifiers for queries with the same structure, independent of literal values and array order; accepts serialized or structured queries and returns a deterministic hash (empty input yields a stable result).
  • Tests
    • Added unit tests verifying shape-only equality, order-independence, logical grouping behavior, input formats, and invalid-input error handling.

Compute a deterministic hash of query structure (method + attribute)
with values excluded. Useful for grouping queries by pattern — e.g.
slow-query analytics where two queries with the same shape but
different parameter values should count as the same pattern.

Accepts raw query strings or parsed Query objects. Order-independent.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 20, 2026

Warning

Rate limit exceeded

@lohanidamodar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d71f547a-b842-4d1f-b8cf-65998a4b09d0

📥 Commits

Reviewing files that changed from the base of the PR and between 7661f0d and f16c7ee.

📒 Files selected for processing (2)
  • src/Database/Query.php
  • tests/unit/QueryTest.php
📝 Walkthrough

Walkthrough

Added Query::fingerprint(array $queries): string, which normalizes query inputs (JSON strings or Query objects), computes a canonical shape string based on method and attribute (recursing into logical children), sorts shapes, concatenates them, and returns an MD5 hash representing the collection's structural shape.

Changes

Cohort / File(s) Summary
Fingerprinting Method
src/Database/Query.php
Added public static function fingerprint(array $queries): string and a private recursive helper to compute a canonical, order-independent shape string per query (based on method and attribute), sort shapes, and return their MD5. Throws QueryException for invalid element types.
Fingerprinting Tests
tests/unit/QueryTest.php
Added unit tests covering JSON string and Query object inputs, empty-array result, order-independence (top-level and within logical operators), shape-equality despite differing literal values, and error handling for invalid types.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰
I sniffed the queries, one by one,
Collected shapes beneath the sun,
Sorted, hashed, a tidy trail,
No noisy values left to flail,
A rabbit's hop — a fingerprint done!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: a new Query::fingerprint() method for generating shape-only query hashes, which is the primary addition across both modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CLO-4204-query-fingerprint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 20, 2026

Greptile Summary

This PR adds Query::fingerprint(array $queries): string and the supporting Query::shape(): string method for structure-only query hashing (method + attribute, values excluded), useful for slow-query analytics and pattern grouping. The iterative tree traversal in shape() correctly recurses into LOGICAL_TYPES nodes, resolving the previously flagged issue where nested logical queries collapsed to and: / or: — two AND queries with different child shapes now produce different fingerprints. Test coverage is thorough, including the newly required nested-query cases.

Confidence Score: 5/5

Safe to merge — implementation is correct, previously flagged issues are resolved, and test coverage is comprehensive.

All prior concerns about nested logical queries losing their inner shape have been addressed by the iterative shape() implementation. No new P0/P1 issues were found.

No files require special attention.

Important Files Changed

Filename Overview
src/Database/Query.php Adds fingerprint() and shape() methods; iterative tree traversal correctly recurses into nested logical queries, resolving the previously flagged gap.
tests/unit/QueryTest.php Adds four new test methods covering leaf queries, nested logical queries, invalid input rejection, and direct shape() output — including the previously missing nested-query case.

Reviews (5): Last reviewed commit: "refactor: expose shape() as an iterative..." | Re-trigger Greptile

Comment thread src/Database/Query.php Outdated
Comment thread tests/unit/QueryTest.php
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Query.php`:
- Around line 436-455: The fingerprint function (Query::fingerprint) only
records outer getMethod():getAttribute(), which collapses distinct nested
queries; update fingerprint to build a recursive canonical shape for each Query
instance by traversing child/value structures returned by
getAttribute/getValue/getChildren (or analogous accessors) so that nested calls
like or(equal(name)) vs or(greaterThan(age)) produce different entries; include
structural markers for helpers (exists, notExists, select) by serializing their
inner attribute/value structure into the shape, sort the resulting shapes and
then md5 the joined canonical strings—modify the loop in fingerprint to call a
new recursive serializer function that handles Query instances and arrays/values
to produce an unambiguous string representation.

In `@tests/unit/QueryTest.php`:
- Around line 472-505: Add tests to QueryTest::testFingerprint that cover nested
structural shapes and helpers whose shape is encoded in values: create queries
using Query::or, Query::and, and Query::elemMatch with nested child queries
(e.g., or([equal(...), greaterThan(...)]) and elemMatch('arr', equal(...))) and
also helpers that carry structural info inside values (e.g., in/contains-like
helpers), then assert that same structural shapes with different terminal values
produce identical fingerprints via Query::fingerprint while structural
differences produce different fingerprints; also include order-independence
checks for nested children and a deterministic hash for empty inputs as already
done.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ea7c924-91a7-4942-97ee-731e6fb1d4e1

📥 Commits

Reviewing files that changed from the base of the PR and between ee2d7d4 and d8c6185.

📒 Files selected for processing (2)
  • src/Database/Query.php
  • tests/unit/QueryTest.php

Comment thread src/Database/Query.php
Comment thread tests/unit/QueryTest.php
Previously all `and(...)`, `or(...)`, and `elemMatch(...)` queries hashed as `and:` / `or:` / `elemMatch:` regardless of their child shapes, defeating the purpose of fingerprinting for slow-query pattern grouping.

The helper now recurses into logical query children, producing canonical shapes like `and(equal:name|greaterThan:age)`. Invalid array elements (non-string, non-Query) throw a QueryException instead of a fatal PHP error.

Added tests for nested AND/OR differentiation, child-order independence, and rejection of invalid elements.
@lohanidamodar lohanidamodar force-pushed the CLO-4204-query-fingerprint branch from 94048d2 to 7661f0d Compare April 20, 2026 05:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/Database/Query.php (1)

453-482: ⚠️ Potential issue | 🟠 Major

Preserve structural attributes in the recursive shape.

elemMatch() currently drops its attribute, and exists(), notExists(), and select() all collapse because their field names live in values. That makes distinct query shapes share the same fingerprint.

🐛 Proposed fix
-            $shapes[] = self::queryShape($query);
+            $shapes[] = self::queryShape($query);
         }
 
-        \sort($shapes);
+        \usort($shapes, static fn (array $left, array $right): int => \serialize($left) <=> \serialize($right));
 
-        return \md5(\implode('|', $shapes));
+        return $shapes === [] ? \md5('') : \md5(\serialize($shapes));
     }
 
     /**
      * Canonical shape string for a single Query — recursive for logical types.
      *
      * `@param` Query $query
-     * `@return` string
+     * `@return` array<string, mixed>
      */
-    private static function queryShape(self $query): string
+    private static function queryShape(self $query): array
     {
         $method = $query->getMethod();
+        $shape = [
+            'method' => $method,
+            'attribute' => $query->getAttribute(),
+        ];
 
         if (\in_array($method, self::LOGICAL_TYPES, true)) {
             $childShapes = [];
             foreach ($query->getValues() as $child) {
-                if ($child instanceof self) {
-                    $childShapes[] = self::queryShape($child);
+                if (!$child instanceof self) {
+                    throw new QueryException('Invalid nested query element for fingerprint: expected Query instance');
                 }
+
+                $childShapes[] = self::queryShape($child);
             }
-            \sort($childShapes);
-            return $method . '(' . \implode('|', $childShapes) . ')';
+            \usort($childShapes, static fn (array $left, array $right): int => \serialize($left) <=> \serialize($right));
+            $shape['children'] = $childShapes;
+        } elseif (\in_array($method, [self::TYPE_EXISTS, self::TYPE_NOT_EXISTS, self::TYPE_SELECT], true)) {
+            $attributes = $query->getValues();
+            \usort($attributes, static fn (mixed $left, mixed $right): int => \serialize($left) <=> \serialize($right));
+            $shape['attributes'] = $attributes;
         }
 
-        return $method . ':' . $query->getAttribute();
+        return $shape;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Query.php` around lines 453 - 482, queryShape currently drops
the parent attribute for logical types so different shapes (e.g., elemMatch,
exists, notExists, select) collapse; update queryShape(self $query) to include
the parent's attribute for LOGICAL_TYPES (e.g., produce
"METHOD:attribute(childShapes...)" or "METHOD(attribute)|...") while still
recursing into children via getValues() and collecting child shapes from nested
Query instances (using getAttribute() on children), then sort and implode
childShapes as before so fingerprints remain distinct; refer to queryShape,
LOGICAL_TYPES, getValues(), getAttribute(), and methods like
elemMatch/exists/notExists/select when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Query.php`:
- Around line 453-482: queryShape currently drops the parent attribute for
logical types so different shapes (e.g., elemMatch, exists, notExists, select)
collapse; update queryShape(self $query) to include the parent's attribute for
LOGICAL_TYPES (e.g., produce "METHOD:attribute(childShapes...)" or
"METHOD(attribute)|...") while still recursing into children via getValues() and
collecting child shapes from nested Query instances (using getAttribute() on
children), then sort and implode childShapes as before so fingerprints remain
distinct; refer to queryShape, LOGICAL_TYPES, getValues(), getAttribute(), and
methods like elemMatch/exists/notExists/select when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 448e8bd6-6f0a-4c2e-a4f6-091f54dc08a7

📥 Commits

Reviewing files that changed from the base of the PR and between d8c6185 and 7661f0d.

📒 Files selected for processing (2)
  • src/Database/Query.php
  • tests/unit/QueryTest.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/QueryTest.php

Logical queries and/or have an empty attribute, but elemMatch carries the field name being matched. Without including the attribute, elemMatch('tags', [...]) and elemMatch('categories', [...]) with the same inner shape would hash identically.

Canonical shape is now `method:attribute(child1|child2)` for logical types — and/or are unaffected (attribute empty), elemMatch preserves the field.
Comment thread src/Database/Query.php Outdated
* @param Query $query
* @return string
*/
private static function queryShape(self $query): string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need an instance anyway, lets make it an instance method shape, with a loop instead of recursion

Per review: make the canonical-shape helper a public instance method on Query (`shape()`) rather than a private static inside `fingerprint()`, and replace the recursive walk with an iterative stack-based post-order traversal.

Added test covering leaf, logical, elemMatch, and 4-level-deep nested shapes to verify iterative equivalence with the previous recursive version.
@abnegate abnegate merged commit dd009a5 into main Apr 20, 2026
47 of 49 checks passed
@abnegate abnegate deleted the CLO-4204-query-fingerprint branch April 20, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants