Skip to content

feat(clickhouse): Query::select + notStartsWith / notEndsWith / regex / orderRandom#116

Merged
lohanidamodar merged 4 commits into
mainfrom
feat-clickhouse-query-projection-and-more
May 14, 2026
Merged

feat(clickhouse): Query::select + notStartsWith / notEndsWith / regex / orderRandom#116
lohanidamodar merged 4 commits into
mainfrom
feat-clickhouse-query-projection-and-more

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

Summary

Fills out the supported Query method set on the ClickHouse adapter so callers can opt into slim projections and the rest of the filter / order menu. Previously these methods were silently ignored when passed to find() (the projection always returned every column, etc.).

Added:

  • Query::select(['col', ...]) — column projection. Multiple select() calls combine; duplicates dropped; id is always projected so the Log model still has its identifier without callers having to remember. Each requested column is validated against the schema (validateAttributeName) and identifier-escaped at SQL build time. Without select(), the previous full-column behaviour is unchanged.
  • Query::notStartsWith(...) / Query::notEndsWith(...) — symmetric with the existing startsWith / endsWith, compiled as NOT startsWith(col, val) / NOT endsWith(col, val).
  • Query::regex(col, pattern) — compiled to ClickHouse's match(haystack, pattern). Pattern is parameter-bound, never inlined.
  • Query::orderRandom()ORDER BY rand(). Mutually exclusive with cursor pagination — combining the two throws because cursor needs a stable order to anchor the next page on.

select, regex, notStartsWith, notEndsWith were added to VALUE_REQUIRED_METHODS so they fail loudly with Select queries require at least one value. (etc.) on an empty values array, matching the existing contract.

Why

Most directly, the cloud audit-events list endpoint is doing a SELECT * on rows with a sizeable data JSON payload — for UI listing pages that only need id, time, event, userId, resource, projecting just those columns avoids the dominant I/O cost. Query::select was the missing piece. Adding the rest while we're in here so the supported set matches utopia-php/query's contract.

Not in this PR

  • search / notSearch — no fulltext index on the audit table
  • exists / notExists — doc-store concept, doesn't map cleanly
  • containsAny / containsAll / elemMatch — array-column methods, audit columns are scalar
  • vector / spatial types — not applicable
  • and / or logical combinations — would need a recursive filter compiler that doesn't exist today; filed as a follow-up

API

use Utopia\Audit\Query;

// Slim projection — read only the columns the UI needs
$logs = $audit->find([
    Query::select(['event', 'resource', 'userId', 'time']),
    Query::greaterThanEqual('time', '2026-05-07T00:00:00Z'),
    Query::orderDesc('time'),
    Query::limit(150),
]);

// notStartsWith / notEndsWith
$audit->find([Query::notStartsWith('resource', 'temp/')]);
$audit->find([Query::notEndsWith('resource', '.bak')]);

// Regex via ClickHouse match()
$audit->find([Query::regex('resource', '^database/document/\\d+$')]);

// Random ordering
$audit->find([Query::orderRandom(), Query::limit(10)]);

Test plan

  • composer lint passes
  • composer check (PHPStan max) passes
  • 8 new tests cover the happy paths plus unknown-column rejection, empty-values rejection, and the cursor + random incompatibility
  • Full audit ClickHouse test suite: 61/61 pass

🤖 Generated with Claude Code

…pter

Adds the missing Query types so audit callers can opt in to slim
projections and the full filter/order menu rather than being limited to a
subset:

- `Query::select(['col', ...])` — column projection. Multiple `select()`
  calls combine; `id` is always projected so the Log model still has its
  identifier. Each requested column is validated against the schema and
  identifier-escaped at SQL build time. Without `select()`, the existing
  full-column behaviour is unchanged.
- `Query::notStartsWith(...)` / `Query::notEndsWith(...)` — symmetric
  with the existing `startsWith` / `endsWith`, emitted as
  `NOT startsWith(col, val)` / `NOT endsWith(col, val)`.
- `Query::regex(col, pattern)` — compiled to ClickHouse's
  `match(haystack, pattern)`. Pattern is parameter-bound, never inlined.
- `Query::orderRandom()` — `ORDER BY rand()`. Mutually exclusive with
  cursor pagination — combining the two throws, since cursor needs a
  stable order to anchor the next page on.

`select`, `regex`, `notStartsWith`, `notEndsWith` are added to
`VALUE_REQUIRED_METHODS` so they fail loudly when given an empty values
array, matching the existing contract.

Skipped: full-text `search`/`notSearch`, `exists`/`notExists`,
`containsAny`/`containsAll`/`elemMatch`, vector / spatial types — none
map cleanly to the audit table's scalar columns. `and` / `or` logical
combinations are filed as a separate follow-up because they need a
recursive filter compiler that we don't have today.

Eight new tests cover happy paths, unknown-column rejection,
empty-values rejection, and the cursor + random incompatibility.
When `sharedTables` is enabled the full-projection path already appends
the `tenant` column to every SELECT, so the slim `Query::select(...)`
path should match — `tenant` is metadata callers expect on every row
regardless of which columns they explicitly listed. Force-include it
alongside `id` (which was already always-projected so the Log model
keeps its identifier).
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR fills out the Query method set on the ClickHouse audit adapter, adding column projection (Query::select), four new filter types (notStartsWith, notEndsWith, regex), and Query::orderRandom. All new SQL is parameterised; identifiers are escaped via escapeIdentifier; columns are validated against the schema whitelist before use.

  • Query::selectbuildProjection injects forced columns (id, and tenant when shared-tables is on), deduplicates, and falls back to the full column list when no select is present; eight integration tests cover projection, forced-column injection, unknown-column rejection, and empty-values rejection.
  • notStartsWith / notEndsWith / regex — compiled with NOT startsWith(), NOT endsWith(), and ClickHouse's match() respectively; all patterns are parameter-bound.
  • orderRandom — emits ORDER BY rand(); combining it with a cursor or an explicit column order throws immediately rather than silently discarding the column order.

Confidence Score: 5/5

Safe to merge; all SQL is parameterised, identifier-escaped, and schema-validated. The one edge case (select projection missing a cursor order column) throws a clear exception rather than silently corrupting results.

The change is well-contained: new query types are added to an isolated switch, forced columns are always injected, and incompatible combinations (orderRandom+cursor, orderRandom+orderBy) are explicitly rejected. The single usability gap — a caller omitting a cursor order column from their projection — surfaces as a named exception on the second page call, not as wrong data.

No files require special attention; the select+cursor interaction in ClickHouse.php is the only area worth a second look.

Important Files Changed

Filename Overview
src/Audit/Adapter/ClickHouse.php Adds select projection, notStartsWith/notEndsWith/regex filters, and orderRandom ordering. Logic is sound, SQL is parameterised throughout; one usability gap where select + cursor without the order column in the projection fails on the second page call rather than at build time.
tests/Audit/Adapter/ClickHouseTest.php Eight new tests cover happy paths, unknown-column rejection, empty-values rejection, and cursor+random incompatibility. The select+cursor interaction (order column omitted from projection) is not covered.

Reviews (3): Last reviewed commit: "Merge branch 'main' into feat-clickhouse..." | Re-trigger Greptile

Comment thread src/Audit/Adapter/ClickHouse.php Outdated
@lohanidamodar lohanidamodar requested a review from abnegate May 14, 2026 03:30
lohanidamodar and others added 2 commits May 14, 2026 03:32
Two follow-ups from greptile on c1b85ad:

- buildProjection no longer re-validates user-supplied select columns.
  parseQueries already calls validateAttributeName on each column inside
  the TYPE_SELECT branch, so the second walk through getAttributes() in
  buildProjection was wasted work. The forced columns (id, tenant) still
  get the defensive check since they're injected here, not user input.

- orderRandom combined with orderAsc/orderDesc now throws. Previously
  rand() silently took precedence over the requested column order, which
  is inconsistent with how the cursor + random combination is rejected.
  The new guard mirrors that pattern so callers see the conflict
  explicitly rather than getting unexpected results.
@lohanidamodar lohanidamodar requested a review from ChiragAgg5k May 14, 2026 03:57
@lohanidamodar lohanidamodar merged commit e7b4049 into main May 14, 2026
4 checks passed
@lohanidamodar lohanidamodar deleted the feat-clickhouse-query-projection-and-more branch May 14, 2026 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants