Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
18a77b0
feat(schema): ClickHouse data-skipping indexes and engine SETTINGS
lohanidamodar Apr 28, 2026
1f811df
fix(schema): address review feedback on ClickHouse skip indexes
lohanidamodar Apr 30, 2026
f03abae
refactor(schema): collapse skip-index API into Table::index()
lohanidamodar Apr 30, 2026
fb4f783
refactor(schema): rename SkipIndexAlgorithm to IndexAlgorithm
lohanidamodar Apr 30, 2026
f3e411d
fix(schema): scope index-name regex to CH; format settings floats
lohanidamodar Apr 30, 2026
34b2353
Merge branch 'main' into feat-clickhouse-skip-index-and-settings
Copilot Apr 30, 2026
a3fcc2b
fix(schema): restore static return type on Table::index() for fluent …
Copilot Apr 30, 2026
b1df44e
fix(schema): guard ClickHouse index loops against non-skip index types
Copilot Apr 30, 2026
1cdeee3
Update src/Query/Schema/Table.php
abnegate Apr 30, 2026
f045969
Update src/Query/Schema/Table.php
abnegate Apr 30, 2026
df22caf
Update src/Query/Schema/ClickHouse.php
abnegate Apr 30, 2026
d60af4c
docs(readme): use builder style for skip index and SETTINGS examples
Copilot Apr 30, 2026
16ddd87
fix(tests): rewrite skip-index and SETTINGS tests to use builder style
Copilot Apr 30, 2026
a548ffe
fix(schema): expose Column forwarders for skip-index and SETTINGS
abnegate Apr 30, 2026
852542f
Update src/Query/Schema/Table.php
abnegate Apr 30, 2026
bb529d8
fix(schema): make index granularity nullable to preserve user intent
abnegate Apr 30, 2026
bf3a8f9
fix(schema): restore missing brace in settings() string validation
abnegate Apr 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2086,6 +2086,50 @@ $schema->table('events')

TTL expressions are emitted verbatim; they must not be empty or contain semicolons. Dialects other than ClickHouse throw `UnsupportedException`.

**Skip-index algorithms** — every ClickHouse index is a data-skipping index that accelerates WHERE pruning by letting the engine skip whole granules. Pick the algorithm that matches the column shape via the `algorithm` argument on `Table::index()`:

```php
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;

$schema->table('events')
->bigInteger('id')->primary()
->string('user_id')
->string('country')
->string('text')
// BloomFilter — high-cardinality strings with `=` / `IN` predicates
->index(['user_id'], algorithm: IndexAlgorithm::BloomFilter)
// Set(N) — small fixed value sets, custom granularity
->index(['country'], algorithm: IndexAlgorithm::Set, algorithmArgs: [100], granularity: 4)
// NgramBloomFilter(n, size_bytes, hashes, seed) — text search on `LIKE` / `match`
->index(['text'], algorithm: IndexAlgorithm::NgramBloomFilter, algorithmArgs: [4, 1024, 3, 0])
// No algorithm specified → defaults to `TYPE minmax GRANULARITY 3`
->index(['id'])
->create();

// CREATE TABLE `events` (..., INDEX `idx_user_id` `user_id` TYPE bloom_filter GRANULARITY 1, ...)
```

The 6 algorithms are `MinMax`, `Set`, `BloomFilter`, `NgramBloomFilter`, `TokenBloomFilter`, `Inverted`. Algorithm-specific arguments are passed via `algorithmArgs` and rendered verbatim — supply them from trusted (developer-controlled) source. Other dialects ignore the ClickHouse-only `algorithm` / `algorithmArgs` / `granularity` arguments.

`MinMax` and `Inverted` take no parenthesised arguments in ClickHouse DDL — passing `algorithmArgs` for them throws `ValidationException`. Skip indexes can also be added via `ALTER TABLE … ADD INDEX` by calling `alter()` on the builder.

**Engine SETTINGS** — emit `SETTINGS k=v` after the TTL clause:

```php
$schema->table('events')
->bigInteger('id')->primary()
->settings([
'index_granularity' => 8192,
'allow_nullable_key' => true, // booleans become 1/0
])
->create();

// CREATE TABLE `events` (...) ENGINE = MergeTree() ORDER BY (`id`)
// SETTINGS index_granularity = 8192, allow_nullable_key = 1
```

Setting names must match `[A-Za-z_][A-Za-z0-9_]*`; string values are restricted to `[A-Za-z0-9_.\-+/]*`. Use ints / floats / booleans for everything else. Other dialects ignore the call.

### SQLite Schema

```php
Expand Down
74 changes: 69 additions & 5 deletions src/Query/Schema/ClickHouse.php
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,15 @@ public function compileAlter(Table $table): Statement
$alterations[] = 'DROP INDEX ' . $this->quote($name);
}

foreach ($table->indexes as $index) {
if ($index->type !== IndexType::Index) {
throw new UnsupportedException(
'Only data-skipping indexes (index()) are supported in ClickHouse ALTER TABLE.'
);
}
$alterations[] = 'ADD ' . $this->compileSkipIndex($index);
}
Comment thread
greptile-apps[bot] marked this conversation as resolved.

if (! empty($table->foreignKeys)) {
throw new UnsupportedException('Foreign keys are not supported in ClickHouse.');
}
Expand All @@ -132,6 +141,12 @@ public function compileAlter(Table $table): Statement
throw new UnsupportedException('Foreign keys are not supported in ClickHouse.');
}

if (! empty($table->settings)) {
throw new UnsupportedException(
'Table SETTINGS can only be set on CREATE TABLE; emit `ALTER TABLE ... MODIFY SETTING` directly to change them.'
);
}

if (empty($alterations)) {
throw new ValidationException('ALTER TABLE requires at least one alteration.');
}
Expand Down Expand Up @@ -165,12 +180,13 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
$primaryKeys = \array_map(fn (string $c): string => $this->quote($c), $table->compositePrimaryKey);
}

// Indexes (ClickHouse uses INDEX ... TYPE ... GRANULARITY ...)
foreach ($table->indexes as $index) {
$cols = \array_map(fn (string $c): string => $this->quote($c), $index->columns);
$expr = \count($cols) === 1 ? $cols[0] : '(' . \implode(', ', $cols) . ')';
$columnDefs[] = 'INDEX ' . $this->quote($index->name)
. ' ' . $expr . ' TYPE minmax GRANULARITY 3';
if ($index->type !== IndexType::Index) {
throw new UnsupportedException(
'Only data-skipping indexes (index()) are supported in ClickHouse CREATE TABLE.'
);
}
$columnDefs[] = $this->compileSkipIndex($index);
}

if (! empty($table->foreignKeys)) {
Expand Down Expand Up @@ -205,9 +221,57 @@ public function compileCreate(Table $table, bool $ifNotExists = false): Statemen
$sql .= ' TTL ' . $table->ttl;
}

if (! empty($table->settings)) {
$kv = [];
foreach ($table->settings as $k => $v) {
$kv[] = $k . ' = ' . $v;
}
$sql .= ' SETTINGS ' . \implode(', ', $kv);
}

return new Statement($sql, [], executor: $this->executor);
}

/**
* Render a full `INDEX <name> <columns> TYPE <algorithm>[(args)] GRANULARITY <n>`
* fragment, used by both CREATE TABLE and ALTER TABLE ADD INDEX.
*
* Defaults to `TYPE minmax GRANULARITY 3` when no algorithm is set on the
* index — matches the ClickHouse default behaviour for callers using the
* generic `Table::index()` without picking an algorithm.
*/
private function compileSkipIndex(Index $index): string
{
$cols = \array_map(fn (string $c): string => $this->quote($c), $index->columns);
$expr = \count($cols) === 1 ? $cols[0] : '(' . \implode(', ', $cols) . ')';

if ($index->algorithm === null) {
return 'INDEX ' . $this->quote($index->name) . ' ' . $expr
. ' TYPE minmax GRANULARITY ' . ($index->granularity ?? 3);
}
Comment thread
abnegate marked this conversation as resolved.
Comment thread
abnegate marked this conversation as resolved.

$type = $index->algorithm->value;

if ($index->algorithmArgs !== []) {
$args = \array_map(
fn (string|int|float $arg): string => match (true) {
\is_string($arg) => "'" . \str_replace("'", "''", $arg) . "'",
// sprintf('%F', ...) avoids scientific notation (e.g. 1.0E-5)
// which ClickHouse rejects in index type arguments. Trim
// trailing zeros so 0.01 stays "0.010000" → "0.01".
\is_float($arg) => \rtrim(\rtrim(\sprintf('%F', $arg), '0'), '.'),
default => (string) $arg,
},
$index->algorithmArgs,
);

$type .= '(' . \implode(', ', $args) . ')';
}

return 'INDEX ' . $this->quote($index->name) . ' ' . $expr
. ' TYPE ' . $type . ' GRANULARITY ' . ($index->granularity ?? 1);
}

/**
* Compile an engine declaration: `<Name>` or `<Name>(<args...>)`.
*
Expand Down
13 changes: 13 additions & 0 deletions src/Query/Schema/ClickHouse/IndexAlgorithm.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<?php

namespace Utopia\Query\Schema\ClickHouse;

enum IndexAlgorithm: string
{
case MinMax = 'minmax';
case Set = 'set';
case BloomFilter = 'bloom_filter';
case NgramBloomFilter = 'ngrambf_v1';
case TokenBloomFilter = 'tokenbf_v1';
case Inverted = 'inverted';
}
26 changes: 25 additions & 1 deletion src/Query/Schema/Column.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
use Utopia\Query\Builder\Statement;
use Utopia\Query\Exception\ValidationException;
use Utopia\Query\Schema\ClickHouse\Engine;
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;

class Column
{
Expand Down Expand Up @@ -392,6 +393,7 @@ public function dropColumn(string $name): Table
* @param array<string, int> $lengths
* @param array<string, string> $orders
* @param array<string, string> $collations
* @param list<string|int|float> $algorithmArgs ClickHouse skip-index algorithm args
*/
public function index(
array $columns,
Expand All @@ -401,8 +403,22 @@ public function index(
array $lengths = [],
array $orders = [],
array $collations = [],
?IndexAlgorithm $algorithm = null,
array $algorithmArgs = [],
?int $granularity = null,
): Table {
return $this->table->index($columns, $name, $method, $operatorClass, $lengths, $orders, $collations);
return $this->table->index(
$columns,
$name,
$method,
$operatorClass,
$lengths,
$orders,
$collations,
$algorithm,
$algorithmArgs,
$granularity,
);
}

/**
Expand Down Expand Up @@ -508,6 +524,14 @@ public function engine(Engine $engine, string ...$args): Table
return $this->table->engine($engine, ...$args);
}

/**
* @param array<string, string|int|float|bool> $settings
*/
public function settings(array $settings): Table
{
return $this->table->settings($settings);
}

/**
* @param list<string> $columns
*/
Expand Down
38 changes: 38 additions & 0 deletions src/Query/Schema/Index.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
namespace Utopia\Query\Schema;

use Utopia\Query\Exception\ValidationException;
use Utopia\Query\Schema\ClickHouse\IndexAlgorithm;

readonly class Index
{
Expand All @@ -12,6 +13,10 @@
* @param array<string, string> $orders
* @param array<string, string> $collations Column-specific collations (column name => collation)
* @param list<string> $rawColumns Raw SQL expressions appended to the column list (bypass quoting)
* @param list<string|int|float> $algorithmArgs ClickHouse skip-index algorithm args
* (e.g. [3] for set(3),
* [0.01] for bloom_filter(0.01),
* [4, 1024, 3, 0] for ngrambf_v1(n, size_bytes, hashes, seed))
*/
public function __construct(
public string $name,
Expand All @@ -23,7 +28,19 @@ public function __construct(
public string $operatorClass = '',
public array $collations = [],
public array $rawColumns = [],
public ?IndexAlgorithm $algorithm = null,
public array $algorithmArgs = [],
public ?int $granularity = null,
) {
// Only ClickHouse data-skipping indexes require an unquoted identifier
// for the name; other dialects emit the name backtick-quoted, so
// hyphens, dots, and other characters are valid there.
if ($algorithm !== null && ! \preg_match('/^[A-Za-z_][A-Za-z0-9_]*$/', $name)) {
throw new ValidationException('Invalid index name: ' . $name);
}
if ($columns === [] && $rawColumns === []) {
throw new ValidationException('Index requires at least one column.');
}
if ($method !== '' && ! \preg_match('/^[A-Za-z0-9_]+$/', $method)) {
throw new ValidationException('Invalid index method: ' . $method);
}
Expand All @@ -35,5 +52,26 @@ public function __construct(
throw new ValidationException('Invalid collation: ' . $collation);
}
}
if ($granularity !== null && $granularity < 1) {
throw new ValidationException('Index granularity must be >= 1.');
}
if ($algorithm !== null && $algorithmArgs !== [] && ! self::algorithmAcceptsArgs($algorithm)) {
throw new ValidationException(
$algorithm->value . ' does not accept algorithm arguments.'
);
}
}

/**
* MinMax and Inverted are emitted without parentheses in ClickHouse DDL;
* passing args to them would produce invalid SQL.
*/
private static function algorithmAcceptsArgs(IndexAlgorithm $algorithm): bool
{
return match ($algorithm) {
IndexAlgorithm::MinMax,
IndexAlgorithm::Inverted => false,
default => true,
};
}
}
Loading
Loading