A Ruby driver for ClickHouse that speaks the native TCP binary protocol via a C++ extension wrapping the official clickhouse-cpp client.
Compared to HTTP-based gems it's faster (binary blocks, columnar decode instead of HTTP + JSON parsing) and it preserves ClickHouse types end-to-end instead of round-tripping through JSON strings.
Add to your Gemfile:
gem "clickhouse-native"Precompiled native gems are published for:
x86_64-linux-gnu,aarch64-linux-gnux86_64-darwin,arm64-darwin
For Ruby ABIs 3.3, 3.4, 4.0. On those platforms gem install drops in a prebuilt .bundle / .so with no compiler toolchain required.
On anything else, gem install falls back to compiling the vendored clickhouse-cpp from source. You'll need:
- CMake 3.15+
- A C++17 compiler
- Ruby ≥ 3.3
# macOS
brew install cmake
# Debian/Ubuntu
sudo apt-get install -y cmake build-essential
# Alpine
apk add --no-cache cmake build-base linux-headers
require "clickhouse_native"
client = ClickhouseNative::Client.new(
host: "localhost",
port: 9000,
database: "default",
user: "default",
password: "",
)
client.ping # => true
client.server_version # => "25.3.1"
client.query_value("SELECT 1 + 1") # => 2
client.query("SELECT number AS n FROM numbers(3)")
# => [{ n: 0 }, { n: 1 }, { n: 2 }]
client.execute(<<~SQL)
CREATE TABLE events (
id UInt64,
user LowCardinality(String),
tags Array(String),
happened DateTime64(6, 'UTC')
) ENGINE = MergeTree ORDER BY id
SQL
client.insert("events", [
{ id: 1, user: "alice", tags: %w[login web], happened: Time.now.utc },
{ id: 2, user: "bob", tags: %w[signup], happened: Time.now.utc },
])
client.query_each("SELECT * FROM events ORDER BY id") do |row|
puts row.inspect
end
client.closeClickhouseNative::Client.new accepts keyword arguments only:
| kwarg | default | notes |
|---|---|---|
host: |
"localhost" |
|
port: |
9000 |
native TCP port (not 8123) |
database: |
"default" |
|
user: |
"default" |
|
password: |
"" |
|
compression: |
:none |
:none, :lz4, or :zstd |
logger: |
nil |
any Logger-compatible object (see Logging) |
Pool.new additionally accepts:
| kwarg | default | notes |
|---|---|---|
pool_size: |
5 |
|
pool_timeout: |
5 |
seconds |
settings: |
{} |
session settings applied to every client in the pool (see Session settings) |
Runs DDL/DML and discards any result. Returns nil. Releases the GVL for the duration of the server round-trip.
client.execute("CREATE TABLE t (id UInt64) ENGINE = Memory")
client.execute("INSERT INTO t VALUES (1), (2)")Buffers the full result into an Array<Hash> with symbol keys:
client.query("SELECT 1 AS a, 'x' AS b")
# => [{ a: 1, b: "x" }]Returns [] for empty results. Use #query_each for large results or when you need to release the GVL mid-iteration.
Returns the first cell of the first row, or nil if there are no rows.
client.query_value("SELECT count() FROM events") # => 1337
client.query_value("SELECT 1 WHERE 0") # => nilStreams rows one block at a time, yielding each row hash to the block. Does not materialise the full result in memory. The GVL is released while ClickHouse is delivering bytes; it's reacquired only to run your block.
client.query_each("SELECT number FROM numbers(10_000_000)") do |row|
process(row[:number])
endRaising from inside the block (or throw-ing past it) aborts the query cleanly — the socket is reset and the client stays usable for the next call.
Bulk block insert. rows can be either:
Array<Hash>— keys are column names (symbols or strings). Defaultscolumns:torows.first.keys.Array<Array>— positional. Defaultscolumns:to every column in the table's DDL order.
client.insert("events", [
{ id: 1, user: "a", tags: [], happened: Time.now.utc },
{ id: 2, user: "b", tags: %w[x], happened: Time.now.utc },
])
# Subset of columns (hash keys not in the schema would raise ArgumentError):
client.insert(
"events",
[{ id: 1, user: "a" }],
columns: %w[id user],
)
# Positional rows, cross-database:
client.insert("events", [[1, "a", [], Time.now.utc]], db_name: "analytics")Without types:, #insert issues a DESCRIBE TABLE to learn the column types. The result is memoized per (table, db_name) on the client, so repeated inserts into the same table don't re-run the lookup. After ALTER TABLE, call #clear_schema_cache (with no args to drop everything, or table: / db_name: to drop a single entry).
If you already know the types (e.g. you're driving from a schema you control), pass them to skip the round-trip entirely:
client.insert(
"events",
rows,
columns: %w[id user tags happened],
types: %w[UInt64 LowCardinality(String) Array(String) DateTime64(6,'UTC')],
)Returns the number of rows inserted. An empty rows array is a no-op and returns 0 without touching the server.
Shortcut for DESCRIBE TABLE. Returns the decoded rows (symbol-keyed).
client.describe_table("events").map { |c| [c[:name], c[:type]] }
# => [["id", "UInt64"], ["user", "LowCardinality(String)"], ...]client.ping # => true (raises ConnectionError on failure)
client.server_version # => "25.3.1"reset_connection tears the TCP socket; clickhouse-cpp re-establishes it on the next operation. close releases the underlying Client permanently — further calls raise ClickhouseNative::ConnectionError.
ClickhouseNative::Pool wraps connection_pool and exposes the same surface as Client (minus close / reset_connection), plus #host, #port, #database attr readers so callers can introspect a pool's endpoint without checking out a connection. Because the extension releases the GVL during I/O, N threads on a pool of size N scale roughly linearly on I/O-bound work.
pool = ClickhouseNative::Pool.new(
host: "localhost", port: 9000,
pool_size: 8, pool_timeout: 5,
)
pool.query_value("SELECT 1") # checks out a client, runs, checks it back in
pool.with do |client| # for multi-statement work on one connection
client.execute("CREATE TEMPORARY TABLE t (n UInt64) ENGINE = Memory")
client.insert("t", [{ n: 1 }, { n: 2 }])
client.query("SELECT sum(n) FROM t")
endPass settings: to Pool.new to apply ClickHouse session settings to every client the pool creates. Each checked-out connection starts from the same session state — equivalent to sending ?key=value URL params on every HTTP request.
pool = ClickhouseNative::Pool.new(
host: "localhost", port: 9000,
settings: {
allow_experimental_analyzer: 1,
do_not_merge_across_partitions_select_final: 1,
max_execution_time: 60,
},
)Integer and Float values render bare (SET allow_experimental_analyzer = 1); anything else is quoted as a SQL string literal.
| ClickHouse | Ruby |
|---|---|
Int8…Int64, UInt8…UInt64 |
Integer |
Float32, Float64 |
Float |
Decimal, Decimal32/64/128 |
BigDecimal (scale preserved) |
String |
String (UTF-8) |
FixedString(N) |
String (UTF-8, NUL-padded to N) |
Date, Date32 |
Date |
DateTime |
Time (UTC) |
DateTime64(P, 'UTC') |
Time (UTC, sub-second precision preserved up to µs) |
Array(T) |
Array |
Nullable(T) |
T or nil |
LowCardinality(String) |
String |
LowCardinality(Nullable(String)) |
String or nil |
Map(K, V) |
Hash |
Tuple(T1, T2, ...) |
Array |
Enum8, Enum16 |
Symbol (the enum name) |
Bool, Nullable(Bool) |
true / false (or nil) |
Dynamic, Variant, typed JSON, and other experimental CH 24.x+ types raise ClickhouseNative::UnsupportedTypeError on decode.
Bool is returned as Ruby true / false at the top level of a column. clickhouse-cpp normalises Bool to UInt8 internally, so the gem relies on the declared wire type preserved by the in-tree patch under ext/clickhouse_native/patches/. Nested occurrences (Array(Bool), Map(_, Bool), Tuple(..., Bool, ...)) are decoded as their underlying UInt8.
- Integer/Float/BigDecimal/String are accepted for the matching numeric and string columns.
Symbolis coerced toString(useful forLowCardinality(String)dictionaries like:eur,:gbp).true/falsecoerce to1/0forBool(stored asUInt8).Time,DateTime, numeric epoch seconds, and ISO-8601 strings are all accepted forDateTime/DateTime64. Naked timestamp strings with no trailing timezone (e.g."2026-04-22 10:30:00") are interpreted as UTC.Date/Time/String/Integerepoch are accepted forDate/Date32; only the calendar day is stored.nilon a non-Nullablecolumn is silently coerced to the column's default (zero / empty string / empty array) — mirrors the HTTP gem'sJSONEachRowbehaviour. For strict semantics, use aNullable(T)column.LowCardinality(String)andLowCardinality(Nullable(String))inserts are supported. NumericLowCardinalitydictionaries are not.Map, arbitraryTuple, and other structural types are not yet supported for#insert— they decode fine, but inserting raisesEncoderError.
All errors inherit from ClickhouseNative::Error.
| class | raised when |
|---|---|
ConnectionError |
socket errors, host unreachable, calling on a closed client |
TimeoutError |
a network-level timeout (subclass of ConnectionError) |
ProtocolError |
malformed server frames |
ServerError |
server-side DB::Exception — carries #server_code, #server_name, #server_stacktrace |
EncoderError |
a Ruby value can't be encoded for a given column type |
DecoderError |
malformed data from the server |
UnsupportedTypeError |
decoding a CH type we don't yet map (subclass of DecoderError) |
After a ServerError (or any decoder/encoder error), the client auto-resets its socket before re-raising, so you can keep using it:
begin
client.query("SELECT no_such_function()")
rescue ClickhouseNative::ServerError => e
e.server_code # => 46
e.server_name # => "DB::Exception"
end
client.query_value("SELECT 1") # => 1Pass any Logger-compatible object as logger: to Client.new or Pool.new. Every SQL statement logs a Sequel-style line at :debug with the elapsed time; errors log at :error.
client = ClickhouseNative::Client.new(**opts, logger: Rails.logger)
client.query("SELECT 1")
# DEBUG -- : (0.421ms) SELECT 1The extension releases the GVL around every blocking clickhouse-cpp call (execute, query, query_value, query_each, insert_block, ping), so a Pool of size N genuinely runs N concurrent ClickHouse queries from N Ruby threads.
benchmark/threaded.rb demonstrates this with a SELECT sleep(0.1) workload. Example output:
serial (16 jobs): 1.612s
parallel ( 2 threads, 16 jobs): 0.815s (1.98x vs serial)
parallel ( 4 threads, 16 jobs): 0.410s (3.93x vs serial)
parallel ( 8 threads, 16 jobs): 0.208s (7.75x vs serial)
parallel (16 threads, 16 jobs): 0.110s (14.64x vs serial)
A single Client is not thread-safe — always go through a Pool for concurrent work.
git clone --recursive https://github.com/tycooon/clickhouse-native
cd clickhouse-native
bundle install
docker compose up -d clickhouse
bundle exec rake compile
bundle exec rspec
Benchmarks (require the clickhouse and click_house gems from Gemfile.release):
bundle exec ruby benchmark/compare.rb # vs HTTP gem
bundle exec ruby benchmark/threaded.rb # GVL release / scaling
Apache-2.0. The vendored clickhouse-cpp and its transitive contribs (absl, cityhash, lz4, zstd) ship under their own upstream licenses.
Created by Claude AI with some help of Yuri Smirnov.