Skip to content

yaroslav/kino

Repository files navigation

Kino

Kino is a high-performance Ractor web server for Ruby 4.0+.

GitHub Release Docs

Ruby threads cannot run Ruby code in parallel, so production setups fork a process per core and pay for each copy in memory. Kino runs your code on every core in one small process. A Rust (tokio + hyper) front-end owns the network, parallel Ractors run your Rack 3 app, and a threaded fallback mode runs everything else, Rails included.

  • Fast. On a real 8-core server, every Kino mode is 1.5-2× ahead of a Puma fork cluster on I/O-light endpoints. Ractor mode also wins on pure CPU, 30%+. Benchmarks below.
  • A fraction of the memory. One process instead of a fork per core: about 15× less memory than the Puma cluster under the same load, and 8× less when serving the Rails hello-world.
  • Parallel without forking. Ractor mode runs CPU work more than 5× faster than Kino's own GVL-bound threaded mode, in the same small process.
  • Production plumbing included. Graceful drain, crash supervision and respawn, bounded queues with 503 backpressure, request timeouts, TLS (rustls), live stats, async access and app logging.
  • Tells you why. kino --check lists exactly what blocks your app from ractor mode, finding by finding, so you do not have to decode Ractor::IsolationError yourself.
  • Puma-shaped. The same workers × threads topology, a familiar config DSL, a kino CLI. If you can run Puma, you can run Kino.

N.B.: Ractors are officially experimental in Ruby 4.0, and so is this server. The threaded mode is solid. Still, Kino aims to be the best way to experiment with Ractors today—and the best Ractor server when they become stable.


Table of Contents

Why

The GVL allows only one Ruby thread to run at a time. To use all cores, Ruby servers fork processes, and every fork costs a full copy of the app. Ractors do not have this limit: each one has its own lock, so one process can run Ruby in parallel. What was missing is a server that dispatches requests to them. Ruby 4.0 reworked Ractors (Ractor::Port, shareable_proc, less lock contention) and made this worth building.

Why a Ractor server has to be built this way, and which Rust parts make Ractors fast here: doc/why-kino.md. The full design notes live in doc/architecture.md.

Benchmarks

Measured on a real server: AWS c7a.2xlarge (8-core AMD EPYC 9R14, 16 GB, Amazon Linux 2023). This is a realistic app-server size. The same Ractor-shareable app runs on every server, Ruby 4.0.5 with YJIT, every server at its defaults: Puma forks 8 workers × 3 threads, Kino stays in one process (8 workers; 1 thread each in ractor modes, 3 in threaded). Numbers are req/s by wrk (8-second windows, 64 connections, same host). Methodology and the analysis behind every column: doc/benchmarks.md.

endpoint Kino :ractor + lanes :ractor, workers 32² Kino :threaded Puma (cluster)
/plaintext 229,565 244,340 156,118 217,619 118,190
/10k 179,119 188,258 134,457 157,147 105,588
/cpu (fib) 76,922¹ 73,136 62,406 13,499 58,337
/io (5 ms) 1,548 1,548 5,935 4,715 4,687
/io_native 1,570 1,571 6,289 4,717 4,695

Memory on the same box, RSS after sustained load:

serving Kino (one process) Puma cluster (8 workers)
bench app, :ractor 80 MB 1,256 MB
bench app, :threaded 151 MB³ 1,256 MB
Rails hello-world 97 MB 797 MB

"+ lanes" is the experimental per-worker-queue dispatcher (lanes true). It posts the fastest plaintext/10k of any configuration here. Details: doc/benchmarks.md.

¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure CPU by +32% (+25% with lanes). Threaded mode shows the GVL ceiling that every single-process Ruby server hits. The old CPU-tuning recipe is retired: its threads 1 half is the default now, and its tokio_threads 1 half costs −12% on real hardware; see doc/benchmarks.md.

² Wait-bound throughput is slots ÷ wait, and the default columns bring 8 single-thread workers against the cluster's 24 threads. Kino slots are threads, not processes—when your app waits a lot, raise workers. The workers 32 column is that tuning: +27% over the cluster on /io (+34% via Kino.sleep) while still ahead of it on pure CPU, all in one small process. The cost is the CPU-light rows (32 ractors oversubscribe 8 cores); pick the topology your app's wait profile needs. See doc/benchmarks.md.

³ With MALLOC_ARENA_MAX=2 (the standard Ruby deployment setting; Heroku's default). Without it, 24 threads churning 10 KB responses through one glibc heap balloon to ~600 MB—an arena-fragmentation footgun, not a leak, and ractor mode sidesteps it. See doc/benchmarks.md.

A common first idea is to keep your current server and wrap the app in a ractor pool. We measured that too (same box; the analysis is in the doc):

endpoint Kino :ractor (8×3) Puma + ractor wrapper Falcon + ractor wrapper
/plaintext 199,032 19,532 100,342
/cpu (fib) 68,238 17,323 48,561
/io (5 ms) 4,531 1,452 1,544

In short: ractor mode beats fork-level CPU parallelism (5.7× Kino's own GVL-bound threaded mode, +32% over the cluster) in one process, at about 1/16th of the cluster's memory. Every Kino mode is 1.5-2.1× ahead of the cluster on I/O-light endpoints. The macOS numbers (secondary; everything there hits the loopback ceiling) and the YJIT × Ractors gotcha are in doc/benchmarks.md.

Reproduce: bench/run.sh [seconds] [concurrency] for the main table, bench/studies.sh for the follow-ups (CPU recipe, topology, scaling, logging, memory).

Install

You need Ruby >= 4.0. Add Kino to your application's bundle:

bundle add kino      # or: gem install kino (outside a bundle)

or put it in the Gemfile yourself:

gem "kino", "~> 0.1"

Then generate a config and serve:

bundle exec kino --init    # writes kino.rb; every directive documented in place
bundle exec kino           # picks up config.ru + kino.rb, serves on :9292

(After a standalone gem install, the kino command works without bundle exec.)

No Rust compiler needed: released versions ship precompiled native gems for Linux (x86_64/aarch64, glibc and musl) and macOS (arm64). On other platforms the gem compiles at install time; that needs a Rust toolchain, plus clang/libclang on Linux.

Usage

require "kino"

# Ractor mode needs a Ractor-shareable app: capture nothing, freeze config.
app = Ractor.shareable_proc do |env|
  [200, { "content-type" => "text/plain" }, ["Hello from #{Ractor.current}"]]
end

Kino::Server.run(app, port: 9292)   # traps INT/TERM; Ctrl-C drains gracefully

Or embedded, with everything spelled out:

server = Kino::Server.new(app,
  bind: "127.0.0.1",
  port: 9292,                 # 0 = ephemeral; read back via server.port
  workers: Etc.nprocessors,   # ractors (parallelism)
  threads: 1,                 # per worker; ractor default 1, threaded default 3
  mode: :auto,                # :auto | :ractor | :threaded
  queue_depth: 1024,          # bounded queue; overflow → 503
  queue_timeout: 5.0,         # seconds before 503 on a full queue
  request_timeout: nil,       # seconds before a slow response becomes a 504 (nil = off)
  shutdown_timeout: 30,       # drain deadline
  tls: { cert: "cert.pem", key: "key.pem" },  # file paths or inline PEM
)
server.start
server.shutdown               # graceful: drain → deadline → abort stragglers

Modes

  • :ractor: workers Ractors × threads Threads each. The app must be Ractor.shareable? (frozen middleware, shareable_proc endpoints). Forcing :ractor with an unshareable app raises Kino::UnshareableAppError. A crashed ractor returns 500 to its in-flight requests right away, then respawns.
  • :threaded: the same machinery on workers × threads plain Threads. Runs any Rack app, including Rails, today. Parallel for I/O, serialized by the GVL for CPU.
  • :auto (default): :ractor when the app is shareable, otherwise a warning and :threaded. One caveat: a class used as a Rack app always counts as "shareable" (classes are), even if calling it touches unshareable state. Force :threaded for those.

Config file and CLI

Settings can live in a Puma-style Ruby DSL file. Precedence: explicit kwargs and CLI flags > config file > defaults.

# kino.rb
port 9292
workers 8
threads 1
mode :ractor
kino --init                   # write a fully commented sample kino.rb
kino                          # config.ru + kino.rb, port 9292
kino --check                  # explain whether the app can run in :ractor mode
kino -C config/kino.rb -p 3000 -w 4 -m ractor my_app.ru

The generated sample documents every directive, including the Rails settings and the performance notes.

kino --check

When an app cannot run in :ractor mode, Kino can tell you why, instead of leaving you with a bare Ractor::IsolationError. The check changes nothing (it does not freeze your objects) and names each blocker: captured variables with the place they were defined, instance variables by path, and the class-level instance variable trap that catches class-style apps:

$ kino --check
check: app is NOT Ractor-shareable
  - app (Proc at app.rb:12)—captures `cache` = {} (Hash) (unshareable)
  - app (HelloApp).@instance—class-level ivar holds #<HelloApp…>—classes
    pass Ractor.shareable?, but reading this from a worker ractor raises
    Ractor::IsolationError on the first request
  hints: freeze config at boot; build endpoints with Ractor.shareable_proc;
  keep per-worker resources in Ractor.store_if_absent; or run mode :threaded.

Exit status is 0/1, so it works in CI. The programmatic form is Kino::Check.report(app).

Request timeouts

request_timeout: seconds (or request_timeout 30 in kino.rb) limits how long the app may take to produce a response. Past the deadline the client gets an immediate 504 while the handler keeps running; its late response is dropped without harm. Off by default. The handler is deliberately not killed, because interrupting arbitrary Ruby mid-flight is unsafe. A stuck handler still occupies its worker slot until it returns, so set the deadline above your slowest legitimate endpoint and watch stats[:timeouts].

Stats

server.stats returns a live snapshot: the configuration plus counters from the native layer (one relaxed atomic per request, no measurable cost):

server.stats
# => {mode: :ractor, lanes: false, workers: 8, threads: 1, batch: 1,
#     respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
#     timeouts: 0}
# plus lane_depths: [...] when lane dispatch is on

From the outside, kill -USR1 <pid> prints the same snapshot as one line (pair it with pidfile to find the pid):

Kino stats: mode=:ractor lanes=false workers=8 threads=1 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0

Logging

With one log line per request, Kino::Logger sustained 2.4× the throughput of a shared ::Logger (149k vs 63k req/s on the benchmark box). There are two native pieces. Both write through a lock-free channel to a Rust flusher thread, so request threads never take a log mutex and never make a write syscall:

  • Access log (log_requests true): one line per request to stdout, including the 503s that never reach your app. Recommended in development; cheap enough for production. On color terminals the lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon, 5xx bright red:

    127.0.0.1 [Tue, 10 Jun 2026 13:39:56 GMT] "GET / HTTP/1.1" 200 0.1ms
    
  • Kino::Logger: a ::Logger over the same async sink, for your app's own logging (Kino::Logger.new("log/production.log"), or no argument for stdout). The raw IO-like device is Kino::Logger::Device, for integrations that want bytes without ::Logger formatting. The device is frozen and Ractor-shareable, so one device serves every worker.

Kino::Logger in a Rails app: it is a real ::Logger subclass, so it fits anywhere Rails expects a logger:

# config/environments/production.rb, simplest forms:
config.logger = Kino::Logger.new                          # stdout
config.logger = Kino::Logger.new("log/production.log")    # file
# both file and stdout:
config.logger = ActiveSupport::BroadcastLogger.new(
  Kino::Logger.new("log/production.log"), Kino::Logger.new
)
# tagged logging wraps it like any ::Logger:
config.logger = ActiveSupport::TaggedLogging.new(Kino::Logger.new)

From a plain Rack app, give middleware the logger, or hand Rack::CommonLogger the raw device (it just calls write):

# config.ru
use Rack::CommonLogger, Kino::Logger::Device.new   # access-style app log
run MyApp

(If you only want request lines, prefer Kino's own log_requests true. It is free for your Ruby threads, and it also sees the 503s that never reach Rack.)

Graceful shutdown drains both logs fully. A hard crash can lose the tail of the buffer, and when you log faster than the disk can take (over 100k lines/s), the sink drops lines instead of blocking request threads. These trade-offs are measured in doc/benchmarks.md.

Timer waits

Kino.sleep(seconds) is a high-resolution sleep on the OS clock with the GVL released. MRI's own sleep wakes up late inside non-main ractors (details and numbers in doc/benchmarks.md). Use Kino.sleep for explicit timer waits in handlers. Ordinary blocking I/O does not need it.

Rack 3 compliance

The spec suite runs every test app under Rack::Lint over real sockets: streaming request bodies (forward-only rack.input), enumerable and callable (full-duplex stream) response bodies, lowercase and multi-value headers, HEAD/204 semantics. Full hijack is left out on purpose; it is optional in Rack 3.

Rails

Rails (edge) runs on Kino today in :threaded mode; see examples/rails-hello. Ractor-mode Rails is blocked upstream. The exact blockers, the Ruby::Box findings, and what would unlock it are written up in doc/rails-on-ractors.md. The example ships a probe script that re-tests against whatever Rails you bundle.

Development

bin/setup
bundle exec rake                       # compile, Rust tests, specs, RBS, lint
RB_SYS_CARGO_PROFILE=dev bundle exec rake compile   # fast dev rebuilds

Assisted by

Claude Code (Mythos, Opus).

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yaroslav/kino.

License

The gem is available as open source under the terms of the MIT License.

About

A high-performance Ractor web server for Ruby 4.0+: Rack 3-based, with a Rust Tokio/Hyper front-end and Ractor-parallel Ruby workers and threaded fallback mode.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors