fibrio is a small Ruby gem for reading large JSON array, NDJSON, and CSV inputs one record at a time. It keeps Fiber usage inside Fibrio::Stream, so callers can use the normal Enumerable API.
gem "fibrio"Then require it:
require "fibrio"Fibrio.open("data.json", format: :json) do |stream|
stream.each do |record|
process(record)
end
endeach returns an Enumerator when no block is given, so lazy chains work as expected:
stream = Fibrio.open("data.json", format: :json)
top10 = stream.each.lazy.select { |record| record["active"] }.first(10)
stream.closeCSV with no header row returns arrays:
Fibrio.open("data.csv", format: :csv, headers: false) do |stream|
stream.each { |row| p row }
endString input is accepted as data when it is not an existing file path:
Fibrio.open("[1,2,3]", format: :json) do |stream|
stream.each { |number| p number }
endTop-level JSON objects can stream an array nested at a known path:
Fibrio.open('{"payload":{"records":[{"id":1},{"id":2}]}}', format: :json, path: %w[payload records]) do |stream|
stream.each { |record| p record["id"] }
endNDJSON uses one JSON value per non-empty line:
Fibrio.open(%({"id":1}\n{"id":2}\n), format: :ndjson) do |stream|
stream.each { |record| p record["id"] }
end- JSON: top-level arrays, or object-contained arrays selected with
path:. - NDJSON: blank lines are skipped. Each non-empty line is parsed with Ruby's standard
jsonlibrary. - CSV:
headers: trueby default yields hashes.headers: falseyields arrays. Quoted newlines are supported.
From a source checkout, run the benchmark with:
ruby benchmark/memory.rb 250000The benchmark generates temporary files, reads them in a child process, and polls peak RSS from the parent process. Fibrio rows iterate through records without retaining them; eager rows keep the parsed collection in memory. Peak RSS includes the Ruby VM baseline, so absolute numbers vary by Ruby version and platform.
Example result on Ruby 4.0.0 arm64-darwin24 with 250,000 records:
| Format | Reader | Input MiB | Records | Seconds | Peak RSS MiB |
|---|---|---|---|---|---|
| JSON | Fibrio | 20.07 | 250,000 | 14.710 | 39.4 |
| JSON | JSON.parse(File.read) | 20.07 | 250,000 | 0.069 | 105.4 |
| NDJSON | Fibrio | 20.07 | 250,000 | 0.220 | 25.6 |
| NDJSON | File.readlines + JSON.parse | 20.07 | 250,000 | 0.182 | 127.6 |
| CSV | Fibrio | 9.10 | 250,000 | 2.640 | 33.3 |
| CSV | CSV.read(headers: true) | 9.10 | 250,000 | 0.826 | 192.8 |
The tradeoff is intentional: Fibrio prioritizes bounded memory use for large inputs over loading everything as fast as possible.
- Each individual record must fit in memory.