Skip to content

ydah/fibrio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fibrio

fibrio is a small Ruby gem for reading large JSON array, NDJSON, and CSV inputs one record at a time. It keeps Fiber usage inside Fibrio::Stream, so callers can use the normal Enumerable API.

Installation

gem "fibrio"

Then require it:

require "fibrio"

Usage

Fibrio.open("data.json", format: :json) do |stream|
  stream.each do |record|
    process(record)
  end
end

each returns an Enumerator when no block is given, so lazy chains work as expected:

stream = Fibrio.open("data.json", format: :json)
top10 = stream.each.lazy.select { |record| record["active"] }.first(10)
stream.close

CSV with no header row returns arrays:

Fibrio.open("data.csv", format: :csv, headers: false) do |stream|
  stream.each { |row| p row }
end

String input is accepted as data when it is not an existing file path:

Fibrio.open("[1,2,3]", format: :json) do |stream|
  stream.each { |number| p number }
end

Top-level JSON objects can stream an array nested at a known path:

Fibrio.open('{"payload":{"records":[{"id":1},{"id":2}]}}', format: :json, path: %w[payload records]) do |stream|
  stream.each { |record| p record["id"] }
end

NDJSON uses one JSON value per non-empty line:

Fibrio.open(%({"id":1}\n{"id":2}\n), format: :ndjson) do |stream|
  stream.each { |record| p record["id"] }
end

Supported Formats

  • JSON: top-level arrays, or object-contained arrays selected with path:.
  • NDJSON: blank lines are skipped. Each non-empty line is parsed with Ruby's standard json library.
  • CSV: headers: true by default yields hashes. headers: false yields arrays. Quoted newlines are supported.

Memory Benchmark

From a source checkout, run the benchmark with:

ruby benchmark/memory.rb 250000

The benchmark generates temporary files, reads them in a child process, and polls peak RSS from the parent process. Fibrio rows iterate through records without retaining them; eager rows keep the parsed collection in memory. Peak RSS includes the Ruby VM baseline, so absolute numbers vary by Ruby version and platform.

Example result on Ruby 4.0.0 arm64-darwin24 with 250,000 records:

Format Reader Input MiB Records Seconds Peak RSS MiB
JSON Fibrio 20.07 250,000 14.710 39.4
JSON JSON.parse(File.read) 20.07 250,000 0.069 105.4
NDJSON Fibrio 20.07 250,000 0.220 25.6
NDJSON File.readlines + JSON.parse 20.07 250,000 0.182 127.6
CSV Fibrio 9.10 250,000 2.640 33.3
CSV CSV.read(headers: true) 9.10 250,000 0.826 192.8

The tradeoff is intentional: Fibrio prioritizes bounded memory use for large inputs over loading everything as fast as possible.

Known Limitations

  • Each individual record must fit in memory.

About

A small Ruby gem for reading large JSON array, NDJSON, and CSV inputs one record at a time

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages