feature: Add streaming I/O methods to FileSystem by xerial · Pull Request #458 · wvlet/uni

xerial · 2026-03-31T17:38:41Z

Summary

Add cross-platform streaming I/O APIs to FileSystem for processing large files without loading them entirely into memory
New sync methods: readLinesLazy (lazy Iterator[String]), readChunks (lazy Iterator[Array[Byte]]), readStream (InputStream), writeStream (OutputStream)
New Rx methods: readLinesRx and readChunksRx for reactive streaming
JVM/Native iterators are truly lazy with AutoCloseable support; JS uses eager fallback since Node.js sync APIs are inherently eager

Test plan

All 30 FileSystemTest tests pass on JVM (10 new streaming tests)
CI passes on all platforms (JVM, JS, Native)
Verify readLinesLazy doesn't load entire file on JVM
Verify readChunks produces correct chunk sizes and counts

🤖 Generated with Claude Code

Add cross-platform streaming I/O APIs for processing large files without loading them entirely into memory: - readLinesLazy: lazy line-by-line Iterator (truly lazy on JVM/Native) - readChunks: fixed-size byte chunk Iterator - readStream/writeStream: InputStream/OutputStream access - readLinesRx/readChunksRx: Rx-based reactive streaming JVM/Native iterators implement AutoCloseable for resource safety. JS falls back to eager reads since Node.js sync APIs are inherently eager. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds streaming and reactive I/O support to the FileSystem trait for JVM, JS, and Native platforms, including lazy line reading and chunked access. The review feedback highlights that the reactive methods currently load entire files into memory, defeating the purpose of streaming. Other identified issues include an inefficient append implementation in JavaScript, a lack of explicit UTF-8 encoding in the Native platform, and resource management concerns in the custom iterators regarding eager initialization and error handling.

gemini-code-assist · 2026-03-31T17:44:52Z

+    * Reads the file line by line as a reactive stream. Each line is emitted as an OnNext event,
+    * followed by OnCompletion.
+    */
+  def readLinesRx(path: IOPath): Rx[String] = Rx.fromSeq(readLinesLazy(path).toSeq)


The implementation of readLinesRx is not truly streaming because it calls .toSeq on the lazy iterator. This exhausts the iterator and loads the entire file content into memory before creating the reactive stream, which defeats the purpose of using a lazy iterator for large files. If Rx supports it, consider using a method that creates a stream directly from an Iterator or a producer.

gemini-code-assist · 2026-03-31T17:44:52Z

+  def readChunksRx(path: IOPath, chunkSize: Int = 8192): Rx[Array[Byte]] = Rx.fromSeq(
+    readChunks(path, chunkSize).toSeq
+  )


Similar to readLinesRx, readChunksRx calls .toSeq on the chunk iterator, causing the entire file to be loaded into memory as a sequence of byte arrays. This is inefficient for large files. Use a streaming producer for Rx if available.

gemini-code-assist · 2026-03-31T17:44:52Z

+        case WriteMode.Append =>
+          val existing = FileSystemJS.readBytes(path)
+          FileSystemJS.writeBytes(path, existing ++ bytes, WriteMode.Create)


The Append mode implementation in FlushToFileOutputStream is highly inefficient as it reads the entire existing file into memory (readBytes), appends the new data, and then writes it all back. Since FileSystemJS.writeBytes already handles WriteMode.Append (which uses Node's appendFileSync or a browser-specific append), you should delegate to it directly.

case WriteMode.Append => FileSystemJS.writeBytes(path, bytes, WriteMode.Append)

gemini-code-assist · 2026-03-31T17:44:52Z

  override def readLines(path: IOPath): Seq[String] = readString(path).split("\n").toSeq

+  override def readLinesLazy(path: IOPath): Iterator[String] =
+    val reader = BufferedReader(FileReader(toJavaFile(path)))


FileReader uses the default platform encoding, which may not be UTF-8. To ensure cross-platform consistency as specified in the FileSystemBase trait, you should explicitly use UTF-8.

val reader = BufferedReader(InputStreamReader(FileInputStream(toJavaFile(path)), StandardCharsets.UTF_8))

gemini-code-assist · 2026-03-31T17:44:52Z

+private[io] class CloseableLineIterator(reader: BufferedReader)
+    extends Iterator[String]
+    with AutoCloseable:
+  private var nextLine: String | Null = reader.readLine()


The iterator performs an eager read in the constructor. If the iterator is created but never consumed (or hasNext is never called), the first line is still read and the file handle remains open until GC or an explicit close. It is better to defer the first read until hasNext or next is invoked.

gemini-code-assist · 2026-03-31T17:44:52Z

+    val line = nextLine
+    if line == null then
+      throw java.util.NoSuchElementException("No more lines")
+    nextLine = reader.readLine()


If reader.readLine() throws an IOException, the underlying reader will not be closed because the close() call is only triggered when hasNext returns false. Consider wrapping the read in a try-catch block that closes the resource on failure.

- Move CloseableLineIterator/CloseableChunkIterator to shared sources (uni-core/src/main/scala) to eliminate JVM/Native duplication - Fix JS FlushToFileOutputStream append to use appendFileSync directly instead of read-concatenate-rewrite (avoids loading entire file) - Fix Rx tests to use run() instead of toSeq (unsupported on Scala.js) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use InputStreamReader with UTF-8 in Native readLinesLazy instead of FileReader which uses platform default encoding - Simplify JS FlushToFileOutputStream to delegate directly to writeBytes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

readLinesRx/readChunksRx called .toSeq on the iterator, loading everything into memory before creating the Rx stream. This defeats the purpose of streaming, so remove them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added the feature New feature label Mar 31, 2026

gemini-code-assist Bot reviewed Mar 31, 2026

View reviewed changes

xerial and others added 4 commits March 31, 2026 11:08

chore: Remove readLinesLazy — composable via readStream + BufferedReader

310bcba

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xerial merged commit 1795662 into main Apr 3, 2026
14 checks passed

xerial deleted the feature/add-streaming-io branch April 3, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Add streaming I/O methods to FileSystem#458

feature: Add streaming I/O methods to FileSystem#458
xerial merged 5 commits into
mainfrom
feature/add-streaming-io

xerial commented Mar 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xerial commented Mar 31, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant