Open
Description
Since all Readers provider by parser combinators framework use PagedSeq inside, using those parsers for working with large files seems impossible - because PagedSeq will not release already parsed elements.
For example, consider the scenario of parsing 1GB file, from which you need only a portion of information (you may want to skip headers, comments, etc.). PagedSeq will hold on the whole 1GB until the parsing finishes and GC would step in.
Example code:
import collection.immutable.PagedSeq
import util.parsing.combinator._
import util.parsing.input._
// virtual file reader (simulates ~400Mb file)
def in = new java.io.Reader {
var buffersRead = 0
def read(cbuf: Array[Char], offset: Int, l: Int) = {
if (buffersRead < 100000) {
(0 until cbuf.size).foreach(cbuf(_) = 't')
buffersRead += 1
cbuf.size
} else -1
}
def close() {}
}
def parser = new RegexParsers {
var gcCountdown = 0
def tt = new Parser[Char] {
def apply(in: Input) = {
gcCountdown += 1
if (gcCountdown > 10000) {
System.gc()
gcCountdown = 0
}
if (in.atEnd)
Failure("", in)
else
Success(in.first, in.drop(1024))
}
}
def go(in: java.io.Reader) = parseAll(tt.*, in).get.size
}
println(parser.go(in))
If you would look at memory usage using something like jvisualvm, you would notice that running this process consumes about 800Mb of RAM just to parse 400kb worth of characters.
Metadata
Metadata
Assignees
Labels
No labels