Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Enumerator as an input source #48

Closed
yorickpeterse opened this issue Sep 22, 2014 · 1 comment
Closed

Support Enumerator as an input source #48

yorickpeterse opened this issue Sep 22, 2014 · 1 comment
Assignees
Milestone

Comments

@yorickpeterse
Copy link
Owner

The Enumerator class can be used to stream data without having to actually write a dedicated class for it. For example:

enum = Enumerator.new do |yielder| 
  HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk| 
    yielder << chunk
  end
end

document = Oga.parse_xml(enum)
@yorickpeterse
Copy link
Owner Author

Worth mentioning: the Enumerator class does not define a read method, nor does it have each_line. Thinking of it, the lexer should also be changed not to read an entire line but instead read a small buffer.

The different scenarios would be as following (in this order):

  • Input is a String? Just yield the whole thing, memory is already allocated anyway
  • Input responds to read? Use this in combination with a fixed buffer size
  • Last option: input responds to each? Use that instead

The rationale for the 2nd and 3rd option is that reading an entire line can be in-efficient. For example, if a large XML file smacks everything on to a single line then the streaming process wouldn't be very efficient.

Code wise this would look something like the following:

if @data.is_a?(String)
  yield @data
# StringIO/IO instances
elsif @data.respond_to?(:read)
  yield @data.read(4) until @data.eof?
# Enumerator and basically everything else
elsif @data.respond_to?(:each)
  @data.each do |line|
    yield line
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant