Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tests for streams #11

Open
tpolecat opened this issue Aug 20, 2014 · 9 comments
Open

add tests for streams #11

tpolecat opened this issue Aug 20, 2014 · 9 comments

Comments

@tpolecat
Copy link
Owner

there are a lot of corner cases depending on chunking of input, so it would be really nice to have fuzz tests for streams

@cvogt
Copy link

cvogt commented Sep 16, 2015

is there a way to parse a character stream? I have a use case where I want to parse a stream of lines that are not succeeded by a line break, but proceeded. There can be significant wait between the individual lines, so I need to parse and process a line before it's terminating new line is sent. Most line based streaming stuff breaks on that unfortunately, so I imagine a Stream[Char] would be the right thing here.

@cvogt
Copy link

cvogt commented Sep 16, 2015

If I have a Stream[String] with strings of size 1, does atto apply it parser to each one or to the beginning of the whole stream, across strings?

I am playing with writing a tool that parses scalac output, ignores bogus type errors based on heuristics, pretty prints types, etc. Scalac doesn't to \n after it's type errors, but before apparently. Or it's sbt.

@tpolecat
Copy link
Owner Author

So, yeah if you use the existing process combinator it will feed each string to the parser and emit values as they are complete (saving any remaining input) and either discard errors or halt on error (depending on which combinator you use). It's straightforward to write a custom processor though ... the current approach handles two possible use cases but it may not match what you're doing. If you want to describe it in a bit more detail I can give you a more precise answer.

@cvogt
Copy link

cvogt commented Sep 16, 2015

sbt prints

[error] .......
....
...
       ^

then waits, no \n following the ^. at some later point the next

[error] .......
....
...
       ^

arrives. I need to parse the first [error]......^ section without waiting for a \n following the ^.

@cvogt
Copy link

cvogt commented Sep 16, 2015

does atto call the parser on each element of the string individually or does it effectively turn the Stream[String] into a Stream[Char] and run the parser on that?

@tpolecat
Copy link
Owner Author

The parser consumes strings, which it treats logically as chunks of characters but is more efficient. On success there may be leftover input, which the stream processor uses as the initial input for parsing the next chunk.

For example, the result here includes the residual input:

scala> int.sepBy(char('.')).parse("128.42.32.12 woozle")
res2: atto.ParseResult[List[Int]] = Done( woozle,List(128, 42, 32, 12))

@cvogt
Copy link

cvogt commented Sep 16, 2015

I would need something vaguely like this:

scala> val s: Stream[Char] = ...
scala> println( stream.take(20).mkString )
123,33,111242346456
scala> (int ~ ',').parse(s)
ParseResult( Stream(33,111242346456....), "123," )

Parse a single parseable value off the stream of characters, return it and the remainder of the stream

@cvogt
Copy link

cvogt commented Sep 16, 2015

Doesn't look like atto does that right now.

@tpolecat
Copy link
Owner Author

Easy enough to hack up. As always it will come down to details.

import atto._, Atto._
import ParseResult._

def chunk[A](chars: Stream[Char], p: Parser[A]): (ParseResult[A], Stream[Char]) = {
  def go(s: Stream[Char], pr: ParseResult[A]): (ParseResult[A], Stream[Char]) =
    pr match {
      case Done(_, _)    => (pr, s)
      case Fail(_, _, _) => (pr, s)
      case Partial(_)    =>
        s match {
          case c #:: cs => go(cs, pr.feed(c.toString))
          case _        => (pr.done, s)
        }
    }
  go(chars, p.parse(""))
}


scala> chunk("123,33,111242346456".toStream, long <~ (char(',') || endOfInput))
res16: (atto.ParseResult[Long], Stream[Char]) = (Done(,123),Stream(3, ?))

You can use this to define a Stream[Char] ~> Stream[A] transform:

def chunks[A](chars: Stream[Char], p: Parser[A]): Stream[A] =
  chunk(chars, p) match {
    case (Done(cs, a), s) => a #:: chunks(cs.toStream ++ s, p)
    case _ => Stream.Empty // or something
  }

scala> chunks("123,33,111242346456".toStream, long <~ (char(',') || endOfInput))
res17: Stream[Long] = Stream(123, ?)

scala> res17.toList
res18: List[Long] = List(123, 33, 111242346456)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants