Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

lookahead documentation #25

Open
wants to merge 1 commit into from

2 participants

Dave Paroulek Nate Young
Dave Paroulek

This is a really great project, I'm having a blast learning about parser combinators, thanks so much for your work.

I noticed there was no documentation about the lookahead parser so I tried to add something that will hopefully be useful to other newbies like me. Please take a look and if you think it'd be helpful please feel free to pull it in.

But also, if this isn't the intended use for lookahead or if you can think of a better example, or if my documentation is confusing in anyway, just let me know and feel free to reject.

Thanks again,
Dave

Dave Paroulek

I see a few grammar mistakes. Please let me know first if you want to pull this in and then, if so, I can fix the mistakes along with any other suggestions/feedback.

Nate Young
Owner

This is good and I'm always willing to accept new documentation. I like where you put the section. lookahead almost certainly makes the most sense right after the section for choice. I thought the first two paragraphs were a very clear and concise overview.

The example, however, strikes me as far too long and it gets muddled in the details of a headline parser. I do think we need an example, and I the headline idea is in the right direction, since I think it should be fairly real-world and not super-contrived. But I would try to keep it short and to the point, maybe demonstrate an input for which choices overlap, and then show how to use lookahead to solve that.

If the headline parser you wrote is part of a larger parser for markdown or some other markup format, please feel free to stash it in the src/parsatron/languages folder and submit a pull request. Sometimes longer-form, fully working code is better documentation than prose.

Dave Paroulek

Hi Nate, thanks much for the feedback. That makes good sense, I thought that example might be a bit too long and complicated.

I'm trying to build a markdown parser using parsatron just for fun to learn more about parsing and clojure. I have a good start and will keep you posted. I'd be happy to contribute it to languages if I have time to get it into better shape.

As I continue to tinker on the markdown parser, I'll keep thinking to try and come up with a better, more concise, example for the lookahead doc section and will update the pull request as soon as I come up with something.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 6, 2013
  1. Dave Paroulek

    lookahead docs

    upgradingdave authored
This page is out of date. Refresh to see the latest.
Showing with 87 additions and 0 deletions.
  1. +87 −0 docs/guide.markdown
87 docs/guide.markdown
View
@@ -339,6 +339,93 @@ we had used `many` then this would always parse as a number because if there
were no digits it would successfully return an empty sequence.
+### lookahead
+
+`lookahead` accepts any existing parser as a parameter. It then runs
+the parser and returns the result just as if the parser had been run
+on it's own. The interesting and useful feature of `lookahead` is that
+it consumes no input and leaves the state of your program untouched;
+just as if the lookahead parser had never been run at all. This is
+very handy for peeking ahead in order to make descisions about what to
+parse next.
+
+For example, say we need to parse a chunk of text that contains
+paragraphs and headlines. A paragraph is one or more lines of texts
+followed by two newlines:
+
+ This is an example paragraph
+ that starts with some words spaces and a newline
+
+A Headline is a line of text followed by equal signs (`=`):
+
+ Headlines might also starts with words, spaces, newline
+ =======================================================
+
+First, we'll need some simple helper parsers (for parsing whitespace
+and characters). A line is one or more digits and/or letters and/or
+spaces followed by a single newline.
+
+ (defparser ws []
+ (many (token #{\space \tab})))
+
+ (defparser char-or-ws []
+ (choice (letter) (digit) (whitespace)))
+
+ (defparser line []
+ (many1 (char-or-ws)) ;; one or more text and/or spaces
+ (char \newline)) ;; followed by a newline
+
+Next, we'll need a `paragraph` parser. Our `paragraph` parser
+should look for one or more lines followed by 2 or more newlines.
+
+ (defparser paragraph []
+ (many1 (line))
+ (many1 (char \newline)))
+
+Next, we'll need a `headline` parser that looks for one line followed
+by one or more equal signs (`=`).
+
+ (defparser headline []
+ (line)
+ (many1 (char \=)))
+
+Now we need to build a parser capable of parsing both headlines and
+paragraphs. You might think at first to use `either` or `choice` like
+this:
+
+ (defparser paragraph-or-headline []
+ (choice (headline) (paragraph)))
+
+But, the bad news is that this won't work. Since headlines and
+paragraphs both begin with a line followed by a newline, `choice` will
+always "choose" `headline` (since `headline` is first in the list)
+regardless of whether the text to be parsed is actually a paragraph or
+a headline.
+
+The good news is that we can use `lookahead` to solve this issue:
+
+ (defparser paragraph-or-headline []
+ (let->>
+ [next-char (lookahead (>> (line) (any-char)))]
+ (if (= next-char \=)
+ (headline)
+ (paragraph))))
+
+The interesting and relevant part here is this line:
+
+ (lookahead (>> (line) (any-char)))
+
+This skips ahead over a single line and returns the value parsed by
+`(any-char)`. We then bind this value into `next-char` using `let->>`.
+If `next-char` is an equal sign (`=`), we know to use the `headline`
+parser. Otherwise, we know to use the `paragraph` parser.
+
+Now, parsing a chunk of text containing one or more headlines and/or
+paragraphs is easy!
+
+ (defparser paragraphs-and-headlines []
+ (many1 (paragraph-or-headline)))
+
### between
`between` is a function that takes three parsers, call them left, right, and
Something went wrong with that request. Please try again.