Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter, based on selective #101

Merged
merged 11 commits into from
Dec 8, 2020
Merged

filter, based on selective #101

merged 11 commits into from
Dec 8, 2020

Conversation

rossabaker
Copy link
Member

Riffing on the ideas of Staged Selective Parser Combinators.

Use case: in http4s, we need to parse various date headers into a 7-tuple of (weekday, year, month, day, hour, minute, second), and then verify that the date is valid (e.g., no 31st of February). Currently we have to flatMap it, and when parsing, we don't want to flatMap that shit.

Implemented on branch. We should have the full power of Selective. The primitive operation could also be select, which is the abstract function on cats-selective.

Should be able to make a version that fails with a particular message, swapping that in for "empty".

There may be optimizations to sprinkle throughout the code. Fusing to Impl.Map seems possible, for instance.

I'm super sleepy and just following types. This might all be nonsense.

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool. I think we should add it.

One note: users can implement filter, with less performance, using flatMap in the mean time (I guess any time you have Monad + Alternative this is true).

I had a couple concerns that I noted.

core/shared/src/main/scala/cats/parse/Parser.scala Outdated Show resolved Hide resolved
val res = genP.fa.filter(_ => false).parse(str)
assert(res.isLeft)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a law on branch directly (see the laws I have on how ~ composes). I don't think what we have now is exactly right but since filter is weaker is might not detect it.

val either = parser.parseMut(state)
if ((state.error eq null) && state.capture)
either.fold(a => l.ap(Parser.pure(a)), b => r.ap(Parser.pure(b))).parseMut(state)
else null.asInstanceOf[C]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't right. I think if we are not capturing we still need to run the result because it may advance the offset in the parser.

@rossabaker
Copy link
Member Author

Ugh. I didn't think anybody had looked at it yet, and just force-pushed it based on select and fixing some 3am mistakes. But I think a lot of your feedback is still valid.

I'd also like to throw in an example and perhaps a bench to show how it compares to the monadic approach.

@codecov-io
Copy link

codecov-io commented Dec 6, 2020

Codecov Report

Merging #101 (3ae0d63) into main (a144809) will decrease coverage by 0.27%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
- Coverage   96.44%   96.17%   -0.28%     
==========================================
  Files           5        6       +1     
  Lines         676      732      +56     
  Branches       57       68      +11     
==========================================
+ Hits          652      704      +52     
- Misses         24       28       +4     
Impacted Files Coverage Δ
core/shared/src/main/scala/cats/parse/Parser.scala 96.94% <93.75%> (-0.35%) ⬇️
...shared/src/main/scala/cats/parse/LocationMap.scala 95.83% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a144809...3ae0d63. Read the comment docs.

@johnynek
Copy link
Collaborator

johnynek commented Dec 6, 2020

Dude I check my GitHub notifications religiously. Well really more than most people do religious things.

fn: Parser[A => B],
state: State
): B = {
val either = parser.parseMut(state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, look at flatMap. I think you have to set capture to true here unconditionally since otherwise we don't know what to do next.

After we capture the first, we can set capture to false on the second part since we can just avoid applying the function.

@@ -911,6 +914,9 @@ object Parser extends ParserInstances {
case _ => Impl.Map1(p, fn)
}

def select[A, B](p: Parser[Either[A, B]])(fn: Parser[A => B]): Parser[B] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bonus points we can add select10 that takes a Parser1 then a Parser and returns a Parser1. Similarly a select01 that does the Parser1 second.

If we had that, we can override filter on Parser1 to return a Parser1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually with this formulation I think we only need select and select1. In select1 the first argument is a Parser1.

assertEquals(res0, res1)
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another interesting pair of laws: if we pa.ap(pb) should be the same as select(pa.map(Left), pb, fail)

Similarly on the right hand side.

@@ -1331,6 +1337,9 @@ object Parser extends ParserInstances {
case Map(p, _) =>
// we discard any allocations done by fn
unmap(p)
case Select(p, _) =>
// we discard any allocations done by fn
unmap(p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to do Select(a, b) => Select(a, unmap(b))

Since in select, after we optionally parse the right in a void we could not do anything more, but we do need the result on the left even in a void.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter has to be a function, so I had to immediately and trivially map the unmapped value.

@johnynek johnynek mentioned this pull request Dec 6, 2020
state.capture = cap
if (state.error eq null)
either match {
case Left(a) => fn.map(_(a)).parseMut(state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do fun.parseMut(state) and in the capture case actually apply the function. In the non capture case you can just return null.

I think this also solves the unmap requirement of putting in const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .asInstanceOf in the unmap makes me queasy, but it makes the existing tests pass.

property("select(pa.map(Left(_)))(pf) == pf.ap(pa)") {
forAll(ParserGen.gen, ParserGen.gen) { (genP, genRes) =>
val pa = genP.fa
val pf = null: Parser[genP.A => genRes.A]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to produce this parser. The trick used in the generator didn't quite translate.

@johnynek
Copy link
Collaborator

johnynek commented Dec 7, 2020

I made a PR as a suggestion to this one:

rossabaker#1

It gets everything green, but changes somewhat the internal representation of Select (to work more cleanly with unmap).

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about merging this and publishing 0.2.0?

I think we need to remove the "Draft" label.

@rossabaker
Copy link
Member Author

I'm good with that. I'd still like to figure out how to write that Left law, and I'm also interested in a benchmark that shows the advantage (or not) over flatMap. But none of that needs to hold back progress.

@rossabaker rossabaker marked this pull request as ready for review December 8, 2020 03:26
@mpilquist
Copy link
Member

Bump ThisBuild / baseVersion before doing so -- not sure that failing to do so will cause a problem but should do it anyway.

@johnynek
Copy link
Collaborator

johnynek commented Dec 8, 2020

Actually I think adding an override for ifM using select:

https://github.com/typelevel/cats/blob/8e8be51187385dde78321e7e0d9fd892801afacb/core/src/main/scala/cats/FlatMap.scala#L123

We should be able but I don't see it immediately. I assume we use select twice.

@rossabaker
Copy link
Member Author

baseVersion is already bumped.

Devil's advocacy: a risk in merging this before a benchmark is that the gains may be only theoretical and not justify the new primitive. I definitely want filter, but there's also a monadic way to express it.

That said, I think it is better, and even if it's not, it's still 0.x.

@johnynek
Copy link
Collaborator

johnynek commented Dec 8, 2020

I bet ifM will be a lot faster using select (or a similar internal node).

@rossabaker
Copy link
Member Author

rossabaker commented Dec 8, 2020

cats-selective and its Haskell forebears call it ifS, and we'd get it for free with a cats-selective integration.

If this proves successful, I'd be interested in renewing the push to get Selective into Cats proper. With the right encoding, I think it could even be accepted into 2.x.

@johnynek johnynek merged commit a7fb824 into typelevel:main Dec 8, 2020
@regadas
Copy link
Collaborator

regadas commented Dec 8, 2020

This is really cool! 🎉

@rossabaker rossabaker deleted the selective branch December 9, 2020 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants