Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate methods for flattening the results of many joins #1097

Merged
merged 6 commits into from
Dec 1, 2014

Conversation

isnotinvain
Copy link
Contributor

Here's what the code-gen based solution to #1096 looks like (I'm fine with using a macro instead, but I just wanted to see how hard this would be and it was pretty straightforward).

works for both the result of left/right joins:

scala> ((("hi", 12.8), Seq(1,2,3)), 17).flatten
res1: (String, Double, Seq[Int], Int) = (hi,12.8,List(1, 2, 3),17)

And for the result of outer joins:

scala> type crazy = (Option[(Option[(Option[(Option[Int], Option[String])], Option[Long])], Option[Double])], Option[Seq[Int]])
defined type alias crazy

scala> val x: crazy = (Some(Some(Some(Some(7), Some("hello")), Some(1000L)), Some(12.3)), Some(Seq(1,2,3)))
x: (Option[(Option[(Option[(Option[Int], Option[String])], Option[Long])], Option[Double])], Option[Seq[Int]]) = (Some((Some((Some((Some(7),Some(hello))),Some(1000))),Some(12.3))),Some(List(1, 2, 3)))

scala> x.flatten
res0: (Option[Int], Option[String], Option[Long], Option[Double], Option[Seq[Int]]) = (Some(7),Some(hello),Some(1000),Some(12.3),Some(List(1, 2, 3)))

scala> val y: crazy = (None, Some(Seq(1,2,3)))
y: (Option[(Option[(Option[(Option[Int], Option[String])], Option[Long])], Option[Double])], Option[Seq[Int]]) = (None,Some(List(1, 2, 3)))

scala> y.flatten
res1: (Option[Int], Option[String], Option[Long], Option[Double], Option[Seq[Int]]) = (None,None,None,None,Some(List(1, 2, 3)))

scala> val z: crazy = (Some(None, Some(12.3)), Some(Seq(1,2,3)))
z: (Option[(Option[(Option[(Option[Int], Option[String])], Option[Long])], Option[Double])], Option[Seq[Int]]) = (Some((None,Some(12.3))),Some(List(1, 2, 3)))

scala> z.flatten
res2: (Option[Int], Option[String], Option[Long], Option[Double], Option[Seq[Int]]) = (None,None,None,Some(12.3),Some(List(1, 2, 3)))

Doesn't have any tests yet, and the generator is a bit long.

@ianoc
Copy link
Collaborator

ianoc commented Nov 14, 2014

These look like bijections rather than something for scalding?

implicit class LeftNestedAlternatingTupleOptionFlatten22[A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V](nested: (Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[(Option[A], Option[B])], Option[C])], Option[D])], Option[E])], Option[F])], Option[G])], Option[H])], Option[I])], Option[J])], Option[K])], Option[L])], Option[M])], Option[N])], Option[O])], Option[P])], Option[Q])], Option[R])], Option[S])], Option[T])], Option[U])], Option[V]) ) {
def flatten: (Option[A], Option[B], Option[C], Option[D], Option[E], Option[F], Option[G], Option[H], Option[I], Option[J], Option[K], Option[L], Option[M], Option[N], Option[O], Option[P], Option[Q], Option[R], Option[S], Option[T], Option[U], Option[V]) = {
val (rest1, v) = nested
val (rest2, u) = rest1.getOrElse(pairOfNones)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. How does it compare to:

val (rest21, v) = nested
val t21 = LeftNestedAlternatingTupleOptionFlatten21(rest21)
(t21._1, t21._2 ....t21._21, v)

performance (kind of recursive calls vs getOrElse calls) might just the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that.
It would work, though now you're making nested function calls, so it seems like it would be slightly more expensive.

The other option would be to short circuit after the first .getOrElse is triggered, but to do that is tricky.

@rangadi
Copy link

rangadi commented Nov 14, 2014

'flatten' normally removes Nones in the context of collections. The context is quite different here and I can't think of a better name, but it could cause some confusion during casual read of the code.

@isnotinvain
Copy link
Contributor Author

@rangadi maybe, but this is a method on a tuple, not on a collection (don't know if that makes it better)
Also, flatten on an option just removes one layer:

scala> Some(Some(Some(7))).flatten
res0: Option[Some[Int]] = Some(Some(7))

The nice thing about generating these methods vs. a macro is that this 'flatten' method will only be available on a Tuple with exactly this structure, and if you look at a tuple of this structure and wonder "what would it mean to flatten this?" I think the answer is what I've got here. If you look at the outer join case, the result of (Option[A], Option[B]...) is about as flat as you can get unless you went with Option[(A, B...)] which isn't very useful and seems like it wouldn't be called flatten?

Any ideas for a better name?

@isnotinvain
Copy link
Contributor Author

@ianoc don't know if you saw the discussion in #1096 for context

These could be written as Bijections (though I don't think the inverse is useful even though it is possible), but I think it wouldn't be as easy to use, eg you would have to do:
LeftNestedAlternatingTupleOptionFlatten22(myCrazyTuple) instead of myCrazyTuple.flatten

As for whether it belongs in the Bijection codebase -- maybe? As far as I can tell you would only ever have tuples with this odd structure as a result of a scalding join, so the motivation for these is to help with the output of scalding joins.

I think long-term, the best would be to have this happen automatically whenever there's a join followed by a join? I guess the best API would be if:

val joined = as
    .outerJoin(bs)
    .outerJoin(cs)
    .outerJoin(ds)

just automatically returned you a Grouped[K, (A, Option[B], Option[C], Option[D])]

@rangadi
Copy link

rangadi commented Nov 14, 2014

I agree, flatten is a very good name. I can't think of good alternative.

@avibryant
Copy link
Contributor

What if you generate the methods for Grouped instead of for tuples, so that the API was

val joined = as
    .outerJoin(bs)
    .outerJoin(cs)
    .outerJoin(ds)
    .flattenJoin

Which just added the right mapValues.
I think that would be clearer and nicer than having it on tuple.

@isnotinvain
Copy link
Contributor Author

@avibryant that would work, but would it make more sense for flattenJoin to delegate to the tuple flattening methods, or just in-line those methods in flattenJoin?

putting it on Grouped is nice in that it keeps it join-specific and not 'generic' to some very strange tuples.

@avibryant
Copy link
Contributor

I was assuming you'd just inline it, since as you point out the methods are not really very useful in other contexts.

@isnotinvain
Copy link
Contributor Author

Yeah makes sense. I'll update with that + the overloaded join methods mentioned in #1096

@isnotinvain
Copy link
Contributor Author

OK, I've updated this to operate on CoGrouped instead of on tuples, and I added the convenience overloaded join methods discussed with @johnynek on #1096

Am I using CoGroupable vs CoGrouped in the right places? What about Grouped?

@isnotinvain
Copy link
Contributor Author

re: serialization -- we'll sidestep this one by only doing join1 through 6, and just inline them CoGroupable (though I'd like to get to the bottom of the fear of self: bounds someday)

re: fancy types -- done! good call.

re: "Left"

well, I didn't know what to name the difference between:
"KeyedListLikes nested in this fashion ((((A, B), C), D), E)"
vs
"KeyedListLikes nested in this fashion (Option[(Option[(Option[(Option[A], Option[B])], Option[C])], Option[D])], Option[E])])"

Except that the first is something you only get as a result of leftJoins and the second is something you only get as a result of outerJoins

I need to also add rightJoins..

@isnotinvain
Copy link
Contributor Author

OK, even with only join2-6 it's 83 extra lines in CoGroupable so I think we should leave it in a separate trait

@isnotinvain
Copy link
Contributor Author

Updated to use better types, rename to flattenValueTuple, removed some extra { }, removed use of implicit class (forgot we're cross compiling to 2.9 not 2.11), added implicit defs

}
}

implicit def toFlattenLeftJoin[KEY, KLL[KLL_K, +KLL_V] <: KeyedListLike[KLL_K, KLL_V, KLL], A, B, C](nested: KLL[KEY, ((A, B), C)]) = new FlattenLeftJoin3(nested)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you could not have two implicits with the same name. You sure these resolve? I think they shadow each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that might be why it wasn't working :)
I haven't had a chance to look into it

@johnynek
Copy link
Collaborator

johnynek commented Dec 1, 2014

working on this so it can go out today in 0.12

@ianoc
Copy link
Collaborator

ianoc commented Dec 1, 2014

Seems sane, would love to see a simple test even just to act as a scaladoc on this if nothing else

@johnynek
Copy link
Collaborator

johnynek commented Dec 1, 2014

I added a test that calls MultiJoin. This code is so simple, the compiler is really doing most of the work for us. A large set of errors are ruled out due to the types.

ianoc added a commit that referenced this pull request Dec 1, 2014
Generate methods for flattening the results of many joins
@ianoc ianoc merged commit be891ab into develop Dec 1, 2014
@ianoc ianoc deleted the alexlevenson/tuple-flatten branch December 1, 2014 23:38
@isnotinvain
Copy link
Contributor Author

Thanks @johnynek !
Two questions:

  1. We don't want the F bounded polymorphism on MultiJoin?
  2. why not mix MultiJoin into CoGroupable? it blows up the docs? It might be less discoverable as a companion object.

@isnotinvain
Copy link
Contributor Author

...looks like @ianoc merged already :)

@ianoc
Copy link
Collaborator

ianoc commented Dec 1, 2014

Well discuss away... I haven't finished the release yet...

@johnynek
Copy link
Collaborator

johnynek commented Dec 1, 2014

I felt like we got consensus after @julienledem and @avibryant fine tuned the .multi proposal.

There was a feeling (from me, and strong +1 from Avi) that adding 66 methods to CoGroupable was too much clutter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants