Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Frameless to explode Map[A,B] #488

Merged
merged 3 commits into from
Nov 27, 2021

Conversation

ayoub-benali
Copy link
Contributor

Fixes #383

def prop[A: TypedEncoder: ClassTag](xs: List[X1[Map[A, A]]]): Prop = {
val tds = TypedDataset.create(xs)

val framelessResults = tds.explodeMap('a).collect().run().toVector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierBlanvillain @imarios I would need you help here if possible since I know nothing about shapeless :/
The compiler is complaining about
No column Symbol with shapeless.tag.Tagged[String("a")] of type scala.collection.immutable.Map[A,B] in frameless.X1[scala.collection.immutable.Map[A,A]]

Am I missing something in the definition of explodeMap ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ayoub-benali, sorry for the delay, I was trying to get the new version out and then help another older PR to get merged. Let me take a look and help you out

exploded
// map explode explodes it into [key, value] columns
// the only way to put it into a column is to create a struct
// TODO: handle org.apache.spark.sql.AnalysisException: Reference 'key / value' is ambiguous, could be: key / value, key / value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayoub-benali I pushed into your branch; one minor thing here is that we don't handle duplicate column names;
i.e.

// for the case class
case class Test(key: Int, m: Map[String, Int])
// explode function produces the following schema
// key, key, value

Copy link
Member

@pomadchin pomadchin Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple ways to workaround it (i.e. rename all input columns first, perform explode, rename them back)
But I didn't have time to try that.

I also don't remember if there is a select by index / some elegant way to handle it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, mb aliasing would not work as well due to interpreter limitations. Can be just a feature than (: I don't think there are good ways to solve that behavior in the native spark API as well.

Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a couple of changes, @ayoub-benali wating for your review / comments there!

check(forAll(prop[String, Int] _))
}

test("explode on maps preserving other columns") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an extra check that verifyies that explode does not drop other columns

Comment on lines +1214 to +1218
* case class X(i: Int, j: Map[Int, Int])
* case class Y(i: Int, j: (Int, Int))
*
* val f: TypedDataset[X] = ???
* val fNew: TypedDataset[Y] = f.explodeMap('j).as[Y]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a working example

@pomadchin pomadchin force-pushed the explode-map branch 2 times, most recently from c205dc6 to ecdd727 Compare November 8, 2021 19:36
@pomadchin
Copy link
Member

pomadchin commented Nov 8, 2021

@ayoub-benali don't know if you still around; the review of this PR took almost a year :D but be careful! I rebased your PR on the main branch.

@codecov
Copy link

codecov bot commented Nov 8, 2021

Codecov Report

Merging #488 (f487989) into master (4d0271c) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #488   +/-   ##
=======================================
  Coverage   95.14%   95.14%           
=======================================
  Files          65       65           
  Lines        1134     1134           
  Branches        5        6    +1     
=======================================
  Hits         1079     1079           
  Misses         55       55           
Flag Coverage Δ
2.12.15 95.14% <ø> (ø)
2.13.7 95.78% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedDataset.scala 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d0271c...f487989. Read the comment docs.

@ayoub-benali
Copy link
Contributor Author

Thanks for the help @pomadchin ! The changes looks good for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for explode map
4 participants