Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatten operator for Optional columns. #307

Merged
merged 4 commits into from Jul 1, 2018

Conversation

imarios
Copy link
Contributor

@imarios imarios commented Jun 19, 2018

val ds: TypedDataset[(Int,Option[Int])] = TypedDataset.create(Seq((1,Option(1))))
ds.flatten('_2): TypedDataset[(Int,Int)]

@codecov-io
Copy link

codecov-io commented Jun 19, 2018

Codecov Report

Merging #307 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #307   +/-   ##
=======================================
  Coverage   94.79%   94.79%           
=======================================
  Files          52       52           
  Lines         961      961           
  Branches        9        9           
=======================================
  Hits          911      911           
  Misses         50       50
Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedDataset.scala 100% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c763176...a0e8c2e. Read the comment docs.

@imarios imarios mentioned this pull request Jun 24, 2018
Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really cool! I'm I right saying that this is not in vanilla? A few comments:

  • If not it should probably be better documented, at least with an example.

  • The name make it sound more general that what it is. .flatten should be .flatMap(identity).

@imarios
Copy link
Contributor Author

imarios commented Jun 27, 2018

@OlivierBlanvillain Yea, I think in Vanilla you can use flatMap(_.a) on a Dataset, with the problem of having to serialize all the columns. Another way to do that with DataFrames is filter($"a".isNotNull), which is essentially the dynamically typed version of it. Both examples worth documenting, so let me do that.

Do you have any alternative naming suggestions?

@OlivierBlanvillain
Copy link
Contributor

I seen, then maybe

  • flatMapOption
  • flatMapNotNull
  • flattenOption
  • flattenNotNull

?

@imarios
Copy link
Contributor Author

imarios commented Jun 29, 2018

@OlivierBlanvillain I think flattenOption feels more appropriate.

@OlivierBlanvillain
Copy link
Contributor

LGTM! Could you maybe add a minimal example to the scaladoc of flattenOption?

For discoverability, what do you think about adding a "new APIs" and "missing APIs" sections to the readme? This way someone familiar with Spark could pickup frameless in no time by looking at this diff.

@imarios
Copy link
Contributor Author

imarios commented Jul 1, 2018

@OlivierBlanvillain done! If there are no more suggestions, let me squash and merge.

@OlivierBlanvillain
Copy link
Contributor

LGTM!

@imarios imarios merged commit 3ad68b3 into typelevel:master Jul 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants