Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotted syntax (via lambda) for TypedColumns #449

Merged
merged 9 commits into from
Sep 6, 2021
Merged

Dotted syntax (via lambda) for TypedColumns #449

merged 9 commits into from
Sep 6, 2021

Conversation

timvw
Copy link
Contributor

@timvw timvw commented Oct 9, 2020

Allow the user to created TypedColumns via lambda syntax, eg: ds.col(_.a) or ds.col(_.x.y).

@imarios
Copy link
Contributor

imarios commented Oct 9, 2020

This is really cool :). I am not an expert in macros so please bare with me as I try to do my best to peer review your PR.

@imarios
Copy link
Contributor

imarios commented Oct 9, 2020

Would it be easy to add few more tests.? I know shapeless has some nice primitives to test negative compilations cases. Let me find that command if you can’t spot it easily.

@codecov-io
Copy link

codecov-io commented Oct 9, 2020

Codecov Report

Merging #449 (044e7bc) into master (5ab4268) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #449   +/-   ##
=======================================
  Coverage   96.28%   96.28%           
=======================================
  Files          61       62    +1     
  Lines        1051     1051           
  Branches        5        5           
=======================================
  Hits         1012     1012           
  Misses         39       39           
Flag Coverage Δ
2.12.14 96.28% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dataset/src/main/scala/frameless/TypedColumn.scala 100.00% <ø> (ø)
...ataset/src/main/scala/frameless/TypedDataset.scala 100.00% <ø> (ø)
dataset/src/main/scala/frameless/With.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ab4268...044e7bc. Read the comment docs.

@timvw
Copy link
Contributor Author

timvw commented Oct 9, 2020

Sure.

Imho there are two things here which need verification:

  1. Is the column type correctly inferred? -> This is checked by the compiler (return type of the lambda)

  2. Is the code (macro) able to correctly calculate the column name?

Negative case: when the user provides a lambda which does not return a field, a compilation error should be raised
Eg: x => dosomethingfunky(x.id)

@timvw
Copy link
Contributor Author

timvw commented Oct 9, 2020

Added some extra typeChecks to verify that invalid/unsupported lambda's are allowd.

Currently the compiler does not throw an Exception on the invalid expression "ds.col(_.a.toInt)" (and not immediately clear how I can fix that)

@jeremyrsmith
Copy link
Contributor

See also https://github.com/typelevel/frameless/pull/110/files, in case there's anything useful in there @timvw

@imarios
Copy link
Contributor

imarios commented Oct 15, 2020

I want to spend more time on this PR this weekend. Apologies for for the delay @timvw! From a very quick glance, it looks really cool. I want to play a bit with it locally and see how it behaves in IntelliJ to see if there is anything I can do/say to help.

@ayoub-benali
Copy link
Contributor

looking forward for this syntax!
just a general question: @timvw would this syntax allow to work around the optional columns limitation ?
See: #168

@timvw
Copy link
Contributor Author

timvw commented Oct 16, 2020

@ayoub-benali Yes. This syntax will simply translate a lambda (_.x.y.z) to a spark "column" expression ("x.y.z")

@timvw
Copy link
Contributor Author

timvw commented Oct 16, 2020

The only thing I am still considering is to introduce a case class ParameterPath[T](path: String) type which is returned by the macro (instead of a TypedColumn). This way the creation of a TypedColumn (or anything) can be moved out of the macro..

@ayoub-benali
Copy link
Contributor

@timvw in your example (_.x.y.z) what would happen if "y" is an optional field and has value None (null) ?

@timvw
Copy link
Contributor Author

timvw commented Oct 28, 2020

Irrespective of the actual value (None,Some) there is no support for Option types (only primitives).

Added a test to verify what happens when you encounter an Option/List/Array/Seq in your lambda path..

Conclusion: It is not possible to further navigate. If you try to do so, the compiler will complain (A good thing imho).

--
If we were to support Option/List/Array/Seq/Map some additional syntax would be required...
For Option, we would have to make clear that all later part types T are actually Option[T]
Eg: In this test, ds.col(.g).opt(.h) => Option[Boolean] (even though h is defined as Boolean).

@imarios
Copy link
Contributor

imarios commented Dec 17, 2020

@timvw Just wanted to let you know that I have been warming up with Frameless in the past weeks. Your PR is next on my list. Sorry for this delay.

@imarios
Copy link
Contributor

imarios commented Jan 6, 2021

Hi @timvw, can you try to update your fork with the current master? I can try to help you rebase and test few things using the latest changes. Thank you!

Copy link
Collaborator

@cchantep cchantep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, looks interesting.

dataset/src/main/scala/frameless/TypedColumn.scala Outdated Show resolved Hide resolved
dataset/src/main/scala/frameless/TypedColumn.scala Outdated Show resolved Hide resolved
dataset/src/main/scala/frameless/TypedColumn.scala Outdated Show resolved Hide resolved
dataset/src/main/scala/frameless/TypedColumn.scala Outdated Show resolved Hide resolved
dataset/src/main/scala/frameless/TypedColumn.scala Outdated Show resolved Hide resolved
@cchantep cchantep changed the title Feature/lambdas Dotted syntax (via lambda) for TypedColumns Aug 25, 2021
Copy link
Collaborator

@cchantep cchantep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now

@cchantep cchantep merged commit 59caada into typelevel:master Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants