Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure collection encoders to resolve encoding issues in the REPL #195

Merged
merged 1 commit into from
Oct 15, 2017

Conversation

iravid
Copy link
Contributor

@iravid iravid commented Oct 9, 2017

@imarios - I added tests to try and exercise the scenario you described in #193, but they seem to pass locally. Any chance that this is a REPL thing?

@codecov-io
Copy link

codecov-io commented Oct 9, 2017

Codecov Report

Merging #195 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
- Coverage    95.7%   95.69%   -0.01%     
==========================================
  Files          40       39       -1     
  Lines         814      790      -24     
  Branches       10       10              
==========================================
- Hits          779      756      -23     
+ Misses         35       34       -1
Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedEncoder.scala 100% <100%> (ø) ⬆️
...c/main/scala/frameless/TypedDatasetForwarded.scala 54.16% <0%> (+4.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f81be1d...b29d884. Read the comment docs.

@imarios
Copy link
Contributor

imarios commented Oct 10, 2017

Hmm ... strange indeed. Where you able to reproduce the bug in the REPL?

@iravid
Copy link
Contributor Author

iravid commented Oct 10, 2017

Yeah. Pretty weird. The offending code generation is at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection. The argValue is annotated as Object[] instead of String[].

I'll continue looking later today

@imarios
Copy link
Contributor

imarios commented Oct 10, 2017

@iravid thank you for looking into this! I really curious to see what is causing this ...

@imarios
Copy link
Contributor

imarios commented Oct 13, 2017

@iravid do you want me to wait until you investigate more on this? If this is an independent PR I can treat it separately to #193

@iravid
Copy link
Contributor Author

iravid commented Oct 13, 2017

@imarios Hey - this PR shouldn't be merged yet; I will update it soon(TM) with some restructuring of the typed encoders for collections.

If by "wait" you meant the 0.4 release, it'd be good if we could wait a few more days to see if we can come up with a solution.

@iravid iravid changed the title Add tests to exercise issue #193 Restructure collection encoders to resolve encoding issues in the REPL Oct 14, 2017
@iravid
Copy link
Contributor Author

iravid commented Oct 14, 2017

@imarios This now resolves the issue. Our encoders are now more in line with how Spark serializes/deserializes collections.

@imarios
Copy link
Contributor

imarios commented Oct 14, 2017

@iravid overall looks pretty good!

I tried this and works super smooth:

case class X(a: Array[Array[Array[Long]]], b: Map[Array[String], Array[Map[Int,Int]]])

val t = TypedDataset.create(Seq(X(Array(Array(Array(12323L))), Map(Array("a","aasdas")-
 >Array(Map(10->10), Map(1->10, 2->20))))))

t.collect().run.head.b.keys.head(1)

@iravid
Copy link
Contributor Author

iravid commented Oct 15, 2017

Awesome :-)

@iravid
Copy link
Contributor Author

iravid commented Oct 15, 2017

I’ll work on the coverage shortly

@imarios
Copy link
Contributor

imarios commented Oct 15, 2017

@iravid great! can you also take a look at the 2 comments I made? Thank you very much for taking the time to re-write this!!

@iravid
Copy link
Contributor Author

iravid commented Oct 15, 2017

@imarios did you submit the review? I don't see any comments


def toCatalyst(path: Expression): Expression =
T.jvmRepr match {
case IntegerType | LongType | DoubleType | FloatType | ShortType | BooleanType =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CharType does not exist? right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no such type. I tested again if Array[Char] works, still no luck.

@@ -196,6 +196,116 @@ object TypedEncoder {
)
}

implicit def arrayEncoder[T: ClassTag](
implicit
T: TypedEncoder[T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse of T looks kind of strange to me ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, habits from cats :-) I'll change it to underlying

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's a cute idea :) ... but it does look kind of funky

@imarios
Copy link
Contributor

imarios commented Oct 15, 2017

@iravid yup, didn't click submit :/

@imarios
Copy link
Contributor

imarios commented Oct 15, 2017

Looks great. if you have time to improve coverage it will be awesome. This will be the last PR before 0.4. I think @adelbertc is waiting for us to give him the go to publish master as 0.4.

@iravid iravid force-pushed the issue-193 branch 3 times, most recently from cf2f8fb to 1b2aa37 Compare October 15, 2017 04:45
@iravid
Copy link
Contributor Author

iravid commented Oct 15, 2017

Coverage looks good now @imarios

@iravid
Copy link
Contributor Author

iravid commented Oct 15, 2017

@GrafBlutwurst btw - this also solves the issue of encoding Array[Byte] as BinaryType.

@imarios
Copy link
Contributor

imarios commented Oct 15, 2017

LGTM!

@imarios imarios merged commit 17078b6 into typelevel:master Oct 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants