-
Notifications
You must be signed in to change notification settings - Fork 213
Feat: Support Spark 4.0.0 part1 #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometTPCDSMicroBenchmark.scala
Outdated
Show resolved
Hide resolved
I ran into compilation issues locally when building for Spark 3.4. I think these can be resolved simply by moving the new Also, some jobs fail with scalastyle errors, which can be fixed by running |
@@ -74,7 +74,7 @@ public static ColumnDescriptor convertToParquet(StructField field) { | |||
builder = Types.primitive(PrimitiveType.PrimitiveTypeName.INT64, repetition); | |||
} else if (type == DataTypes.BinaryType) { | |||
builder = Types.primitive(PrimitiveType.PrimitiveTypeName.BINARY, repetition); | |||
} else if (type == DataTypes.StringType) { | |||
} else if (type == DataTypes.StringType || type.sameType(DataTypes.StringType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the sameType
method. Why is it needed for strings but not for the other types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to support String Collation in Spark 4.0. (e.g. test("Check order by on table with collated string column") )
Without String Collation, it goes to https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala#L89 which uses singleton DataTypes.StringType
, so type == DataTypes.StringType, But with String Collation, it goes to https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala#L94, so type == DataTypes.StringType fails and I added type.sameType(DataTypes.StringType to let String Collation pass. Actually, I think it should be
else if (
type == DataTypes.StringType ||
(type.sameType(DataTypes.StringType) && isSpark40Plus())
)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1830 +/- ##
============================================
+ Coverage 56.12% 58.70% +2.57%
- Complexity 976 1139 +163
============================================
Files 119 130 +11
Lines 11743 12795 +1052
Branches 2251 2406 +155
============================================
+ Hits 6591 7511 +920
- Misses 4012 4060 +48
- Partials 1140 1224 +84 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The Spark version will also need to be updated in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few dependency alignment as part of the Spark 4.0.0 release
pom.xml
Outdated
@@ -600,7 +600,7 @@ under the License. | |||
<!-- Use Scala 2.13 by default --> | |||
<scala.version>2.13.14</scala.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala was upgraded in the 4.0.0 release version:
<scala.version>2.13.14</scala.version> | |
<scala.version>2.13.16</scala.version> |
pom.xml
Outdated
@@ -600,7 +600,7 @@ under the License. | |||
<!-- Use Scala 2.13 by default --> | |||
<scala.version>2.13.14</scala.version> | |||
<scala.binary.version>2.13</scala.binary.version> | |||
<spark.version>4.0.0-preview1</spark.version> | |||
<spark.version>4.0.0</spark.version> | |||
<spark.version.short>4.0</spark.version.short> | |||
<parquet.version>1.13.1</parquet.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet was upgraded in the 4.0.0 release version:
<parquet.version>1.13.1</parquet.version> | |
<parquet.version>1.15.2</parquet.version> |
pom.xml
Outdated
@@ -600,7 +600,7 @@ under the License. | |||
<!-- Use Scala 2.13 by default --> | |||
<scala.version>2.13.14</scala.version> | |||
<scala.binary.version>2.13</scala.binary.version> | |||
<spark.version>4.0.0-preview1</spark.version> | |||
<spark.version>4.0.0</spark.version> | |||
<spark.version.short>4.0</spark.version.short> | |||
<parquet.version>1.13.1</parquet.version> | |||
<semanticdb.version>4.9.5</semanticdb.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Scala is upgraded, this probably needs to be upgraded to:
- at least 4.12.0: the first version to support scala 2.13.16
- 4.12.7: the latest 4.12.x version
- 4.13.6: the latest of the latest
Note: this might need to be also aligned in the scala-2.13
profile!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, because I can't comment on the next line -
slf4j was upgraded in the 4.0.0 release version, to version 2.0.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some files that might have been left over from local operations -
should probably be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS local file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS local file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a temporary diff patch file (I know it sounds like "salsa sauce", but it probably shouldn't be here at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS local file
Which issue does this PR close?
Closes #.
Rationale for this change
Adding shim files for Spark 4.0.0 support
What changes are included in this PR?
How are these changes tested?