New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffle read does not read all data completely? #175
Comments
Can you share the spark UI for stage IO? |
I can't upload pictures in my company |
Compared to the native Spark, Shuffle Write has the same amount of data, but Firestorm reads very little data during Shuffle Read. The label |
How about the result? Is it the same as the result with native Spark? |
I need to confirm this, because we modified the SQL and did not collect the results |
Does Firestorm print partition lengths to MapStatus? |
We record the length, aqe need the metrics. |
I find that when I use aqe ,I got the wrong statistics
…---- Replied Message ----
| From | ***@***.***> |
| Date | 06/21/2022 11:47 |
| To | ***@***.***> |
| Cc | ***@***.******@***.***> |
| Subject | Re: [Tencent/Firestorm] Shuffle read does not read all data completely? (Issue #175) |
We record the length, aqe need the metrics.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Could you give me more detail information? |
Firestorm for spark2 does‘t support AQE?I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value. |
However spark2 do support this configuration |
spark2 don't support AQE. |
The open source Spark2 don't support AQE, too. |
But if I set |
https://spark.apache.org/releases/spark-release-3-0-0.html |
As far as I know, spark2 can also use configuration |
Then ExchangeCoordinator.doEstimationIfNecessary() method will need mapOutputStatistics to determine the number of post-shuffle partitions. |
@xunxunmimi5577 For RSS + Spark2, AQE is not supported with current implementation. This feature was announced in Spark3, so there is no plan to support AQE with Spark2. |
It's not available feature in Spark2. Maybe some configurations were added first , but the implement isn't complete. |
If I use spark2 + firestorm + |
OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it? |
@xunxunmimi5577 thanks for report this, I think it should be described in readme for such unsupported case. |
I would like to, or maybe you just want to describe it in readme? |
Moreover, is it possible to record an array of partitionLengths like Spark3? |
Actually, we want to do two things. We want to add the parameter check in code. And we also want to increase document description. |
It's not available Feature in Spark 2. We wouldn't do it. |
OK |
Could I close this issue? Is it solved? |
I think it's solved.Let me close this issue. |
When running query64 of tpcds 10T data,Ifind a stage have shuffle wrote 1.3T of data,but I never find a stage which reads 1.3T of data accordingly.
The text was updated successfully, but these errors were encountered: