Shuffle read does not read all data completely? #175

xunxunmimi5577 · 2022-06-14T10:07:43Z

When running query64 of tpcds 10T data，Ifind a stage have shuffle wrote 1.3T of data,but I never find a stage which reads 1.3T of data accordingly.

colinmjj · 2022-06-14T11:24:05Z

Can you share the spark UI for stage IO?

xunxunmimi5577 · 2022-06-14T12:56:07Z

I can't upload pictures in my company

xunxunmimi5577 · 2022-06-14T13:24:49Z

Compared to the native Spark, Shuffle Write has the same amount of data, but Firestorm reads very little data during Shuffle Read. The label Task:Succeeded/Total in spark ui shows only one Task in Firestorm，but Spark shows 5000 tasks are successfully executed.

colinmjj · 2022-06-15T01:33:17Z

How about the result? Is it the same as the result with native Spark?
we passed result compare based on 1TB data, but haven't did this with 10TB data.

xunxunmimi5577 · 2022-06-15T01:39:43Z

xunxunmimi5577 · 2022-06-15T01:41:05Z

xunxunmimi5577 · 2022-06-15T02:11:19Z

How about the result? Is it the same as the result with native Spark? we passed result compare based on 1TB data, but haven't did this with 10TB data.

I need to confirm this, because we modified the SQL and did not collect the results

xunxunmimi5577 · 2022-06-21T03:45:01Z

Does Firestorm print partition lengths to MapStatus？

jerqi · 2022-06-21T03:47:44Z

We record the length, aqe need the metrics.

xunxunmimi5577 · 2022-06-21T03:53:52Z

I find that when I use aqe ,I got the wrong statistics

…

---- Replied Message ---- | From | ***@***.***> | | Date | 06/21/2022 11:47 | | To | ***@***.***> | | Cc | ***@***.******@***.***> | | Subject | Re: [Tencent/Firestorm] Shuffle read does not read all data completely? (Issue #175) | We record the length, aqe need the metrics. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

jerqi · 2022-06-21T06:17:31Z

Could you give me more detail information?

xunxunmimi5577 · 2022-06-21T07:28:13Z

Firestorm for spark2 does‘t support AQE？I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

xunxunmimi5577 · 2022-06-21T07:31:09Z

However spark2 do support this configuration spark.sql.adaptive.enabled. If as mentioned above,then spark.sql.adaptive.enabled can't be set to true?

jerqi · 2022-06-21T07:35:14Z

Firestorm for spark2 does‘t support AQE？I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

spark2 don't support AQE.

jerqi · 2022-06-21T07:36:41Z

The open source Spark2 don't support AQE, too.

xunxunmimi5577 · 2022-06-21T07:38:34Z

Firestorm for spark2 does‘t support AQE？I saw that the implementation of the stop() method in RssShuffleWriter(Spark2) seems to fill the partitionLengthse with dummy value.

spark2 don't support AQE.

But if I set spark.sql.adaptive.enabled=true，I will get the wrong result.

jerqi · 2022-06-21T07:42:13Z

https://spark.apache.org/releases/spark-release-3-0-0.html
AQE is the Spark 3.0's feature.

xunxunmimi5577 · 2022-06-21T07:46:44Z

As far as I know, spark2 can also use configuration spark.sql.adaptive.enabled.

xunxunmimi5577 · 2022-06-21T07:49:57Z

Then ExchangeCoordinator.doEstimationIfNecessary() method will need mapOutputStatistics to determine the number of post-shuffle partitions.

colinmjj · 2022-06-21T07:51:58Z

@xunxunmimi5577 For RSS + Spark2, AQE is not supported with current implementation. This feature was announced in Spark3, so there is no plan to support AQE with Spark2.

jerqi · 2022-06-21T07:52:57Z

It's not available feature in Spark2. Maybe some configurations were added first , but the implement isn't complete.

xunxunmimi5577 · 2022-06-21T08:21:35Z

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0，this is the phenomenon described in my issue，there were supposed to be 200 tasks to execute, but only one was executed.
I think users should at least be prompted of this.

jerqi · 2022-06-21T08:40:18Z

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0，this is the phenomenon described in my issue，there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

colinmjj · 2022-06-21T08:44:09Z

@xunxunmimi5577 thanks for report this, I think it should be described in readme for such unsupported case.

xunxunmimi5577 · 2022-06-21T09:05:05Z

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0，this is the phenomenon described in my issue，there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

I would like to, or maybe you just want to describe it in readme？

xunxunmimi5577 · 2022-06-21T09:08:31Z

Moreover, is it possible to record an array of partitionLengths like Spark3?

jerqi · 2022-06-21T09:08:33Z

If I use spark2 + firestorm + spark.sql.adaptive.enabled=true, partitionStartIndices from ExchangeCoordinator will be [0,200) instead of [0,1),[1,2),... ,then shuffleReader will only read partition 0，this is the phenomenon described in my issue，there were supposed to be 200 tasks to execute, but only one was executed. I think users should at least be prompted of this.

OK, We can check whether ADAPTIVE_EXECUTION_ENABLED is enabled in RssShuffleManager. If true, we can throw an illegal argument exception. Would you like to contribute it?

I would like to, or maybe you just want to describe it in readme？

Actually, we want to do two things. We want to add the parameter check in code. And we also want to increase document description.

jerqi · 2022-06-21T09:09:35Z

Moreover, is it possible to record an array of partitionLengths like Spark3?

It's not available Feature in Spark 2. We wouldn't do it.

xunxunmimi5577 · 2022-06-21T12:46:52Z

OK

jerqi · 2022-06-30T03:48:01Z

Could I close this issue? Is it solved?

xunxunmimi5577 · 2022-06-30T03:53:31Z

I think it's solved.Let me close this issue.

xunxunmimi5577 closed this as completed Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffle read does not read all data completely? #175

Shuffle read does not read all data completely? #175

xunxunmimi5577 commented Jun 14, 2022

colinmjj commented Jun 14, 2022

xunxunmimi5577 commented Jun 14, 2022 •

edited

xunxunmimi5577 commented Jun 14, 2022

colinmjj commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022 via email

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022 •

edited

jerqi commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

colinmjj commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022 •

edited

colinmjj commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 30, 2022

xunxunmimi5577 commented Jun 30, 2022

Shuffle read does not read all data completely? #175

Shuffle read does not read all data completely? #175

Comments

xunxunmimi5577 commented Jun 14, 2022

colinmjj commented Jun 14, 2022

xunxunmimi5577 commented Jun 14, 2022 • edited

xunxunmimi5577 commented Jun 14, 2022

colinmjj commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 15, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022 via email

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022 • edited

jerqi commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

colinmjj commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022 • edited

colinmjj commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 21, 2022

jerqi commented Jun 21, 2022

xunxunmimi5577 commented Jun 21, 2022

jerqi commented Jun 30, 2022

xunxunmimi5577 commented Jun 30, 2022

xunxunmimi5577 commented Jun 14, 2022 •

edited

xunxunmimi5577 commented Jun 21, 2022 •

edited

jerqi commented Jun 21, 2022 •

edited