[data] split test_all_to_all.py #53865

can-anyscale · 2025-06-16T22:43:34Z

The test_all_to_all.py is taking a really long time to finish (35 minutes). I'm breaking this into smaller chunks so they run in parallel.

Test:

CI

Copilot

Pull Request Overview

This PR splits the monolithic test_all_to_all.py into several smaller end-to-end tests to speed up runtime by running tests in parallel. The changes include new test files for unique, repartition, random shuffle, map groups, and aggregation functionalities as well as updates in the BUILD file to reflect these changes.

Splitting tests for aggregation, unique, repartition, random shuffle, and group-by operations.
Updating the BUILD configuration to assign separate test targets.
Refactoring test logic for improved parallelism and resource-specific behavior.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
python/ray/data/tests/test_unique_e2e.py	New tests for unique operation and null handling.
python/ray/data/tests/test_repartition_e2e.py	New tests for repartition logic with and without shuffling.
python/ray/data/tests/test_random_e2e.py	New tests for random shuffle operations and order verification.
python/ray/data/tests/test_map_groups_e2e.py	New tests for groupby map operations using GPUs and actors.
python/ray/data/tests/test_agg_e2e.py	New tests for aggregation logic and error validation.
python/ray/data/BUILD	Updated BUILD configuration to reflect split test targets.

Copilot · 2025-06-16T22:48:51Z

python/ray/data/tests/test_random_e2e.py

+):
+    df = pd.DataFrame({"a": np.random.rand(10), "b": np.random.rand(10)})
+    ds = ray.data.from_pandas(df).randomize_block_order()
+    ds.schema().names == ["a", "b"]


[nitpick] Consider adding an explicit assert statement here (e.g., assert ds.schema().names == ["a", "b"]) to validate that the schema is as expected.

Suggested change

ds.schema().names == ["a", "b"]

assert ds.schema().names == ["a", "b"], "Schema does not match the expected structure ['a', 'b']"

Copilot · 2025-06-16T22:48:52Z

python/ray/data/tests/test_agg_e2e.py

+    xs = list(range(100))
+    ds = ray.data.from_items([{"A": (x % 3), "B": x, "C": (x % 2)} for x in xs])
+
+    def check_init(k):


[nitpick] When using 'len(keys)' inside check_init, consider explicitly checking the type of 'keys' (e.g. whether it is a list or a string) to make the intent clearer and avoid potential confusion.

Suggested change

def check_init(k):

def check_init(k):

if not isinstance(keys, (list, str)):

raise TypeError(f"'keys' must be a list or a string, but got {type(keys).__name__}")

bveeramani

ty

Signed-off-by: can <can@anyscale.com>

The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: Scott Lee <scott.lee@rebellions.ai>

The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com>

The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

can-anyscale force-pushed the can-dsplit branch from 700d539 to 85527c8 Compare June 16, 2025 22:45

can-anyscale added the go label Jun 16, 2025

can-anyscale marked this pull request as ready for review June 16, 2025 22:48

Copilot AI review requested due to automatic review settings June 16, 2025 22:48

can-anyscale requested a review from a team as a code owner June 16, 2025 22:48

can-anyscale mentioned this pull request Jun 16, 2025

[data] split test_all_to_all.py #53863

Closed

Copilot AI reviewed Jun 16, 2025

View reviewed changes

bveeramani approved these changes Jun 16, 2025

View reviewed changes

can-anyscale force-pushed the can-dsplit branch from 85527c8 to 6e33a22 Compare June 16, 2025 23:07

can-anyscale enabled auto-merge (squash) June 16, 2025 23:10

can-anyscale force-pushed the can-dsplit branch from 6e33a22 to 3b98457 Compare June 17, 2025 17:35

github-actions bot disabled auto-merge June 17, 2025 17:35

can-anyscale force-pushed the can-dsplit branch from 3b98457 to 9382518 Compare June 17, 2025 19:54

can-anyscale changed the base branch from master to releases/2.47.1 June 17, 2025 19:56

can-anyscale requested review from a team, edoakes, zcin, GeneDer, akshay-anyscale, pcmoritz, kevin85421, aslonnie, richardliaw and thomasdesr as code owners June 17, 2025 19:56

can-anyscale removed request for a team, pcmoritz, thomasdesr, raulchen, richardliaw, GeneDer, edoakes, zcin, kevin85421, aslonnie and akshay-anyscale June 17, 2025 19:59

can-anyscale enabled auto-merge (squash) June 17, 2025 19:59

[data] split test_all_to_all.py

3f964d7

Signed-off-by: can <can@anyscale.com>

can-anyscale force-pushed the can-dsplit branch from 4577c66 to 3f964d7 Compare June 17, 2025 20:25

github-actions bot disabled auto-merge June 17, 2025 20:25

Merge branch 'master' into can-dsplit

4b8fdb5

can-anyscale enabled auto-merge (squash) June 17, 2025 22:17

github-actions bot disabled auto-merge June 17, 2025 22:17

can-anyscale merged commit efd1c15 into master Jun 17, 2025
6 checks passed

can-anyscale deleted the can-dsplit branch June 17, 2025 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data] split test_all_to_all.py #53865

[data] split test_all_to_all.py #53865

Uh oh!

can-anyscale commented Jun 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 16, 2025

Uh oh!

Copilot AI Jun 16, 2025

Uh oh!

bveeramani left a comment

Uh oh!

Uh oh!

Uh oh!

	ds.schema().names == ["a", "b"]
	assert ds.schema().names == ["a", "b"], "Schema does not match the expected structure ['a', 'b']"

[data] split test_all_to_all.py #53865

[data] split test_all_to_all.py #53865

Uh oh!

Conversation

can-anyscale commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

can-anyscale commented Jun 16, 2025 •

edited

Loading