-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[data] split test_all_to_all.py #53865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR splits the monolithic test_all_to_all.py into several smaller end-to-end tests to speed up runtime by running tests in parallel. The changes include new test files for unique, repartition, random shuffle, map groups, and aggregation functionalities as well as updates in the BUILD file to reflect these changes.
- Splitting tests for aggregation, unique, repartition, random shuffle, and group-by operations.
- Updating the BUILD configuration to assign separate test targets.
- Refactoring test logic for improved parallelism and resource-specific behavior.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
python/ray/data/tests/test_unique_e2e.py | New tests for unique operation and null handling. |
python/ray/data/tests/test_repartition_e2e.py | New tests for repartition logic with and without shuffling. |
python/ray/data/tests/test_random_e2e.py | New tests for random shuffle operations and order verification. |
python/ray/data/tests/test_map_groups_e2e.py | New tests for groupby map operations using GPUs and actors. |
python/ray/data/tests/test_agg_e2e.py | New tests for aggregation logic and error validation. |
python/ray/data/BUILD | Updated BUILD configuration to reflect split test targets. |
): | ||
df = pd.DataFrame({"a": np.random.rand(10), "b": np.random.rand(10)}) | ||
ds = ray.data.from_pandas(df).randomize_block_order() | ||
ds.schema().names == ["a", "b"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding an explicit assert statement here (e.g., assert ds.schema().names == ["a", "b"]) to validate that the schema is as expected.
ds.schema().names == ["a", "b"] | |
assert ds.schema().names == ["a", "b"], "Schema does not match the expected structure ['a', 'b']" |
Copilot uses AI. Check for mistakes.
xs = list(range(100)) | ||
ds = ray.data.from_items([{"A": (x % 3), "B": x, "C": (x % 2)} for x in xs]) | ||
|
||
def check_init(k): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] When using 'len(keys)' inside check_init, consider explicitly checking the type of 'keys' (e.g. whether it is a list or a string) to make the intent clearer and avoid potential confusion.
def check_init(k): | |
def check_init(k): | |
if not isinstance(keys, (list, str)): | |
raise TypeError(f"'keys' must be a list or a string, but got {type(keys).__name__}") |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty
Signed-off-by: can <can@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: Scott Lee <scott.lee@rebellions.ai>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
The test_all_to_all.py is taking a really long time to finish (35 minutes). I'm breaking this into smaller chunks so they run in parallel.
Test:
CI