-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
2 Releases published by 2 people
-
ray-2.44.0 Ray-2.44.0
published
Mar 21, 2025 -
ray-2.44.1 Ray-2.44.1
published
Mar 27, 2025
90 Pull requests merged by 33 people
-
[CI] Add ability to configure release tests with a matrix
#51721 merged
Mar 27, 2025 -
[docker] Update latest Docker dependencies for 2.44.1 release
#51775 merged
Mar 27, 2025 -
[ci] Allow dependency failure for kuberay ci kickoff
#51760 merged
Mar 27, 2025 -
[core] disable asan tests on premerge
#51772 merged
Mar 27, 2025 -
[Chore][Autoscaler] Clarify disable_launch_config_check comment in StandardAutoscaler
#51751 merged
Mar 27, 2025 -
[data] refine backpressure info on progress bar
#51697 merged
Mar 27, 2025 -
[Data] Revisiting
make_async_gen
to address issues with concurrency control for sequences of varying lengths#51661 merged
Mar 27, 2025 -
[core][state] Fix false alarm in
get_logs
when a server chunk splits into multiple client chunks#51750 merged
Mar 27, 2025 -
Revert "[doc] Add hpu resource description in ray train related docs"
#51754 merged
Mar 27, 2025 -
Make @edoakes the czar of
_common/
dir for now#51753 merged
Mar 27, 2025 -
[Feat][Core/Dashboard] Convert ReportHead to subprocess module
#51733 merged
Mar 27, 2025 -
[core] Fix windows build with no cython -Wno-shadow
#51730 merged
Mar 27, 2025 -
[data] add getdaft to compiled versions
#51723 merged
Mar 27, 2025 -
[serve] Remove RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS flag
#51722 merged
Mar 26, 2025 -
Revert "[serve] Log rejected requests at router side (#51346)"
#51698 merged
Mar 26, 2025 -
Run basic Python 3.13 tests
#51688 merged
Mar 26, 2025 -
[ci] add misc and untested files in skipping
#51715 merged
Mar 26, 2025 -
[doc] Add hpu resource description in ray train related docs
#47241 merged
Mar 26, 2025 -
[core]
test_job_isolation
passes even when exceptions are thrown#51694 merged
Mar 26, 2025 -
[core][kuberay] Trigger kuberay release pipeline from rayci
#51539 merged
Mar 26, 2025 -
[data] Integrate Ray Dataset with Daft Dataframe
#51531 merged
Mar 26, 2025 -
[Feat][Core/Dashboard] Convert StateHead to subprocess module
#51676 merged
Mar 26, 2025 -
[core] Fix all gcs variable shadowing
#51704 merged
Mar 26, 2025 -
[Autoscaler] Update CoordinatorNodeProvider example
#51293 merged
Mar 26, 2025 -
Fix Ray Client when 'uv run' runtime environment is used
#51683 merged
Mar 26, 2025 -
[core] Fix all raylet variable shadowing
#51689 merged
Mar 26, 2025 -
[docs] Update usage_lib.py guide link
#51681 merged
Mar 26, 2025 -
[core][autoscaler][v2] do not removing nodes for upcoming resource requests
#51570 merged
Mar 26, 2025 -
[CI] Update LLM dependencies list and make the uv compile test job hard fail
#51693 merged
Mar 26, 2025 -
[core] Fix incorrect comment
#51575 merged
Mar 25, 2025 -
[tests] Reassign dashboard tests to core team
#51691 merged
Mar 25, 2025 -
[core] Introduce ConcurrentFlatMap and use for InMemoryStoreClient
#50375 merged
Mar 25, 2025 -
[Serve.llm] fix loading model from remote storage and add docs
#51617 merged
Mar 25, 2025 -
[core] Record dashboard metrics with oneshot
#51627 merged
Mar 25, 2025 -
[Feat][Core/Dashboard] Convert JobHead to subprocess module
#51553 merged
Mar 25, 2025 -
[core] Avoid resize in GetAndPinArgsForExecutor
#51543 merged
Mar 25, 2025 -
[release] Fix perf metrics compare
#51655 merged
Mar 25, 2025 -
[Data][LLM] trust remote code
#51680 merged
Mar 25, 2025 -
[core] Fix all variable shadowing for core worker
#51672 merged
Mar 25, 2025 -
[core] Threaded actors get stuck forever if they receive two exit signals
#51582 merged
Mar 25, 2025 -
[core] [easy] Mark cgroup tests exclusive
#51654 merged
Mar 25, 2025 -
[Data] Support async callable classes in flat_map()
#51180 merged
Mar 25, 2025 -
[Data] fix RandomAccessDataset.multiget returning unexpected values for missing keys
#44769 merged
Mar 25, 2025 -
[data] fix lance ut failed
#51421 merged
Mar 25, 2025 -
[core] Correct the wording in the OnNodeDead logs to avoid confusion
#51668 merged
Mar 25, 2025 -
[ci] add an always tag for cond testing
#51662 merged
Mar 25, 2025 -
[deps] Use UV to compile LLM dependencies
#51323 merged
Mar 25, 2025 -
[data] Update repartition on target_num_rows_per_block documentation
#51433 merged
Mar 25, 2025 -
[data] Fix Databricks host URL handling in Ray Data
#49926 merged
Mar 25, 2025 -
[serve] Remove RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE flag
#51649 merged
Mar 24, 2025 -
[ray.llm] Refactor model download utilities
#51604 merged
Mar 24, 2025 -
refactor replica _handle_errors_and_metrics
#51644 merged
Mar 24, 2025 -
[ci] Enable Cgroup support in CI for core
#51454 merged
Mar 24, 2025 -
[Test][KubeRay] Add doctest for RayCluster Quickstart doc
#51249 merged
Mar 24, 2025 -
Skip multiplex metrics and proxy status code is error tests on windows
#51645 merged
Mar 24, 2025 -
[serve] update deployment status docs
#51610 merged
Mar 24, 2025 -
[RLlib] Make min/max env steps per evaluation sample call configurable for duration="auto".
#51637 merged
Mar 24, 2025 -
Fix Ray Train release test
#51624 merged
Mar 24, 2025 -
[serve] don't stop retrying replicas when a deployment is scaling back up from zero
#51600 merged
Mar 24, 2025 -
[core] Fix test_threaded_actor flaky on mac
#51602 merged
Mar 24, 2025 -
Fix syntax errors in Ray Tune example pbt_ppo_example.ipynb
#51626 merged
Mar 23, 2025 -
[core] [easy] [noop] Add comments on client call
#51614 merged
Mar 22, 2025 -
[Doc][KubeRay] Add a doc to explain why some worker Pods are not ready in RayService
#51095 merged
Mar 22, 2025 -
[Feat][Core/Dashboard] Convert EventHead to subprocess module
#51587 merged
Mar 22, 2025 -
[Core] Cover cpplint for /src/ray/core_worker (excluding transport)
#51557 merged
Mar 22, 2025 -
[Docs][Core] Update system logs doc for dashboard subprocess module
#50984 merged
Mar 22, 2025 -
[Feat][Core/Dashboard] Convert DataHead to subprocess module
#51507 merged
Mar 22, 2025 -
Add TorchDataLoader to Train Benchmark
#51456 merged
Mar 22, 2025 -
[core] [easy] [no-op] Fix rotation comment
#51606 merged
Mar 21, 2025 -
[release-automation] Add option to add build tag when uploading wheels to pypi
#51517 merged
Mar 21, 2025 -
[serve][tests] Add a timeout for resnet app image request
#51569 merged
Mar 21, 2025 -
[serve][test] Change the response_time_s to response_time_ms
#51566 merged
Mar 21, 2025 -
Fix broken doctest build
#51594 merged
Mar 21, 2025 -
[Serve.llm] Add gen config related doc
#51572 merged
Mar 21, 2025 -
[CI] Upgrade pytest-aiohttp to 1.1.0
#51556 merged
Mar 21, 2025 -
[Data] Removing usages of the deprecated
use_legacy_format
param#51563 merged
Mar 21, 2025 -
[llm] ray.llm support custom accelerators
#51359 merged
Mar 21, 2025 -
[Doc] Clarify the relation between 'uv run' and 'uv pip' support
#51599 merged
Mar 21, 2025 -
[Data] Adding more ops to
BlockColumnAccessor
#51571 merged
Mar 21, 2025 -
[ray.data.llm] Propose log_input_column_names()
#51441 merged
Mar 21, 2025 -
Move experimental and OOM tests to core builds
#51525 merged
Mar 21, 2025 -
Add perf metrics for 2.44.0
#51427 merged
Mar 21, 2025 -
[core] Make testable stream redirection
#51191 merged
Mar 21, 2025 -
[Feat][Core/Dashboard] Remove ReportEventService and replace with HTTP API
#51555 merged
Mar 21, 2025 -
[docker] Update latest Docker dependencies for 2.44.0 release
#51581 merged
Mar 21, 2025 -
[docker] Update latest Docker dependencies for 2.44.0 release
#51580 merged
Mar 21, 2025 -
[Feat][Core/Dashboard] Redirect child process stdout and stderr to dashboard_[module_name].err
#51545 merged
Mar 21, 2025 -
Give better error message if 'uv run' is combined with incompatible plugins
#51565 merged
Mar 21, 2025
70 Pull requests opened by 35 people
-
[WIP] [core] Rotate log monitor
#51573 opened
Mar 21, 2025 -
Bump gradio from 3.50.2 to 5.22.0 in /python/requirements
#51577 opened
Mar 21, 2025 -
Bump transformers from 4.30.1 to 4.48.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#51597 opened
Mar 21, 2025 -
Bump vllm from 0.7.2 to 0.8.1 in /python
#51603 opened
Mar 21, 2025 -
[docs] Feature: adopt llms.txt convention
#51605 opened
Mar 21, 2025 -
Bump pytorch-lightning from 1.8.6 to 2.4.0 in /python
#51607 opened
Mar 21, 2025 -
Bump flask-cors from 4.0.0 to 4.0.2 in /python
#51609 opened
Mar 21, 2025 -
[serve] reorg
#51611 opened
Mar 21, 2025 -
Bump torch from 2.0.1 to 2.4.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#51612 opened
Mar 21, 2025 -
Bump mlflow from 2.9.2 to 2.20.3 in /python/requirements/ml
#51613 opened
Mar 22, 2025 -
Bump gunicorn from 20.1.0 to 23.0.0 in /python
#51615 opened
Mar 22, 2025 -
[core] grpc stub manager
#51616 opened
Mar 22, 2025 -
correct the error msg for invalid env registering
#51621 opened
Mar 22, 2025 -
[core] Large objects release test
#51625 opened
Mar 22, 2025 -
[py_modules] Don't install the wheel package if it's already installed
#51629 opened
Mar 23, 2025 -
[data] add hive catalog
#51638 opened
Mar 24, 2025 -
[RLlib] MetricsLogger + Stats overhaul
#51639 opened
Mar 24, 2025 -
[core] split scheduler into smaller targets to improve build performance
#51641 opened
Mar 24, 2025 -
[Core][Prototype] Prototype Code for Event Buffer
#51648 opened
Mar 24, 2025 -
[core] Fix actor reconstruction that depends on plasma object
#51653 opened
Mar 24, 2025 -
Add image datasets to ray train benchmark
#51657 opened
Mar 24, 2025 -
fix status codes on http proxy
#51658 opened
Mar 25, 2025 -
this commit adds use of specific python3.9 version for development se…
#51663 opened
Mar 25, 2025 -
[VMware][WCP provider][Part 3/3] Architecture documentation & uts for vsphere wcp provider
#51666 opened
Mar 25, 2025 -
[WIP] [core] Force compilation error on variable shadow
#51669 opened
Mar 25, 2025 -
update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
#51673 opened
Mar 25, 2025 -
Add uv to Docker image
#51675 opened
Mar 25, 2025 -
windows dev setup
#51678 opened
Mar 25, 2025 -
[Refactor][Core/Dashboard] Extract shared implementation from ServeHead and ServeAgent to a separate class
#51682 opened
Mar 25, 2025 -
[docs] Tune toc
#51684 opened
Mar 25, 2025 -
[core] add actor labels to export events
#51687 opened
Mar 25, 2025 -
[core] Avoid task spec copy for in order actor task submission
#51692 opened
Mar 25, 2025 -
Revert "[cg] Move default device logic into channel utils (#51305)"
#51699 opened
Mar 26, 2025 -
[WIP] [core] Add worker process into application cgroup
#51701 opened
Mar 26, 2025 -
[core][dashboard-agent] Fail fast if the dashboard agent fails to launch the HTTP server
#51705 opened
Mar 26, 2025 -
Update `--labels` and add `--labels-from-file` options for Label Selector API
#51706 opened
Mar 26, 2025 -
Add `label_selector` option to remote functions
#51707 opened
Mar 26, 2025 -
[Docs][KubeRay] Add guide for writing KubeRay doctests
#51708 opened
Mar 26, 2025 -
[Test][KubeRay] Add a deliberate failure test to ensure doctests fail on error
#51709 opened
Mar 26, 2025 -
[data] Make sql cursor buffered
#51712 opened
Mar 26, 2025 -
[ci] upgrade rayci version
#51713 opened
Mar 26, 2025 -
[core] Lazily subscribe to node changes from workers
#51718 opened
Mar 26, 2025 -
[Core] Native CPU affinity support for accelerators
#51719 opened
Mar 26, 2025 -
[Data][LLM] Bump vLLM version to support new models
#51726 opened
Mar 26, 2025 -
[WIP]
#51727 opened
Mar 26, 2025 -
[train] differentiate between train v1 and v2 export data
#51728 opened
Mar 26, 2025 -
[WIP] [core] Log rotation for dashboard
#51729 opened
Mar 27, 2025 -
[core] Log rotate monitor
#51731 opened
Mar 27, 2025 -
[core][cgraph] Fix illegal memory access of cgraph when used in PP
#51734 opened
Mar 27, 2025 -
[core] Set actor creation task's `num_returns` to 0 instead of 1
#51735 opened
Mar 27, 2025 -
[Chore][Core/Dashboard] Remove TrainHead's dependency on DataOrganizer
#51739 opened
Mar 27, 2025 -
[Core] Runtime env working_dir validation #51380
#51741 opened
Mar 27, 2025 -
[core] Fix GCS target compilation
#51742 opened
Mar 27, 2025 -
[WIP] [core] Fix root variable shadow
#51743 opened
Mar 27, 2025 -
[data] support new pyiceberg version
#51744 opened
Mar 27, 2025 -
[WIP] [core] Avoid ray_common as dependency
#51745 opened
Mar 27, 2025 -
[Test][Dashboard] Add API tests for MetricsHead module
#51752 opened
Mar 27, 2025 -
[Test][KubeRay] Add doctest for RayJob Quickstart doc
#51756 opened
Mar 27, 2025 -
[core] Remove unnecessary exporter depencencies
#51763 opened
Mar 27, 2025 -
[Data] Remove lazy fixture
#51764 opened
Mar 27, 2025 -
[core] debug
#51765 opened
Mar 27, 2025 -
[core] Remove object store runner
#51766 opened
Mar 27, 2025 -
export target details via applications rest api
#51767 opened
Mar 27, 2025 -
[Serve] Unify request cancellation errors
#51768 opened
Mar 27, 2025 -
[Data] Implement forceful releasing of actors upon shutdown of `StreamingExecutor`
#51769 opened
Mar 27, 2025 -
[ci] mark jailed rllib tests as manual
#51770 opened
Mar 27, 2025 -
[Core][Autoscaler] Update the Autoscaler Resource Requests Data Model for Scheduling for Label Selector
#51771 opened
Mar 27, 2025 -
[rllib] apply buildifier to build file
#51773 opened
Mar 27, 2025 -
[ci] add flaky tags for flaky test groups
#51774 opened
Mar 27, 2025
55 Issues closed by 18 people
-
Release test rllib_multi_gpu_with_attention_learning_tests.aws failed
#42605 closed
Mar 27, 2025 -
Release test rllib_learner_group_checkpointing_multinode.aws failed
#44209 closed
Mar 27, 2025 -
Release test rllib_learning_tests_marwil_old_api_stack_tf.aws failed
#44522 closed
Mar 27, 2025 -
Release test rllib_learning_tests_cql_old_api_stack_tf.aws failed
#44552 closed
Mar 27, 2025 -
CI test linux://rllib:learning_tests_multi_agent_cartpole_appo_multi_cpu is flaky
#46330 closed
Mar 27, 2025 -
CI test linux://rllib:learning_tests_cartpole_truncated_ppo is flaky
#47646 closed
Mar 27, 2025 -
CI test linux://rllib:learning_tests_cartpole_truncated_ppo is flaky
#47635 closed
Mar 27, 2025 -
CI test windows://python/ray/serve/tests:test_metrics is flaky
#45843 closed
Mar 27, 2025 -
CI test linux://python/ray/tests:test_state_api_2 is flaky
#51736 closed
Mar 27, 2025 -
CI test linux://python/ray/tests:test_asyncio_client_mode is flaky
#51659 closed
Mar 27, 2025 -
CI test linux://python/ray/tests:test_asyncio is flaky
#51660 closed
Mar 27, 2025 -
CI test linux://doc:doctest[train-gpu][gpu] is consistently_failing
#51740 closed
Mar 27, 2025 -
AssertionError: Session name mismatch after restarting Ray cluster with cleanup steps
#51737 closed
Mar 27, 2025 -
[Ray debugger] Ray dubbger does not work
#51670 closed
Mar 27, 2025 -
【bug】Ray.data.write_parquet will write twice when use fsspec local filesystem
#49741 closed
Mar 26, 2025 -
[data] iter_batches needs streaming operation
#49072 closed
Mar 26, 2025 -
[data] Documentation is formatted incorrectly
#48974 closed
Mar 26, 2025 -
CI test linux://python/ray/tests:test_array_asan is consistently_failing
#51714 closed
Mar 26, 2025 -
CI test linux://python/ray/serve/tests:test_multiplex_with_queue_len_cache_disabled is consistently_failing
#48379 closed
Mar 26, 2025 -
[<Ray component: Core|RLlib|etc...>] why ? I use 'curl -v http://127.0.0.1:8268/api/version' return @@@? ,
#51584 closed
Mar 26, 2025 -
[Ray Core] The node storing the actor will be kill unexpectedly when autoscaler is turned on
#46172 closed
Mar 26, 2025 -
[Core] Problems with uv run and remote cluster
#51368 closed
Mar 26, 2025 -
[core][autoscaler][v2] do not removing nodes for upcoming resource requests
#51321 closed
Mar 26, 2025 -
CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_cpu is flaky
#47264 closed
Mar 26, 2025 -
[data] async `flat_map`
#50329 closed
Mar 26, 2025 -
[data] Async map_batches return empty result when execution_options.preserve_order = True
#51188 closed
Mar 26, 2025 -
CI test linux://python/ray/dashboard:test_node is consistently_failing
#51618 closed
Mar 25, 2025 -
CI test linux://python/ray/dashboard:test_dashboard is consistently_failing
#44917 closed
Mar 25, 2025 -
CI test darwin://python/ray/tests:test_gcs_fault_tolerance is consistently_failing
#43777 closed
Mar 25, 2025 -
CI test darwin://python/ray/tests:test_scheduling_performance is flaky
#44238 closed
Mar 25, 2025 -
[Serve] Elastic Autoscaling Based on Cluster Resources with Customizable Scaling Logic
#49151 closed
Mar 25, 2025 -
[Data] RandomAccessDataset.multiget return unexpected values for missing keys.
#44768 closed
Mar 25, 2025 -
Release test training_ingest_benchmark-task=image_classification.skip_training failed
#51622 closed
Mar 25, 2025 -
Release test training_ingest_benchmark-task=image_classification.skip_training.fault_tolerance failed
#51623 closed
Mar 25, 2025 -
Release test training_ingest_benchmark-task=image_classification.skip_training_torch_dataloader failed
#51633 closed
Mar 25, 2025 -
CI test windows://python/ray/tests:test_actor_client_mode is flaky
#51651 closed
Mar 25, 2025 -
CI test linux://python/ray/data:test_huggingface is consistently_failing
#44516 closed
Mar 25, 2025 -
[<Ray component: Data] - inconsistent URL handling in Ray's Databricks integration
#49925 closed
Mar 25, 2025 -
[client] Documentation Python version behavior
#45339 closed
Mar 24, 2025 -
Release test aws_cluster_launcher_minimal failed
#51443 closed
Mar 24, 2025 -
Release test aws_cluster_launcher failed
#51437 closed
Mar 24, 2025 -
[Serve] Serve no longer retries deployments after 3 failures
#50710 closed
Mar 24, 2025 -
[Data/preprocessors] Allow preprocessors to be append operations
#48133 closed
Mar 24, 2025 -
CI test darwin://python/ray/tests:test_task_metrics is flaky
#48278 closed
Mar 22, 2025 -
[Core] Cover cpplint for `ray/core_worker` (excluding transport)
#51510 closed
Mar 22, 2025 -
CI test linux://python/ray/serve/tests/unit:test_pow_2_replica_scheduler is flaky
#48736 closed
Mar 22, 2025 -
[core] Place unit test alongside with the implementation
#51152 closed
Mar 21, 2025 -
CI test linux://python/ray/data:test_parquet is flaky
#48152 closed
Mar 21, 2025 -
CI test linux://python/ray/data:test_metadata_provider is flaky
#51436 closed
Mar 21, 2025 -
[Core] API Reference: uv
#51195 closed
Mar 21, 2025 -
CI test linux://python/ray/tests:test_object_spilling_2_debug_mode is consistently_failing
#49143 closed
Mar 21, 2025 -
[core] Compiled Graphs has a dependence on pyarrow
#51595 closed
Mar 21, 2025 -
Ray client connection timeout on ray.init
#51591 closed
Mar 21, 2025 -
[Core] Failed to use uv
#51196 closed
Mar 21, 2025
46 Issues opened by 29 people
-
[core][dashboard] don't use the first byte to determine whether a chunk succeeds or not in `get_log`
#51762 opened
Mar 27, 2025 -
[Ray Serve] request timeout sec for grpc
#51761 opened
Mar 27, 2025 -
Release test stress_test_state_api_scale.aws failed
#51759 opened
Mar 27, 2025 -
[Data] Found Bug `add_column`
#51758 opened
Mar 27, 2025 -
[Ray Data: Preprocessors] Support flattening vector features in concatenator
#51757 opened
Mar 27, 2025 -
[Core] `uv sync` fails when running fork of Ray
#51755 opened
Mar 27, 2025 -
CI test darwin://python/ray/tests:test_state_api_2 is consistently_failing
#51749 opened
Mar 27, 2025 -
[RLLIB] PPO Gradient Removed for Value Estimation
#51748 opened
Mar 27, 2025 -
[RLLIB] Offline Training
#51747 opened
Mar 27, 2025 -
CI test darwin://python/ray/tests:test_state_api_log is consistently_failing
#51746 opened
Mar 27, 2025 -
CI test linux://python/ray/tests:test_job is flaky
#51738 opened
Mar 27, 2025 -
CI test linux://python/ray/air:test_resource_manager_placement_group is consistently_failing
#51725 opened
Mar 26, 2025 -
[<Ray component: Core>,C++] Ray worker process number keep increasing if calling actor from workers
#51711 opened
Mar 26, 2025 -
CI test darwin://python/ray/dashboard:test_cli_integration is flaky
#51710 opened
Mar 26, 2025 -
Outdated docs
#51703 opened
Mar 26, 2025 -
DQN training with 2.44.0 with gymnasium.env and action mask giving error
#51700 opened
Mar 26, 2025 -
[core] Ray status doesn't work for all ostream-printable objects
#51695 opened
Mar 25, 2025 -
[core][gpu-objects] intra-process communication
#51685 opened
Mar 25, 2025 -
AttributeError observed in sample_func when executed from a different python source file
#51679 opened
Mar 25, 2025 -
[core/scheduler] replace :ray_common dep with sub-dependencies
#51677 opened
Mar 25, 2025 -
[core] Add more compilation options and link options
#51671 opened
Mar 25, 2025 -
[core] Ray status doesn't show source location where the error happens
#51667 opened
Mar 25, 2025 -
[build] Avoid ODR issues
#51647 opened
Mar 24, 2025 -
[Core] Expose `tags` parameter for tasks/actors to be propagated to metrics
#51646 opened
Mar 24, 2025 -
[core][gpu-objects] Support streaming to overlap computation / communication
#51643 opened
Mar 24, 2025 -
[core] Unify `CoreWorker::Exit` and `CoreWorker::Shutdown`
#51642 opened
Mar 24, 2025 -
[core/scheduler] Split giant ray core C++ target into small ones
#51634 opened
Mar 24, 2025 -
[Serve] Ray Serve Autoscaling supports the configuration of custom-metrics and policy
#51632 opened
Mar 24, 2025 -
RLlib new API stack false deprecation warning / MultiRLModuleSpec
#51630 opened
Mar 23, 2025 -
[Core] Unable to build Ray wheel on Windows using Docker due to private image access issues
#51628 opened
Mar 23, 2025 -
[RLlib] Incorrect error message for improper registering of custom env
#51620 opened
Mar 22, 2025 -
[core] Split giant ray core C++ targets into small ones(plasma store)
#51619 opened
Mar 22, 2025 -
[Cluster] Ray job submit/logs sporadically stops following logs
#51601 opened
Mar 21, 2025 -
[Ray serve] StopAsyncIteration error thrown by ray when the client cancels the request
#51598 opened
Mar 21, 2025 -
[CG, Core] Illegal memory access with Ray 2.44 and vLLM v1 pipeline parallelism
#51596 opened
Mar 21, 2025 -
[cgraph] Support function nodes
#51593 opened
Mar 21, 2025 -
[Cluster] Add uv to base images
#51592 opened
Mar 21, 2025 -
[core] Combine multiple grpc connections into one
#51590 opened
Mar 21, 2025 -
[core] Replace opencensus with opentelemetry (C++)
#51589 opened
Mar 21, 2025 -
[Cluster] Split up monitor.log
#51586 opened
Mar 21, 2025 -
[Cluster] Autoscaler frequently fails to scale down workers
#51585 opened
Mar 21, 2025 -
[CG, Core] Add Ascend NPU Support for RCCL and CG
#51574 opened
Mar 21, 2025
788 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Core] Making Object Store Fallback Directory Configurable
#51189 commented on
Mar 27, 2025 • 13 new comments -
[Data] Update whether release tests use autoscaling
#51562 commented on
Mar 27, 2025 • 12 new comments -
[Compiled Graph] Enhance Compile Graph with Multi-Device Support
#51032 commented on
Mar 27, 2025 • 9 new comments -
[data] make random_sample() reproducible
#51401 commented on
Mar 26, 2025 • 8 new comments -
[LLM Batch] SGLang engine stage and processor
#51409 commented on
Mar 27, 2025 • 7 new comments -
[Core] Cover cpplint for ray/src/ray/common
#51551 commented on
Mar 26, 2025 • 5 new comments -
[core] support dir includes env for working dir
#50066 commented on
Mar 26, 2025 • 5 new comments -
[data] adding snowflake connectors
#51429 commented on
Mar 26, 2025 • 5 new comments -
[Grafana] Enable `includeAll` for Grafana cluster variable
#51396 commented on
Mar 27, 2025 • 5 new comments -
[Core][Bug fix] Trigger local task scheduling after deleting bundle.
#51125 commented on
Mar 26, 2025 • 5 new comments -
[doc] minor/patch version update
#48626 commented on
Mar 26, 2025 • 2 new comments -
[Autoscaler][V2] Check IM instance_status before terminating nodes
#50707 commented on
Mar 26, 2025 • 2 new comments -
[core][compiled graphs] Support reduce scatter and all gather collective for GPU communicator in compiled graph
#50624 commented on
Mar 26, 2025 • 2 new comments -
[core] Always create a default executor
#51058 commented on
Mar 26, 2025 • 1 new comment -
[core] Use cord for sending objects
#51397 commented on
Mar 26, 2025 • 1 new comment -
[data] Iceberg datasource read with pyiceberg 0.9 fix
#51453 commented on
Mar 26, 2025 • 1 new comment -
[core] Implement utils class to setup and cleanup cgroup folder
#49941 commented on
Mar 26, 2025 • 1 new comment -
Update rllib-env.rst
#46750 commented on
Mar 26, 2025 • 0 new comments -
[Core]Support Merge code search path from env variable
#46771 commented on
Mar 26, 2025 • 0 new comments -
Add docs link to Serve page of Ray Dashboard
#46812 commented on
Mar 26, 2025 • 0 new comments -
Introducing StaleTaskError
#46705 commented on
Mar 26, 2025 • 0 new comments -
[Core] If possible, force flush the trace when the worker ends.
#46654 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Place the submit job on a separate page
#46613 commented on
Mar 26, 2025 • 0 new comments -
[ADAG] Fix DAG input
#46604 commented on
Mar 26, 2025 • 0 new comments -
Fix mlflow artifact logging
#46570 commented on
Mar 26, 2025 • 0 new comments -
[Data] Make the seed take effect in Dataset.random_sample()
#46088 commented on
Mar 26, 2025 • 0 new comments -
Updated LogVirtualView component removed react window
#46835 commented on
Mar 26, 2025 • 0 new comments -
Verification to move PyG data to device
#46839 commented on
Mar 26, 2025 • 0 new comments -
Add generic item support for queue
#46849 commented on
Mar 26, 2025 • 0 new comments -
[POC] A Reactor style GCS. #1: GcsNodeManager
#46891 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Deprecate algo config (python) dicts; must be `AlgorithmConfig` objects.
#46896 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard]. Map logic resource data to row node
#46914 commented on
Mar 26, 2025 • 0 new comments -
Added support for multiple callbacks for GcsSubscriber
#46958 commented on
Mar 26, 2025 • 0 new comments -
[ci][core] GCS FT Chaos test
#46996 commented on
Mar 26, 2025 • 0 new comments -
[core][dashboard] Change the StateDataSourceClient from using gRPC stub -> NewGcsClient.
#47056 commented on
Mar 26, 2025 • 0 new comments -
[core] GcsPublisher bindings
#47062 commented on
Mar 26, 2025 • 0 new comments -
Add Runhouse to Ecosystem
#47150 commented on
Mar 26, 2025 • 0 new comments -
Revert "[doc]Make vllm example works with latest vllm version"
#46094 commented on
Mar 26, 2025 • 0 new comments -
Add PyFlyt waypoints example to documentation
#46145 commented on
Mar 26, 2025 • 0 new comments -
[ADAG] Detect if ADAG is at capacity for execution
#46158 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Add cleanup of `job_table` in `delete_job`
#46173 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Add GPU component usage
#46188 commented on
Mar 26, 2025 • 0 new comments -
[core] add ray.util.concurrent.futures.RayExecutor
#46249 commented on
Mar 26, 2025 • 0 new comments -
[WIP] CI: jemalloc & mimalloc
#46271 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Display accelerators info on demand, add Huawei Ascend NPU monitoring.
#46287 commented on
Mar 26, 2025 • 0 new comments -
[data] fix np.array crash the allocate mem error when souce include short an…
#46298 commented on
Mar 26, 2025 • 0 new comments -
Update py_modules.py AttributeError: module has no attribute '__path__'
#46302 commented on
Mar 26, 2025 • 0 new comments -
[Doc] Update directory path for installation
#46318 commented on
Mar 26, 2025 • 0 new comments -
Enable RAY_DATA_ENABLE_TENSOR_EXTENSION_CASTING environment variable
#46344 commented on
Mar 26, 2025 • 0 new comments -
fixed a typo in ValueError message for contains_tensor
#46348 commented on
Mar 27, 2025 • 0 new comments -
[Docker] Upgrade base deps docker python env to 3.9.7
#46353 commented on
Mar 26, 2025 • 0 new comments -
[test] cpp20
#46380 commented on
Mar 26, 2025 • 0 new comments -
[Core] Add ray-start option 'session-name'
#46404 commented on
Mar 26, 2025 • 0 new comments -
avoid merge errors when blocks contain different type in DelegatingBl…
#46407 commented on
Mar 26, 2025 • 0 new comments -
[Core] Use real CPU count available to a Ray process
#46424 commented on
Mar 26, 2025 • 0 new comments -
fix performance bug in arrow to numpy transform
#46433 commented on
Mar 26, 2025 • 0 new comments -
[Doc][KubeRay] Add KubeRay image resize example to Ray doc page
#46447 commented on
Mar 26, 2025 • 0 new comments -
python/ray/autoscaler/gcp/*.yaml: change scheduling from dict to list
#46500 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Optimize rnn_sequencing performance
#46502 commented on
Mar 26, 2025 • 0 new comments -
[autoscaler][aws] Fix replace cloudwatch alarm config
#46537 commented on
Mar 26, 2025 • 0 new comments -
[Core]Fix the issue of actor tasks hanging during resubmission
#46539 commented on
Mar 27, 2025 • 0 new comments -
[Data] Enable streaming json read
#46550 commented on
Mar 26, 2025 • 0 new comments -
[Jobs] Making sure `JobManager` retries `JobSupervisor.ping` before declaring job as failed
#47166 commented on
Mar 26, 2025 • 0 new comments -
[ADAG]Enable NPU (hccl) communication for CG
#47658 commented on
Mar 26, 2025 • 0 new comments -
[wip] revive zero copy torch tensor serialization
#47665 commented on
Mar 26, 2025 • 0 new comments -
[Data] Fix parallelism deriving heuristic to ensure parallelism stays w/in min/max bounds
#47695 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] PPO enhancement: Send samples as refs to n Learners (speedup for multi-node/multi-GPU learning).
#47707 commented on
Mar 26, 2025 • 0 new comments -
Convert export events to proto and flush from background thread
#47713 commented on
Mar 26, 2025 • 0 new comments -
[core]Make GCS InternalKV workload configurable to the Policy.
#47736 commented on
Mar 26, 2025 • 0 new comments -
[Docs][hotfix] Correct the desc of nums of blocks
#47741 commented on
Mar 26, 2025 • 0 new comments -
[core][aDAG] Fix cpu tensor is automatically converted to gpu tensor
#47742 commented on
Mar 26, 2025 • 0 new comments -
Getinternalconfig and ioctx
#47756 commented on
Mar 26, 2025 • 0 new comments -
[Docs] Update map_reduce.ipynb chunk_size
#47766 commented on
Mar 26, 2025 • 0 new comments -
[RayCluster] Introduce how to run ray remote job with ray client (#47…
#47771 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix MeanStdFilter: Do not accumulate `num_pushes` for RunningStats when merging connector states.
#47794 commented on
Mar 26, 2025 • 0 new comments -
Update vllm_openai_example.py for compatibility with latest vllm
#47835 commented on
Mar 26, 2025 • 0 new comments -
Add new serve autoscaling parameter `scaling_function`
#47837 commented on
Mar 26, 2025 • 0 new comments -
[observability][export-api] Write TrainRun events
#47888 commented on
Mar 26, 2025 • 0 new comments -
Adding image_uri to docstring of runtime_env
#47905 commented on
Mar 26, 2025 • 0 new comments -
[Docs] Update Volcano Integration with The New Flag
#47911 commented on
Mar 26, 2025 • 0 new comments -
[ADAG] Fix none output.
#47918 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; Offline RL] Enable GPU and multi-GPU training for offline algorithms.
#47929 commented on
Mar 26, 2025 • 0 new comments -
[Azure][Cluster] Check tags when provided
#47941 commented on
Mar 26, 2025 • 0 new comments -
[autoscalerv2] use replicas in workerGroupSpecs as current workers number when initialize scale request to fix scale up target is wrong
#47967 commented on
Mar 26, 2025 • 0 new comments -
(WIP) [ADAG] Support dag.experimental_compile(_custom_nccl_group= nccl_group) in aDAG
#47987 commented on
Mar 26, 2025 • 0 new comments -
[data] add backpressure reason
#48009 commented on
Mar 26, 2025 • 0 new comments -
fix: WandbLogger crashing silently on a FileNotFoundError
#50308 commented on
Mar 26, 2025 • 0 new comments -
[core][dashboard] Make updates to DataSource.(node_workers|core_worker_stats) on delta.
#47186 commented on
Mar 26, 2025 • 0 new comments -
add stop and delete button for jobs that are of submission type
#47189 commented on
Mar 26, 2025 • 0 new comments -
[core] Make preloading Jemalloc configurable for worker
#47243 commented on
Mar 26, 2025 • 0 new comments -
Add tensorflow support to numpy_to_tensor connector
#47246 commented on
Mar 26, 2025 • 0 new comments -
[bazel] move python rules up
#47260 commented on
Mar 26, 2025 • 0 new comments -
[core] Decouple create worker vs pop worker request.
#47268 commented on
Mar 26, 2025 • 0 new comments -
[data] Fixed pyarrow error when the writer receives empty table
#47270 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; docs] New API stack docs: Add `ConnectorV2` documentation
#47278 commented on
Mar 26, 2025 • 0 new comments -
idempotent replies by seq_no for sequential actors.
#47314 commented on
Mar 26, 2025 • 0 new comments -
[Core][aDAG] Remove busy waiting semaphore acquire in linux
#47322 commented on
Mar 26, 2025 • 0 new comments -
[todo] Migrate redis kv get sync
#47348 commented on
Mar 26, 2025 • 0 new comments -
Remove unnecessary string literal splits
#47360 commented on
Mar 26, 2025 • 0 new comments -
Return multiple best trials
#47381 commented on
Mar 26, 2025 • 0 new comments -
[PoC] Dashboard with Heads as Actors.
#47414 commented on
Mar 26, 2025 • 0 new comments -
[Core] Refine accelerator resource assessment for better node selection
#47443 commented on
Mar 26, 2025 • 0 new comments -
[core][dashboard] make a flamegraph on event loop lag.
#47491 commented on
Mar 26, 2025 • 0 new comments -
Improvements and Artificial Intelligence-based Improvements for Ray Cross-Language Functionality Testing
#47499 commented on
Mar 26, 2025 • 0 new comments -
Enhancements to Ray Cross-Language Testing Script: Automated Error Detection, Data Input Checking, and System Efficiency Enhancement using Artificial Intelligence
#47558 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] add `SingleAgentRLModuleSpec` alias to `RLModuleSpec`
#47560 commented on
Mar 26, 2025 • 0 new comments -
[Doc][KubeRay] Document Fields that Will Not Trigger Downtime in RayService
#47561 commented on
Mar 26, 2025 • 0 new comments -
uint8_t* data ptr not used.
#47565 commented on
Mar 26, 2025 • 0 new comments -
[Do not merge] Run release tests for export API
#47568 commented on
Mar 26, 2025 • 0 new comments -
[Core][StreamingGenerator] Fix ray.get streaming object hang after node dead.
#47583 commented on
Mar 26, 2025 • 0 new comments -
[RLlib|New API|Inconsistency] LSTM Encoder lacks the output Linear, but stated in the docstring (#47625)
#47626 commented on
Mar 26, 2025 • 0 new comments -
[RPC] Added appropriate keep-alive configuration for Ray's internal RPCs
#44612 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix invalid call of action_sampler_fn to support Keras 3
#44700 commented on
Mar 26, 2025 • 0 new comments -
Update prometheus-grafana.md and add grafana support allowed_origins
#44701 commented on
Mar 26, 2025 • 0 new comments -
Deflake test_threaded_actor
#44709 commented on
Mar 26, 2025 • 0 new comments -
WIP: Futex
#44724 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Make RLlib learner support custom resources
#44732 commented on
Mar 26, 2025 • 0 new comments -
[core] object store data transfer zstd
#44755 commented on
Mar 26, 2025 • 0 new comments -
[DO NOT SUBMIT] debug sleep macos
#44759 commented on
Mar 26, 2025 • 0 new comments -
[core] Add 5s timeout to the log and err subscriber polls.
#44761 commented on
Mar 26, 2025 • 0 new comments -
[Core] Profile Ray start
#44818 commented on
Mar 26, 2025 • 0 new comments -
Add Ray train dashboard head module with mock data
#44819 commented on
Mar 27, 2025 • 0 new comments -
[WIP] Fixes the streaming generator hang on conn break
#44838 commented on
Mar 26, 2025 • 0 new comments -
[DO NOT SUBMIT] Pr 44234
#44839 commented on
Mar 26, 2025 • 0 new comments -
upgrade node to v20, latest LTS
#44860 commented on
Mar 26, 2025 • 0 new comments -
[serve] update long running release tests
#44915 commented on
Mar 26, 2025 • 0 new comments -
[ci][core] Add -flto and -fwhole-program-vtables
#44919 commented on
Mar 26, 2025 • 0 new comments -
[core] add ray.util.concurrent.futures.RayExecutor
#44922 commented on
Mar 26, 2025 • 0 new comments -
Debug tune repro
#44936 commented on
Mar 26, 2025 • 0 new comments -
[ci] Adds promethesus latencies for ray_dashboard_api_requests_duration_seconds_bucket for the Ray Core Tests.
#44944 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Experimenting with streaming
#44959 commented on
Mar 26, 2025 • 0 new comments -
[proof-of-concept][dashboard] One event loop per module
#44964 commented on
Mar 26, 2025 • 0 new comments -
[Data] Support task reassignment in actor_pool_map_operator to improv…
#44968 commented on
Mar 26, 2025 • 0 new comments -
add more execution and iteration metrics to prometheus
#44971 commented on
Mar 26, 2025 • 0 new comments -
blind try on ubuntu upgrade ..
#45427 commented on
Mar 26, 2025 • 0 new comments -
[Hack] Hack the pickle to make relpath in SCRIPT_MODE working dir
#43804 commented on
Mar 26, 2025 • 0 new comments -
[WIP] In Driver CoreWorkerProcess, shutdown with the Exit method
#43833 commented on
Mar 26, 2025 • 0 new comments -
[flakey] Deflakey `darwin://python/ray/tests:test_gcs_fault_tolerance`
#43922 commented on
Mar 26, 2025 • 0 new comments -
[Core] Remove external storage upon sigterm for ray start
#43941 commented on
Mar 26, 2025 • 0 new comments -
remove flaky marker from test
#44033 commented on
Mar 26, 2025 • 0 new comments -
Fix for issue #43411 (BaseException error)
#44038 commented on
Mar 26, 2025 • 0 new comments -
[misc] Reformat train/tune BUILD files
#44151 commented on
Mar 26, 2025 • 0 new comments -
[misc] Reformat RLLib BUILD files
#44153 commented on
Mar 26, 2025 • 0 new comments -
[Jobs] [Dashboard] Changing cluster address resolution in get_address_for_submission_client
#44186 commented on
Mar 26, 2025 • 0 new comments -
[gRPC] Adding retry policies for all gRPC clients
#44234 commented on
Mar 26, 2025 • 0 new comments -
Ray IPv6 support
#44252 commented on
Mar 26, 2025 • 0 new comments -
[CI] Update kind version if it doesn't match pinned version
#44268 commented on
Mar 26, 2025 • 0 new comments -
Debug reference_count
#44271 commented on
Mar 26, 2025 • 0 new comments -
Deflake test
#44333 commented on
Mar 26, 2025 • 0 new comments -
Retry on stream rpc lost
#44358 commented on
Mar 26, 2025 • 0 new comments -
test
#44377 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Offload execution of sync methods to event-loop's default executor
#44406 commented on
Mar 26, 2025 • 0 new comments -
change naming to intel gaudi habana for ray train example
#44412 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Cleanup `examples` folder 02: Add shared value function example script for MultiAgentRLModule.
#44421 commented on
Mar 26, 2025 • 0 new comments -
Deflake test with long sleep
#44433 commented on
Mar 26, 2025 • 0 new comments -
Update bert.ipynb
#44455 commented on
Mar 26, 2025 • 0 new comments -
Remove SimpleImageViewer from EnvRunnerV2
#44466 commented on
Mar 26, 2025 • 0 new comments -
[data] add better support for list-typed fields when using `write_bigquery`
#44564 commented on
Mar 26, 2025 • 0 new comments -
[RLLib] Fix action masking example
#44565 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] DreamerV3 on PyTorch.
#45463 commented on
Mar 26, 2025 • 0 new comments -
[train] Update Torch default timeout_s to use Torch's default timeout
#45501 commented on
Mar 26, 2025 • 0 new comments -
Create a singleton io context and thread, and standalone gcs client on it.
#45524 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] fix VTrace in impala_tf_policy to support Keras 3
#45562 commented on
Mar 26, 2025 • 0 new comments -
[WIP] [Core] Support of reading Working dir from HUAWEI Object Storage Service (OBS)
#45577 commented on
Mar 26, 2025 • 0 new comments -
[core] Eagerly kill idle workers on job finish.
#45633 commented on
Mar 26, 2025 • 0 new comments -
[core][1/2] Add SubscribeAllActors to GcsClient.
#45637 commented on
Mar 26, 2025 • 0 new comments -
[core][2/2] Kill worker on root detached actor died.
#45638 commented on
Mar 26, 2025 • 0 new comments -
[WIP][Jobs] Revisit Job Agent to run Job Supervisors in-process
#45664 commented on
Mar 26, 2025 • 0 new comments -
[Core] Add warning when uploading large working dirs
#45818 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Benchmark data shuffle
#45847 commented on
Mar 26, 2025 • 0 new comments -
Improve code snippet in docs to set up `ray[serve]` gRPC service
#45862 commented on
Mar 26, 2025 • 0 new comments -
MADDPG framework should be TensorFlow
#45863 commented on
Mar 26, 2025 • 0 new comments -
Enable setting OS disk size in Azure
#45867 commented on
Mar 26, 2025 • 0 new comments -
Adds new working dir upload protocol PLASMA, and use it in job submission.
#45880 commented on
Mar 26, 2025 • 0 new comments -
[spark] Fix nvidia-smi hanging issue
#45896 commented on
Mar 26, 2025 • 0 new comments -
Fix ax_client.create_experiment call
#45902 commented on
Mar 26, 2025 • 0 new comments -
Fix malformed `temp_dir` path when connecting Windows workers to cluster with Linux head
#45930 commented on
Mar 26, 2025 • 0 new comments -
[URL] Change the absolute path to a relative path to solve the ingres…
#45933 commented on
Mar 26, 2025 • 0 new comments -
[Data] Remove gaps between tasks in ray data.
#45935 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Group `DeploymentHandle` autoscaling metrics pushes by process
#45957 commented on
Mar 26, 2025 • 0 new comments -
enable easy logging of images to tensorboard
#46068 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] - `"Synchronized"` sampling for multi-agent buffers.
#46083 commented on
Mar 26, 2025 • 0 new comments -
[WIP] poc / hack relpath
#45003 commented on
Mar 26, 2025 • 0 new comments -
[WIP] add env var to enable debug
#45009 commented on
Mar 26, 2025 • 0 new comments -
RuntimeContext support get actor namespace
#45025 commented on
Mar 26, 2025 • 0 new comments -
Add roundtrip (ping-pong) microbenchmarks for accelerated DAG channels
#45064 commented on
Mar 26, 2025 • 0 new comments -
[wip][train][tune] handle s3fs permissions
#45100 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Revisited `JobManager` log fetching infra to avoid blocking the event-loop
#45117 commented on
Mar 26, 2025 • 0 new comments -
[Jobs] Revisit Ray Job execution and monitoring
#45120 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; Tune] Fix default behavior of default tune `CLIReporter` (based on `Algorithm._progress_metrics`).
#45122 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix async (multiprocessing) gymnasium vector envs in `SingleAgentEnvRunner`.
#45144 commented on
Mar 26, 2025 • 0 new comments -
[RFC] Splitted Dashboard Heads.
#45175 commented on
Mar 26, 2025 • 0 new comments -
Add descriptive error message when deployment name not found
#45181 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Removes ray.rpc.ReportEventService and Dashboard head as gRPC server.
#45219 commented on
Mar 26, 2025 • 0 new comments -
[Core] Improve logging during accelerator auto-detection
#45240 commented on
Mar 26, 2025 • 0 new comments -
[core] Change all object_size to uint64_t and use 0 for unknown. Also adds a method `ray.experimental.get_local_object_locations`
#45247 commented on
Mar 26, 2025 • 0 new comments -
grid_search resolution code optimization
#45267 commented on
Mar 26, 2025 • 0 new comments -
[POC][core] GcsClient async binding, aka remove PythonGcsClient.
#45289 commented on
Mar 26, 2025 • 0 new comments -
[Data] add reset pandas index when merge sorted blocks
#45326 commented on
Mar 26, 2025 • 0 new comments -
add links to eks site for neuron examples
#45341 commented on
Mar 26, 2025 • 0 new comments -
Modify Spark on Ray to support Pex and other virtualenvs + direct scr…
#45354 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Reopen cpp test on mac
#45374 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Enhance callbacks test case for EnvRunners; Add (optional) explicit `enable_multi_agent` setting to AlgorithmConfig.
#45385 commented on
Mar 26, 2025 • 0 new comments -
[serve] allow build_serve_application to happen in parallel
#45394 commented on
Mar 26, 2025 • 0 new comments -
add ray debugger references to ray docs
#45414 commented on
Mar 26, 2025 • 0 new comments -
[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource
#45415 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Hide GPU and GRAM columns from clusters and actors table if there are 0 rows with GPUs
#50338 commented on
Mar 26, 2025 • 0 new comments -
[core][1/N] Set gRPC deadline to ReportOCMetrics RPC
#50370 commented on
Mar 26, 2025 • 0 new comments -
[data] add ClickHouse sink
#50377 commented on
Mar 27, 2025 • 0 new comments -
[core] Move `overload remote` for actors
#50412 commented on
Mar 26, 2025 • 0 new comments -
[Autoscaler][V2] Use running node instances to rate-limit upscaling
#50414 commented on
Mar 26, 2025 • 0 new comments -
[tune] Remove loguniform's base
#50415 commented on
Mar 26, 2025 • 0 new comments -
[core][cgraph] Support individual submit_timeout
#50424 commented on
Mar 26, 2025 • 0 new comments -
[core] add RAY_IGNORE_VERSION_MISMATCH when ray start --address
#50513 commented on
Mar 26, 2025 • 0 new comments -
Revert "[core][cgraph] Rework DagRef Destruction (#49818)"
#50529 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Enable spliting and zero padding of Dict observation
#50589 commented on
Mar 26, 2025 • 0 new comments -
[Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on
Mar 27, 2025 • 0 new comments -
[chore] Delete unused build.sh
#50649 commented on
Mar 26, 2025 • 0 new comments -
[doc][core] Fix ray generator code example
#50655 commented on
Mar 26, 2025 • 0 new comments -
[WIP / try out] Use UV for Python 3.13 tests
#50669 commented on
Mar 26, 2025 • 0 new comments -
[core] Cover cpplint for ray/src/ray/stats
#50678 commented on
Mar 26, 2025 • 0 new comments -
Move `pydantic_compat` from `_private` to `_common`
#50683 commented on
Mar 26, 2025 • 0 new comments -
[core] [wip attempt] StatusOr union construction sometimes breaks windows build
#50761 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Ray Collective Communication Lib Support HCCL Backend
#50790 commented on
Mar 26, 2025 • 0 new comments -
[docs] add missing step to install KubeRay in gke-gcs-bucket.md
#50811 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Rebasing materialized dataset on iterator back-pressure is active upon materialization
#50880 commented on
Mar 26, 2025 • 0 new comments -
[CI] Enable pretty-format-java pre-commit hook
#50957 commented on
Mar 26, 2025 • 0 new comments -
[wip] add object detection notebooks
#50965 commented on
Mar 27, 2025 • 0 new comments -
[doc][kuberay]: add `kubectl ray get node` example
#51271 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Adding `transform` utility to `Operator`
#48620 commented on
Mar 26, 2025 • 0 new comments -
[RFC][dashboard] Use aiohttp client for inter dependencies.
#49932 commented on
Mar 26, 2025 • 0 new comments -
[core] minor optimization for JoinPaths
#49946 commented on
Mar 26, 2025 • 0 new comments -
adding distributional critic example
#49949 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; Offline] - Add single learner gpu training with preloading in `OfflinePreLearner`.
#49960 commented on
Mar 26, 2025 • 0 new comments -
Explicit comm
#49979 commented on
Mar 26, 2025 • 0 new comments -
Add Semi-Random Weighting to AutoScaler Node Scheduler
#49983 commented on
Mar 26, 2025 • 0 new comments -
[kuberay] fix deserialisation of custom resources in autoscaler config
#49993 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Remove the dashboard grpc server.
#50021 commented on
Mar 26, 2025 • 0 new comments -
[core] Thread-safe gcs node manager
#50024 commented on
Mar 26, 2025 • 0 new comments -
[Core][Doc] Add support for Cambricon MLU
#50026 commented on
Mar 26, 2025 • 0 new comments -
[Train] Add Cambricon MLU support to Ray Train
#50028 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Move execution loop to the same thread as the constructor of an actor
#50032 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] LearnerConnector pipeline speedup.
#50035 commented on
Mar 26, 2025 • 0 new comments -
Add Cloud Logging example for Ray on GKE
#50060 commented on
Mar 26, 2025 • 0 new comments -
Update multi-agent-envs.rst
#50075 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Make `config.episodes_to_numpy` False by default.
#50077 commented on
Mar 26, 2025 • 0 new comments -
tsan
#50105 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Remove DataSource.ndoes listeners in StateHead with get_all_node_info.
#50122 commented on
Mar 26, 2025 • 0 new comments -
[core][collective] Avoid creation of `gloo_queue` in race condition
#50132 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Move record_dashboard_metrics from MetricsHead to DashboardHead, remove .metrics property and convert MetricsHead.
#50133 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Use cloudpickle to pickle SubprocessModule classes, and convert ServeHead.
#50153 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Disable callbacks callable check for new api stack
#50157 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] Actor and node head
#50159 commented on
Mar 26, 2025 • 0 new comments -
[core] unblocking macos tests by pinning aiohappyeyeballs to version 2.4.8
#51288 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Fix timezones to deal with daylight savings
#51314 commented on
Mar 26, 2025 • 0 new comments -
[CI] Replace `black` with `ruff format`
#51332 commented on
Mar 26, 2025 • 0 new comments -
Deflake
#51338 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] - Add state syncing to EnvRunner sample call in APPO.
#51343 commented on
Mar 26, 2025 • 0 new comments -
[Debugger] Random pick ray debugger port from range
#51344 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Support reporting AMD GPU usage
#51345 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] Throw better error if catalog can not be created for Atari environments, help atari users
#51371 commented on
Mar 26, 2025 • 0 new comments -
[core] Create small objects release test
#51382 commented on
Mar 26, 2025 • 0 new comments -
[core] Remove the unnecessary key `ActorID` of `concurrency_groups_cache_` in TaskReceiver
#51403 commented on
Mar 26, 2025 • 0 new comments -
[data] Implement Spark-like accumulators for Ray Data
#51404 commented on
Mar 26, 2025 • 0 new comments -
Upgrading Arrow dependency to latest stable version
#51440 commented on
Mar 26, 2025 • 0 new comments -
expose ObjectRef from DeploymentResponse
#51444 commented on
Mar 26, 2025 • 0 new comments -
[Data] Add environment variable support for Ray Data execution callbacks.
#51449 commented on
Mar 26, 2025 • 0 new comments -
[serve] move serve image_uri tests to serve CI
#51451 commented on
Mar 26, 2025 • 0 new comments -
[WIP][core][gpu-objects] CollectiveGroupManager
#51460 commented on
Mar 26, 2025 • 0 new comments -
Unify `_private/log.py` and `_private/ray_logging`
#51461 commented on
Mar 26, 2025 • 0 new comments -
[core] upgrading macos CI python 3.9 -> 3.9.2 to enable numpy serialization warnings
#51462 commented on
Mar 26, 2025 • 0 new comments -
[ray.serve.llm] Support vLLM v1
#51490 commented on
Mar 26, 2025 • 0 new comments -
[RLlib|Tune|Train] ValueError: Could not recover from checkpoint as it does not exist anymore
#51515 commented on
Mar 26, 2025 • 0 new comments -
Avoid len(), which causes static batch sizes on export.
#51520 commented on
Mar 26, 2025 • 0 new comments -
Add perf metrics for 2.44.1
#51535 commented on
Mar 26, 2025 • 0 new comments -
Update observability.md
#51567 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Try upgrade cython to 3.1
#50972 commented on
Mar 26, 2025 • 0 new comments -
fix restore BUG "RuntimeError: Expected scalars to be on CPU, got cud…
#50983 commented on
Mar 26, 2025 • 0 new comments -
Fix editorconfig option name
#50993 commented on
Mar 26, 2025 • 0 new comments -
Suppress type error
#50994 commented on
Mar 26, 2025 • 0 new comments -
Improvements to General Debugging guide
#51004 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Schedule `AggregatorActors` via `PlacementGroupSchedulingStrategy` into Learner bundles.
#51017 commented on
Mar 26, 2025 • 0 new comments -
[doc] add jax example
#51040 commented on
Mar 26, 2025 • 0 new comments -
[WIP][core][compiled graphs] Supporting allreduce on tuple of tensors
#51047 commented on
Mar 26, 2025 • 0 new comments -
[Misc]cupy.cuda.nccl.get_unique_id() generic modification.
#51052 commented on
Mar 26, 2025 • 0 new comments -
[Refactor]Rename NCCL-related items to comm_backend
#51061 commented on
Mar 26, 2025 • 0 new comments -
[Docs] Update docs to reflect CPU requests/limits change in KubeRay v1.3
#51072 commented on
Mar 26, 2025 • 0 new comments -
[data] Make Dataset.name/set_name public
#51076 commented on
Mar 26, 2025 • 0 new comments -
[DONOTMERGE] POC for Ray+torch.distributed
#51078 commented on
Mar 26, 2025 • 0 new comments -
Fix the grammar of the OOM killer error messages
#51081 commented on
Mar 26, 2025 • 0 new comments -
[do not merge] Add Daft to the Ray ecosystem page
#51133 commented on
Mar 26, 2025 • 0 new comments -
Reproducing MacOS x86_64 Test Failure w/ Custom Numpy Serializer for ndarrays
#51143 commented on
Mar 26, 2025 • 0 new comments -
[core] Implement a universal printer
#51151 commented on
Mar 26, 2025 • 0 new comments -
[Do Not Merge] Update the Test Script to Debug test_network_failure_e2e Flaky Test
#51153 commented on
Mar 26, 2025 • 0 new comments -
Bump axios from 0.21.4 to 1.8.2 in /python/ray/dashboard/client
#51162 commented on
Mar 26, 2025 • 0 new comments -
Bump jinja2 from 3.1.3 to 3.1.6 in /release
#51216 commented on
Mar 26, 2025 • 0 new comments -
Bump keras from 2.15.0 to 3.9.0 in /python
#51256 commented on
Mar 26, 2025 • 0 new comments -
[Train V2] Fold `v2.LightGBMTrainer` API into the public trainer class as an alternate constructor
#51265 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Add placement strategy to `EnvRunner` creation.
#51267 commented on
Mar 26, 2025 • 0 new comments -
Bump @babel/helpers from 7.19.4 to 7.26.10 in /python/ray/dashboard/client
#51268 commented on
Mar 26, 2025 • 0 new comments -
(WIP) [core][compiled graphs] Unify code paths for NCCL P2P and collectives scheduling
#48649 commented on
Mar 26, 2025 • 0 new comments -
[core][compiled graphs] Inter-execution overlap
#48659 commented on
Mar 26, 2025 • 0 new comments -
[Core]: Fix ConnectionError on Autoscaler CR lookups in K8s clusters …
#48675 commented on
Mar 26, 2025 • 0 new comments -
[Tune] Add OSS Vizier to Ray Tune
#48684 commented on
Mar 26, 2025 • 0 new comments -
[core] Fix building Ray against modern Protobuf versions
#48724 commented on
Mar 26, 2025 • 0 new comments -
[core] Enable STDERR custom formatting
#48742 commented on
Mar 26, 2025 • 0 new comments -
docs: update ray tune section
#48769 commented on
Mar 26, 2025 • 0 new comments -
docs: update ray serve section
#48770 commented on
Mar 26, 2025 • 0 new comments -
removed limit on log sizes via sockets
#48780 commented on
Mar 26, 2025 • 0 new comments -
[Fix][GCS] Implement reconnection for RedisContext
#48781 commented on
Mar 26, 2025 • 0 new comments -
[core][autoscaler]Reset the failure count to avoid RayCluster aborting unexpectedly
#48797 commented on
Mar 26, 2025 • 0 new comments -
[Data] Cleaned up & streamlined boundary sampling sequence to avoid conversion from Numpy to Python objects
#48825 commented on
Mar 26, 2025 • 0 new comments -
[Build][Deps] Add new `ray[azure]` extra package
#48847 commented on
Mar 26, 2025 • 0 new comments -
[core] cpp lint of object_manager
#48878 commented on
Mar 26, 2025 • 0 new comments -
[Autoscaler][Placement Group] Skip placed bundle when requesting resource
#48924 commented on
Mar 26, 2025 • 0 new comments -
[train] Make dataset argument covariant
#48999 commented on
Mar 26, 2025 • 0 new comments -
[core] Lint cpp files in common
#49002 commented on
Mar 26, 2025 • 0 new comments -
Slo track
#49007 commented on
Mar 26, 2025 • 0 new comments -
support to clean worker table with maximum_gcs_dead_worker_cached_count
#49030 commented on
Mar 26, 2025 • 0 new comments -
[Jobs] Add metric to track duration of jobs
#49035 commented on
Mar 26, 2025 • 0 new comments -
[WIP][compiled graphs] Avoid extra data I/O if CPU data is static
#49042 commented on
Mar 26, 2025 • 0 new comments -
[core][compiled-graphs] Very hacky Gloo channel PoC
#49103 commented on
Mar 26, 2025 • 0 new comments -
[Core] Persist the Driver Console Log When Job Execution Not Through Job API
#49452 commented on
Mar 26, 2025 • 0 new comments -
:bug: do not modify user-provided runtime_env
#48021 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; Offline RL] - Enable gpu inference on data workers.
#48041 commented on
Mar 26, 2025 • 0 new comments -
[WIP][core] C++20 upgrade
#48044 commented on
Mar 26, 2025 • 0 new comments -
[core] Add metrics for Task RSS HWM.
#48052 commented on
Mar 26, 2025 • 0 new comments -
[gRPC] Fixing gRPC Server Call to be instantiated immediately for unbounded handlers
#48057 commented on
Mar 26, 2025 • 0 new comments -
[data] preprocessor: use map_batches in MaxAbsScaler, MinMaxScaler, UniformKBinsDiscretizer
#48097 commented on
Mar 26, 2025 • 0 new comments -
[Data] Fix a test that checks the "eliminate_build_output_blocks" optimization
#48119 commented on
Mar 26, 2025 • 0 new comments -
[Data] Fix a bug in the ReorderRandomizeBlocksRule optimization rule
#48258 commented on
Mar 26, 2025 • 0 new comments -
Add kuberay operator addon to cmd in gke-gcs-bucket.md
#48268 commented on
Mar 26, 2025 • 0 new comments -
[doc] Remove unused/unmaintained `doc/source/templates` folder
#48295 commented on
Mar 26, 2025 • 0 new comments -
[doc] fix: Typo and missing import in doc
#48311 commented on
Mar 26, 2025 • 0 new comments -
Fix invalid type for progress_reporter parameter of RunConfig
#48439 commented on
Mar 26, 2025 • 0 new comments -
[Data] Fix block accessors' combine handling of duplicate columns
#48495 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix remote env runner request spam
#48499 commented on
Mar 26, 2025 • 0 new comments -
[runtime env]: Integrating Omnitrace to Ray worker process
#48525 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix Algorithm with tune
#48529 commented on
Mar 26, 2025 • 0 new comments -
Add Adaptive Scaling Feature for Distributed Task Scheduling
#48537 commented on
Mar 26, 2025 • 0 new comments -
Overlap dynamic
#48545 commented on
Mar 26, 2025 • 0 new comments -
[data] add opensearch datasource
#48555 commented on
Mar 26, 2025 • 0 new comments -
[Docs][Collective] Fix examples to use init_collective_group and create_collective_group
#48570 commented on
Mar 26, 2025 • 0 new comments -
[core] Introduces Postable for InternalKVInterface.
#48584 commented on
Mar 26, 2025 • 0 new comments -
Dag bind order execution fix
#48603 commented on
Mar 26, 2025 • 0 new comments -
[Core][Compiled Graph] Execute DAG on Actor's Main Thread
#48608 commented on
Mar 26, 2025 • 0 new comments -
[Tune] Fix pbt restore in synch mode
#48616 commented on
Mar 26, 2025 • 0 new comments -
[core][cgraph] Use threadpool and one io_context for mutable object provider
#49500 commented on
Mar 26, 2025 • 0 new comments -
[train] add test for ScalingConfigV2 import
#49515 commented on
Mar 26, 2025 • 0 new comments -
[ci] Remove redundant ML doctests from running in unit test pipelines
#49516 commented on
Mar 26, 2025 • 0 new comments -
Overlap check deps
#49520 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Add NPU and HPU support to RLlib
#49535 commented on
Mar 26, 2025 • 0 new comments -
[core][cgraph] Use cv instead of busy wait for next version
#49542 commented on
Mar 26, 2025 • 0 new comments -
[core] Minor improvements to core worker get
#49567 commented on
Mar 26, 2025 • 0 new comments -
[core] Don't get dashboard address after each dashboard connection failure
#49584 commented on
Mar 26, 2025 • 0 new comments -
[Core] Streaming generator supports num_returns
#49586 commented on
Mar 26, 2025 • 0 new comments -
[RLlib; docs] Docs do-over (new API stack): New `debugging.rst` page.
#49592 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Support multiple accelerator monitoring and flexible display
#49610 commented on
Mar 26, 2025 • 0 new comments -
[core][docs] Lint some top level core docs
#49703 commented on
Mar 26, 2025 • 0 new comments -
[Core] Add virtual cluster
#49717 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix broken stats accumulation for 'MeanStdFilter' connector.
#49718 commented on
Mar 26, 2025 • 0 new comments -
Update dyn-req-batch.md with style edits
#49725 commented on
Mar 26, 2025 • 0 new comments -
changes to get ray serve responding on REST API calls when distribute…
#49730 commented on
Mar 27, 2025 • 0 new comments -
[DATA]Add custom resources in data autoscaling
#49756 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Flatten dict-typed observations before comparing them.
#49758 commented on
Mar 26, 2025 • 0 new comments -
[KubeRay] support suspending worker groups in KubeRay autoscaler
#49768 commented on
Mar 26, 2025 • 0 new comments -
[ci] remove pins in runtime_env usage in train examples
#49772 commented on
Mar 26, 2025 • 0 new comments -
Pass checkpointable args through in tf_learner
#49861 commented on
Mar 26, 2025 • 0 new comments -
New vsphere provider supporting Supervisor (k8s) cluster.
#49881 commented on
Mar 26, 2025 • 0 new comments -
[autoscaler] Fix potential dead lock in local provider
#49909 commented on
Mar 26, 2025 • 0 new comments -
Update azure.md - Missing azure dependency
#49104 commented on
Mar 26, 2025 • 0 new comments -
Fix memory issues caused by pyarrow.Dataset.to_batches.
#49124 commented on
Mar 26, 2025 • 0 new comments -
[compiled grapn][doc] structure
#49134 commented on
Mar 26, 2025 • 0 new comments -
[core] change all dynamic_pointer_cast to static_pointer_cast.
#49135 commented on
Mar 26, 2025 • 0 new comments -
[core] Gcs asio minor improvements
#49169 commented on
Mar 26, 2025 • 0 new comments -
[core][compiled-graphs] Gloo group
#49187 commented on
Mar 26, 2025 • 0 new comments -
[data] fix nodeName When the network in KubeRay is set to hostnetwork
#49188 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] stop ray submmited job through ui
#49201 commented on
Mar 26, 2025 • 0 new comments -
Fix unpacking zip package treats "../" as the top_level_directory
#49204 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Remove VM cluster autoscaler docker implementation
#49238 commented on
Mar 26, 2025 • 0 new comments -
[data] feat: Implement `ray.data.Dataset.offset`
#49274 commented on
Mar 26, 2025 • 0 new comments -
[Data] Fix numpy to arrow conversion.
#49293 commented on
Mar 27, 2025 • 0 new comments -
[wandb] Use wandb Run as a context manager
#49307 commented on
Mar 26, 2025 • 0 new comments -
[Core] fail to download s3 py modules
#49332 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Improve serve deploy ignore behavior
#49336 commented on
Mar 26, 2025 • 0 new comments -
[Fix][Core] Periodically check log message queue cleared before shutdown
#49337 commented on
Mar 26, 2025 • 0 new comments -
[wip] Moving around
#49345 commented on
Mar 26, 2025 • 0 new comments -
Update tune-search-spaces.rst to correct outdated api use
#49386 commented on
Mar 26, 2025 • 0 new comments -
Adding input validation to ScalingConfig resources_per_worker
#49389 commented on
Mar 26, 2025 • 0 new comments -
[Draft] [spark] Set "HOST_IP" environmental variable for Ray worker nodes
#49403 commented on
Mar 26, 2025 • 0 new comments -
[core][compiled graphs] Support reduce scatter and all gather collective in compiled graph
#49404 commented on
Mar 26, 2025 • 0 new comments -
[core][dashboard] Dashboard head modules as Actors.
#49432 commented on
Mar 26, 2025 • 0 new comments -
[core][compiled-graphs] CachedChannel's inner channel must be provided
#49434 commented on
Mar 26, 2025 • 0 new comments -
[data] fix random_sample return different data in fixed seed
#49443 commented on
Mar 26, 2025 • 0 new comments -
[core] Improve Process management in Raylet
#35252 commented on
Mar 24, 2025 • 0 new comments -
[CI] `windows://python/ray/serve:test_metrics` is failing/flaky on master.
#35452 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Windows CLI, cmd.exe, powershell parsing json arguments JSONDecodeError
#35492 commented on
Mar 24, 2025 • 0 new comments -
[Tune] Better force UTF8 encoding when calling Open method
#34679 commented on
Mar 24, 2025 • 0 new comments -
Ray can't use my resources correctly for parallelizing with OptunaSearch/Pandas/NumPy
#34834 commented on
Mar 24, 2025 • 0 new comments -
Build fails on ppc64le architecture
#4309 commented on
Mar 24, 2025 • 0 new comments -
[autoscaler] "Cannot perform an interactive login from a non TTY device" when trying to use a private docker registry
#7339 commented on
Mar 24, 2025 • 0 new comments -
Invalid memory access in RedisAsioClient/RedisAsyncContext on shutdown
#9074 commented on
Mar 24, 2025 • 0 new comments -
ray::IDLE processes persist if I disconnect and kill master process from IDE
#9528 commented on
Mar 24, 2025 • 0 new comments -
Unable to connect to ray head running on linux from ray worker node on windows
#10362 commented on
Mar 24, 2025 • 0 new comments -
Windows debugging on gdb does not work
#9827 commented on
Mar 24, 2025 • 0 new comments -
[RFC] Logging shutdown process to all Ray components.
#13241 commented on
Mar 24, 2025 • 0 new comments -
[CI] Upload Windows Status to flakey-tests.ray.io
#12168 commented on
Mar 24, 2025 • 0 new comments -
Cannot call remote instance method of a superclass from within a different instance method of the superclass
#10899 commented on
Mar 24, 2025 • 0 new comments -
[Bug] [Core] Unable to schedule fractional gpu jobs
#20933 commented on
Mar 24, 2025 • 0 new comments -
__del__ magic method can't access class properties
#14285 commented on
Mar 24, 2025 • 0 new comments -
[runtime env] raise exception for unsupported runtime_env features on Windows
#21435 commented on
Mar 24, 2025 • 0 new comments -
[Bug] [RLlib] Custom metrics are not reported to Tune
#20938 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] League based PolicyMap across workers impacting scalability via memory use - Question/[Bug]
#21459 commented on
Mar 24, 2025 • 0 new comments -
[Bug] MultiDiscrete very slow
#22507 commented on
Mar 24, 2025 • 0 new comments -
Assertion Error on Seq lens for PPO with Attention only in evaluation.
#22266 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] PPO - ray.rllib.agents.ppo "Put Error"
#24307 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Metrics not reported with Client/Server and env=None
#24601 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Dequeue check() returned False
#25783 commented on
Mar 24, 2025 • 0 new comments -
[core][gpu-objects] Support multiple tensors
#51550 commented on
Mar 25, 2025 • 0 new comments -
[Ray core] surface the node id/ip and other info (task name, etc.) in the stacktrace for a full object store
#50408 commented on
Mar 22, 2025 • 0 new comments -
Building an executable using Ray and Cx_freeze
#42101 commented on
Mar 24, 2025 • 0 new comments -
[dreamerv3] Get error when tuning custom env using dreamerv3
#42107 commented on
Mar 24, 2025 • 0 new comments -
SAC Checkpoint Loading Error
#42651 commented on
Mar 24, 2025 • 0 new comments -
[RLLib] custom TorchRLModule return action_dist but this results in an error
#42786 commented on
Mar 24, 2025 • 0 new comments -
[rllib] How to evaluate rollouts when using frame stacking RLModule?
#42931 commented on
Mar 24, 2025 • 0 new comments -
Core: ray.remote raises ValueError when used on torch IterableDataset
#42914 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Calling AlgorithmConfig.build() for different algorithms inside the same execution context causes hard to debug issues.
#43087 commented on
Mar 24, 2025 • 0 new comments -
Ray with PyInstaller
#27421 commented on
Mar 24, 2025 • 0 new comments -
[Tune][Air] MLFlow Callback is incompatible with PB2
#27783 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again
#28289 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] The “trajectory_view_api” does not support the DQN algorithm, and the program will run in error
#27609 commented on
Mar 24, 2025 • 0 new comments -
[Jobs] Run jobs tests on Windows
#28316 commented on
Mar 24, 2025 • 0 new comments -
[CI] A simple way to reproduce osx/linux/windows CI run failure locally
#29068 commented on
Mar 24, 2025 • 0 new comments -
[Core] inspect_serializability bug - parent object serializable but bound method not
#29423 commented on
Mar 24, 2025 • 0 new comments -
[Core] util.multiprocessing.pool scheduling inefficiencies, blocking behavior in imap and imap_unordered
#29453 commented on
Mar 24, 2025 • 0 new comments -
[Core] Reference leakage somewhere after ray.shutdown()
#30089 commented on
Mar 24, 2025 • 0 new comments -
[Core] util.multiprocessing.pool: imap and imap_unordered blocking on ray.wait even though processes are complete
#29466 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Undesired memory growing when using convolutional neural network
#29699 commented on
Mar 24, 2025 • 0 new comments -
[Core] Access violation on windows 11 when running modin workload
#30493 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Using (gym) discrete and box spaces inside dict observation space throws ValueError: Expected flattened obs shape ...
#31525 commented on
Mar 24, 2025 • 0 new comments -
Windows python can not open file default_worker.py path with space
#33047 commented on
Mar 24, 2025 • 0 new comments -
[Core] RFC: simplify CI testing
#34315 commented on
Mar 24, 2025 • 0 new comments -
[Core] Incorrect detection of cpus
#34846 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] `ExternalMultiAgentEnv` yields error on `log_returns` with `multiagent_done_dict`
#35189 commented on
Mar 24, 2025 • 0 new comments -
[core][gpu-objects] Object contains multiple tensors and/or mix of CPU data and GPU tensors
#51274 commented on
Mar 25, 2025 • 0 new comments -
[core][gpu-objects] Garbage collection for in-actor GPU objects
#51262 commented on
Mar 25, 2025 • 0 new comments -
[Core] Deserialization of generic pydantic models
#47840 commented on
Mar 25, 2025 • 0 new comments -
[Core] RayCheck failed: placement_group_resource_manager_->ReturnBundle(bundle_spec) Status not OK
#51124 commented on
Mar 25, 2025 • 0 new comments -
[Autoscaler][V2] Updating max replicas while Pods are pending causes v2 autoscaler to hang
#50868 commented on
Mar 25, 2025 • 0 new comments -
[data] importing ray.data closes logging handlers, breaking custom logging
#48846 commented on
Mar 26, 2025 • 0 new comments -
[Core|Dataset] Ray job stuck with idle actors with no tasks
#45822 commented on
Mar 26, 2025 • 0 new comments -
[Core] Default concurrency using concurrency groups
#46666 commented on
Mar 26, 2025 • 0 new comments -
[Data] Adding streaming capability for `ray.data.Dataset.unique`
#51207 commented on
Mar 26, 2025 • 0 new comments -
[Data] Filter operation changes schema of dataset
#51217 commented on
Mar 26, 2025 • 0 new comments -
Core: Ray cluster nodes underutilization during autoscaling
#47355 commented on
Mar 26, 2025 • 0 new comments -
[RFC] [Serve] Custom Scaling
#41135 commented on
Mar 26, 2025 • 0 new comments -
[Feedback] Feedback for ray + uv
#50961 commented on
Mar 26, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_cartpole_dqn_gpu is flaky
#46683 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Consider custom resources in best-fit node selection for DeploymentScheduler in Ray Serve
#51361 commented on
Mar 26, 2025 • 0 new comments -
[Core] Python 3.13 wheel
#49738 commented on
Mar 26, 2025 • 0 new comments -
[train v2][tune] Migration Guide
#49454 commented on
Mar 26, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_gpu is flaky
#47309 commented on
Mar 26, 2025 • 0 new comments -
[Data] - read_parquet raises AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 43
#35826 commented on
Mar 26, 2025 • 0 new comments -
[Data] Progress bars are sometimes half completed even after the task is finished
#36490 commented on
Mar 26, 2025 • 0 new comments -
[Data] Progress Bars are incomplete (notebooks + terminal)
#36181 commented on
Mar 26, 2025 • 0 new comments -
[Data] Ray 2.6 created a breaking change in the index of a Modin DataFrame
#37771 commented on
Mar 26, 2025 • 0 new comments -
[data] ray_tqdm does not work with numba
#45538 commented on
Mar 26, 2025 • 0 new comments -
[Data] Row outputted returns 0
#48484 commented on
Mar 26, 2025 • 0 new comments -
Ray component: Core: PoolActor processes hanging
#24784 commented on
Mar 24, 2025 • 0 new comments -
Ray on kubernetes with custom image_uri is broken
#51423 commented on
Mar 24, 2025 • 0 new comments -
[Autoscaler, data] Ray starts `AutoscalingRequester` even when using `enableInTreeAutoscaling`
#51559 commented on
Mar 24, 2025 • 0 new comments -
[Core] build from source code guide is out of date
#43093 commented on
Mar 25, 2025 • 0 new comments -
[cgraph] Support ray.wait() for CompiledDAGRef
#51391 commented on
Mar 25, 2025 • 0 new comments -
[Workflow] Add Azure as one of the storage backend options?
#34910 commented on
Mar 25, 2025 • 0 new comments -
[Train] Intermittent `UnpicklingError` when loading estimator/preprocessor from checkpoint
#33815 commented on
Mar 25, 2025 • 0 new comments -
[Core] std::bad_alloc error using ray.init()
#33525 commented on
Mar 25, 2025 • 0 new comments -
[RLlib] APPO gets extremely slow when run with >1 GPUs.
#50221 commented on
Mar 25, 2025 • 0 new comments -
[RLlib] Tuner.restore() Not Restoring Training
#43266 commented on
Mar 25, 2025 • 0 new comments -
[BUG] Ray dashboard client failed to build
#23548 commented on
Mar 25, 2025 • 0 new comments -
[Core][Streaming generator] Support num_returns.
#46934 commented on
Mar 25, 2025 • 0 new comments -
[llm] Roadmap for Data and Serve LLM APIs
#51313 commented on
Mar 25, 2025 • 0 new comments -
[Serve] Detailed Analysis of Errors Related to 'Ray does not allocate any GPUs on the driver node' && 'No CUDA GPUs are available'
#51242 commented on
Mar 25, 2025 • 0 new comments -
[Ray dashboard] Actors tab does not list actors under certain conditions
#47447 commented on
Mar 25, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] Support for Gradio version 4 on Ray Serve
#49245 commented on
Mar 25, 2025 • 0 new comments -
[RFC] Async request support in Ray Serve
#32292 commented on
Mar 25, 2025 • 0 new comments -
[Serve] Observability for proxy
#48184 commented on
Mar 25, 2025 • 0 new comments -
[Serve] ingress decorator does not work with fastapi.APIRouter arg
#50372 commented on
Mar 25, 2025 • 0 new comments -
[Serve] FastAPI ingress does not work with composable routers
#50373 commented on
Mar 25, 2025 • 0 new comments -
[core][gpu-objects] IPC communication for processes on the same GPU
#51270 commented on
Mar 25, 2025 • 0 new comments -
[core][gpu-objects] CollectiveGroupManager
#51260 commented on
Mar 25, 2025 • 0 new comments -
[core][gpu-objects] CollectiveExecutor
#51261 commented on
Mar 25, 2025 • 0 new comments -
[core][gpu-objects] Driver should order all collective calls to avoid deadlock
#51264 commented on
Mar 25, 2025 • 0 new comments -
[RLlib] Silence external warnings
#24107 commented on
Mar 25, 2025 • 0 new comments -
[Core] Spot preemption related retries do not count towards the max retries
#50640 commented on
Mar 22, 2025 • 0 new comments -
[Core] Plugable storage backend besides Redis
#50656 commented on
Mar 22, 2025 • 0 new comments -
[nsys plugin] How about add an option `name` to nsys dumped file
#50711 commented on
Mar 22, 2025 • 0 new comments -
[Ray Core] Slow scheduling speed with IOError: Broken pipe
#50244 commented on
Mar 22, 2025 • 0 new comments -
[Core] ray.util.ActorPool can get stuck in failing state with one bad actor
#50313 commented on
Mar 22, 2025 • 0 new comments -
[compiled graph] Driver cannot participate in the NCCL group
#50423 commented on
Mar 22, 2025 • 0 new comments -
[Core] Negative available resources
#50739 commented on
Mar 22, 2025 • 0 new comments -
[core] Serve microbenchmarks occasionally crash with segfault or invalid memory access
#50802 commented on
Mar 22, 2025 • 0 new comments -
[Core] Ray Data job hanging with flooded Cancelling stale RPC with seqno 125 < 127 error
#50814 commented on
Mar 22, 2025 • 0 new comments -
[Core] calling remote function in `Future` callback breaks ray
#50980 commented on
Mar 22, 2025 • 0 new comments -
[core] question about ray issue: 51051
#51554 commented on
Mar 22, 2025 • 0 new comments -
[Ray component: Python|runtime_env]Pip install `whl` file faliure when a job reruns in the same cluster
#49059 commented on
Mar 22, 2025 • 0 new comments -
[Data]: Categorizer fails with non uniform distributions
#50792 commented on
Mar 22, 2025 • 0 new comments -
[core] ray.init does not work with local_mode on run_time envs.
#30273 commented on
Mar 23, 2025 • 0 new comments -
[RLlib] Basic PPO script throws obscure error when building RLModule
#51333 commented on
Mar 23, 2025 • 0 new comments -
[core] Split giant ray core C++ targets into small ones
#50586 commented on
Mar 24, 2025 • 0 new comments -
[Train] Deepspeed + Triton 3.2.0 + Torch 2.6.0 has issues with Ray Train
#50406 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Attribute error when trying to compute action after training Multi Agent PPO with New API Stack
#44475 commented on
Mar 24, 2025 • 0 new comments -
[AIR] Sampling support for Ray Train/Ray Data
#31127 commented on
Mar 24, 2025 • 0 new comments -
[Serve/Core] Raylet crash encountered in Serve during Actor termination
#51408 commented on
Mar 24, 2025 • 0 new comments -
[Ray Data | Core ]
#51416 commented on
Mar 24, 2025 • 0 new comments -
Cannot Install ray[rllib] on Python 3.13
#50226 commented on
Mar 24, 2025 • 0 new comments -
for exporting r2d2+lstm to onnx, why is empty state being passed in?
#50166 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Can't create multi-agent external env with new API stack
#46961 commented on
Mar 24, 2025 • 0 new comments -
[core] Ray session conflicts with PyArrow+HDFS
#36415 commented on
Mar 21, 2025 • 0 new comments -
[Core] Limited interoperability with JAX
#46760 commented on
Mar 21, 2025 • 0 new comments -
[Core] Too many threads in ray worker
#36936 commented on
Mar 21, 2025 • 0 new comments -
[Runtime Environment] Remove cached python libs, working dir etc
#47488 commented on
Mar 21, 2025 • 0 new comments -
[Data] Infinite recursion in ansitowin32.py (under tqdm_ray)
#51337 commented on
Mar 21, 2025 • 0 new comments -
[Data] `read_images` benchmark sometimes fails with `ArrowVariableShapedTensorArray` error
#49883 commented on
Mar 21, 2025 • 0 new comments -
[Core] ray raises a "Failed to unpickle serialized exception" error when an OpenAI Authentication Error is raised in task
#43428 commented on
Mar 21, 2025 • 0 new comments -
[core] Set OPENBLAS_NUM_THREADS to number of cpus automatically
#34724 commented on
Mar 21, 2025 • 0 new comments -
[Core] Getting node id for usage in NodeAffinitySchedulingStrategy
#28195 commented on
Mar 21, 2025 • 0 new comments -
[usability][Feature] Throw error message if resolved ip address doesn't match the localhost
#19052 commented on
Mar 21, 2025 • 0 new comments -
[StateAPI] StateAPI request truncates recent elements
#50378 commented on
Mar 21, 2025 • 0 new comments -
[core] ray.remote Decorator's Return Type Cannot Be Determined by Type Checkers
#50410 commented on
Mar 21, 2025 • 0 new comments -
[Feature] [Performance] [Docs] Disabling object spilling is not documented
#21998 commented on
Mar 21, 2025 • 0 new comments -
[Core] classmethod support for actors
#36986 commented on
Mar 21, 2025 • 0 new comments -
[data] Cannot convert dict to PyArrow blocks
#42075 commented on
Mar 21, 2025 • 0 new comments -
[Core] `ray.cancel` multiple ObjectRefs
#24559 commented on
Mar 21, 2025 • 0 new comments -
[core][cluster launcher] Cluster launcher should use `docker run --gpus` if GPUs are autodetected on the worker node
#43231 commented on
Mar 21, 2025 • 0 new comments -
[core][cluster launcher] Local clusters should stop Ray containers on `ray down`
#43232 commented on
Mar 21, 2025 • 0 new comments -
[Core] Enable huge pages for object store
#51352 commented on
Mar 21, 2025 • 0 new comments -
[Dashboard] Fix listing APIs to avoid truncating at 10k entities
#48251 commented on
Mar 21, 2025 • 0 new comments -
[core] Upgrade grpc (Mar.15, 2025)
#51395 commented on
Mar 21, 2025 • 0 new comments -
[core][gpu-objects] Driver tries to get the data from in-actor store
#51272 commented on
Mar 21, 2025 • 0 new comments -
[RLlib] Possible bug in TorchSquashedGaussian plus associated feature request
#51544 commented on
Mar 21, 2025 • 0 new comments -
[Core] get_user_temp_dir() Doesn't Honor the User Specified Temp Dir
#51218 commented on
Mar 21, 2025 • 0 new comments -
[Core] Please provide better message where 'RuntimeError: Failed to unpickle serialized exception'
#49885 commented on
Mar 21, 2025 • 0 new comments -
[Tune] Cannot run the QuickStart example code on windows after installing Ray in conda enviroment, reporting FileNotFoundError
#46827 commented on
Mar 24, 2025 • 0 new comments -
RLlib: dist_class is missed while I try to use Policy.learn_on_batch()
#47011 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Flatten observations example doesn't work
#47127 commented on
Mar 24, 2025 • 0 new comments -
Issue after pip install of ray tune
#47266 commented on
Mar 24, 2025 • 0 new comments -
Install Ray version 1.5.2
#46776 commented on
Mar 24, 2025 • 0 new comments -
ray issue
#47177 commented on
Mar 24, 2025 • 0 new comments -
[RLLib] Expected scalars to be on CPU, got cuda:0 instead
#35640 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] PPO Memory Leak on Uneven CNN (conv) filters
#35866 commented on
Mar 24, 2025 • 0 new comments -
[Rllib][Tune] AttributeError: 'TorchCategorical' object has no attribute 'log_prob' with PB2
#35923 commented on
Mar 24, 2025 • 0 new comments -
[Tune] PB2 Checkpoint/Sync Path Compatibility with Windows
#36370 commented on
Mar 24, 2025 • 0 new comments -
[Release process] Validating and uploading wheels is a pain and a error prone
#36522 commented on
Mar 24, 2025 • 0 new comments -
[<Ray component: cluster>] Urllib3 warning messages cannot be blocked in Ray
#36577 commented on
Mar 24, 2025 • 0 new comments -
[JAVA] Ray.init() failed when JNI load
#36637 commented on
Mar 24, 2025 • 0 new comments -
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-68: character maps to <undefined>
#36767 commented on
Mar 24, 2025 • 0 new comments -
Unable to build Ray on Power (Error: key "3.9.16" not found in dictionary)
#37889 commented on
Mar 24, 2025 • 0 new comments -
[RLlib|Tune] Cannot restore tune checkpoints in algorithm
#39785 commented on
Mar 24, 2025 • 0 new comments -
[Core] Ray is slower than serial python
#40184 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] Torch Autoregressive Example does not work with gpu
#38645 commented on
Mar 24, 2025 • 0 new comments -
[Tune] Font color in dark theme Jupyter Notebooks in VS Code
#40317 commented on
Mar 24, 2025 • 0 new comments -
CheckpointConfig does not work on Windows
#37226 commented on
Mar 24, 2025 • 0 new comments -
[PPOConfig] Utilising new API/models without matching documentation
#40201 commented on
Mar 24, 2025 • 0 new comments -
[core] Recent windows test flakiness
#38413 commented on
Mar 24, 2025 • 0 new comments -
[tune] Full /tmp/ folder, ray does not clean up
#41202 commented on
Mar 24, 2025 • 0 new comments -
[Core] StreamingObjectRefGenerator not working over network
#41556 commented on
Mar 24, 2025 • 0 new comments -
Upgrade Windows CI docker image to use Windows 11 and more recent toolchains.
#49830 commented on
Mar 24, 2025 • 0 new comments -
Upgrade windows CI AMI to use Windows 11
#49829 commented on
Mar 24, 2025 • 0 new comments -
[RAY TRAIN] Force use of gloo in Windows
#49778 commented on
Mar 24, 2025 • 0 new comments -
[RLlib][Windows] Windows Invalid Directory Name Error in Ray RLlib
#49477 commented on
Mar 24, 2025 • 0 new comments -
[Data] Transient Parquet Fragment Serialization Error
#49082 commented on
Mar 24, 2025 • 0 new comments -
BUILD: patch zlib for macos and protobuf for windows
#48794 commented on
Mar 24, 2025 • 0 new comments -
[Distributed Debugger] Newly added breakpoint not works: Breakpoint in file that does not exist
#48778 commented on
Mar 24, 2025 • 0 new comments -
[Ray + YOLOv8] YOLOv8 model.tune
#47859 commented on
Mar 24, 2025 • 0 new comments -
[tune] Repeated runs don't get averaged by search algorithm
#47758 commented on
Mar 24, 2025 • 0 new comments -
|RLlib] New API Stack: "local_gpu_idx 0 is not a valid GPU id or is not available."
#47364 commented on
Mar 24, 2025 • 0 new comments -
[Core] Warning message output as error cannot be filtered/hidden; unexposed environmental variable
#43264 commented on
Mar 24, 2025 • 0 new comments -
[BUG] Ray crashes my python process when the connected kernel goes away.
#43280 commented on
Mar 24, 2025 • 0 new comments -
[RLlib + Tune] PermissionError: [WinError 5] Access is denied: '../.tmp_generator' -> '..basic-variant-state-..' while training with ``Tuner``
#43702 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] step() function called too early, could lead to inconsistencies
#44290 commented on
Mar 24, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_actor_retry is flaky
#43845 commented on
Mar 24, 2025 • 0 new comments -
[<Ray component: syncer.py.>] Last sync command failed with the following error
#44320 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] make_multi_callbacks with new API stack error
#44386 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] EagerTFPolicyV2 wrongly calls overridden action_sampler_fn
#44671 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] New API Stack: Action masking not working with wrapper, default encoder config issue
#44780 commented on
Mar 24, 2025 • 0 new comments -
Ray Weekly Release
#44276 commented on
Mar 24, 2025 • 0 new comments -
[RLlib] TypeError converting batch (INFOS) to torch tensor with ConnectorV2
#44478 commented on
Mar 24, 2025 • 0 new comments -
tmp directory path issue between Windows client and Linux Ray cluster head
#45010 commented on
Mar 24, 2025 • 0 new comments -
Ray tune on Mac M2/M1 never stop
#45797 commented on
Mar 24, 2025 • 0 new comments -
[<Ray component: Serve>] Worker node is killed after starting with reason of missing too many heartbeat checks
#46548 commented on
Mar 24, 2025 • 0 new comments -
[tune] Can't find driver_artifacts file
#46607 commented on
Mar 24, 2025 • 0 new comments -
[WIP][Feature commit] Initial commit for supporting IPv6 stack in Ray Clus…
#40332 commented on
Mar 27, 2025 • 0 new comments -
[core] Tidying up mmap and munmap a bit.
#40334 commented on
Mar 26, 2025 • 0 new comments -
[Ray Train] Implement strict even-split of training workers for pretraining
#40442 commented on
Mar 26, 2025 • 0 new comments -
Add dolly v2 instruction tuning ray train
#40455 commented on
Mar 26, 2025 • 0 new comments -
[Doc] Add note in `ray submit` doc to recommend Ray Job API
#40500 commented on
Mar 26, 2025 • 0 new comments -
[dashboard] ignore reinit error when getting dashboard url
#40545 commented on
Mar 26, 2025 • 0 new comments -
[tune] link placement group doc
#40590 commented on
Mar 26, 2025 • 0 new comments -
[Train] Support rank_zero_only uploading for Lightning RayTrainReportCallback
#40639 commented on
Mar 26, 2025 • 0 new comments -
[Core] Add observability support to AcceleratorManager
#40749 commented on
Mar 26, 2025 • 0 new comments -
[core] Fix windows conda activate with conda.bat as executable in conda path
#40779 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Fix Windows unit tests
#40812 commented on
Mar 27, 2025 • 0 new comments -
[Core] [Cluster Launcher] Rename min/max_workers to min/max_worker_nodes
#40835 commented on
Mar 26, 2025 • 0 new comments -
[Doc] Clarify that a recent version of nsight is needed
#40846 commented on
Mar 26, 2025 • 0 new comments -
[Core] Add logs upon abrupt failure code path
#40849 commented on
Mar 26, 2025 • 0 new comments -
[docs] Documentation fixes (logging and profiling)
#40915 commented on
Mar 26, 2025 • 0 new comments -
[RFC v3] Ray Client2
#40990 commented on
Mar 26, 2025 • 0 new comments -
[WIP] accelerated DAG
#40991 commented on
Mar 26, 2025 • 0 new comments -
WIP Do Not Merge
#41025 commented on
Mar 26, 2025 • 0 new comments -
Adapt the joblib backend for compatibility with `return_as=generator`
#41028 commented on
Mar 26, 2025 • 0 new comments -
TPU pod autoscaling based on the TpuCommandRunner
#41065 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Add SUPER algorithm.
#41079 commented on
Mar 26, 2025 • 0 new comments -
[docs][Serve] add text about pip-pack installation
#41088 commented on
Mar 26, 2025 • 0 new comments -
[Core] mig auto detection
#41103 commented on
Mar 26, 2025 • 0 new comments -
WIP recharts custom charting library
#41140 commented on
Mar 26, 2025 • 0 new comments -
Fix docker gpu 2
#42426 commented on
Mar 26, 2025 • 0 new comments -
Release test sort.regular failed
#50417 commented on
Mar 27, 2025 • 0 new comments -
[docs] try enabling nitpicky
#39448 commented on
Mar 26, 2025 • 0 new comments -
Fix for appo_torch_policy.py when used with attention_net
#39520 commented on
Mar 26, 2025 • 0 new comments -
[CherryPick][Serve] Ignore cancel request when receving websocket.accept message (#39413)
#39625 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Issue 38560: New API (Learner) stack does not properly count steps trained/sampled.
#39628 commented on
Mar 26, 2025 • 0 new comments -
[DEBUG] metric
#39659 commented on
Mar 26, 2025 • 0 new comments -
Upgrade default AWS DLAMI
#39721 commented on
Mar 26, 2025 • 0 new comments -
[RLLib-contrib] Implementation of RED-Q (Ensemble SAC) Algorithm in PyTorch
#39747 commented on
Mar 26, 2025 • 0 new comments -
[core] Use futures to synchornize `parallel_memcopy`
#39755 commented on
Mar 26, 2025 • 0 new comments -
fix: add cython async detection
#39762 commented on
Mar 26, 2025 • 0 new comments -
[core][RFC] http based pure external client
#39771 commented on
Mar 26, 2025 • 0 new comments -
[Data] Consolidate default fault tolerance options
#39797 commented on
Mar 26, 2025 • 0 new comments -
[core] Fix a corner case where the RPC never return
#39801 commented on
Mar 26, 2025 • 0 new comments -
[Cluster launcher] Make cluster Ray version match client Ray version by default
#39812 commented on
Mar 26, 2025 • 0 new comments -
[core] Default actor object's callable method to ActorMethod.remote
#39826 commented on
Mar 26, 2025 • 0 new comments -
[Logging] Fix Deduplication URL
#39830 commented on
Mar 26, 2025 • 0 new comments -
[data] Fix map_batches on datasets with nested lists
#39869 commented on
Mar 26, 2025 • 0 new comments -
[CI] [Doc] Add reminder to install setup hooks for linter
#39888 commented on
Mar 26, 2025 • 0 new comments -
[Cluster launcher] disable verbose logs by default
#39930 commented on
Mar 26, 2025 • 0 new comments -
static DAGs
#39956 commented on
Mar 26, 2025 • 0 new comments -
[data] Make exceptions consistent when falling back to pandas
#39969 commented on
Mar 26, 2025 • 0 new comments -
[core][RFC v2] HTTP based Ray Client
#40085 commented on
Mar 26, 2025 • 0 new comments -
RFC: Add Julia Language support
#40098 commented on
Mar 26, 2025 • 0 new comments -
[train] update some imports from ray.air to ray.train
#40171 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Adding action space scaling to Gaussian noise in exploration
#40281 commented on
Mar 26, 2025 • 0 new comments -
[WIP][Streaming] Cleaning up streaming sequence
#42443 commented on
Mar 26, 2025 • 0 new comments -
WIP [data] A streaming compatible implementation of repartition-by-column
#42477 commented on
Mar 26, 2025 • 0 new comments -
[tune] remove tensorboardX upper bound
#42581 commented on
Mar 26, 2025 • 0 new comments -
allow victoria metrics response message
#42620 commented on
Mar 26, 2025 • 0 new comments -
[WIP][Serve] Revisited cancellation handling in Proxy to make sure response generator is properly cancelled
#42665 commented on
Mar 26, 2025 • 0 new comments -
[DO NOT REVIEW, LONG TERM PR FOR CI] Pinterest main branch 2.9.1
#42672 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Experimental batching support in Streaming Generators
#42825 commented on
Mar 26, 2025 • 0 new comments -
Fix None exception in evaluate.
#42858 commented on
Mar 26, 2025 • 0 new comments -
[Data] Block compression for ArrowBlock
#42859 commented on
Mar 27, 2025 • 0 new comments -
reduce lock mutex scope
#43067 commented on
Mar 26, 2025 • 0 new comments -
[Core/Accelerated DAG] Support Gloo-based backend using Ray collective group.
#43096 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] ConnectorV2 API: Add heuristic action logits mixin example script.
#43107 commented on
Mar 26, 2025 • 0 new comments -
[Core] ray.remote raises ValueError when used on torch IterableDataset
#43117 commented on
Mar 26, 2025 • 0 new comments -
[docs] Add antipattern for nested ray.get
#43184 commented on
Mar 26, 2025 • 0 new comments -
[docs][clusters] Improve instructions for GPU autodetection and manual cluster launching
#43219 commented on
Mar 26, 2025 • 0 new comments -
Release performance regression 2.9.2/2.9.3
#43235 commented on
Mar 26, 2025 • 0 new comments -
[Build] Add build for RH
#43335 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Use 3.9 in macos.
#43351 commented on
Mar 26, 2025 • 0 new comments -
[WIP] prepend pickled function co_fileanme with "<ray remote>"
#43359 commented on
Mar 26, 2025 • 0 new comments -
verify windows wheels.
#43442 commented on
Mar 26, 2025 • 0 new comments -
[Doc] Add RAY_REDIS_CA_CERT description for GCS fault tolerance
#43478 commented on
Mar 26, 2025 • 0 new comments -
[Core][CLI] fix ray status long decimal numbers
#43480 commented on
Mar 26, 2025 • 0 new comments -
[core] Fix max_calls option when used on a worker that is part of a workflow
#43700 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Core worker shutdown then disconnect
#43759 commented on
Mar 26, 2025 • 0 new comments -
[Core] gpu memory scheduling prototype
#41147 commented on
Mar 26, 2025 • 0 new comments -
[RFC] Ray Client2 with original Ray APIs
#41323 commented on
Mar 26, 2025 • 0 new comments -
[core] Refactors the delayed task resubmission.
#41351 commented on
Mar 26, 2025 • 0 new comments -
Relax check_version_info to check for bytecode compatibility
#41373 commented on
Mar 26, 2025 • 0 new comments -
Update metrics.py
#41385 commented on
Mar 27, 2025 • 0 new comments -
[Core] Return the correct task ID when get_runtime_context is used in a background thread
#41397 commented on
Mar 26, 2025 • 0 new comments -
[serve] Adjust the doc of the Serve Java API
#41398 commented on
Mar 26, 2025 • 0 new comments -
[core] Vendor aiohttp and aiosignal for Ray.
#41426 commented on
Mar 26, 2025 • 0 new comments -
[train] update XGBoost model format to UBJ
#41442 commented on
Mar 26, 2025 • 0 new comments -
Feat/metric validation
#41478 commented on
Mar 26, 2025 • 0 new comments -
[Cluster Launcher] Update head node commands to refer to which node they can be run from
#41490 commented on
Mar 26, 2025 • 0 new comments -
docs: add user guide on KubeRay webhooks
#41527 commented on
Mar 26, 2025 • 0 new comments -
[Core] Track resource per instance [1/n]
#41582 commented on
Mar 26, 2025 • 0 new comments -
[doc] Add documentation guide for MPI on Ray.
#41626 commented on
Mar 26, 2025 • 0 new comments -
[RFC v4] Ray Client2.
#41803 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Should specify the time range in job detail page for load the cluster status and scale metrics
#41828 commented on
Mar 26, 2025 • 0 new comments -
add disk throughput test
#41882 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Significant performance improvement with curriculum learning when using a high number of rollout workers
#41910 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Multinode dag
#42059 commented on
Mar 26, 2025 • 0 new comments -
Fix a bug where start_time could be None leading to a crash in TuneTerminalReporter
#42078 commented on
Mar 26, 2025 • 0 new comments -
[draft] 100tb shuffle experimental
#42086 commented on
Mar 26, 2025 • 0 new comments -
[WIP] Multinode DAG minus FT changes
#42173 commented on
Mar 26, 2025 • 0 new comments -
[Tune][Air] Fix MLflowLoggerCallback to enable its use with PBT (#27783)
#42182 commented on
Mar 26, 2025 • 0 new comments -
[rllib_contrib] Lagrangian PPO
#42365 commented on
Mar 26, 2025 • 0 new comments -
[WIP][Tracing] Fixing tracing context injection
#42384 commented on
Mar 26, 2025 • 0 new comments -
Release test sort.chaos failed
#49765 commented on
Mar 27, 2025 • 0 new comments -
Release test random_shuffle.chaos failed
#49395 commented on
Mar 27, 2025 • 0 new comments -
Release test random_shuffle.regular failed
#49383 commented on
Mar 27, 2025 • 0 new comments -
[Train] Crash at end of training
#51527 commented on
Mar 27, 2025 • 0 new comments -
[<Ray component: Core>] num_gpus not working with ROCM devices
#46563 commented on
Mar 27, 2025 • 0 new comments -
[Data] ray.data.from_torch fails on datasets with variable shaped images
#50229 commented on
Mar 27, 2025 • 0 new comments -
CI test linux://python/ray/tune:test_train_v2_integration is flaky
#49930 commented on
Mar 27, 2025 • 0 new comments -
[Core] Ray Label Selector API Implementation Tracker
#51564 commented on
Mar 27, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_actor_retry2 is flaky
#47415 commented on
Mar 27, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_reference_counting_2 is flaky
#45964 commented on
Mar 27, 2025 • 0 new comments -
[train] add model (pipeline) parallelism example
#22894 commented on
Mar 26, 2025 • 0 new comments -
[Air] add Jax trainer
#25385 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] Fix Issue #25316: unconfigurable `dist_dim` for custom multi-action distributions
#25490 commented on
Mar 26, 2025 • 0 new comments -
[Core] Allow task retry for `ray.cancel`
#26254 commented on
Mar 26, 2025 • 0 new comments -
[RLLib][Air] MLFlow parsing of RLLib evaluation and custom metrics
#26711 commented on
Mar 26, 2025 • 0 new comments -
[State API] Add input & output size to the task API
#31898 commented on
Mar 26, 2025 • 0 new comments -
[core][bugfix] catch BaseException
#32105 commented on
Mar 26, 2025 • 0 new comments -
[core] Add support for one log per worker pool worker
#33167 commented on
Mar 26, 2025 • 0 new comments -
[Core] Allow user to specify DASHBOARD_AGENT_LISTEN_PORT
#34886 commented on
Mar 26, 2025 • 0 new comments -
Refactor run_release_test.sh
#35065 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Increase the GCS timeout
#35330 commented on
Mar 26, 2025 • 0 new comments -
UX: rework stdout error when ray fails to start
#35378 commented on
Mar 26, 2025 • 0 new comments -
[Dashboard] Provide a job dashboard URL link instead of the dashboard link when ray.init is called.
#35427 commented on
Mar 26, 2025 • 0 new comments -
[1/2] [UI] Event Observability, add a new event table
#38638 commented on
Mar 27, 2025 • 0 new comments -
[Data] A proper way to handle random seed in random_sample() and other sampling for reproducibility
#50638 commented on
Mar 26, 2025 • 0 new comments -
[Ray Data] Introduce "Key concepts" to Ray Data doc
#50018 commented on
Mar 26, 2025 • 0 new comments -
[Data] Refactor `ParquetDatasink._write_partition_files` to use `pyarrow.parquet.write_to_dataset`
#50502 commented on
Mar 26, 2025 • 0 new comments -
[Data]Extend Ray Data with read/write hive
#51094 commented on
Mar 26, 2025 • 0 new comments -
[Data] supper passing `pyarrow.dataset.Expression`s to `Dataset.filter`'s `expr`
#50799 commented on
Mar 26, 2025 • 0 new comments -
[data] RefBundle doesn't always eagerly free data
#37910 commented on
Mar 26, 2025 • 0 new comments -
[Data] Add support for all Spark RDD transformations and actions
#10983 commented on
Mar 26, 2025 • 0 new comments -
[Data, Train] ray::SplitCoordinator is very slow at every epoch + takes up too much memory
#49190 commented on
Mar 26, 2025 • 0 new comments -
[Ray Data] Support S3 Tables
#49083 commented on
Mar 26, 2025 • 0 new comments -
CI test linux://rllib:examples/connectors/mean_std_filtering_ppo is flaky
#47435 commented on
Mar 26, 2025 • 0 new comments -
[Ray debugger] Unable to use debugger on slurm cluster
#51157 commented on
Mar 27, 2025 • 0 new comments -
[Core] Runtime env working_dir validation
#51380 commented on
Mar 27, 2025 • 0 new comments -
[<Ray component: java>] expose ObjectRef in DeploymentResponse class
#51445 commented on
Mar 27, 2025 • 0 new comments -
[Ray Serve]: "RuntimeError: No CUDA GPUs are available" when running vllm with ray
#51193 commented on
Mar 27, 2025 • 0 new comments -
[Serve] On kuberay, vLLM-0.7.2 reports "No CUDA GPUs are available" while vllm-0.6.6.post1 works fine when deploy rayservice
#51154 commented on
Mar 27, 2025 • 0 new comments -
[Ray Serve] Expose public interface for user to customize the router
#50465 commented on
Mar 27, 2025 • 0 new comments -
[Serve] Proxy actor not started on worker node when using kuberay
#50349 commented on
Mar 27, 2025 • 0 new comments -
[Serve] make various default values of `AutoscalingConfig.max_replicas` consistent and >1
#50222 commented on
Mar 27, 2025 • 0 new comments -
[Serve] exceptions raised by request timeout are inconsistent
#50992 commented on
Mar 27, 2025 • 0 new comments -
CI test darwin://python/ray/tests:test_threaded_actor is flaky
#44663 commented on
Mar 27, 2025 • 0 new comments -
[Core/Data] Name GPU Worker Processes
#40529 commented on
Mar 27, 2025 • 0 new comments -
how to solve this problem
#50721 commented on
Mar 27, 2025 • 0 new comments -
[distributed debugger] exception in regular remote worker function leading to access violation when debugger connects
#51010 commented on
Mar 27, 2025 • 0 new comments -
raylet exited immediately because dashboard agent fialed
#49162 commented on
Mar 27, 2025 • 0 new comments -
RLlib: beta1 as a Tensor is not supported for capturable=False and foreach=True
#51560 commented on
Mar 27, 2025 • 0 new comments -
[2/2] [API] Event Observability, add a new event table
#38708 commented on
Mar 26, 2025 • 0 new comments -
[ci][release][core] rewrite RuntimeEnvAgentClient with reusable TCP connection, also test_many_runtime_envs.py with env vars
#38772 commented on
Mar 26, 2025 • 0 new comments -
[TEST] DEBUG
#38798 commented on
Mar 26, 2025 • 0 new comments -
[Data] Add `read_delta` API to read Delta format files
#38813 commented on
Mar 26, 2025 • 0 new comments -
[Core][Label scheduling 8/n]Add length and illegal letters validation to the node labels
#38824 commented on
Mar 26, 2025 • 0 new comments -
Another debug
#38842 commented on
Mar 26, 2025 • 0 new comments -
[Serve] Add more log in the router init step
#38933 commented on
Mar 27, 2025 • 0 new comments -
updates to setup-dev.py to work around the types.py import issues
#38948 commented on
Mar 27, 2025 • 0 new comments -
[WIP][Core] Unflake actor-cancel-test
#38975 commented on
Mar 27, 2025 • 0 new comments -
[WIP] Streaming Generator + actor task lineage reconstruction
#38982 commented on
Mar 27, 2025 • 0 new comments -
[Test] Fix torch dist nccl test
#38986 commented on
Mar 26, 2025 • 0 new comments -
[experiment] rewrite PythonGcsClient with GcsClient
#39010 commented on
Mar 26, 2025 • 0 new comments -
[spark] Improve Ray node memory config calculation logic
#39149 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] Support terminated and truncated in ExternalMultiAgentEnv
#39175 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] DreamerV3: Fix restore from checkpoint functionality
#39209 commented on
Mar 26, 2025 • 0 new comments -
chore: update stale link and comment in tracing_helper.py
#39239 commented on
Mar 27, 2025 • 0 new comments -
Yuming test
#39242 commented on
Mar 26, 2025 • 0 new comments -
[Core] Fix get_next_unordered and get_next
#39250 commented on
Mar 26, 2025 • 0 new comments -
[Core][Observability] Add the scheduling_strategy field to the ActorInfo for the "get actor info" API
#39256 commented on
Mar 26, 2025 • 0 new comments -
Test p
#39297 commented on
Mar 26, 2025 • 0 new comments -
fix typos in router.py
#39301 commented on
Mar 26, 2025 • 0 new comments -
[templates/04_finetuning_llms_with_deepspeed] pin transformers to 4.31.0
#39372 commented on
Mar 26, 2025 • 0 new comments -
[Serve][Debug] websocket test
#39389 commented on
Mar 26, 2025 • 0 new comments -
Update pettingzoo_env.py
#39431 commented on
Mar 26, 2025 • 0 new comments -
[WIP][Core]Add batch remote api for batch submit actor task
#35597 commented on
Mar 26, 2025 • 0 new comments -
[Tune] Add optimizer kwargs for `SkOptSearch`
#36041 commented on
Mar 26, 2025 • 0 new comments -
[autoscaler v2][6/n] introduce instance manager
#36066 commented on
Mar 26, 2025 • 0 new comments -
add setting s3 endpoint-url via env var
#36114 commented on
Mar 27, 2025 • 0 new comments -
Revert "Revert "[Core] Support Arrow zerocopy serialization in object…
#36153 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] Update `check_env` in env.py
#36463 commented on
Mar 27, 2025 • 0 new comments -
[build_base] coroutine cpp
#36513 commented on
Mar 26, 2025 • 0 new comments -
[ci] remove is_automated_build in setup.py
#36547 commented on
Mar 26, 2025 • 0 new comments -
[RLlib] fix PPOConfig warning
#36595 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] fix custom policy examples
#36600 commented on
Mar 27, 2025 • 0 new comments -
[RLlib] Fix A3C use_critic in `rllib_contrib`
#36613 commented on
Mar 27, 2025 • 0 new comments -
WIP python protobuf removal
#36856 commented on
Mar 26, 2025 • 0 new comments -
Add Bazel Steward for dependency management
#36863 commented on
Mar 26, 2025 • 0 new comments -
[Runtime Env] working dir refactor
#36953 commented on
Mar 26, 2025 • 0 new comments -
revert fix: pin libffi=3.3 for base-deps #33294
#37088 commented on
Mar 26, 2025 • 0 new comments -
[core] Remove grpc ClientCallTag
#37140 commented on
Mar 26, 2025 • 0 new comments -
[dag] Show both lib dependency installation instructions on import failure.
#37236 commented on
Mar 27, 2025 • 0 new comments -
obj scale down
#37687 commented on
Mar 26, 2025 • 0 new comments -
[Core] Network benchmark ip
#37810 commented on
Mar 27, 2025 • 0 new comments -
[Data] Add support for ORC format
#37891 commented on
Mar 27, 2025 • 0 new comments -
Enable mixed docker + non-docker clusters
#37968 commented on
Mar 26, 2025 • 0 new comments -
[autoscaler] Use `bash` instead of `/bin/bash`
#38105 commented on
Mar 26, 2025 • 0 new comments -
[tune] Fix error when move file with different disk types
#38403 commented on
Mar 27, 2025 • 0 new comments -
Add Apple silicon GPU(mps) support to ray
#38464 commented on
Mar 26, 2025 • 0 new comments -
fix
#38623 commented on
Mar 26, 2025 • 0 new comments