Pulse · ray-project/ray · GitHub

June 15, 2025 – June 22, 2025

Overview

138 Active pull requests

82 Active issues

Could not load contribution data

Please try again later

1 Release published by 1 person

ray-2.47.1 Ray-2.47.1
published Jun 18, 2025

77 Pull requests merged by 38 people

[Serve.llm][P/D] Fix health check in prefill disagg
#53937 merged Jun 22, 2025
[Test][KubeRay] Update KubeRay version to v1.4.0 for autoscaler tests
#53974 merged Jun 22, 2025
[core] Fix ActorClass.remote return typing and expose Actor class methods to static analysis
#53986 merged Jun 21, 2025
[core] Use core worker client pool in GCS
#53654 merged Jun 21, 2025
[core] Revert container tests to medium size instance
#53966 merged Jun 21, 2025
Fix ray import error when both ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES are set
#53757 merged Jun 20, 2025
[core] Making NodeManager use ILocalTaskManager instead of TaskManager.
#53961 merged Jun 20, 2025
defer loading csat so gtag loads first
#53968 merged Jun 20, 2025
fix ga4 events
#53967 merged Jun 20, 2025
[train][template] Remove clock emoji which does not always render well
#53965 merged Jun 20, 2025
[core][gpu-objects] Support ray.get on the driver process for GPU objects
#53902 merged Jun 19, 2025
[kuberay] Update helm install command in prometheus doc to set serviceMonitor release=prometheus
#53952 merged Jun 19, 2025
[Docs] Fix async code in serving notebook
#53864 merged Jun 19, 2025
[core][rocm] Allow CUDA_VISIBLE_DEVICS and HIP_VISIBLE_DEVICES
#53531 merged Jun 19, 2025
[train][template] Pip install with python block instead
#53928 merged Jun 19, 2025
[Data] Refactor Planner to avoid storing plan-specific state
#53955 merged Jun 19, 2025
[core] Avoid unnecessary deserialization/serialization of CallerWorkerId
#53939 merged Jun 19, 2025
[serve] add ability to track child requests
#53941 merged Jun 19, 2025
[Doc][KubeRay] Add a doc for scheduler plugins
#53846 merged Jun 19, 2025
[core][telemetry/08] record counter metric e2e
#53449 merged Jun 19, 2025
[HashShuffle] - Add warnings for when there are insufficient resources for Aggregators
#53705 merged Jun 19, 2025
[Data] Join release tests
#53903 merged Jun 19, 2025
[docs][Serve] Add clarification for health check and FT of serve deployments
#53944 merged Jun 19, 2025
fastapi and streaming tests use get applications api
#53949 merged Jun 19, 2025
[RLlib; docs] Fix docstring example for custom MultiRLModule with shared encoder.
#53912 merged Jun 19, 2025
[Data] Prevent filename collisions on write
#53890 merged Jun 19, 2025
[data] fix flakey schema
#53901 merged Jun 19, 2025
[Data] Fixed BlockMetadata derivation for Read operator
#53908 merged Jun 19, 2025
[core] Fix flaky test_worker_exit_intended_user_exit
#53909 merged Jun 19, 2025
fix the bash code run error in notebook
#53900 merged Jun 19, 2025
[Docs] Fix issues with e2e audio tutorial
#53932 merged Jun 19, 2025
[train] Cleanups for training ingest benchmark
#53684 merged Jun 19, 2025
[train] add proper filtering to metrics
#53788 merged Jun 18, 2025
[cgraph] Avoid depending on torch CPU module for CPU-only actor
#53849 merged Jun 18, 2025
[train] expose training input/output in callbacks
#53869 merged Jun 18, 2025
Skip test_metrics_agent_with_open_telemetry on mac
#53917 merged Jun 18, 2025
[Docs] Add ServiceMonitor section and make some step optional in Grafana & Promethus page
#53474 merged Jun 18, 2025
[Docs][KubeRay] Update KubeRay operator installation references for all docs
#53885 merged Jun 18, 2025
[Core] Support AMD GPU MI3xx product line
#51802 merged Jun 18, 2025
[Doc][KubeRay] Update KubeRay operator installation reference
#53842 merged Jun 18, 2025
[Docs][KubeRay] Fix RayJob quickstart doc step 9 error
#53887 merged Jun 18, 2025
[Core] Use fd instead of handle for windows log redirection
#53852 merged Jun 18, 2025
Add dashboard visualizations for TPU metrics
#53898 merged Jun 18, 2025
[ObjectStore] Warn if object store is allocated < 50% of total memory for data workloads
#53857 merged Jun 18, 2025
[Data] Deprecate use_polars flag
#53867 merged Jun 17, 2025
[data] split test_all_to_all.py
#53865 merged Jun 17, 2025
add missing configs for object detection template
#53895 merged Jun 17, 2025
[core] Remove hardcoded flaky tests
#53888 merged Jun 17, 2025
[Serve][LLM] Simplify _prepare_engine_config()
#53704 merged Jun 17, 2025
[core][gpu-objects] Fix test_gpu_objects_nccl.py
#53874 merged Jun 17, 2025
[RLlib] MetricsLogger: Fix get/set_state to handle tensors in self.values.
#53514 merged Jun 17, 2025
[Data] Improve handling of mismatched columns
#53861 merged Jun 17, 2025
Fix pickle error with remote code models in vLLM Ray worker process
#53815 merged Jun 17, 2025
[train][template] Remove ineffective post build script and pip install instead
#53822 merged Jun 17, 2025
[core][gpu objects] Integrate single-controller collective APIs with GPU objects
#53720 merged Jun 16, 2025
[Data] Improve handling of pandas.NA
#53859 merged Jun 16, 2025
[devx] Fix 'uv run' command line parsing
#53838 merged Jun 16, 2025
[Data] Improve read_text trailing newline semantics
#53860 merged Jun 16, 2025
[Serve.llm][P/D] Support separate deployment config for PDProxy in Prefill disagg
#53821 merged Jun 16, 2025
[Doc][KubeRay] Remove vllm-rayservice.md and use Ray Serve LLM instead
#53844 merged Jun 16, 2025
add api to get application url
#53796 merged Jun 16, 2025
[Doc][KubeRay] Remove very old ResNet benchmark example
#53839 merged Jun 16, 2025
[release] Fix release tests
#53855 merged Jun 16, 2025
[Serve.llm] Disable TP=2 VLM batch test
#53825 merged Jun 16, 2025
[Doc][Fix] reveal the falsely hidden export command in the KubeRay GCS FT guide
#53832 merged Jun 16, 2025
[core][gpu-objects] Support intra-process communication
#53798 merged Jun 16, 2025
[Doc][KubeRay] Remove very old XGBoostTrainer example
#53837 merged Jun 16, 2025
[core] Release resources only after tasks have stopped executing
#53660 merged Jun 16, 2025
[core] Deflake test_multiprocessing.py
#53802 merged Jun 16, 2025
[core] Fix test_object_spilling.py on Windows
#53851 merged Jun 16, 2025
[KubeRay] Remove unused YAMLs
#53840 merged Jun 16, 2025
[chore] Change file mode of rayservice-no-ray-serve-replica.md from 755 to 644
#53843 merged Jun 16, 2025
fix AggregateFnV2 doc to state finalize instead of _finalize
#53835 merged Jun 16, 2025
[core] Fix GCS subscribers map race condition
#53781 merged Jun 16, 2025
[core] deleting unused code from plasma client
#53814 merged Jun 16, 2025
[core] Fix race condition in raylet graceful shutdown
#53762 merged Jun 16, 2025
[serve] Revert request timeout from serve instance fixtures
#53809 merged Jun 16, 2025

61 Pull requests opened by 37 people

[core] Ungracefully exit if the agent dies unexpectedly
#53847 opened Jun 16, 2025
[RLlib] Mixin Layer Design Sketch Up
#53850 opened Jun 16, 2025
[core] Fix comment
#53853 opened Jun 16, 2025
[Docs] Add image tag to `rayproject/ray-ml`
#53854 opened Jun 16, 2025
[core] adding additional stats to the dump object store usage api.
#53856 opened Jun 16, 2025
[core] Cleanup naming in core worker scheduling queues
#53858 opened Jun 16, 2025
[core] Sleep to debug container test
#53862 opened Jun 16, 2025
[core] Don't queue in flight submissions by attempt number
#53866 opened Jun 16, 2025
Feat/ray serve middleware support
#53868 opened Jun 17, 2025
Pass parameters to custom routers through LLMConfig
#53870 opened Jun 17, 2025
test: refactor `test_observability_helpers`
#53875 opened Jun 17, 2025
[dashboard] Support to overwrite the _client_max_size of http request entity
#53880 opened Jun 17, 2025
[doc][core] fix reStructuredText formatting on Resources page
#53882 opened Jun 17, 2025
[Docs][KubeRay] Update all KubeRay version references for KubeRay 1.4.0 release
#53884 opened Jun 17, 2025
[Docs][KubeRay] Update changes from KubeRay 1.3.2 to 1.4.0
#53886 opened Jun 17, 2025
[ci] add python 3.13 ray docker image build
#53894 opened Jun 17, 2025
Bump gradio from 3.50.2 to 5.31.0 in /python/requirements
#53899 opened Jun 17, 2025
python depsets tool
#53904 opened Jun 18, 2025
[core] Move inner_publisher logic into gcsPublisher
#53905 opened Jun 18, 2025
[Core] Remove Unnecessary Checks in GRPC Server Shutdown Process
#53910 opened Jun 18, 2025
[WIP][core][gpu-objects] GC
#53911 opened Jun 18, 2025
[RLlib] Add missing colon to CUBLAS_WORKSPACE_CONFIG
#53913 opened Jun 18, 2025
[kuberay] log actionable err msg when required TPU node selectors missing
#53914 opened Jun 18, 2025
[RLlib] Add missing documentation for SACConfig's training()
#53918 opened Jun 18, 2025
[core][telemetry/12] record histogram metric e2e
#53927 opened Jun 18, 2025
Update deletion policy for rayjob quick start
#53929 opened Jun 18, 2025
[Data] - write_parquet enable both partition by & min_rows_per_file, max_rows_per_file
#53930 opened Jun 18, 2025
[core][telemetry/13] performance tests
#53931 opened Jun 18, 2025
[serve] move test from test_grpc to test_proxy
#53933 opened Jun 18, 2025
[core] Fix race condition b/w object eviction & repinning for recovery.
#53934 opened Jun 18, 2025
[core][GPU objects] Attach tensor transport to task args protobuf
#53935 opened Jun 18, 2025
Bump urllib3 from 1.26.19 to 2.5.0 in /python
#53936 opened Jun 18, 2025
[V2][Autoscaler] Transition ALLOCATED instances to RAY_STOPPPED when enforcing max nodes
#53938 opened Jun 18, 2025
[Data] Replaced `get_object_locations` with `get_local_object_locations`
#53942 opened Jun 19, 2025
[doc][kuberay] state `rayStartParams` is optional starting with KubeRay 1.4.0
#53943 opened Jun 19, 2025
[doc][kuberay] add version skew warning for plugin and RayCluster
#53950 opened Jun 19, 2025
[core] Fix GCS crash on duplicate MarkJobFinished RPCs due to network failures
#53951 opened Jun 19, 2025
[data] remove schema from release tests
#53956 opened Jun 19, 2025
[serve.llm] Prefix aware router eviction thread improvements
#53957 opened Jun 19, 2025
[core] improve assertion check in test_task_metrics
#53958 opened Jun 19, 2025
Can wins01
#53959 opened Jun 19, 2025
[core] disable so_reuseaddr when creating grpc server
#53960 opened Jun 19, 2025
[data] remove operator_fusion_benchmark
#53962 opened Jun 19, 2025
[train] Fix release test missing data key
#53963 opened Jun 19, 2025
finishing commit for issue #52113
#53964 opened Jun 19, 2025
tune: make Tune status/progress tables readable in dark mode
#53969 opened Jun 20, 2025
[Data] Fixing PA overflow handling
#53971 opened Jun 20, 2025
docs(data): fix broken Parameters table
#53972 opened Jun 20, 2025
[core] Rename `GcsFunctionManager` and use fake in test
#53973 opened Jun 20, 2025
[core] Fix flaky `test_state_api`
#53975 opened Jun 20, 2025
Revert "[train] Cleanups for training ingest benchmark"
#53979 opened Jun 20, 2025
[Serve.llm] Remove ImageRetriever class and related tests from the LLM deployment module.
#53980 opened Jun 20, 2025
Feature/sac discrete
#53982 opened Jun 20, 2025
[Data] Fix ActorPool autoscaler to properly scale up
#53983 opened Jun 20, 2025
[CI][KubeRay] Update KubeRay CI Tests branch for KubeRay v1.4.0 release
#53984 opened Jun 21, 2025
[Core] Add AcceleratorManager implementation for Rebellions NPU
#53985 opened Jun 21, 2025
[Doc] Update Istio service mesh graph
#53988 opened Jun 21, 2025
[Serve] Make replica scheduler backoff configurable #52871
#53991 opened Jun 21, 2025
Fix autoscaler recovery docker config to use node-specific settings
#53992 opened Jun 21, 2025
[ci] Upgrade nightly test to run against KubeRay 1.4
#53993 opened Jun 21, 2025
[core] Remove actor task path in normal task submitter
#53996 opened Jun 22, 2025

46 Issues closed by 23 people

CI test linux://python/ray/data:test_arrow_block is flaky
#48859 closed Jun 22, 2025
Conflict between ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES environment variables causes Ray import error
#53737 closed Jun 21, 2025
CI test linux://rllib:learning_tests_cartpole_dqn_multi_cpu is flaky
#47214 closed Jun 21, 2025
CI test linux://rllib:examples/evaluation/evaluation_parallel_to_training_multi_agent_duration_auto is consistently_failing
#53255 closed Jun 21, 2025
CI test linux://python/ray/serve/tests:test_multiplex is flaky
#48378 closed Jun 21, 2025
CI test windows://python/ray/tests:test_object_store_metrics is flaky
#49514 closed Jun 20, 2025
[RLlib] MAML does not work with TF2 in Ray 2.3.1
#34620 closed Jun 20, 2025
[RayData|RayServe] Does RayData/RayServe support multi-node vllm inference
#53192 closed Jun 20, 2025
[Core] Core Worker crashing
#49088 closed Jun 20, 2025
[core][gpu-objects] Driver tries to get the data from in-actor store
#51272 closed Jun 19, 2025
[Core][ROCm] Setting CUDA_VISIBLE_DEVICES leads to an assertion
#52701 closed Jun 19, 2025
[Autoscaler][V2] Autoscaler fails to delete idle KubeRay Pod
#52264 closed Jun 19, 2025
CI test linux://python/ray/data:test_consumption is flaky
#48163 closed Jun 19, 2025
CI test windows://python/ray/tests:test_actor_state_metrics is consistently_failing
#46303 closed Jun 19, 2025
[data] ray.data.read_images is slower than reading images manually
#37499 closed Jun 19, 2025
[RFC] Q2 Ray Data Roadmap
#51808 closed Jun 19, 2025
[RFC] LLM APIs for Ray Data and Ray Serve
#50639 closed Jun 19, 2025
CI test linux://python/ray/data:test_json is flaky
#48150 closed Jun 19, 2025
[<Ray component: Core|RLlib|etc...>] ValueError: There was an error while reducing the Stats object under key=('actual_n_step',)!
#53947 closed Jun 19, 2025
CI test windows://python/ray/serve/tests:test_standalone_3 is flaky
#44003 closed Jun 19, 2025
Release test compiled_graphs failed
#53716 closed Jun 18, 2025
CI test darwin://python/ray/tests:test_metrics_agent_open_telemetry is consistently_failing
#53828 closed Jun 18, 2025
[RLlib] ActionMaskingTorchRLModule can't set up `conv_filters`
#53325 closed Jun 18, 2025
[Core] `ray.init()` and `ray start` fails on Windows 11 in ray 2.45+
#52739 closed Jun 18, 2025
CI test windows://python/ray/tests:test_object_spilling_debug_mode is flaky
#43796 closed Jun 18, 2025
[core] support S3 path style access in runtime_env download_and_unpack_package()
#53893 closed Jun 17, 2025
How to transfer tensors stored in GPU in actor with NCCL?
#53816 closed Jun 17, 2025
[Data] PyArrow 20.0.0 Backward Incompatability (`unexpected keyword argument 'maps_as_pydicts'`)
#52685 closed Jun 17, 2025
CI test linux://python/ray/tests:test_gpu_objects_nccl is consistently_failing
#53871 closed Jun 17, 2025
[RLlib] Headnode without GPU triggers torch/CUDA de-serialization error
#53467 closed Jun 17, 2025
[Core] Ray Autoscaler does not restart a worker node on setup failure
#29127 closed Jun 17, 2025
Release test llm_batch_vllm failed
#53827 closed Jun 17, 2025
[Serve] Add timeout parameter for `deploy`
#25433 closed Jun 17, 2025
CI test windows://python/ray/serve/tests:test_logging is flaky
#46043 closed Jun 17, 2025
[Core] Read-only buffer error in some scikit-learn models
#52571 closed Jun 17, 2025
[core] ray stop --force doesn't kill processes on worker node
#28038 closed Jun 17, 2025
[core][gpu-objects] Support TensorDict
#51550 closed Jun 17, 2025
[core][gpu-objects] Allocate placeholder tensor on corresponding devices
#53622 closed Jun 17, 2025
[core][gpu-objects] Driver should order all collective calls to avoid deadlock
#51264 closed Jun 17, 2025
CI test windows://python/ray/tests:test_object_spilling_asan is consistently_failing
#45962 closed Jun 17, 2025
CI test windows://python/ray/tests:test_object_spilling is consistently_failing
#45961 closed Jun 16, 2025
[RLlib] Add syntax checking to configuration string literals or migrate to enums.
#39384 closed Jun 16, 2025
[Ray Core] Ray error causes the Python interpreter to terminate without failing
#28211 closed Jun 16, 2025
[CI] Test GPU training tutorial with Ray Release tests
#28902 closed Jun 16, 2025
CI test linux://python/ray/train:accelerate_torch_trainer is consistently_failing
#44513 closed Jun 16, 2025
[core][gpu-objects] intra-process communication
#51685 closed Jun 16, 2025

36 Issues opened by 31 people

[Data] When writing on BigQuery, Google's "TooManyRequests" exceptions is not retried
#53997 opened Jun 22, 2025
[data] Slow fetching of metadata for large number of parquet files
#53995 opened Jun 22, 2025
[Rllib] Bug in TorchMultiDistribution logp prevents policy mapping from being used
#53994 opened Jun 22, 2025
Release test many_nodes_actor_test_on_v2.aws failed
#53990 opened Jun 21, 2025
[Doc][KubeRay] Run doctest `user-guides/configuring-autoscaling.ipynb` in CI
#53989 opened Jun 21, 2025
[Core] Autoscaler Node Recovery Ignores Node-Specific Docker Config
#53987 opened Jun 21, 2025
[core][gpu-objects] Allow sending ObjectRefs to other processes
#53978 opened Jun 20, 2025
[core][gpu-objects] Support ray.put
#53977 opened Jun 20, 2025
[core][gpu-objects] RDMA support for data transfer
#53976 opened Jun 20, 2025
[Dashboard] Support for List Tasks Filter Pushdown
#53970 opened Jun 20, 2025
[Data] Add support to turn off strict block-size enforcement
#53954 opened Jun 19, 2025
Release test training_ingest_benchmark-task=image_classification.full_training.jpeg failed
#53953 opened Jun 19, 2025
ValueError: There was an error while reducing the Stats object under key=('actual_n_step',)! [<Ray component: Core|RLlib|etc...>]
#53948 opened Jun 19, 2025
[Core] `InternalKVPut` retries incorrectly when encountering transient error
#53946 opened Jun 19, 2025
PolicyServer and PolicyClient Demo Issue
#53926 opened Jun 18, 2025
CI test linux://rllib:examples/algorithms/vpg_custom_algorithm is flaky
#53925 opened Jun 18, 2025
Windows VS WSL2
#53924 opened Jun 18, 2025
[Docker][CI] Add Python 3.13 Ray Image to CI
#53923 opened Jun 18, 2025
[serve.llm] Ray LLM serving not respecting max_completion_tokens parameter
#53922 opened Jun 18, 2025
[Ray V2 Tune + Train] Tuner is not aware of resources and oversubscribes leading to deadlocks
#53921 opened Jun 18, 2025
[Data/Preprocessors]: Preprocessors do not work with nested records
#53920 opened Jun 18, 2025
[Core] Ray Does Not Detect GPU
#53919 opened Jun 18, 2025
Multiple CVEs in Ray's compiled dependencies
#53915 opened Jun 18, 2025
Using ray for LLM inference got errors
#53907 opened Jun 18, 2025
[Core] Starting multiple local instances on one node may result in errors due to randomly selecting the same port.
#53906 opened Jun 18, 2025
[CI] `linux://python/ray/data:test_consumption` is failing/flaky on master.
#53897 opened Jun 17, 2025
[Ray Data]Pylint detection found some Python code defects in ray data
#53881 opened Jun 17, 2025
[dashboard] Support to overwrite the _client_max_size of http request entity
#53879 opened Jun 17, 2025
[RLlib] Significant drop in DQN training reward when resuming from checkpoint
#53878 opened Jun 17, 2025
[RLlib] Checkpoint metrics loading with Tune is broken in 2.47.0
#53877 opened Jun 17, 2025
Issue: Ray Dashboard Links to Grafana Return "Dashboard Not Found" (Windows)
#53876 opened Jun 17, 2025
[serve.llm] LLM serving seems not working with mistral tokenizer.
#53873 opened Jun 17, 2025
[Core] ray.ActorID.nil().job_id
#53872 opened Jun 17, 2025
[Core] Ray 2.47 regression: All tasks hang when using `uv`
#53848 opened Jun 16, 2025
[<Ray component: Core|RLlib|etc...>] Ray Timeout Error running VLLM Multi-Node(tp_size=2) Online Server with Acl_Graph when handling curl request
#53845 opened Jun 16, 2025
[RLlib] Typo in error message on line 37 of ray/rllib/env/utils/__init__.py
#53841 opened Jun 16, 2025

1,427 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[core][compiled graphs] Supporting allreduce on list of input nodes
#51047 commented on Jun 20, 2025 • 19 new comments
[Doc][KubeRay] Add doc for running KubeRay dashboard
#53830 commented on Jun 21, 2025 • 17 new comments
[Feat][Core] Implement Event Aggregator Agent
#53182 commented on Jun 19, 2025 • 14 new comments
[train] Driver SIGINT calls controller abort
#53600 commented on Jun 19, 2025 • 13 new comments
[train] TrainStateActor periodically checks controller status and sets aborted
#53818 commented on Jun 20, 2025 • 12 new comments
Add `pin_memory` to `iter_torch_batches`
#53792 commented on Jun 20, 2025 • 10 new comments
[core] fix detached actor being unexpectedly killed
#53562 commented on Jun 20, 2025 • 10 new comments
(serve.llm) Make _LLMServerBase.__init__ synchronous
#53719 commented on Jun 16, 2025 • 3 new comments
(serve.llm): Refactor/Consolidate LoRA downloading
#53714 commented on Jun 19, 2025 • 3 new comments
Add Apple silicon GPU(mps) support to ray
#38464 commented on Jun 20, 2025 • 2 new comments
Relax check_version_info to check for bytecode compatibility
#41373 commented on Jun 19, 2025 • 2 new comments
[Core] Add default Ray Node labels at Node init
#53360 commented on Jun 21, 2025 • 2 new comments
[core] Support broadcast and reduce collective for compiled graphs
#53625 commented on Jun 18, 2025 • 2 new comments
ray: fix handling large chunks
#53535 commented on Jun 20, 2025 • 1 new comment
[core]: Correct podman output parsing for image uri in runtime env
#53653 commented on Jun 18, 2025 • 1 new comment
[Core] The Idle worker killing feature slows down tasks
#27863 commented on Jun 17, 2025 • 0 new comments
[RLlib] Issue Regarding Future Warnings
#26424 commented on Jun 16, 2025 • 0 new comments
[Ray component: Core] Enable better progress bar
#26426 commented on Jun 16, 2025 • 0 new comments
[core][c++ worker] RayClusterModeTest.DefaultActorLifetimeTest timed out in macOS
#26435 commented on Jun 16, 2025 • 0 new comments
[RLlib] Use observations (input_dict) for exploration
#26437 commented on Jun 16, 2025 • 0 new comments
[Core] Observing Multiple Exceptions When Using Different Python Patch Versions
#26443 commented on Jun 16, 2025 • 0 new comments
[RLlib] Unable to call ray.remote functions inside env/action dist
#26468 commented on Jun 16, 2025 • 0 new comments
[Job] Job submission not following convention for quote
#26514 commented on Jun 16, 2025 • 0 new comments
[doc][Core | State Observability] Document usage of the rate limiting env variable in public doc
#26370 commented on Jun 16, 2025 • 0 new comments
[Tune] NevergradSearch Budget Exception
#26305 commented on Jun 16, 2025 • 0 new comments
How to color-code console output
#26226 commented on Jun 16, 2025 • 0 new comments
[Core] [Quality] Live handle raises unnecessary exception when script ends
#26198 commented on Jun 16, 2025 • 0 new comments
[RLlib]: SimpleQ TF2 is broken
#26192 commented on Jun 16, 2025 • 0 new comments
[State Observability] Raise an exception if the state schema contains predicates.
#26125 commented on Jun 16, 2025 • 0 new comments
[air] We should have a convenient method for user to interact with checkpoint file on driver when they checkpoint using other method in session
#26082 commented on Jun 16, 2025 • 0 new comments
[RLlib] server reports nan episodes and empty policy
#26048 commented on Jun 16, 2025 • 0 new comments
[test][autoscaler] ModuleNotFoundError: No module named 'ray.tests'
#26023 commented on Jun 16, 2025 • 0 new comments
[Tune] Ray Tune doesn't work inside Spark UDF
#26002 commented on Jun 16, 2025 • 0 new comments
Ray component: Core: PoolActor processes hanging
#24784 commented on Jun 16, 2025 • 0 new comments
[AIR] Support TorchRec trainer
#27794 commented on Jun 17, 2025 • 0 new comments
[Dashboard] Dashboard agent cannot be started because the port is still occupied
#27736 commented on Jun 17, 2025 • 0 new comments
[Tune/RLlib] log_to_file creates files, but doesn't write anything there
#27702 commented on Jun 17, 2025 • 0 new comments
[RLLib] global_timestep not monotonic when when running concurrent episodes with ExternalEnv
#27669 commented on Jun 17, 2025 • 0 new comments
[Dashboard] Ray Dashboard not showing the SpillWorker's actual memory usage
#27591 commented on Jun 17, 2025 • 0 new comments
[Core] The actors got distributed to just a few nodes even with spread scheduling
#27577 commented on Jun 17, 2025 • 0 new comments
[runtime_env] Add tests for all driver output (warnings, etc)
#27566 commented on Jun 17, 2025 • 0 new comments
[Tune] TuneReportCheckpointCallback causes two checkpoints to made every time it is called.
#27524 commented on Jun 17, 2025 • 0 new comments
[AIR] SettingWithCopyWarning for "A value is trying to be set on a copy of a slice from a DataFrame"
#27352 commented on Jun 17, 2025 • 0 new comments
[ray dashboard] profile button not working
#27211 commented on Jun 17, 2025 • 0 new comments
[Ray Train] Ray Train running slow when multiple workers executed
#27107 commented on Jun 17, 2025 • 0 new comments
[workflow] We should give the storage a default value if it's not set in some way.
#27046 commented on Jun 17, 2025 • 0 new comments
[State Observability][Log] Allow to ctrl + C when running logs API
#27008 commented on Jun 17, 2025 • 0 new comments
[runtime env] local `working_dir` doesn't work with strongly-typed `RuntimeEnv`
#26984 commented on Jun 17, 2025 • 0 new comments
[Core][State Observability] More fine-grained exceptions/error codes handling
#26974 commented on Jun 17, 2025 • 0 new comments
[Feature] Autoscaler should understand AWS availability and act accordingly
#20774 commented on Jun 16, 2025 • 0 new comments
[Core] Typing for .options for Ray Tasks
#26871 commented on Jun 16, 2025 • 0 new comments
[State Observability] Support filter None value
#26820 commented on Jun 16, 2025 • 0 new comments
[Core] Batch PinObjectIDs requests from Raylet client
#26796 commented on Jun 16, 2025 • 0 new comments
[Train] feature request for catboost_ray
#26687 commented on Jun 16, 2025 • 0 new comments
[AIR/Tune] Add a `ScalingConfig`-based API to `ResourceChangingScheduler`
#26538 commented on Jun 16, 2025 • 0 new comments
[RLlib] CRR and CQL consume more cpus than reported
#26533 commented on Jun 16, 2025 • 0 new comments
[RLlib] Duplicate custom metrics
#24731 commented on Jun 16, 2025 • 0 new comments
[Serve] Asynchronous inference best practices
#24627 commented on Jun 16, 2025 • 0 new comments
[tune] `progress_reporter.py` is messy and should be cleaned up
#24604 commented on Jun 16, 2025 • 0 new comments
[aws][autoscaler] AWS: When using spot instances, always single availability zone is selected
#24310 commented on Jun 16, 2025 • 0 new comments
[RLlib] PPO - ray.rllib.agents.ppo "Put Error"
#24307 commented on Jun 16, 2025 • 0 new comments
[Ray Collective] Remove Redis store and LocalFile store from gloo mode.
#24288 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] upscaling_speed: 0 gets reset to 1
#24177 commented on Jun 16, 2025 • 0 new comments
[RLlib] Categorical action dist incorrectly uses tf.random.categorical
#24055 commented on Jun 16, 2025 • 0 new comments
[RLlib] [Bug] Inconsistent behavior between TFPolicy and TorchPolicy on `compute_actions_from_input_dict`
#24007 commented on Jun 16, 2025 • 0 new comments
[RLlib] Enable Training from Replay Buffer Larger than Memory
#23816 commented on Jun 16, 2025 • 0 new comments
[RLlib] [Bug] IMPALA causes an OOM after a long running.
#23769 commented on Jun 16, 2025 • 0 new comments
[BUG] Ray dashboard client failed to build
#23548 commented on Jun 16, 2025 • 0 new comments
[RFC][Feature][Autoscaler][Core]Graceful draining of nodes while scale-down
#23522 commented on Jun 16, 2025 • 0 new comments
[ml][Improvement] Improve messages to be “rank0, rank1” actors etc.
#23310 commented on Jun 16, 2025 • 0 new comments
[Feature] [tune] create a mlflow run name from config params
#23228 commented on Jun 16, 2025 • 0 new comments
[Feature][RLlib] Improve pytorch memory usage by disabling caching
#23077 commented on Jun 16, 2025 • 0 new comments
[tune][Bug] Worker doesn't sync the logs to HDFS at the given interval
#23055 commented on Jun 16, 2025 • 0 new comments
[Bug] AdaBelief optimizer crashes checkpoint restore
#22976 commented on Jun 16, 2025 • 0 new comments
[Bug] Custom model with R2D2
#22747 commented on Jun 16, 2025 • 0 new comments
[Bug] Resources displayed in Dashboard don't match cluster configuration
#22548 commented on Jun 16, 2025 • 0 new comments
[Bug] Deletion of Ray clusters hangs while Ray operator is still up
#22505 commented on Jun 16, 2025 • 0 new comments
Doing import ray breaks my logging [Bug]
#22312 commented on Jun 16, 2025 • 0 new comments
[Serve] A Deployment Graph with unfulfilled demands fails to scale Pods in Kubernetes
#25998 commented on Jun 16, 2025 • 0 new comments
API server internal error message not useful
#25986 commented on Jun 16, 2025 • 0 new comments
[runtime env] Bad runtime env specified in ray.init() with eager install only raises error on task/actor invocation
#25972 commented on Jun 16, 2025 • 0 new comments
[Core][State Observability] Use a separate thread to run spill/restore
#25960 commented on Jun 16, 2025 • 0 new comments
[RLlib] KeyError: simple_list_collector.py, line 950, in postprocess_episode
#25938 commented on Jun 16, 2025 • 0 new comments
[RLLib] SampleBatch.update() doesn't update `added_keys`
#25937 commented on Jun 16, 2025 • 0 new comments
[runtime env] Use namespace for internal KV storage
#25897 commented on Jun 16, 2025 • 0 new comments
[Core?] Federation + data perimeters
#25846 commented on Jun 16, 2025 • 0 new comments
[Core] RBAC + auditability
#25845 commented on Jun 16, 2025 • 0 new comments
[Core] Arrow Flight Server doesn't work with Ray Actors due to two GRPC versions
#25774 commented on Jun 16, 2025 • 0 new comments
[Core | State Observability ] Refactor summary/log SDK to use StateApiClient
#25746 commented on Jun 16, 2025 • 0 new comments
[Serve] Deployment fails if name contains slashes
#25714 commented on Jun 16, 2025 • 0 new comments
[RLlib] ModelCatagolg Selects Wrong Model for Nested Complex Observations
#25619 commented on Jun 16, 2025 • 0 new comments
[Train] [Tune] When using Train with Tune, a `logdir` is created that's not the one specified by the user
#25474 commented on Jun 16, 2025 • 0 new comments
[Core][Observability] Ray memory should show more objects
#25463 commented on Jun 16, 2025 • 0 new comments
[Dashboard] Error during render node with gpu and 4 hdds
#25437 commented on Jun 16, 2025 • 0 new comments
[Ray Collective Lib] Enable CI
#25396 commented on Jun 16, 2025 • 0 new comments
Core: deamonset feature request
#25334 commented on Jun 16, 2025 • 0 new comments
[DeviceMesh][Collective] Support multiple tensors API
#25129 commented on Jun 16, 2025 • 0 new comments
[Ray Air] nan in the tensorflow_linear_dataset_example.py
#25037 commented on Jun 16, 2025 • 0 new comments
ray docker images do not have uvloop installed
#25023 commented on Jun 16, 2025 • 0 new comments
Ray Tune: No console output is logged to Wandb.
#25011 commented on Jun 16, 2025 • 0 new comments
[Core][RLlib][Tune] CUDA PTX error when training with Tune
#25001 commented on Jun 16, 2025 • 0 new comments
[Build][Deps] Add new `ray[azure]` extra package
#48847 commented on Jun 22, 2025 • 0 new comments
[AIR][Tune] Provide user guide on how to build active learning on AIR
#30157 commented on Jun 17, 2025 • 0 new comments
[AIR/Docs] Mention/warn that running a Trainer inside a custom Tune trainable is an anti-pattern
#30153 commented on Jun 17, 2025 • 0 new comments
[RLlib] (PPO) algo parameter "lambda_" never gets passed because `AlgorithmConfig` refractored "lambda_" to "lambda"
#30143 commented on Jun 17, 2025 • 0 new comments
[Core] Reference leakage somewhere after ray.shutdown()
#30089 commented on Jun 17, 2025 • 0 new comments
[Tune] Can't access all metrics for all trials
#30004 commented on Jun 17, 2025 • 0 new comments
[core][dashboard] state api on worker nodes can not connect to dashboard url
#29959 commented on Jun 17, 2025 • 0 new comments
[Jobs] Include requested and available resources in JobInfo status message
#29921 commented on Jun 17, 2025 • 0 new comments
[RLlib] Add some metric for aync algos (e.g. APPO) that shows the total number of gradient updates
#29830 commented on Jun 17, 2025 • 0 new comments
[AIR] Update pytorch training and prediction benchmark with numpy with updated metrics
#29743 commented on Jun 17, 2025 • 0 new comments
[RLlib] Undesired memory growing when using convolutional neural network
#29699 commented on Jun 17, 2025 • 0 new comments
[AIR] `XGBoostTrainer` gives misleading error if column missing
#29695 commented on Jun 17, 2025 • 0 new comments
[RLLib Tests] : Included pytests in package as well as basic commands fail with ValueError
#29691 commented on Jun 17, 2025 • 0 new comments
[air] GPU memory leak when using AIR trainer with torch dataloader when the latter uses multi-processing
#29563 commented on Jun 17, 2025 • 0 new comments
[RLlib] Benchmark bandit methods vs plain Thompson Sampling for a non-contextual MAB
#29528 commented on Jun 17, 2025 • 0 new comments
[RLlib] "model": {"free_log_std": True} generates Tensorflow Lambda layers warning with TF2 framework
#29502 commented on Jun 17, 2025 • 0 new comments
[Autoscaler] Delete AWS resources created when launching Ray cluster upon cluster termination
#29499 commented on Jun 17, 2025 • 0 new comments
[Core] util.multiprocessing.pool: imap and imap_unordered blocking on ray.wait even though processes are complete
#29466 commented on Jun 17, 2025 • 0 new comments
[Ray Log_monitor]: close_all_files ProcessLookupError
#29452 commented on Jun 17, 2025 • 0 new comments
Ray core: incorrect account of GPUs on ec2 ubuntu instance: g4dn.2xlarge
#29420 commented on Jun 17, 2025 • 0 new comments
[core] GCS segfaults under OOM
#29336 commented on Jun 17, 2025 • 0 new comments
[AIR] Add progress bar for training
#29314 commented on Jun 17, 2025 • 0 new comments
[CI] A simple way to reproduce osx/linux/windows CI run failure locally
#29068 commented on Jun 17, 2025 • 0 new comments
[RLlib] Deprecate the RLlib spaces that are duplications of gym spaces.
#30800 commented on Jun 17, 2025 • 0 new comments
[Tune] Guard against users overriding internal `Trainable` methods
#30795 commented on Jun 17, 2025 • 0 new comments
Ray Cluster Resources Issue
#30780 commented on Jun 17, 2025 • 0 new comments
[Core] Worker leak
#30731 commented on Jun 17, 2025 • 0 new comments
[RLlib] Default policy error in two trainer work flow
#30676 commented on Jun 17, 2025 • 0 new comments
[core] Can't set working directory for runtime env in actor definition
#30666 commented on Jun 17, 2025 • 0 new comments
[Tune] HeboSearch reproducible deterministic results
#30661 commented on Jun 17, 2025 • 0 new comments
[core] Memory changes are not as expected when using ray.get()
#30615 commented on Jun 17, 2025 • 0 new comments
[Tune] `fail_fast` marks all runs as terminated, making the experiment impossible to restore
#30584 commented on Jun 17, 2025 • 0 new comments
[RLLib] Custom model with LSTM causes the auto wrapping to be partially executed
#30581 commented on Jun 17, 2025 • 0 new comments
[Core|RayTrain] RuntimeError: Some workers returned results while others didn't
#30545 commented on Jun 17, 2025 • 0 new comments
[Core] Overriding the default logging format for Worker logs
#30544 commented on Jun 17, 2025 • 0 new comments
[AIR] Canonical way to determine whether the code is running in a Train/Tune session
#30536 commented on Jun 17, 2025 • 0 new comments
[client][runtime_env] Inconsistent runs on ray client
#30518 commented on Jun 17, 2025 • 0 new comments
[Core] ray.exceptions.RaySystemError: System error: buffer source array is read-only
#30505 commented on Jun 17, 2025 • 0 new comments
[Core] Access violation on windows 11 when running modin workload
#30493 commented on Jun 17, 2025 • 0 new comments
Critic Regularized Regression (CRR) model is getting error with Custom Environment (Offline RL)
#30411 commented on Jun 17, 2025 • 0 new comments
[Docs] [Jobs] Add pros and cons of different ways of submitting a job
#30305 commented on Jun 17, 2025 • 0 new comments
[air/horovod] horovod distributed worker creation may hang
#30276 commented on Jun 17, 2025 • 0 new comments
[<Ray component: Workflow>] module 'ray.workflow' has no attribute 'HTTPListener'
#30248 commented on Jun 17, 2025 • 0 new comments
[RLLIB][Torch] numerically unstable + mkl issue in torch.sqrt normc_initializer
#30191 commented on Jun 17, 2025 • 0 new comments
[RLlib] RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)
#30164 commented on Jun 17, 2025 • 0 new comments
[Core] [RLlib] RLlib on Ray 2.0 not easily working on Colab
#28457 commented on Jun 17, 2025 • 0 new comments
[Job Submission] Support env file input in ray.runtime_env.RuntimeEnv
#28453 commented on Jun 17, 2025 • 0 new comments
[Tune] Adding DEHB
#28427 commented on Jun 17, 2025 • 0 new comments
[serve] Gradio integration does surface error messages, runs indefinitely
#28399 commented on Jun 17, 2025 • 0 new comments
[Ray: Core] Ray can hang when getting an ObjectRef from an unknown environment
#28341 commented on Jun 17, 2025 • 0 new comments
[Core][RuntimeEnv]Make `job_submission_id` to a new index of GCS::JobTableData
#28337 commented on Jun 17, 2025 • 0 new comments
[Jobs] Run jobs tests on Windows
#28316 commented on Jun 17, 2025 • 0 new comments
[Core] Job stop should terminate runtime_env setup
#28221 commented on Jun 17, 2025 • 0 new comments
[Core] log_to_driver=False does not suppress worker errors in ipython
#28216 commented on Jun 17, 2025 • 0 new comments
[core][runtime envs] Ray should respect CUDA_VISIBLE_DEVICES if set in runtime env
#28215 commented on Jun 17, 2025 • 0 new comments
[Core] ray dashboard <rayhost>:8265/nodes?view=details cpuPercent should contains actor's subprocess
#28100 commented on Jun 17, 2025 • 0 new comments
[Core] Ray may hang if workers fail to start due to limited ports
#28071 commented on Jun 17, 2025 • 0 new comments
[<Ray component: Core|Cluster>] Documentation instructions for mounting AWS EFS Fails for Ray Cluster
#28057 commented on Jun 17, 2025 • 0 new comments
[Core] Support retry_delay option in Ray tasks
#28015 commented on Jun 17, 2025 • 0 new comments
[Core] Ray object primary copy transfer
#27985 commented on Jun 17, 2025 • 0 new comments
[Observability] ray timeline errors with ray.rpc.GetAllProfileInfoReply exceeded maximum protobuf size of 2GB
#27952 commented on Jun 17, 2025 • 0 new comments
[Core] allow customized error message for WorkerCrashedError
#27947 commented on Jun 17, 2025 • 0 new comments
tensorflow.python.framework.errors_impl.NotFoundError: ./multi_worker_model/variables/variables_temp/part-00000-of-00001.index; No such file or directory [Op:MergeV2Checkpoints]
#27938 commented on Jun 17, 2025 • 0 new comments
[observability] JSON file generated by Ray timeline doesn't render correctly in the new version of chrome tracing (perfetto)
#27921 commented on Jun 17, 2025 • 0 new comments
[RLlib] make policy evaluation support Attention nets
#27909 commented on Jun 17, 2025 • 0 new comments
[tune] allow using (nested) data classes for search space definition
#27904 commented on Jun 17, 2025 • 0 new comments
[Autoscaler][GCP] Autofill GCP node type resources
#27888 commented on Jun 17, 2025 • 0 new comments
[Core] Is it possible to do asynchroneous task submission?
#29039 commented on Jun 17, 2025 • 0 new comments
[doc][core] multiprocessing.Pool should document resource usage with ray_remote_args
#29004 commented on Jun 17, 2025 • 0 new comments
[Train] Allow passing in placement group to trainer
#28924 commented on Jun 17, 2025 • 0 new comments
[<Algorithm overview>]
#28915 commented on Jun 17, 2025 • 0 new comments
Runtime Environment Dependencies- container per task
#28875 commented on Jun 17, 2025 • 0 new comments
[Ray component: Core] Returning to much data from ray remote fails with no error
#28855 commented on Jun 17, 2025 • 0 new comments
Issue on page /ray-core/examples/plot_parameter_server.html
#28854 commented on Jun 17, 2025 • 0 new comments
[Datasets] Why does pydantic make training slower？
#28836 commented on Jun 17, 2025 • 0 new comments
[Infra] Improve Ray client usability
#28790 commented on Jun 17, 2025 • 0 new comments
[Core] Download Logs from Ray Dashboard
#28788 commented on Jun 17, 2025 • 0 new comments
Ray Core: AttributeError: 'NoneType' object has no attribute 'enum_types_by_name'
#28779 commented on Jun 17, 2025 • 0 new comments
[Tune] HyperOptSearch fails with nested config dicts and points_to_evaluate
#28753 commented on Jun 17, 2025 • 0 new comments
Ray Deployment crashes in docker [<Ray component: Serve>]
#28732 commented on Jun 17, 2025 • 0 new comments
[Ray Serve]: Testing out on local using Docker container
#28692 commented on Jun 17, 2025 • 0 new comments
[core] Generator task that returns more values than specified by num_returns should throw error instead
#28689 commented on Jun 17, 2025 • 0 new comments
[Core] CloudPickle explain tool
#28585 commented on Jun 17, 2025 • 0 new comments
[dashboard] Dashboard randomly not showing the status of worker nodes.
#28569 commented on Jun 17, 2025 • 0 new comments
[AIR] Status updates still prints even with breakpoint
#28554 commented on Jun 17, 2025 • 0 new comments
[AIR/Tune] Session report does not show the key for those not included in the first metrics report
#28549 commented on Jun 17, 2025 • 0 new comments
[Core] dump the info and anaylze the data offline
#28496 commented on Jun 17, 2025 • 0 new comments
[Core] Document what are the generic python code that's easily scalable.
#28487 commented on Jun 17, 2025 • 0 new comments
[AIR] Refactor checkpoint encoding and decoding out of Backend to framework-specific Checkpoints
#28462 commented on Jun 17, 2025 • 0 new comments
[RLlib][Bug] RLLib Dreamer tuned example requesting unreasonable amount of GPU memory
#23479 commented on Jun 16, 2025 • 0 new comments
[Core] Add a warning message if options / arguments differ for Actor.options(get_if_exists=True)
#23455 commented on Jun 16, 2025 • 0 new comments
[RLlib][Feature] Feature Importance Plots
#23447 commented on Jun 16, 2025 • 0 new comments
[air] If you kill train via control C, a bunch of random error messages show up next time you run Train.
#23431 commented on Jun 16, 2025 • 0 new comments
[air] Logging message is not relevant to user
#23430 commented on Jun 16, 2025 • 0 new comments
[RLlib][docs] Adding more flow charts to RLlib components docs
#23393 commented on Jun 16, 2025 • 0 new comments
[runtime env] Warn user if pip check fails
#23335 commented on Jun 16, 2025 • 0 new comments
[runtime env] Refactor packaging code
#23257 commented on Jun 16, 2025 • 0 new comments
[Feature] Cleanup current use of `other_args_to_resolve` that passes deployment object into ClassNode
#23243 commented on Jun 16, 2025 • 0 new comments
[runtime env] Improve tracking of URI size
#23186 commented on Jun 16, 2025 • 0 new comments
[updater][Bug] update fails on preempted node and autoscaler stops scheduling
#23182 commented on Jun 16, 2025 • 0 new comments
Pipeline ingress requires trailing /
#23048 commented on Jun 16, 2025 • 0 new comments
Shouldn't require `PipelineInputNode` to build a pipeline DAG
#23037 commented on Jun 16, 2025 • 0 new comments
Pipeline DAG sanity check for model wrappers fields
#23019 commented on Jun 16, 2025 • 0 new comments
Pipeline doesn't accept importable class as arguments
#23016 commented on Jun 16, 2025 • 0 new comments
[tune][Bug] 'tune.report( mean_accuracy=sklearn.metrics.accuracy_score(test_y, pred_labels), done=True)'where can i get the mean_accuary result?
#22992 commented on Jun 16, 2025 • 0 new comments
[Train] add logging to `finish_training` for existing `Callback`s
#22754 commented on Jun 16, 2025 • 0 new comments
[Bug] [serve] Accessing shared objects within a deployment
#22751 commented on Jun 16, 2025 • 0 new comments
[Feature] Client version check on commit
#22675 commented on Jun 16, 2025 • 0 new comments
[Jobs] run all doc examples in CI
#22487 commented on Jun 16, 2025 • 0 new comments
Some tests misusing assertTrue for comparisons
#22395 commented on Jun 16, 2025 • 0 new comments
[Enhancement][client] Move synchronous GetObject calls to datapath
#22357 commented on Jun 16, 2025 • 0 new comments
[datasets] `random_shuffle` overspills objects on random node
#17612 commented on Jun 16, 2025 • 0 new comments
[AIR] `Result` object doesn't work with Ray Client
#24396 commented on Jun 16, 2025 • 0 new comments
[RLlib] Current Implementation of Replay Buffer is not a True Circular Buffer
#24393 commented on Jun 16, 2025 • 0 new comments
[RLlib] wrong env step counting when train multi-agent with shared default policy
#24340 commented on Jun 16, 2025 • 0 new comments
[Core][observability] Enable observability features built in gRPC
#24327 commented on Jun 16, 2025 • 0 new comments
[Autoscaler][Docs] Add up-to-date docs on how the autoscaler works.
#24323 commented on Jun 16, 2025 • 0 new comments
[Ray Serve Autoscaling] Add release test that checks that nodes scale down when there are no requests
#24315 commented on Jun 16, 2025 • 0 new comments
Received message larger than max (105683136 vs. 104857600)
#24286 commented on Jun 16, 2025 • 0 new comments
[Serve] [Doc] HTTP Adapters Cookbooks
#24245 commented on Jun 16, 2025 • 0 new comments
[Serve] Default DAGDriver implementation cannot serve.run() or serve.build() twice
#24122 commented on Jun 16, 2025 • 0 new comments
[AIR] Support functionality to stitch Preprocessor with Keras model
#24023 commented on Jun 16, 2025 • 0 new comments
[Core] Log propagation between actor exit called and process terminated
#24020 commented on Jun 16, 2025 • 0 new comments
[<Ray component: Serve] Improve access by index/key on intermediate result in Serve deployment graph
#23987 commented on Jun 16, 2025 • 0 new comments
[Serve] [Docs] Improve architectural diagrams
#23956 commented on Jun 16, 2025 • 0 new comments
[Runtime Env] Dependency Installation private git repositories via ssh
#23768 commented on Jun 16, 2025 • 0 new comments
[ray client] ray.wait timeout is not respected when connection is interrupted
#23694 commented on Jun 16, 2025 • 0 new comments
[Feature] [Tune] Trial-wise dependencies
#23654 commented on Jun 16, 2025 • 0 new comments
[Bug] `policies_to_train` throws incorrect/confusing error message when passed an empty list.
#23646 commented on Jun 16, 2025 • 0 new comments
[Feature] support of complicated action space in QMix algorithm in Rllib.
#23634 commented on Jun 16, 2025 • 0 new comments
[runtime env] Deflake `test_runtime_env_working_dir_2`
#23569 commented on Jun 16, 2025 • 0 new comments
[runtime env] [Feature] Make Internal KV operations async
#23567 commented on Jun 16, 2025 • 0 new comments
[Feature] .bind() on function does not take pre-bind value from upstream DAGNode
#23511 commented on Jun 16, 2025 • 0 new comments
[Tune] [Bug] Ray checkpoint sync can sometimes fail to upload checkpoints to s3, plus log spew about sync client observed
#21469 commented on Jun 16, 2025 • 0 new comments
[runtime env] raise exception for unsupported runtime_env features on Windows
#21435 commented on Jun 16, 2025 • 0 new comments
[train] fix scalability of `JsonLoggerCallback`
#21416 commented on Jun 16, 2025 • 0 new comments
[Feature] [runtime env] [java] select jdk version
#21239 commented on Jun 16, 2025 • 0 new comments
[Feature][Tune] Trial status based Stopper
#21222 commented on Jun 16, 2025 • 0 new comments
[Train][Tune] Unify Train and Tune Callbacks
#21065 commented on Jun 16, 2025 • 0 new comments
[Bug] rsync_filter isn't used in hash_runtime_conf
#20878 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Persistent problems encountered during autoscaling can lead to driver log spam
#20855 commented on Jun 16, 2025 • 0 new comments
[Bug] Test placement group chaos testing
#20716 commented on Jun 16, 2025 • 0 new comments
[GCP][autoscaler] Scale down is slow and Ray status doesn't show pending nodes
#20695 commented on Jun 16, 2025 • 0 new comments
Support snappy compression for spilled objects
#20575 commented on Jun 16, 2025 • 0 new comments
Sparse object reads - read part of an object, without downloading the entire object
#20500 commented on Jun 16, 2025 • 0 new comments
[core] Scale shuffle to 200+ nodes
#20499 commented on Jun 16, 2025 • 0 new comments
Memory-aware task scheduling to avoid OOMs under memory pressure
#20495 commented on Jun 16, 2025 • 0 new comments
[Feature] [Placement Group] Add timeout mechanism when scheduling placement group
#20477 commented on Jun 16, 2025 • 0 new comments
[job submission] Add RAY_ADDRESS or --address to suggested commands for logs/status
#20441 commented on Jun 16, 2025 • 0 new comments
[Bug] [Ray Autoscaler] [Core] Ray Worker Node Relaunching during 'ray up'
#20402 commented on Jun 16, 2025 • 0 new comments
[workflow] Fail to construct workflow within a workflow
#20381 commented on Jun 16, 2025 • 0 new comments
[Feature] [Serve] Threading for Ray Serve
#20169 commented on Jun 16, 2025 • 0 new comments
[Feature] [Serve] Support Sticky Sessions for Stateful Workflows Deployed via Ray Serve
#20107 commented on Jun 16, 2025 • 0 new comments
[runtime env] Remove filelock dependency
#20083 commented on Jun 16, 2025 • 0 new comments
[Client] Dataset write_csv AttributeError: ‘Worker’ object has no attribute 'core_worker'
#35537 commented on Jun 16, 2025 • 0 new comments
Enhance state notification pattern in Ray pubsub
#22340 commented on Jun 16, 2025 • 0 new comments
[Core] Avoiding subscribing to all logs by each log subscriber
#22274 commented on Jun 16, 2025 • 0 new comments
[Train] TPU support
#22251 commented on Jun 16, 2025 • 0 new comments
[train] support per epoch shuffling with `prepare_dataloader`
#22108 commented on Jun 16, 2025 • 0 new comments
[runtime env] Refactor `pip` protobuf to store a single str (`requirements.txt` contents) instead of list of "packages"
#22097 commented on Jun 16, 2025 • 0 new comments
[runtime_env] Remove `.lock` files after URI garbage collection
#22062 commented on Jun 16, 2025 • 0 new comments
[runtime env] Use LRU cache for URIs instead of random eviction
#22060 commented on Jun 16, 2025 • 0 new comments
[runtime env] Use single URI for `py_modules` field
#22059 commented on Jun 16, 2025 • 0 new comments
[Train] Add callback preprocessor that smoothly tracks values
#21989 commented on Jun 16, 2025 • 0 new comments
[Bug] Policy - ActionDistribution Type
#21973 commented on Jun 16, 2025 • 0 new comments
[runtiime env] Use coroutine to create runtime envs in `runtime_env_agent`
#21950 commented on Jun 16, 2025 • 0 new comments
[Train] Add support for Bagua
#21934 commented on Jun 16, 2025 • 0 new comments
[Bug] "The kernel has died..." during Ray tune.run
#21917 commented on Jun 16, 2025 • 0 new comments
[Jobs] Backwards compatibility tests for REST API
#21915 commented on Jun 16, 2025 • 0 new comments
[Jobs] Make jobs work out-of-the-box with cluster YAML
#21911 commented on Jun 16, 2025 • 0 new comments
[Train] Support for averaging results
#21849 commented on Jun 16, 2025 • 0 new comments
AttributeError raised when using response_model in FastAPI route decorator
#21744 commented on Jun 16, 2025 • 0 new comments
[Feature] [runtime env] [C++] support a strong-typed API in C++
#21733 commented on Jun 16, 2025 • 0 new comments
[runtime env] Cross-language runtime env
#21731 commented on Jun 16, 2025 • 0 new comments
[Testing] multi fake node set up doesn't work under non ray client mode
#21653 commented on Jun 16, 2025 • 0 new comments
[Bug] "Sent message larger than max" error with dask
#21601 commented on Jun 16, 2025 • 0 new comments
[runtime env] Can we avoid merging two runtime envs?
#21494 commented on Jun 16, 2025 • 0 new comments
[Core] Ray Actor abnormal exit problem && Reproduction
#17198 commented on Jun 16, 2025 • 0 new comments
Support resizing placement groups
#16403 commented on Jun 16, 2025 • 0 new comments
[dashboard] Errors are not shown
#15238 commented on Jun 16, 2025 • 0 new comments
changing the docker image in consecutive `ray up` calls fails.
#14990 commented on Jun 16, 2025 • 0 new comments
[metrics] Add regression tests for Prometheus metrics
#14614 commented on Jun 16, 2025 • 0 new comments
[dashboard] Show more nodes at a time instead of paging through
#14537 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Support Memory Aware Scheduling on a multi-node-type cluster.
#14104 commented on Jun 16, 2025 • 0 new comments
[RFC] k8s-native worker pool
#14077 commented on Jun 16, 2025 • 0 new comments
[dashboard] clicking on a column to sort makes the UI blank
#13525 commented on Jun 16, 2025 • 0 new comments
Autoscaler does not respect --num-cpus argument to `ray start`
#13270 commented on Jun 16, 2025 • 0 new comments
[core] Number of CPUs in ray.available_resources() does not match Dashboard's Machine View
#13100 commented on Jun 16, 2025 • 0 new comments
atexit handlers don't run when actor is terminated from going out of scope
#12806 commented on Jun 16, 2025 • 0 new comments
Task Cancellation is broken for queued tasks
#12080 commented on Jun 16, 2025 • 0 new comments
[logging] Use 'warnings.warn' appropriately
#12060 commented on Jun 16, 2025 • 0 new comments
[Dashboard] New dashboard port errors in a large cluster.
#11638 commented on Jun 16, 2025 • 0 new comments
ES Trainer does not support evaluation workers
#10999 commented on Jun 16, 2025 • 0 new comments
[Plasma] Improve plasma documentation on distributed storage
#10858 commented on Jun 16, 2025 • 0 new comments
Unable to connect to ray head running on linux from ray worker node on windows
#10362 commented on Jun 16, 2025 • 0 new comments
Ray log tracing
#9786 commented on Jun 16, 2025 • 0 new comments
[Core] Logging policy should be clearly defined and needs unit test coverage
#9692 commented on Jun 16, 2025 • 0 new comments
[dashboard] Error on Infinity values
#9103 commented on Jun 16, 2025 • 0 new comments
[Feature][Client] remove ray.disconnect() and ray.connect()
#22125 commented on Jun 16, 2025 • 0 new comments
[Bug] Detached actor exceptions are not logged.
#21810 commented on Jun 16, 2025 • 0 new comments
[Bug] Sometimes the worker node logs in the ray dashboard are empty
#21785 commented on Jun 16, 2025 • 0 new comments
[Core][Feature] Add checksum support for object store.
#21782 commented on Jun 16, 2025 • 0 new comments
Setting VF_SHARE_LAYERS to False and NO_FINAL_LINEAR to true leads to a bug
#21756 commented on Jun 16, 2025 • 0 new comments
[Feature] [runtime env] support using different python versions in Ray cluster
#21597 commented on Jun 16, 2025 • 0 new comments
[Feature] [Serve] Request Redistribution Among Replicas
#21578 commented on Jun 16, 2025 • 0 new comments
Put failed error occurred when shutdown and init again at client mode
#21573 commented on Jun 16, 2025 • 0 new comments
[Core] [Bug] No timeout or deadlock on scheduling job in remote cluster
#21419 commented on Jun 16, 2025 • 0 new comments
[Feature] [Autoscaler] Scaling Intelligently Based on Observed Resource Bottlenecks (related: task & actor profiling)
#21301 commented on Jun 16, 2025 • 0 new comments
[Core] [Bug] Failed to register worker to Raylet for single node, multi-GPU
#21226 commented on Jun 16, 2025 • 0 new comments
[Train] Port over `timm` example to Train
#21020 commented on Jun 16, 2025 • 0 new comments
[Bug] [RLlib] Custom metrics are not reported to Tune
#20938 commented on Jun 16, 2025 • 0 new comments
[Train] Deepspeed support
#20648 commented on Jun 16, 2025 • 0 new comments
[Bug] Cannot start cluster if other user is already running one
#20634 commented on Jun 16, 2025 • 0 new comments
[Bug] Excess memory usage when scheduling tasks in parallel?
#20618 commented on Jun 16, 2025 • 0 new comments
[Bug] Ray auto init interacts badly with allow_multiple=True and kills python shell
#20355 commented on Jun 16, 2025 • 0 new comments
[Bug] BasicVariantGenerator not compatible with Repeater
#19879 commented on Jun 16, 2025 • 0 new comments
[Feature] Support Sigopt for Tune standard space definitions
#19018 commented on Jun 16, 2025 • 0 new comments
[Bug] Re-enable Worker in Container Tests.
#18787 commented on Jun 16, 2025 • 0 new comments
[GCP][autoscaler] Rework ray TPU demos to create nothing but TPU VMs (no harddrives / `n2-standard-2` instances)
#18645 commented on Jun 16, 2025 • 0 new comments
Ray kill actor API is a GET request
#18411 commented on Jun 16, 2025 • 0 new comments
[AIR/Train] Torch: Automatically unpack model when checkpointing state dicts
#24975 commented on Jun 16, 2025 • 0 new comments
[AIR/Train] Automatically return the framework specific dataset in `train_loop_per_worker`
#24974 commented on Jun 16, 2025 • 0 new comments
[<Ray component: RLlib>] ppo error when not using critic
#24907 commented on Jun 16, 2025 • 0 new comments
[RLlib]: Add tabular models to ModelV2
#24882 commented on Jun 16, 2025 • 0 new comments
[RLlib] Error when converting GYM Robotics env to Multi-agent Env with the make_multi_agent wrapper
#24881 commented on Jun 16, 2025 • 0 new comments
[tune] SigOptSearch suggester is not serialisable
#24864 commented on Jun 16, 2025 • 0 new comments
[core] Add basic metrics for lineage reconstruction
#24855 commented on Jun 16, 2025 • 0 new comments
[Core] Enhance runtime env state when `ray list runtime-env` is used.
#24838 commented on Jun 16, 2025 • 0 new comments
[Core] Refactor Ray memory codepath to follow same pattern as `ray list tasks`.
#24836 commented on Jun 16, 2025 • 0 new comments
[Core] Reach parity of task status for `ray memory` and `ray list tasks`
#24835 commented on Jun 16, 2025 • 0 new comments
[Tune] `MedianStoppingRule` mishandles `nan`s
#24809 commented on Jun 16, 2025 • 0 new comments
[Serve] Simplify json_serde of deployment graph
#24620 commented on Jun 16, 2025 • 0 new comments
[RLlib] Metrics not reported with Client/Server and env=None
#24601 commented on Jun 16, 2025 • 0 new comments
[Core] In C++, there are D_GLIBCXX_USE_CXX11_ABI settings conflicts when both Ray and Arrow are used.
#24566 commented on Jun 16, 2025 • 0 new comments
[Serve] `Deployment.url` not updated after options changing name or prefix.
#24548 commented on Jun 16, 2025 • 0 new comments
[Rllib] Lack validation for "num_workers" parameter in DDPGTrainer.
#24536 commented on Jun 16, 2025 • 0 new comments
[doc] Update instructions for wheel installation
#24533 commented on Jun 16, 2025 • 0 new comments
[RLlib] Simplex action space shape
#24529 commented on Jun 16, 2025 • 0 new comments
[Tune] Make it easy to configure logger level
#24447 commented on Jun 16, 2025 • 0 new comments
[tune] improve documentation around "resource exhausted error"
#24439 commented on Jun 16, 2025 • 0 new comments
[Core] Unify RegisterClient and AnnounceWorkerPort
#24432 commented on Jun 16, 2025 • 0 new comments
[core] Annotation and docstring for ray.remote wrapped functions
#24411 commented on Jun 16, 2025 • 0 new comments
"ray timeline" command fails when RAY_ADDRESS is set
#8951 commented on Jun 16, 2025 • 0 new comments
[tune] [dashboard] Table formatting issues due to too many hparams
#8667 commented on Jun 16, 2025 • 0 new comments
[autoscaler] "Cannot perform an interactive login from a non TTY device" when trying to use a private docker registry
#7339 commented on Jun 16, 2025 • 0 new comments
Allowing multiple users to access a single ray cluster
#6800 commented on Jun 16, 2025 • 0 new comments
[Ray core & ray cluster] Add diagrams/architectures to explain how to run ray locally vs remotely
#25663 commented on Jun 16, 2025 • 0 new comments
[Ray Clusters] Remove nightly and latest images and wheels from all example configs.
#25606 commented on Jun 16, 2025 • 0 new comments
[air] Consider having a preprocessor for Feast integration
#25559 commented on Jun 16, 2025 • 0 new comments
[Core] Open telemetry Context pass from ray client to actors
#25538 commented on Jun 16, 2025 • 0 new comments
[dataset] Reduce tasks in push-based shuffle are not evenly distributed
#25468 commented on Jun 16, 2025 • 0 new comments
[Core] [State Observability] List all actor logs when actors are restarted.
#25443 commented on Jun 16, 2025 • 0 new comments
[air] Ordinal Encoder complains about None
#25442 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] google-cloud-storage seems cannot read GOOGLE_APPLICATION_CREDENTIALS
#25308 commented on Jun 16, 2025 • 0 new comments
[Serve] Dynamically move models between CPUs and GPUs
#25295 commented on Jun 16, 2025 • 0 new comments
[RLlib][Doc] Add documentation for `ModelCatalog.get_model_v2()`
#25186 commented on Jun 16, 2025 • 0 new comments
[AIR] MLflow integration polish
#25156 commented on Jun 16, 2025 • 0 new comments
[AIR] TensorFlow warns to use `distribute.MultiWorkerMirroredStrategy` when I'm already using it
#25140 commented on Jun 16, 2025 • 0 new comments
[air] Have a default column for not frequent enough categories for OHE
#25096 commented on Jun 16, 2025 • 0 new comments
[Core] Make NodeManager unit testable
#25095 commented on Jun 16, 2025 • 0 new comments
[AIR] Improve logging for train
#25088 commented on Jun 16, 2025 • 0 new comments
[RLlib] Hope RLlib can support DQfD & POfD
#25058 commented on Jun 16, 2025 • 0 new comments
[AIR] Support postprocessing in Predictors
#24979 commented on Jun 16, 2025 • 0 new comments
[AIR] Add a `TorchVision` preprocessor
#24976 commented on Jun 16, 2025 • 0 new comments
[Fix][GCS] Implement reconnection for RedisContext
#48781 commented on Jun 22, 2025 • 0 new comments
[Core]: Fix ConnectionError on Autoscaler CR lookups in K8s clusters …
#48675 commented on Jun 22, 2025 • 0 new comments
[runtime env]: Integrating ROCm Systems Profiler to Ray worker process
#48525 commented on Jun 19, 2025 • 0 new comments
Fix invalid type for progress_reporter parameter of RunConfig
#48439 commented on Jun 19, 2025 • 0 new comments
[doc] fix: Typo and missing import in doc
#48311 commented on Jun 19, 2025 • 0 new comments
[WIP][core] C++20 upgrade
#48044 commented on Jun 19, 2025 • 0 new comments
:bug: do not modify user-provided runtime_env
#48021 commented on Jun 22, 2025 • 0 new comments
[Data] Fix parallelism deriving heuristic to ensure parallelism stays w/in min/max bounds
#47695 commented on Jun 20, 2025 • 0 new comments
Add generic item support for queue
#46849 commented on Jun 18, 2025 • 0 new comments
blind try on ubuntu upgrade ..
#45427 commented on Jun 16, 2025 • 0 new comments
[core] object store data transfer zstd
#44755 commented on Jun 17, 2025 • 0 new comments
[data] add better support for list-typed fields when using `write_bigquery`
#44564 commented on Jun 22, 2025 • 0 new comments
Ray IPv6 support
#44252 commented on Jun 19, 2025 • 0 new comments
Adapt the joblib backend for compatibility with `return_as=generator`
#41028 commented on Jun 16, 2025 • 0 new comments
[dashboard] ignore reinit error when getting dashboard url
#40545 commented on Jun 19, 2025 • 0 new comments
Update pettingzoo_env.py
#39431 commented on Jun 17, 2025 • 0 new comments
[ci] remove is_automated_build in setup.py
#36547 commented on Jun 16, 2025 • 0 new comments
[RLLib][Air] MLFlow parsing of RLLib evaluation and custom metrics
#26711 commented on Jun 17, 2025 • 0 new comments
[Serve] Make replica scheduler backoff configurable
#52871 commented on Jun 21, 2025 • 0 new comments
[core|serve] Migrate shared utilities from `ray._private` to `ray._common`
#53478 commented on Jun 21, 2025 • 0 new comments
CI test linux://rllib:examples/metrics/custom_metrics_in_algorithm_training_step is flaky
#51870 commented on Jun 21, 2025 • 0 new comments
[core] Cleanup gcs event listeners and gcs_storage env variable
#53566 commented on Jun 19, 2025 • 0 new comments
update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
#51673 commented on Jun 20, 2025 • 0 new comments
[Docs][wip] Feature: adopt llms.txt convention
#51605 commented on Jun 20, 2025 • 0 new comments
[Refactor]Rename NCCL-related items to comm_backend
#51061 commented on Jun 20, 2025 • 0 new comments
[doc] add jax example
#51040 commented on Jun 22, 2025 • 0 new comments
Suppress type error
#50994 commented on Jun 22, 2025 • 0 new comments
fix restore BUG "RuntimeError: Expected scalars to be on CPU, got cud…
#50983 commented on Jun 22, 2025 • 0 new comments
[CI] Enable pretty-format-java pre-commit hook
#50957 commented on Jun 22, 2025 • 0 new comments
[core] Cover cpplint for ray/src/ray/stats
#50678 commented on Jun 22, 2025 • 0 new comments
[Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on Jun 22, 2025 • 0 new comments
[RLlib] Enable spliting and zero padding of Dict observation
#50589 commented on Jun 22, 2025 • 0 new comments
[Autoscaler][V2] Use running node instances to rate-limit upscaling
#50414 commented on Jun 22, 2025 • 0 new comments
[core][collective] Avoid creation of `gloo_queue` in race condition
#50132 commented on Jun 22, 2025 • 0 new comments
Update multi-agent-envs.rst
#50075 commented on Jun 22, 2025 • 0 new comments
[core] Thread-safe gcs node manager
#50024 commented on Jun 22, 2025 • 0 new comments
[core] Don't get dashboard address after each dashboard connection failure
#49584 commented on Jun 21, 2025 • 0 new comments
[core][cgraph] Use cv instead of busy wait for next version
#49542 commented on Jun 21, 2025 • 0 new comments
[RLlib] Add NPU and HPU support to RLlib
#49535 commented on Jun 17, 2025 • 0 new comments
[core][cgraph] Use threadpool and one io_context for mutable object provider
#49500 commented on Jun 21, 2025 • 0 new comments
[Fix][Core] Periodically check log message queue cleared before shutdown
#49337 commented on Jun 22, 2025 • 0 new comments
[WIP] Remove VM cluster autoscaler docker implementation
#49238 commented on Jun 17, 2025 • 0 new comments
[train] Make dataset argument covariant
#48999 commented on Jun 19, 2025 • 0 new comments
[flaky] test_scheduling_2.py::test_demand_report_when_scale_up
#53811 commented on Jun 19, 2025 • 0 new comments
[Serve] Allow HTTPs Options in Ray Serve
#26814 commented on Jun 19, 2025 • 0 new comments
[Core] Transient network failure on RPC `WaitForActorRefDeleted` causes actor registration fail
#53797 commented on Jun 18, 2025 • 0 new comments
[Core] Transient network failure on RPC `MarkJobFinished` causes node crash
#53645 commented on Jun 18, 2025 • 0 new comments
Core: Ray 2.45 causes Google's LIBTPU to be very spammy
#53756 commented on Jun 18, 2025 • 0 new comments
[Core] Make Ray Core tasks/actors metrics counters (accumulators)
#47522 commented on Jun 18, 2025 • 0 new comments
[RLlib]
#52683 commented on Jun 18, 2025 • 0 new comments
CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing
#45223 commented on Jun 18, 2025 • 0 new comments
[Core] Ray causes a 25% slower GPU performance compared with manually written Multi-processing program on 8 Hopper GPUs
#53799 commented on Jun 18, 2025 • 0 new comments
[Core] ASSERTION FAILED: queue.num_items() == 0
#53510 commented on Jun 18, 2025 • 0 new comments
[Serve] Proxy issues: Request cancellation, intermittent 503 backpressure, and max_queued_requests configuration not applied
#53794 commented on Jun 18, 2025 • 0 new comments
[Dashboard] Discrepancy between Worker Process Memory Display on Dashboard and RSS Statistics
#53829 commented on Jun 18, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] Issue of port allocation
#53790 commented on Jun 18, 2025 • 0 new comments
[Dashboard] Support ncu
#53759 commented on Jun 18, 2025 • 0 new comments
Incorrect default value of CUBLAS_WORKSPACE_CONFIG
#47690 commented on Jun 18, 2025 • 0 new comments
[Serve] make various default values of `AutoscalingConfig.max_replicas` consistent and >1
#50222 commented on Jun 18, 2025 • 0 new comments
CI test linux://rllib:learning_tests_stateless_cartpole_appo_gpu is flaky
#47295 commented on Jun 18, 2025 • 0 new comments
[core][compiled graph] Support all-to-one collective ops (e.g. reduce)
#49324 commented on Jun 18, 2025 • 0 new comments
[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml
#14551 commented on Jun 18, 2025 • 0 new comments
[Core] The streaming generator will mark the inplasma object that is already ready as failed after the task fails.
#53772 commented on Jun 17, 2025 • 0 new comments
[core] control whether to construct a default concurrency group executor when max-concurrency=1 and there are other concurrency group for an actor
#53771 commented on Jun 17, 2025 • 0 new comments
[core] Race condition between raylet graceful shutdown and GCS health checks
#53739 commented on Jun 17, 2025 • 0 new comments
[Core] BUG: Cluster crashes when using temp_dir "could not connect to socket" raylet.x [since 2.7+]
#44431 commented on Jun 20, 2025 • 0 new comments
[Ray debugger] Unable to use debugger on slurm cluster
#51157 commented on Jun 20, 2025 • 0 new comments
[Core] Submitted containerized job is stuck in pending mode
#37293 commented on Jun 20, 2025 • 0 new comments
[Data] Custom Partitioner in Ray Data and Related Implementation Considerations
#53800 commented on Jun 20, 2025 • 0 new comments
[core][gpu-objects] Ability to register custom types for GPU data
#52340 commented on Jun 20, 2025 • 0 new comments
CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_cpu is consistently_failing
#47264 commented on Jun 19, 2025 • 0 new comments
[data] Bad error message when function outputs cannot be pickled
#46642 commented on Jun 19, 2025 • 0 new comments
[data] ObjectRefs passed to map UDF are not automatically deref'ed
#49207 commented on Jun 19, 2025 • 0 new comments
[Core] Ray hangs with vllm0.8.5 v1 api for tp8+pp4
#53758 commented on Jun 19, 2025 • 0 new comments
[data] Optimize Dataset.unique()
#38764 commented on Jun 19, 2025 • 0 new comments
Error Handling Large Pyarrow Chunk
#53536 commented on Jun 19, 2025 • 0 new comments
[<Ray component: Data>] lack of check for empty table produce lots of error messages
#53605 commented on Jun 19, 2025 • 0 new comments
Release test random_shuffle_fixed_size failed
#53806 commented on Jun 19, 2025 • 0 new comments
[Data] Support for SQL/DataFrame capability
#53693 commented on Jun 19, 2025 • 0 new comments
[RayData] The write operator supports the use of an actor pool
#53552 commented on Jun 19, 2025 • 0 new comments
[core][gpu-objects] Actor sends the same ObjectRef twice to another actor
#51273 commented on Jun 19, 2025 • 0 new comments
[Autoscaler] Improve NodeProvider interface, make it easier to extend it to cluster managers (e.g. Fargate)
#25134 commented on Jun 19, 2025 • 0 new comments
[RLlib] Observation space with 2 dimensions not working with the new API stack
#46631 commented on Jun 19, 2025 • 0 new comments
[Ray Client] - Client server failed with runtime_env container
#29852 commented on Jun 19, 2025 • 0 new comments
[Ray Train] XGBoostTrainer crashes with ActorDiedError when using num_workers > 1 and use_gpu=False
#53123 commented on Jun 19, 2025 • 0 new comments
CI test windows://python/ray/serve/tests:test_request_timeout is flaky
#48417 commented on Jun 19, 2025 • 0 new comments
CI test linux://rllib:examples/algorithms/appo_custom_algorithm_w_shared_data_actor is flaky
#53176 commented on Jun 19, 2025 • 0 new comments
Bump torch from 2.3.0 to 2.7.1 in /python
#53558 commented on Jun 19, 2025 • 0 new comments
Script to generate test coverage for doc files
#53556 commented on Jun 16, 2025 • 0 new comments
[core] Support pip_install_options for pip
#53551 commented on Jun 20, 2025 • 0 new comments
[RLlib] Upgrade RLlink protocol for external env/simulator training.
#53550 commented on Jun 17, 2025 • 0 new comments
[data] add Lance-based ordered data conversion that keeps row_id content unchanged
#53542 commented on Jun 19, 2025 • 0 new comments
[serve.llm] Update ray-llm docker
#53532 commented on Jun 16, 2025 • 0 new comments
[RLlib] Wrapper which allows EnvRunners to operate on environments with Repeated observation spaces
#53519 commented on Jun 19, 2025 • 0 new comments
[core][telemetry/09] record sum metric e2e
#53512 commented on Jun 20, 2025 • 0 new comments
[Dashboard] Fixing residual state leaks in Dashboard/Agent
#53508 commented on Jun 20, 2025 • 0 new comments
[Data] Add a data compaction function
#53489 commented on Jun 19, 2025 • 0 new comments
[WIP][Data] Batch query for block_ref_iter
#53485 commented on Jun 18, 2025 • 0 new comments
docs test coverage script
#53482 commented on Jun 18, 2025 • 0 new comments
[core][compiled graphs] Unify and simplify NCCL operation nodes
#53470 commented on Jun 21, 2025 • 0 new comments
[Data] Add dropna function
#53464 commented on Jun 18, 2025 • 0 new comments
[Serve] Set the docs path after app is initialized on the replica
#53463 commented on Jun 22, 2025 • 0 new comments
[core] Check if a task can be spilled before checking if args can be pinned
#53462 commented on Jun 19, 2025 • 0 new comments
[Data] Add fillna function
#53459 commented on Jun 18, 2025 • 0 new comments
[WIP][Data] Add support for Arrow native fixed-shape tensor type
#53450 commented on Jun 18, 2025 • 0 new comments
Bump torch from 2.0.1 to 2.7.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#53447 commented on Jun 19, 2025 • 0 new comments
[Data] add switch for optimizer rules
#53427 commented on Jun 20, 2025 • 0 new comments
[Data] Add support for ray.dataset.map_sql
#53417 commented on Jun 20, 2025 • 0 new comments
[Doc][KubeRay] remove head pod trailing hash and adjust volcano output
#53826 commented on Jun 16, 2025 • 0 new comments
[core] Add switch for the cache of runtime env
#53775 commented on Jun 21, 2025 • 0 new comments
[Core][Bug fix]Fix issue: the streaming generator will mark the inplasma object that is already ready as failed after the task fails.
#53773 commented on Jun 17, 2025 • 0 new comments
[core] Control whether to construct a default concurrency group executor when max-concurrency=1 and there are other concurrency groups for an actor
#53770 commented on Jun 17, 2025 • 0 new comments
[core] upgrade opentelemetry-sdk
#53745 commented on Jun 20, 2025 • 0 new comments
[core][telemetry/11] support histogram metric on worker side
#53740 commented on Jun 20, 2025 • 0 new comments
[core][telemetry/10] support custom gauge+counter+sum metrics
#53734 commented on Jun 20, 2025 • 0 new comments
(serve.llm) Remove test leakage from placement bundle logic
#53723 commented on Jun 20, 2025 • 0 new comments
[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling.
#53702 commented on Jun 20, 2025 • 0 new comments
[Data] Add reading from Delta Lake tables and from Unity Catalog
#53701 commented on Jun 21, 2025 • 0 new comments
[data] allow max_calls to be a static but not dynamic option
#53687 commented on Jun 19, 2025 • 0 new comments
[Serve] Check multiple FastAPI ingress deployments in a single application
#53647 commented on Jun 22, 2025 • 0 new comments
[core] Gcs actor manager cleanup
#53633 commented on Jun 22, 2025 • 0 new comments
[rllib] IMPALA fix no attribute '_minibatch_size'
#53620 commented on Jun 21, 2025 • 0 new comments
[serve.llm] Add useful logging in prefill_decode_disagg.py
#53604 commented on Jun 16, 2025 • 0 new comments
[core] Cleanup retryable grpc client
#53599 commented on Jun 21, 2025 • 0 new comments
[Do not merge] Run ray data release tests with export API
#53594 commented on Jun 21, 2025 • 0 new comments
BLD: Update ``.bazelrc`` file for Windows 11 build
#53586 commented on Jun 20, 2025 • 0 new comments
[CI] Re-enable isort for all remaining files
#53583 commented on Jun 22, 2025 • 0 new comments
[Not for Merge] Event Aggregator Perf
#53576 commented on Jun 19, 2025 • 0 new comments
[DON'T MERGE]
#53575 commented on Jun 20, 2025 • 0 new comments
[Data] [Draft] user guide for aggregations
#53568 commented on Jun 21, 2025 • 0 new comments
[data] New landing page with better examples that show key workloads
#53228 commented on Jun 17, 2025 • 0 new comments
[core] enable -Wshadow for all c++ targets
#53194 commented on Jun 17, 2025 • 0 new comments
[core] Returning a useful message when trying to get logs for a job that has not started yet
#53174 commented on Jun 17, 2025 • 0 new comments
[draft] Submit Ray release test as RayJob to Kuberay GKE
#53165 commented on Jun 20, 2025 • 0 new comments
[data] fix lance count_rows not support filter
#53162 commented on Jun 17, 2025 • 0 new comments
[core] Don't try to monitor zipped files
#53151 commented on Jun 19, 2025 • 0 new comments
Make vllm_engine a deployment
#53139 commented on Jun 17, 2025 • 0 new comments
Fix broken Ray Workflows documentation link in README.rst
#53136 commented on Jun 17, 2025 • 0 new comments
feat(runtime_env): add Azure Blob Storage support
#53135 commented on Jun 17, 2025 • 0 new comments
macos wheel build debug
#53119 commented on Jun 17, 2025 • 0 new comments
Bump flask-cors from 4.0.0 to 6.0.0 in /python
#53116 commented on Jun 17, 2025 • 0 new comments
WIP: Add iter_torch_batches Tensor cache
#53069 commented on Jun 19, 2025 • 0 new comments
[core] Remove tests that are permanently skipped with old decorator
#53046 commented on Jun 17, 2025 • 0 new comments
[core] Node manager related cpp cleanup
#52990 commented on Jun 17, 2025 • 0 new comments
[RLlib; Offline RL] - Use `iter_torch_batches` in learner
#52968 commented on Jun 20, 2025 • 0 new comments
[Data] fix write_iceberg error
#52956 commented on Jun 17, 2025 • 0 new comments
[Core] Ensure Ray vendored libraries only be visible and used by Ray internal
#52905 commented on Jun 17, 2025 • 0 new comments
[Data] added XML datasource
#52539 commented on Jun 17, 2025 • 0 new comments
[core] Minor task manager related improvements
#52294 commented on Jun 19, 2025 • 0 new comments
[Core] Deserialization of PyArrow Extension Arrays by registration of deserializers
#51972 commented on Jun 20, 2025 • 0 new comments
Add new autoscaling parameter `aggregation function`
#51905 commented on Jun 19, 2025 • 0 new comments
[core] Remove client call tag
#51817 commented on Jun 19, 2025 • 0 new comments
kuberay edits
#53411 commented on Jun 18, 2025 • 0 new comments
[WIP] Remove `_owner` arg for `ray.put`
#53410 commented on Jun 18, 2025 • 0 new comments
[Core] Add Logic to Emit Task Events to Event Aggregator
#53402 commented on Jun 19, 2025 • 0 new comments
[serve.llm] DO NOT REVIEW, IN DRAFT
#53391 commented on Jun 16, 2025 • 0 new comments
Deduplicate schema in BlockMetadata
#53384 commented on Jun 18, 2025 • 0 new comments
[serve.llm] [cleanup] Add LLMConfig.parse_from() api
#53382 commented on Jun 18, 2025 • 0 new comments
Bump vllm from 0.8.5 to 0.9.0 in /python
#53375 commented on Jun 18, 2025 • 0 new comments
[WIP] Remove global worker
#53372 commented on Jun 18, 2025 • 0 new comments
Filter out ANSI escape codes from logs when retrieving logs from the dashboard
#53370 commented on Jun 19, 2025 • 0 new comments
fix: Type of AlgorithmConfig.training(learner_connector
#53369 commented on Jun 19, 2025 • 0 new comments
[core] Cleanup plasma client and object manager
#53357 commented on Jun 19, 2025 • 0 new comments
[Docs] Clarify Train-side docs on Ray Data
#53349 commented on Jun 20, 2025 • 0 new comments
[RLlib] ConnectorV2 API polishings (stricter input-/output batch formats).
#53328 commented on Jun 18, 2025 • 0 new comments
[core] Avoid making rpc for local GetLocationFromOwner
#53322 commented on Jun 17, 2025 • 0 new comments
Omar/kuberay anyscale
#53318 commented on Jun 17, 2025 • 0 new comments
[data] Add GroupedData.random_sample() for group-wise sampling
#53313 commented on Jun 17, 2025 • 0 new comments
[core] Core worker get cv - notify after unlock
#53311 commented on Jun 17, 2025 • 0 new comments
Make core worker testable
#53299 commented on Jun 19, 2025 • 0 new comments
[core][autoscaler][v1] drop object_store_memory from ResourceDemandScheduler._update_node_resources_from_runtime
#53283 commented on Jun 18, 2025 • 0 new comments
Bump tornado from 6.1 to 6.5.1 in /python
#53274 commented on Jun 17, 2025 • 0 new comments
Override Autoscaler
#53245 commented on Jun 17, 2025 • 0 new comments
[data] add explain interface for dataset
#53235 commented on Jun 17, 2025 • 0 new comments
[Serve] Support for setting `working_dir` to a local directory in `RayService`
#33456 commented on Jun 17, 2025 • 0 new comments
RLLIB - RE3 Exploration Algorithm - No GPU support f0r Dynamic TF V2
#33425 commented on Jun 17, 2025 • 0 new comments
[client] kubernetes w ray client
#33367 commented on Jun 17, 2025 • 0 new comments
[Train] Reporting metrics/checkpoints from multiple workers
#33360 commented on Jun 17, 2025 • 0 new comments
[Data] `read_parquet` schema is incorrect (schema is a dict instead of a string)
#33279 commented on Jun 17, 2025 • 0 new comments
[Ray status] confusing output about gpus and accelerators
#33272 commented on Jun 17, 2025 • 0 new comments
[Tune] mlflow logger callback > log_trial_result fail (psycopg2.ProgrammingError) can't adapt type 'numpy.int64'
#33233 commented on Jun 17, 2025 • 0 new comments
[Serve] Enhance replica upgrade process.
#33192 commented on Jun 17, 2025 • 0 new comments
[air output] Isolate/refactor/improve rllib related progress reporting logic
#33150 commented on Jun 17, 2025 • 0 new comments
[Tune][wandb] Report tune experiments as a wandb `sweep`
#33142 commented on Jun 17, 2025 • 0 new comments
[AIR][wandb] Add option to track artifact references in wandb if using cloud storage
#33130 commented on Jun 17, 2025 • 0 new comments
[AIR][Tune] Add an option in `WandbLoggerCallback` to group wandb runs by config
#33084 commented on Jun 17, 2025 • 0 new comments
[Serve] Support external storage for state
#33059 commented on Jun 17, 2025 • 0 new comments
[Serve] Use the namespace of context instead of "serve" when the Controller gets all running Actors
#33057 commented on Jun 17, 2025 • 0 new comments
[Serve] Specify replicas when scaling down
#33056 commented on Jun 17, 2025 • 0 new comments
[Serve] Restart a batch of replicas by Actor names or replica tags
#33055 commented on Jun 17, 2025 • 0 new comments
[Serve] Specify a batch of replicas to update their user_config
#33054 commented on Jun 17, 2025 • 0 new comments
Ray Core Runtime Environments with tea.xyz
#33049 commented on Jun 17, 2025 • 0 new comments
[Ray Tune] Support for continuing training when metrics are only reported from some of the workers
#33042 commented on Jun 17, 2025 • 0 new comments
[Data] Cannot get the length of a tf dataset created from `ray_ds.to_tf`
#33004 commented on Jun 17, 2025 • 0 new comments
[Data] Include image class id in the returned datasets of `ray.data.read_images()`.
#32989 commented on Jun 17, 2025 • 0 new comments
[Datasets] Raise descriptive error if `iter_torch_batches` can't convert data
#32953 commented on Jun 17, 2025 • 0 new comments
[RLlib] Windows CLI, cmd.exe, powershell parsing json arguments JSONDecodeError
#35492 commented on Jun 17, 2025 • 0 new comments
[Core] Timeout for unschedulable task due to unavailable workers
#33954 commented on Jun 17, 2025 • 0 new comments
[Observability] Programmatically fetch prometheus metrics
#33940 commented on Jun 17, 2025 • 0 new comments
[Ray AIR] Add more documentation about checkpointing
#33932 commented on Jun 17, 2025 • 0 new comments
Ray Workflow
#33844 commented on Jun 17, 2025 • 0 new comments
[Train] Intermittent `UnpicklingError` when loading estimator/preprocessor from checkpoint
#33815 commented on Jun 17, 2025 • 0 new comments
[AIR output] Warnings for AIR_VERBOSITY is confusing
#33810 commented on Jun 17, 2025 • 0 new comments
[air output] Aggregation of feedback for air output v2
#33803 commented on Jun 17, 2025 • 0 new comments
[Datasets] `FileBasedDataSource`s do not pass `filesystem` to `_read_stream()` methods' `reader_args`
#33777 commented on Jun 17, 2025 • 0 new comments
[Core][Runtime Env] Document how to write custom runtime env plugin
#33746 commented on Jun 17, 2025 • 0 new comments
Core: Can the ray core's scheduling mechanism support customized extensions?
#33735 commented on Jun 17, 2025 • 0 new comments
[Ray init] Ray init method does not support pathlib.Path
#33672 commented on Jun 17, 2025 • 0 new comments
[docs] improve user experience of the API ref
#33645 commented on Jun 17, 2025 • 0 new comments
[RLLib] Collecting external experience
#33636 commented on Jun 17, 2025 • 0 new comments
[Workflow] get_metadata(workflow_id)["status"] and get_status(workflow_id) not returning the same status
#33633 commented on Jun 17, 2025 • 0 new comments
[runtime_env] Actors always depend global `pip` field for `runtime_env`
#33607 commented on Jun 17, 2025 • 0 new comments
[Core] Raylet process not respecting `--node-ip-address`
#33554 commented on Jun 17, 2025 • 0 new comments
[Tune] Support ExperimentAnalysis.dataframe(mode='mean')
#33540 commented on Jun 17, 2025 • 0 new comments
[Train] `RunConfig` doesn't get propagated from the Tuner to the Trainer
#33539 commented on Jun 17, 2025 • 0 new comments
[Core] std::bad_alloc error using ray.init()
#33525 commented on Jun 17, 2025 • 0 new comments
[Core] `test_memory_deadlock` times out
#33491 commented on Jun 17, 2025 • 0 new comments
[Core] Support binding worker processes to NUMA nodes
#33465 commented on Jun 17, 2025 • 0 new comments
Serve build usage of click CLI library conflicts python argparse
#32001 commented on Jun 17, 2025 • 0 new comments
[Serve] Version Support in 2.X API
#31928 commented on Jun 17, 2025 • 0 new comments
[Train] User exceptions not propagated from remote cluster
#31913 commented on Jun 17, 2025 • 0 new comments
[RLlib] AlgorithmConfig() defaults not used by build_sac_model when implementing custom model
#31783 commented on Jun 17, 2025 • 0 new comments
[kubernetes/cluster] More guides on deployment
#31623 commented on Jun 17, 2025 • 0 new comments
[core][state] ray log supporting regex searching
#31549 commented on Jun 17, 2025 • 0 new comments
[Tune] Support NLopt search algorithms
#31492 commented on Jun 17, 2025 • 0 new comments
[Rllib] Possible Redudant Code
#31463 commented on Jun 17, 2025 • 0 new comments
[aws] ray submit --stop fails on aws
#31380 commented on Jun 17, 2025 • 0 new comments
[Tune] Avoid insufficient resources warning if cluster is autoscaling
#31292 commented on Jun 17, 2025 • 0 new comments
No worker logs in the dashboard after recreating the K8S Ray pods
#31288 commented on Jun 17, 2025 • 0 new comments
[core] Please improve warning message for ip mismatch
#31264 commented on Jun 17, 2025 • 0 new comments
[Ray Collective] Ray Collective AllGather is Completely Broken
#31259 commented on Jun 17, 2025 • 0 new comments
[core][state] Refactor use of bounded LRU/FIFO buffer/map used in task backend
#31158 commented on Jun 17, 2025 • 0 new comments
[core] Ray resources should be case-insensitive
#31087 commented on Jun 17, 2025 • 0 new comments
[RayCluster]
#31041 commented on Jun 17, 2025 • 0 new comments
[Serve] gRPCis should not allow route_prefix set
#30891 commented on Jun 17, 2025 • 0 new comments
[General] Setup a "code walkthrough" meetup or tutorial
#30852 commented on Jun 17, 2025 • 0 new comments
[RFC][core] Option to avoid scheduling tasks to nodes with disk full
#30843 commented on Jun 17, 2025 • 0 new comments
[core] Enable greater control over log verbosity
#30832 commented on Jun 17, 2025 • 0 new comments
[Core] ux issues of ray state cli for tasks
#30805 commented on Jun 17, 2025 • 0 new comments
[Tune] ability to specify search algorithm when using tune.run_experiments()
#30802 commented on Jun 17, 2025 • 0 new comments
[Serve] Don't start Serve agent if Serve isn't installed
#32920 commented on Jun 17, 2025 • 0 new comments
[Data]: `ds.take()` and `ds.iter_batches()` have unexpected different behavior for pd.Series columns
#32913 commented on Jun 17, 2025 • 0 new comments
[Ray: Serve] Model Composition primitives should be part of Serve Core API docs.
#32837 commented on Jun 17, 2025 • 0 new comments
[core][state] Task backend : already submitted cancelled task showing up as finished
#32826 commented on Jun 17, 2025 • 0 new comments
[AIR][Tune] Make trial checkpoint + artifact upload happen atomically
#32823 commented on Jun 17, 2025 • 0 new comments
[Tune] During multi-GPU training (using mp.spawn), ray.tune.report does not take effect.
#32810 commented on Jun 17, 2025 • 0 new comments
[Tune] failure when using more than one GPU
#32760 commented on Jun 17, 2025 • 0 new comments
[Runtime Env] Add docstring for public class methods and attributes
#32704 commented on Jun 17, 2025 • 0 new comments
[tune] Add suggestions on when `reuse_actor` should be set to false.
#32698 commented on Jun 17, 2025 • 0 new comments
[serve] serve run doesn't restart app successfully in some environments
#32633 commented on Jun 17, 2025 • 0 new comments
[train] Big performance hit when TensorFlow trainer is not scheduled on head node
#32509 commented on Jun 17, 2025 • 0 new comments
[doc][tune] clarify `Stopper`, what is `training_iteration`
#32497 commented on Jun 17, 2025 • 0 new comments
[release] update our xgboost release test to catch issues like (see discription)
#32491 commented on Jun 17, 2025 • 0 new comments
[Core] The remote function in the worker no longer runs after the head crashes
#32454 commented on Jun 17, 2025 • 0 new comments
[RLlib] Special __common__ key in MultiAgent batches is not documented
#32399 commented on Jun 17, 2025 • 0 new comments
[tune] update how trainable reports result/checkpoint to driver
#32380 commented on Jun 17, 2025 • 0 new comments
[Datasets] The projection pushdown cannot work with hive style partitioning file path
#32301 commented on Jun 17, 2025 • 0 new comments
[Core][utilization] some anti-pattern that not well supported by Ray core.
#32297 commented on Jun 17, 2025 • 0 new comments
[tune/train] Provide actionable error messages for common thirdparty errors
#32232 commented on Jun 17, 2025 • 0 new comments
[ci] Mirror external dependenies in CI
#32113 commented on Jun 17, 2025 • 0 new comments
[Serve] ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB
#32049 commented on Jun 17, 2025 • 0 new comments
[CLI] make `ray get-head-ip` and `ray get-worker-ips` work for kuberay clusters when run outside the cluster
#32037 commented on Jun 17, 2025 • 0 new comments
[RayClient]large object transfer failure
#35448 commented on Jun 17, 2025 • 0 new comments
[train] Simplify `test_transformers_trainer_steps::test_e2e_steps`
#35424 commented on Jun 17, 2025 • 0 new comments
[Core] Reducing scheduling fragmentation
#35422 commented on Jun 17, 2025 • 0 new comments
[Core, RLlib] Multi GPU RLlib experiment is unable to be scheduled.
#35409 commented on Jun 17, 2025 • 0 new comments
[Job] Failed to schedule supervisor actor leads to job failure
#35387 commented on Jun 17, 2025 • 0 new comments
[Job] Show submitter of a Job on the dashboard
#35367 commented on Jun 17, 2025 • 0 new comments
[Serve] Support sync function for multiplexing
#35356 commented on Jun 17, 2025 • 0 new comments
[AIR] [Train] train multiple instances simultaneously on machines with specified tags
#35333 commented on Jun 17, 2025 • 0 new comments
<RLlib> What is the cause of the low CPU utilization in rllib PPO?
#35313 commented on Jun 17, 2025 • 0 new comments
[VM launcher] Document how to set up the cluster when there is UFW firewall
#35254 commented on Jun 17, 2025 • 0 new comments
[Data] Infer the data schema in Ray Datasets
#35230 commented on Jun 17, 2025 • 0 new comments
[dashboard] how to adjust ray dashboard refresh rate?
#35156 commented on Jun 17, 2025 • 0 new comments
[KubeRay, dashboard] Clarify that the users can use persistent volumes for log_dir and ray dashboard can read from it.
#35137 commented on Jun 17, 2025 • 0 new comments
[RLlib] Better error handling when return shape from step() mismatch in utils._flatten_multidiscrete
#35113 commented on Jun 17, 2025 • 0 new comments
The ray rsync-up cli reports no issue, but actually file is absent on remote side (Ray AWS cluster)
#35051 commented on Jun 17, 2025 • 0 new comments
[Core] - GPU Support - Explanation of Results
#35048 commented on Jun 17, 2025 • 0 new comments
[Data] Optimize `read_datasource` setup
#35029 commented on Jun 17, 2025 • 0 new comments
[EC2 VM Cluster launcher] Document EC2 ssh key limit and workaround
#35020 commented on Jun 17, 2025 • 0 new comments
[VM launcher] Ran `Ray status` after I sshed in to the head node and it printed "No cluster status"
#35017 commented on Jun 17, 2025 • 0 new comments
[air/tune][multi-tenancy] Parallel runs can use the same experiment directory
#35006 commented on Jun 17, 2025 • 0 new comments
Issue on page /cluster/vms/examples/ml-example.html
#34996 commented on Jun 17, 2025 • 0 new comments
[Core] Custom docker image not scaling out
#53696 commented on Jun 17, 2025 • 0 new comments
[Serve] `fastapi_app` is still mutable in the deployment constructor after being passed to `@serve.ingress`
#52775 commented on Jun 17, 2025 • 0 new comments
[tune] `URI has empty scheme` error when `storage_path` in `RunConfig` is relative
#42969 commented on Jun 17, 2025 • 0 new comments
[Serve] Specify different images for each deployment
#52994 commented on Jun 17, 2025 • 0 new comments
[Conda] Ray should raise exception when ray is not installed in conda environment
#52672 commented on Jun 17, 2025 • 0 new comments
[Serve] Autoscaling not working correctly when `max_replica_per_node` is set in Ray Serve
#53582 commented on Jun 17, 2025 • 0 new comments
[Serve] Allow --metrics-export-port argument in "serve run" CLI command
#44426 commented on Jun 17, 2025 • 0 new comments
[data] verbose_progress=True doesn't work in client mode
#43200 commented on Jun 17, 2025 • 0 new comments
[data] importing ray.data closes logging handlers, breaking custom logging
#48846 commented on Jun 17, 2025 • 0 new comments
Ray Serve Replica Initialization Timeout: STDOUT "Failed to load", RequestCancelledError, Likely Due to Slow/Crashing RLModule.from_checkpoint()
#53079 commented on Jun 17, 2025 • 0 new comments
[RLlib] TorchDistributionWrapper Typing Information Should Be Changed
#33997 commented on Jun 17, 2025 • 0 new comments
[Core] DecodeError when `ray.put` a large (2GB) object
#35976 commented on Jun 17, 2025 • 0 new comments
[core][gpu-objects] Support streaming to overlap computation / communication
#51643 commented on Jun 17, 2025 • 0 new comments
[core][gpu-objects] Allow tensor metadata to be specified ahead of time for improved performance
#51279 commented on Jun 17, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] SAC config error about framework
#53694 commented on Jun 17, 2025 • 0 new comments
[air/output] Jupyter notebook trial result table keeps swapping column order
#35838 commented on Jun 17, 2025 • 0 new comments
[RLlib] Make Learner more standalone with regards to LearnerHyperparameters
#35788 commented on Jun 17, 2025 • 0 new comments
[AIR] `on_trial_complete` callback hook happens before trial resources are freed
#35721 commented on Jun 17, 2025 • 0 new comments
[core] Failed to close sockets in CoreWorker when crash.
#35681 commented on Jun 17, 2025 • 0 new comments
[serve][dashboard] Show last line instead of first line in Serve app status message
#35600 commented on Jun 17, 2025 • 0 new comments
Ray Data - Glob/wildcard in file path
#35499 commented on Jun 17, 2025 • 0 new comments
[serve] Document how to silence access logs from GradioIngress
#35496 commented on Jun 17, 2025 • 0 new comments
[Serve] Production Guide: Add instruction for non-K8s on-premise clusters
#34437 commented on Jun 17, 2025 • 0 new comments
[Serve] Ray Serve hangs and becomes unresponsive when calling ffmpeg in deployment
#34414 commented on Jun 17, 2025 • 0 new comments
[Serve] Deployments page tasks history is full of system tasks. Not very useful
#34386 commented on Jun 17, 2025 • 0 new comments
[Core] serialisation of dataclass in separate module fails to recognise parameter change in child dataclass, but functions correctly if in the same module
#34366 commented on Jun 17, 2025 • 0 new comments
ImportError: cannot import name 'torch' from 'ray.rllib.train'
#34354 commented on Jun 17, 2025 • 0 new comments
[core][state] Include job info for placement group
#34333 commented on Jun 17, 2025 • 0 new comments
[Jobs] Use new API `is_head_node` to find head node
#34317 commented on Jun 17, 2025 • 0 new comments
[Core] RFC: simplify CI testing
#34315 commented on Jun 17, 2025 • 0 new comments
[air] Error while loading xgboost model in BatchPredictor
#34307 commented on Jun 17, 2025 • 0 new comments
[RLlib] Unity 3d env tests are broken
#34290 commented on Jun 17, 2025 • 0 new comments
[air/train] the logic to grab free ports for `tf_config` is potentially racy
#34271 commented on Jun 17, 2025 • 0 new comments
[Core][Object Store] Push Manager: round for object manager client and FIFO for object
#34270 commented on Jun 17, 2025 • 0 new comments
[air] xgboost/lightgbm trainer's validation result differ between online and offline
#34211 commented on Jun 17, 2025 • 0 new comments
[tune] support viewing partial experiment result as tuning goes on
#34207 commented on Jun 17, 2025 • 0 new comments
[Workflow] Improve efficiency of Ray Workflow by returning workflow metadata and completed task information in single API call
#34158 commented on Jun 17, 2025 • 0 new comments
Issue on page /rllib/package_ref/algorithm.html
#34157 commented on Jun 17, 2025 • 0 new comments
[Prometheus metrics util] Application level custom metrics aren't getting exported consistently
#34145 commented on Jun 17, 2025 • 0 new comments
[Core] Actors not cleaning up resources correct because `force_kill=true`.
#34124 commented on Jun 17, 2025 • 0 new comments
Ray Tune + ray xgboost running out of disk space
#34118 commented on Jun 17, 2025 • 0 new comments
[Core][Tune]Trials hang when using Pytorch
#34028 commented on Jun 17, 2025 • 0 new comments
[Data] `map_batches` hard to use and debug
#34007 commented on Jun 17, 2025 • 0 new comments
[Core] improve garbage collection after job go out of scope
#34001 commented on Jun 17, 2025 • 0 new comments
[AWS VM Cluster Launcher] AWS Cluster launcher installs nightly Ray by default
#34991 commented on Jun 17, 2025 • 0 new comments
[CI] Fix minimal-install python 3.11: build wheel with unsupported tags.
#34980 commented on Jun 17, 2025 • 0 new comments
[serve][docs] Add DAG building classes to the API reference
#34953 commented on Jun 17, 2025 • 0 new comments
[AIR output] Rich table gets truncated when the terminal height is smaller than it
#34925 commented on Jun 17, 2025 • 0 new comments
[AIR output] Format of trial table with Rich enabled.
#34923 commented on Jun 17, 2025 • 0 new comments
[AIR output] "iteration" is shown in the output for RL users
#34918 commented on Jun 17, 2025 • 0 new comments
[core] ray.kill doesn't guarantee resources are cleaned up
#34917 commented on Jun 17, 2025 • 0 new comments
[Data] Add `fn_kwargs` to `BatchMapper`
#34852 commented on Jun 17, 2025 • 0 new comments
Resource Allocation: Ray Core, Ray Client
#34816 commented on Jun 17, 2025 • 0 new comments
[Jobs] Job agent recovers all running jobs on restart, not just those monitored by that agent
#34794 commented on Jun 17, 2025 • 0 new comments
[Doc] Autogenerated "suggest an edit" link doesn't work
#34751 commented on Jun 17, 2025 • 0 new comments
[Tune] thread limit resulting in the job failure in multi-tenancy usage
#34745 commented on Jun 17, 2025 • 0 new comments
Ray Job
#34710 commented on Jun 17, 2025 • 0 new comments
[docs][infra] automate checks for common link errors
#34681 commented on Jun 17, 2025 • 0 new comments
[Ray Job] Auto-shutdown of the cluster when job finished
#34672 commented on Jun 17, 2025 • 0 new comments
[Core] Ray.wait should return if task throw exception
#34653 commented on Jun 17, 2025 • 0 new comments
[Core] ray2.3.1 gcs_server memory keeps increasing until OOM
#34619 commented on Jun 17, 2025 • 0 new comments
[Runtime Env/Ray Job] Job submission fails when specifing local zip file as working dir
#34605 commented on Jun 17, 2025 • 0 new comments
why ray.data.read_images cat not combine_chunks
#34563 commented on Jun 17, 2025 • 0 new comments
[Core] Add support for cancelling descendants of a completed task
#34545 commented on Jun 17, 2025 • 0 new comments
[Data] retrieve written paths from `Dataset.write_datasource`
#34444 commented on Jun 17, 2025 • 0 new comments
[Docs Infra] [RLLib] Remove "<<<" from code blocks
#34439 commented on Jun 17, 2025 • 0 new comments
[docs][Bug] Workflow docs have few typos and type issue
#23113 commented on Jun 16, 2025 • 0 new comments
[tune][Feature] add tune.choices to select multiple values from a search space
#23001 commented on Jun 16, 2025 • 0 new comments
[Bug] An exception in a task cannot be caught with ActorPool.map_unordered making restarting meaningless
#22978 commented on Jun 16, 2025 • 0 new comments
Ray Train / Tune - W&B logger documentation
#22881 commented on Jun 16, 2025 • 0 new comments
[Train] update `logdir` relative path
#22753 commented on Jun 16, 2025 • 0 new comments
[Bug][placement groups] Actor scheduling does not respect placement_group=None
#22742 commented on Jun 16, 2025 • 0 new comments
[Cluster snapshot] [Bug] `runtime_env` fields in cluster snapshot are converted to camelcase when they should not be
#22565 commented on Jun 16, 2025 • 0 new comments
[Train] Add flags to disable creating log directories
#22261 commented on Jun 16, 2025 • 0 new comments
[RLlib] [Feature] Support for having parametric action spaces/action masking for continuous action space models
#22259 commented on Jun 16, 2025 • 0 new comments
[RLLib] Workers died at the initialization stage when the observation space is a 3D shape
#22033 commented on Jun 16, 2025 • 0 new comments
[Train] Automatically choose number of workers
#21987 commented on Jun 16, 2025 • 0 new comments
[Serve] The adjustment about Ray Serve Java Proxy and Java Replica
#21694 commented on Jun 16, 2025 • 0 new comments
[C++] Cluster Mode Tests Should have 1 test per feature tested
#21454 commented on Jun 16, 2025 • 0 new comments
[Tune] [Bug] lazily expand directories for client compatibility
#21408 commented on Jun 16, 2025 • 0 new comments
[Tune] Issue on page /tune/tutorials/tune-pytorch-lightning.html
#21354 commented on Jun 16, 2025 • 0 new comments
[Bug] Got stucked when running python script from a shell script
#21298 commented on Jun 16, 2025 • 0 new comments
[Bug] [Tune] pbt run_experiments not stable, some trial will error.
#21259 commented on Jun 16, 2025 • 0 new comments
[Train] Document Callbacks
#21066 commented on Jun 16, 2025 • 0 new comments
[Feature] Single source of truth for Ray version in Java `pom.xml` and `pom_template.xml` files
#21059 commented on Jun 16, 2025 • 0 new comments
[Test Bug] Matching `psutil.Process.name()` doesn't work on macOS
#20982 commented on Jun 16, 2025 • 0 new comments
[Bug] Incorrect promise usage that causes infinite blocking calls
#20899 commented on Jun 16, 2025 • 0 new comments
We encountered the cast exception after we got result from ray actor task
#20369 commented on Jun 16, 2025 • 0 new comments
[Train/AIR] Ray Train actors still use up resources after Notebook cell is stopped
#24947 commented on Jun 16, 2025 • 0 new comments
[Core] Failed to delete named actor in client mode
#24906 commented on Jun 16, 2025 • 0 new comments
[AIR] Add a `reconfigure` option to `ModelWrapperDeployment`
#24869 commented on Jun 16, 2025 • 0 new comments
[core] Uninformative error for unserialisable objects
#24863 commented on Jun 16, 2025 • 0 new comments
[Serve] Prototype C++ Worker in Serve
#24738 commented on Jun 16, 2025 • 0 new comments
[Core] Spilling performance regression in large-scale shuffle
#24667 commented on Jun 16, 2025 • 0 new comments
[Core] Restore objects directly from S3
#24581 commented on Jun 16, 2025 • 0 new comments
[Ray component: Core] Dask on Ray - Worker processes go to idle state and not garbage collected when used with RayProgressBar()
#24556 commented on Jun 16, 2025 • 0 new comments
[runtime env] `serialized_env` used as ID, but identical envs can produce different `serialized_env`
#24515 commented on Jun 16, 2025 • 0 new comments
[Core] No overloads for "remote" match the provided arguments
#24371 commented on Jun 16, 2025 • 0 new comments
Workflows: Type stubs are incorrect: argument missing for parameter status_filter
#24367 commented on Jun 16, 2025 • 0 new comments
[Core] /api/cluster_status treats placement groups differently than ray status
#24309 commented on Jun 16, 2025 • 0 new comments
[Core] Restore worker silently fails and the program is stuck
#24248 commented on Jun 16, 2025 • 0 new comments
[RLlib][Bug] duplicate action unsquashing in DDPG / TD3 policy
#24213 commented on Jun 16, 2025 • 0 new comments
[Tune] support for FIRE PBT
#24137 commented on Jun 16, 2025 • 0 new comments
[Tune] Tune Job hangs out and can't finish the tune job
#23858 commented on Jun 16, 2025 • 0 new comments
[Workflows] Cant use custom storage backends
#23831 commented on Jun 16, 2025 • 0 new comments
[RLlib] Add Option for Custom Sample Preprocessing when Sampling from Replay Buffer
#23815 commented on Jun 16, 2025 • 0 new comments
[Core][Bug] global-scoped actor handles/Ray objects prevents Ray workers from being destructed.
#23677 commented on Jun 16, 2025 • 0 new comments
[runtime env] `zip_directory` `excludes` parameter doesn't work with absolute paths
#23473 commented on Jun 16, 2025 • 0 new comments
[Train] [Feature] Print useful traceback on SIGINT
#23148 commented on Jun 16, 2025 • 0 new comments
[Train] [Docs] Document how to change logging verbosity
#23147 commented on Jun 16, 2025 • 0 new comments
[Train] Refactor `TrainingIterator` result processing logic
#20330 commented on Jun 16, 2025 • 0 new comments
[serve] java api
#16393 commented on Jun 16, 2025 • 0 new comments
[serve] java serve handle
#16392 commented on Jun 16, 2025 • 0 new comments
[serve] java http proxy
#16391 commented on Jun 16, 2025 • 0 new comments
[Shuffle] non-streaming consumed bytes are too low compared to spilled / restored bytes.
#16149 commented on Jun 16, 2025 • 0 new comments
[ray] Multiple concurrent requests to create a named actor crash GCS
#15941 commented on Jun 16, 2025 • 0 new comments
Remove unused util functions for conda environments
#15912 commented on Jun 16, 2025 • 0 new comments
[core] Zero-gpu node shouldn't be marked with accelerator_type resource.
#15878 commented on Jun 16, 2025 • 0 new comments
Cannot using external model with cuda when using ray
#15869 commented on Jun 16, 2025 • 0 new comments
[wheel][doc] Make it easier to access Ray wheels for specific commits
#15765 commented on Jun 16, 2025 • 0 new comments
[rllib]Update the docs about Variable-length / Parametric Action Space
#15710 commented on Jun 16, 2025 • 0 new comments
Odd task scheduling behavior on same node
#15602 commented on Jun 16, 2025 • 0 new comments
Averaging learning curves over repetitions + plotting confidence intervals [Tune]
#15400 commented on Jun 16, 2025 • 0 new comments
AssertionError when using pyinstaller with ray
#15396 commented on Jun 16, 2025 • 0 new comments
[core] Memory leak when using local simulated cluster (long_running_tests/workloads/apex.py)
#15305 commented on Jun 16, 2025 • 0 new comments
[Core] Bad traceback on failure to reconnect to GCS server.
#15235 commented on Jun 16, 2025 • 0 new comments
[metrics] Custom sum metrics have type comment "gauge"
#15150 commented on Jun 16, 2025 • 0 new comments
[core] Actor restart does not work when owner dies and constructor task has dependencies
#15076 commented on Jun 16, 2025 • 0 new comments
[k8s] ray down command does not remove pods which are in evicted state
#14958 commented on Jun 16, 2025 • 0 new comments
[Tune] [Ray Client] tune_cifar10_gluon example fails with Ray Client
#14946 commented on Jun 16, 2025 • 0 new comments
[ray white paper] broken links
#14897 commented on Jun 16, 2025 • 0 new comments
Fix Asyncio Event Metrics on Java
#14715 commented on Jun 16, 2025 • 0 new comments
Add ray.__wheel__ with a link to the wheel to install the same version
#14623 commented on Jun 16, 2025 • 0 new comments
[tsan] Add TSAN CI build that runs basic Python tests
#20080 commented on Jun 16, 2025 • 0 new comments
[tsan] Race in census SetGlobalTags
#20079 commented on Jun 16, 2025 • 0 new comments
[tsan] Race accessing global stats objects
#20078 commented on Jun 16, 2025 • 0 new comments
[tsan] Several global config variables accessed unsafely
#20077 commented on Jun 16, 2025 • 0 new comments
Support working_dir=None for skipping packaging upload/download
#19962 commented on Jun 16, 2025 • 0 new comments
[Bug] Placement group removal refinement
#19937 commented on Jun 16, 2025 • 0 new comments
[Feature] Able to access objects put in cross language
#19873 commented on Jun 16, 2025 • 0 new comments
[Bug] Improve RuntimeEnvSetupError message
#19824 commented on Jun 16, 2025 • 0 new comments
[RLlib] Deprecate Internally Maintained Probability Distributions In Favor Of Native TFP And torch.distributions Solutions
#19725 commented on Jun 16, 2025 • 0 new comments
[Serve] Test KVStore early in constructor init.
#19714 commented on Jun 16, 2025 • 0 new comments
[Bug] [Workflow] ray.wait on workflow result doesn't work as expected
#19295 commented on Jun 16, 2025 • 0 new comments
[tune] MLFlowLogger doesn't save artifacts for remote mlflow tracking_uri
#19263 commented on Jun 16, 2025 • 0 new comments
[Bug] [XLang] Segfault when Java returns void
#18837 commented on Jun 16, 2025 • 0 new comments
[Bug] tensorboardX vs tensorboard?
#18727 commented on Jun 16, 2025 • 0 new comments
Dashboard exposes redis PW on the command line
#18491 commented on Jun 16, 2025 • 0 new comments
[core] ReferenceCountingAssertionError may be thrown if ObjectRef is passed through intermediate worker that dies
#18456 commented on Jun 16, 2025 • 0 new comments
Race condition of grpc backpressure
#18439 commented on Jun 16, 2025 • 0 new comments
[Core] Task spec including inlined objects can crash lease request RPCs.
#18194 commented on Jun 16, 2025 • 0 new comments
[Runtime Env] Setup process doesn't have CPU limit
#18137 commented on Jun 16, 2025 • 0 new comments
ray.init with address crashes process outside of cluster
#17769 commented on Jun 16, 2025 • 0 new comments
new dashboard agent port conflict issues
#17498 commented on Jun 16, 2025 • 0 new comments
[Core] Unable to get actor handle of global named actor created in java from python in Ray 1.4.0
#16436 commented on Jun 16, 2025 • 0 new comments
[Clusters] [KubeRay] problem with pending actors' pods in Kubernetes
#32651 commented on Jun 16, 2025 • 0 new comments
[core] Lock contention when submitting actor task on the client queue
#32595 commented on Jun 16, 2025 • 0 new comments
[Core] Install via `pip` fails, install with `conda` crashes worker and exits
#32423 commented on Jun 16, 2025 • 0 new comments
[Core] "ImportError: No module named ray" when using `ray submit`
#31924 commented on Jun 16, 2025 • 0 new comments
[workflow] memory leakage
#31819 commented on Jun 16, 2025 • 0 new comments
[Clusters] [RLlib] Trainer Object running on Worker node & RolloutWorker running on Head node
#31808 commented on Jun 16, 2025 • 0 new comments
[runtime envs] Ray Client Server failed when starting
#31622 commented on Jun 16, 2025 • 0 new comments
Huge numbers of "deleted" files with open processes left after Ray Tune run
#31556 commented on Jun 16, 2025 • 0 new comments
[Tune] Reenable `zoopt` searcher test after fixes for handling invalid results are included in its next release
#31439 commented on Jun 16, 2025 • 0 new comments
[RLlib] Pytorch multiple optimizers
#31428 commented on Jun 16, 2025 • 0 new comments
In the docker bridge mode, pulling the actor on a non head node fails.
#31308 commented on Jun 16, 2025 • 0 new comments
[CORE] Unable to run celery task containing ray tasks
#31157 commented on Jun 16, 2025 • 0 new comments
[core] Segfaults when restarting Ray multiple times in unit tests with background threads running
#31145 commented on Jun 16, 2025 • 0 new comments
[core] Error with Slurm: No available node types can fulfill resource request {'node:<ip>': 0.01}.
#31135 commented on Jun 16, 2025 • 0 new comments
[RLlib] Not able to save evaluation recording videos
#30949 commented on Jun 16, 2025 • 0 new comments
[Ray Job] SchedulingCancelled for JobSupervisor Actor
#30898 commented on Jun 16, 2025 • 0 new comments
[Ray client] Ray Zombie Process Issue
#30894 commented on Jun 16, 2025 • 0 new comments
[Devprod] Bazel reports an error when compiling as a non-root user
#30885 commented on Jun 16, 2025 • 0 new comments
[release tests] Prometheus metrics collection sometimes takes 15min to run for long_running_node_failures
#30859 commented on Jun 16, 2025 • 0 new comments
[core] Disk full error logging is verbose
#30833 commented on Jun 16, 2025 • 0 new comments
[RLlib] Error when running RLlib
#30412 commented on Jun 16, 2025 • 0 new comments
[Cluster Launcher] `ray dashboard` CLI command does not stop port-forwarding after Ctrl+C
#30385 commented on Jun 16, 2025 • 0 new comments
Ray: Data - Cannot read json its written to s3
#35501 commented on Jun 16, 2025 • 0 new comments
[Core] `OwnerDiedError` if dataset owner actor handle get out of scope
#35262 commented on Jun 16, 2025 • 0 new comments
[VM launcher] Automtically shut down the ec2 machine when I stop ray up in the middle
#35013 commented on Jun 16, 2025 • 0 new comments
[CI] Migrate from flake8 to ruff
#34889 commented on Jun 16, 2025 • 0 new comments
[Core] Incorrect detection of cpus
#34846 commented on Jun 16, 2025 • 0 new comments
[Clusters] - Cannot switch off rsync during Cluster Launch with `ray up`
#34390 commented on Jun 16, 2025 • 0 new comments
Azure autoscaler cannot create additional nodes
#34198 commented on Jun 16, 2025 • 0 new comments
[Core] Error in external storage writing for object spilling
#33913 commented on Jun 16, 2025 • 0 new comments
get_node_to_storage_syncer has an empty docstring
#33841 commented on Jun 16, 2025 • 0 new comments
[ Core ] Correct usage of min/max-worker-port arguments
#33749 commented on Jun 16, 2025 • 0 new comments
Core: nightly builds for macos only include an x86 _raylet.so even though they claim to be universal
#33720 commented on Jun 16, 2025 • 0 new comments
[Core] The resources have minus values in ray status output
#33569 commented on Jun 16, 2025 • 0 new comments
[tune] tqdm/Hyperopt-style TuneReporter for Databricks notebooks
#33519 commented on Jun 16, 2025 • 0 new comments
[Core] Ray client doesn't support `should_capture_child_tasks_in_placement_group` API
#33513 commented on Jun 16, 2025 • 0 new comments
[RLlib] DictFlatteningPreprocessor order is inconsistent leads to invalid mapping of OBS
#33327 commented on Jun 16, 2025 • 0 new comments
[Dashboard] Head node exited unexceptly because of dashboard process exited
#31261 commented on Jun 16, 2025 • 0 new comments
[runtime env] Raise warning when using `runtime_env` with `local_mode=True`
#33260 commented on Jun 16, 2025 • 0 new comments
[Core] `get_runtime_context()` in task fails with unhelpful error "cannot pickle '_thread.lock' object"
#32987 commented on Jun 16, 2025 • 0 new comments
[Train] Benchmark testing on Mosaic Composer with Ray
#32946 commented on Jun 16, 2025 • 0 new comments
[Ray Core] Actor Handles not properly passed to Actors created by other Actors
#32848 commented on Jun 16, 2025 • 0 new comments
[RLlib] A3C has problems with the horizon option removed
#32812 commented on Jun 16, 2025 • 0 new comments
[Core][Object Store] Object Store to manage files in the cluster
#32694 commented on Jun 16, 2025 • 0 new comments
[autoscaler] AWS Single Sign-On support
#30064 commented on Jun 16, 2025 • 0 new comments
[Ray: Core] - Unable to enable TLS on the ray head node
#28534 commented on Jun 16, 2025 • 0 new comments
Dashboard / Jobs RegexMatcher ignores "includes".
#28502 commented on Jun 16, 2025 • 0 new comments
[Core, RLlib] RLlib uses Metal GPU even when told not to
#28385 commented on Jun 16, 2025 • 0 new comments
[Core] Actor methods will be modified for tracing even if tracing is not enabled.
#28293 commented on Jun 16, 2025 • 0 new comments
[Runtime] Improve runtime environment error message when virtualenv version is too old
#28232 commented on Jun 16, 2025 • 0 new comments
[Core] Multi-Threaded Actors are Un-Killable
#28086 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] Assigning None to optional keys leads to failure
#28012 commented on Jun 16, 2025 • 0 new comments
[Core] Can't pickle objects defined in top-level environment
#28000 commented on Jun 16, 2025 • 0 new comments
[Doc] [Serve] Serve Loki monitoring tutorial screenshot has outdated API
#27453 commented on Jun 16, 2025 • 0 new comments
[core] Very slow task scheduling during Dataset.sort on 100TB
#27410 commented on Jun 16, 2025 • 0 new comments
Is Ray going to support Weighted Quantile Sketches or Quantile Sketches?
#27363 commented on Jun 16, 2025 • 0 new comments
[Core] Raylet continually exiting on worker in docker
#26576 commented on Jun 16, 2025 • 0 new comments
Tensorboard with Docker from Ray dashboard, tune tab cannot be accessed
#26325 commented on Jun 16, 2025 • 0 new comments
[RLlib] Eval episode runs forever if Env doesn't terminate properly
#26241 commented on Jun 16, 2025 • 0 new comments
[Ray Client] Using many concurrent client connections results in deadlock/hanging
#26144 commented on Jun 16, 2025 • 0 new comments
[Core] worker died randomly and unexpectedly under heavy workload (Check failed: inner_it->second.mutable_nested()->contained_in_borrowed_ids.erase(id))
#26128 commented on Jun 16, 2025 • 0 new comments
[Core][HA] Actor entries are not deleted from the storage permanently if GCS is crashed.
#26114 commented on Jun 16, 2025 • 0 new comments
Unclear error when using generator tasks
#25836 commented on Jun 16, 2025 • 0 new comments
[Core] SIGSEGV when I run experimental shuffle command.
#25650 commented on Jun 16, 2025 • 0 new comments
[Core][Metrics] Prometheus-client not working with the latest version.
#25523 commented on Jun 16, 2025 • 0 new comments
[core] Scheduler stalls during shuffle reduce stage with 100k concurrent tasks or more
#25412 commented on Jun 16, 2025 • 0 new comments
[AIR] Utilities to go from Predictor to `BatchPredictor` and `ModelWrapperDeployment`
#24977 commented on Jun 16, 2025 • 0 new comments
[Autoscaler][GCP] Autoscaler crashing on GCP with error 404.
#30050 commented on Jun 16, 2025 • 0 new comments
Setting some system configs causes Ray to fail to start
#29841 commented on Jun 16, 2025 • 0 new comments
[Ray Cluster] Assigning all host GPUs into head node without nvidia.com/gpu present
#29753 commented on Jun 16, 2025 • 0 new comments
[gcp] "No such container" error after ray up
#29671 commented on Jun 16, 2025 • 0 new comments
[Tune] Passing a handle to grid search cause trials to get stuck in running and pending mode
#29545 commented on Jun 16, 2025 • 0 new comments
[Serve] `ServeHandles` fail if GCS crashes before first request
#29539 commented on Jun 16, 2025 • 0 new comments
[Core] util.multiprocessing.pool scheduling inefficiencies, blocking behavior in imap and imap_unordered
#29453 commented on Jun 16, 2025 • 0 new comments
[Core] inspect_serializability bug - parent object serializable but bound method not
#29423 commented on Jun 16, 2025 • 0 new comments
[Core] Ray doesn't shutdown properly on KeyboardInterrupt
#29384 commented on Jun 16, 2025 • 0 new comments
[Serve] Unable to upload current working directory
#29354 commented on Jun 16, 2025 • 0 new comments
[core][observability] Improving reliability of memory_summary API call
#29329 commented on Jun 16, 2025 • 0 new comments
InvalidLocationConstraint Message: The specified location-constraint is not valid for storage option
#29309 commented on Jun 16, 2025 • 0 new comments
[Dashboard] A button to shut down the ray cluster from the dashboard UI
#29208 commented on Jun 16, 2025 • 0 new comments
[Core] Worker pool didn't prestart num_cpus workers
#29162 commented on Jun 16, 2025 • 0 new comments
[core] use proto for oom error / node died error in the frontend
#28907 commented on Jun 16, 2025 • 0 new comments
[ray client] surface ray client logs better
#28890 commented on Jun 16, 2025 • 0 new comments
[Backlog][Collective] Facilitate NCCL test in ray cluster
#28860 commented on Jun 16, 2025 • 0 new comments
[core/k8s/GKE] Ray schedules actors on pods/nodes that are shutting down
#28852 commented on Jun 16, 2025 • 0 new comments
[Core] Cannot 'ray list nodes' after setting the environmental variable 'export RAY_ADDRESS="http://127.0.0.1:8265" '
#28847 commented on Jun 16, 2025 • 0 new comments
[AIR] [Tune] Don't add random hash to trial id for single trial
#28830 commented on Jun 16, 2025 • 0 new comments
[core] Object returned by a generator with num_returns="dynamic" should throw an error if reconstruction fails
#28688 commented on Jun 16, 2025 • 0 new comments
[P0] test_submit_cpp_job failed in osx
#28592 commented on Jun 16, 2025 • 0 new comments
Unable to override ray's default logging format
#6965 commented on Jun 16, 2025 • 0 new comments
MADDPG used onto a MultiEnv does not show learning.
#6949 commented on Jun 16, 2025 • 0 new comments
How to throttle process to avoid "UnreconstructableError"
#6892 commented on Jun 16, 2025 • 0 new comments
pip install from source requires --editable/-e flag
#6845 commented on Jun 16, 2025 • 0 new comments
[scheduling] Default actor lifetime resources (0 CPUs) cause cluster not to be saturated
#6814 commented on Jun 16, 2025 • 0 new comments
How to Reduce Memory Usage for Creating Actor?
#6778 commented on Jun 16, 2025 • 0 new comments
Reconstruction semantics around failing actor constructor.
#6768 commented on Jun 16, 2025 • 0 new comments
[Deploy]Ray on Yarn Deployment
#6753 commented on Jun 16, 2025 • 0 new comments
failed on virtualnevironment
#6735 commented on Jun 16, 2025 • 0 new comments
Managing memory during long loops
#6717 commented on Jun 16, 2025 • 0 new comments
Not able to reproduce speed performance improvements using ray on my machine
#6716 commented on Jun 16, 2025 • 0 new comments
[tune] Logs don't sync up to workers on restore
#6702 commented on Jun 16, 2025 • 0 new comments
The remote_function.options is not documented.
#6699 commented on Jun 16, 2025 • 0 new comments
[tune] More robust checkpoint garbage collection
#6697 commented on Jun 16, 2025 • 0 new comments
Fault tolerance to dead actors
#6670 commented on Jun 16, 2025 • 0 new comments
ray.wait's num_returns should not fail if num_returns > len(results)
#6667 commented on Jun 16, 2025 • 0 new comments
Parallel execution of multiple dataframes by dividing them into sub-frames
#6640 commented on Jun 16, 2025 • 0 new comments
Batch Norm example failing under APEX
#6638 commented on Jun 16, 2025 • 0 new comments
limiting tensorflow memory failed in actor or function
#6633 commented on Jun 16, 2025 • 0 new comments
Remote function is executed in python `exec` with empty local/global will fails
#6620 commented on Jun 16, 2025 • 0 new comments
[tune] Estimate timing
#6618 commented on Jun 16, 2025 • 0 new comments
[streaming] Add micro batching feature
#6607 commented on Jun 16, 2025 • 0 new comments
Proper way of calling a class method in another method
#7450 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Provide ability to provide elastic ip when launching cluster
#7446 commented on Jun 16, 2025 • 0 new comments
[core] Get IP Address of Actor
#7431 commented on Jun 16, 2025 • 0 new comments
What are system requirements for building on Mac OSX
#7430 commented on Jun 16, 2025 • 0 new comments
Ray dashboard integration
#7383 commented on Jun 16, 2025 • 0 new comments
Do not suggest calling __ray_terminate__ directly
#7382 commented on Jun 16, 2025 • 0 new comments
ray.services.get_node_ip_address doesn't work well if there is a local proxy
#7316 commented on Jun 16, 2025 • 0 new comments
Provide abstraction/interface to implement resource isolation for custom resources
#7204 commented on Jun 16, 2025 • 0 new comments
[cross-language]Problem about cross language data layout
#7191 commented on Jun 16, 2025 • 0 new comments
Documentation for connecting to ray cluster could be improved
#7186 commented on Jun 16, 2025 • 0 new comments
ray.experimental.queue is very slow
#7172 commented on Jun 16, 2025 • 0 new comments
Using asserts for argument checks is probably a bad idea
#7171 commented on Jun 16, 2025 • 0 new comments
Ray Issue: The class state is never hold by passing it to the remote function/actors if the class is defined in separate files
#7160 commented on Jun 16, 2025 • 0 new comments
Docs on Cython extensions and install requirements
#7094 commented on Jun 16, 2025 • 0 new comments
[core] Gets timeout on randomly generated ObjectIDs
#7074 commented on Jun 16, 2025 • 0 new comments
Allow remote functions to require running on a fresh worker
#7059 commented on Jun 16, 2025 • 0 new comments
How to use Ray with closures?
#7055 commented on Jun 16, 2025 • 0 new comments
The project `setup.py` script doesn't install tools needed by `ci/travis/format.sh`
#6999 commented on Jun 16, 2025 • 0 new comments
Don't run Java or sanitizer tests when only Python changes.
#6992 commented on Jun 16, 2025 • 0 new comments
ray plasma object store connection refused after 24hrs
#6988 commented on Jun 16, 2025 • 0 new comments
Sharing in memory
#6976 commented on Jun 16, 2025 • 0 new comments
[ray] ray on slurm not respecting memory limits
#6968 commented on Jun 16, 2025 • 0 new comments
Package reference should include task & actor APIs
#6566 commented on Jun 16, 2025 • 0 new comments
Python Worker class should have proper constructor and destructor.
#3961 commented on Jun 16, 2025 • 0 new comments
Should not ignore "AttributeError"
#3820 commented on Jun 16, 2025 • 0 new comments
Backend timing statements should be made type safe.
#3341 commented on Jun 16, 2025 • 0 new comments
Make it possible to limit memory usage of processes
#3055 commented on Jun 16, 2025 • 0 new comments
Task submission from local scheduler client is blocking
#2940 commented on Jun 16, 2025 • 0 new comments
Add test for numpy array alignment.
#2937 commented on Jun 16, 2025 • 0 new comments
[bug][serve.llm] AssertionError: failed to get the hash of the compiled graph (VLM, batch, TP=2)
#53824 commented on Jun 16, 2025 • 0 new comments
Allow ray.get and ray.wait to take in additional argument types
#2126 commented on Jun 16, 2025 • 0 new comments
Remove the import thread from the workers and driver.
#951 commented on Jun 16, 2025 • 0 new comments
Remote decorator fails on jitted function.
#593 commented on Jun 16, 2025 • 0 new comments
Actors do not work properly with subclasses that call super.
#449 commented on Jun 16, 2025 • 0 new comments
TypeError: Descriptors cannot not be created directly.
#36417 commented on Jun 16, 2025 • 0 new comments
[Tune|RLlib] PBT reward drop - not checkpointing or restoring properly
#53831 commented on Jun 16, 2025 • 0 new comments
[rllib] [bug] Official PPO Atari example fails with IndexError
#53836 commented on Jun 16, 2025 • 0 new comments
Methods on actors inherited from built-in classes are not visible
#278 commented on Jun 16, 2025 • 0 new comments
[Core][StreamingGenerator] `ray.get` will hang when the node on which the streaming task is running fails.
#47582 commented on Jun 16, 2025 • 0 new comments
[Serve] `serve.run` can bind the incorrect Application if Deployments have the same name
#53295 commented on Jun 16, 2025 • 0 new comments
[Serve] RayServe Pods Stuck in Unready State Causing API Outages
#53323 commented on Jun 16, 2025 • 0 new comments
[Serve] Support generics for DeploymentHandle type hints
#52654 commented on Jun 16, 2025 • 0 new comments
[Ray Complied Graph] NCCL Internal Error
#49827 commented on Jun 16, 2025 • 0 new comments
[RLlib] Checkpointing fails with CUDA GPU learner using the new API stack
#53793 commented on Jun 16, 2025 • 0 new comments
How to enable tool calling in serve llm?
#53795 commented on Jun 16, 2025 • 0 new comments
Serialization is 20% slower from 0.7.6 -> 0.7.7
#6551 commented on Jun 16, 2025 • 0 new comments
[ray] How to write into numpy arrays in shared memory with Ray?
#6507 commented on Jun 16, 2025 • 0 new comments
Support for mxnet.ndarray?
#6494 commented on Jun 16, 2025 • 0 new comments
[ray] Handle memory pressure more gracefully
#6458 commented on Jun 16, 2025 • 0 new comments
Reloading module changes in workers
#6449 commented on Jun 16, 2025 • 0 new comments
[tune] [serve] Don't use daemon threads
#6421 commented on Jun 16, 2025 • 0 new comments
Terminal freezes after setting @ray.remote(num_gpu=2)
#6418 commented on Jun 16, 2025 • 0 new comments
Ray does not preserve requires_grad attribute
#6405 commented on Jun 16, 2025 • 0 new comments
Ray over mpi for supercomputers
#6344 commented on Jun 16, 2025 • 0 new comments
Support of Ray Decorator for Built in Functions
#6308 commented on Jun 16, 2025 • 0 new comments
[docs] Issue on `tune-schedulers.rst`
#6063 commented on Jun 16, 2025 • 0 new comments
Can I set priority for my tasks
#6057 commented on Jun 16, 2025 • 0 new comments
Avoid putting the redis password in plain text in processlist
#5872 commented on Jun 16, 2025 • 0 new comments
Handling `use_pickle=True` with pickle5 serializer and performance regression
#5856 commented on Jun 16, 2025 • 0 new comments
Install ray with conda but not pip
#5511 commented on Jun 16, 2025 • 0 new comments
[tune] saving mechanism and PBT
#5312 commented on Jun 16, 2025 • 0 new comments
Feature request: An API to wait until there are are X resources available
#5243 commented on Jun 16, 2025 • 0 new comments
[Feature request] Also expose python function after decorating with ray.remote
#4981 commented on Jun 16, 2025 • 0 new comments
Creative action space support: contains method, action interpoalation.
#4837 commented on Jun 16, 2025 • 0 new comments
__module__ can be None
#4758 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Autoscaler UX Issues
#4656 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Add tests that mock endpoints for AWS, GCE
#4303 commented on Jun 16, 2025 • 0 new comments
[metrics] Support filtering logs streamed to driver by actor/task
#12305 commented on Jun 16, 2025 • 0 new comments
[serve] Support more expressive policies for choosing replicas
#12296 commented on Jun 16, 2025 • 0 new comments
[Tune] [PBT] Automatic experiment restart for synch=True
#12122 commented on Jun 16, 2025 • 0 new comments
[tune] [wandb] Experiment checkpointing fails with `WandbTrainableMixin`
#11917 commented on Jun 16, 2025 • 0 new comments
[tune] quniform distribution
#11879 commented on Jun 16, 2025 • 0 new comments
[docs] improve tune distributed tuning guide
#11681 commented on Jun 16, 2025 • 0 new comments
[tune] doc should indicate print output
#11679 commented on Jun 16, 2025 • 0 new comments
[cli] attach `--tmux` should show parallel command output
#11678 commented on Jun 16, 2025 • 0 new comments
[tune] Client API improvements
#11676 commented on Jun 16, 2025 • 0 new comments
[cloudpickle] Too much override for cloudpickle, breaks scikit-learn usage
#11547 commented on Jun 16, 2025 • 0 new comments
[docs] search results don't link to correct tab
#11288 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] Prioritize infeasible bundles and placement group rescheduling
#11259 commented on Jun 16, 2025 • 0 new comments
Remove the `remove_after_get` flag
#10977 commented on Jun 16, 2025 • 0 new comments
[placement groups] Feasibility Check
#10913 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Add unit tests for sdk.py
#10903 commented on Jun 16, 2025 • 0 new comments
Cannot call remote instance method of a superclass from within a different instance method of the superclass
#10899 commented on Jun 16, 2025 • 0 new comments
Add testing to `commands.py`/`NodeUpdaterThread` level
#10846 commented on Jun 16, 2025 • 0 new comments
Treat CPUs as abstract resources
#10818 commented on Jun 16, 2025 • 0 new comments
Installing ray on powerpc
#10774 commented on Jun 16, 2025 • 0 new comments
[core] [docs] use-cases for Ray's async support
#10688 commented on Jun 16, 2025 • 0 new comments
Exceptions and ResourceWarnings on ray.init (Jupyter+offline)
#10279 commented on Jun 16, 2025 • 0 new comments
Can CPU resource scheduling be scheduled through Cgroup?
#10037 commented on Jun 16, 2025 • 0 new comments
Pre-push hooks allow code to be pushed that fails LINT
#14367 commented on Jun 16, 2025 • 0 new comments
__del__ magic method can't access class properties
#14285 commented on Jun 16, 2025 • 0 new comments
Failed to load actor due to dependencies not being pickled
#14284 commented on Jun 16, 2025 • 0 new comments
optimization: Client blocks on releasing references due to detached actor race condition
#14137 commented on Jun 16, 2025 • 0 new comments
[autoscaler] request resources doesn't work with multiple jobs
#13534 commented on Jun 16, 2025 • 0 new comments
[Metrics] Custom metrics don't work after calling `ray.shutdown()` followed by `ray.init()`
#13532 commented on Jun 16, 2025 • 0 new comments
Unify linting of clang-format and *.proto files
#13465 commented on Jun 16, 2025 • 0 new comments
Hang or Deadlock when calling ray.get() inside pytorch Dataset when DataLoader with num_workers >0
#13407 commented on Jun 16, 2025 • 0 new comments
[core] Unwanted pickling behaviour when starting remote actor with @propery
#13365 commented on Jun 16, 2025 • 0 new comments
Explore Protos as the Ray Client pickle transport (instead of namedtuples)
#13280 commented on Jun 16, 2025 • 0 new comments
SIGKILL generates core dumps on some systems
#13221 commented on Jun 16, 2025 • 0 new comments
Object store thrashing if it runs ray.get in a non-main thread.
#12906 commented on Jun 16, 2025 • 0 new comments
Canonicalize the python lint options
#12801 commented on Jun 16, 2025 • 0 new comments
[autoscaler] refactor duplicate code for handling request_resources().
#12699 commented on Jun 16, 2025 • 0 new comments
[Dashboard]Profile Actor Button Not Working
#12668 commented on Jun 16, 2025 • 0 new comments
[core] bytearray is parsed as bytes in remote function
#12648 commented on Jun 16, 2025 • 0 new comments
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-0.01 GB) is less than 0% of total. You can adjust these settings with ray.init(memory=<bytes>, object store memory=<bytes>
#12561 commented on Jun 16, 2025 • 0 new comments
Ray grinds to a halt if both PyTorch and TensorFlow are installed
#12467 commented on Jun 16, 2025 • 0 new comments
Ray does not handle MIG devices
#12413 commented on Jun 16, 2025 • 0 new comments
[tune] progress reporter should limit table to 80char
#12374 commented on Jun 16, 2025 • 0 new comments
[serve] Distributed Tracing Support in Serve
#12320 commented on Jun 16, 2025 • 0 new comments
[metrics] Replace ray timeline with distributed tracing
#12315 commented on Jun 16, 2025 • 0 new comments
Windows debugging on gdb does not work
#9827 commented on Jun 16, 2025 • 0 new comments
[tune] unify run() and run_experiments()
#8127 commented on Jun 16, 2025 • 0 new comments
[ui] More metadata for the task timeline
#8050 commented on Jun 16, 2025 • 0 new comments
[tune] Support for config to (optionally) be an argparse.Namespace?
#8006 commented on Jun 16, 2025 • 0 new comments
[tune] Resource Allocation UX
#7968 commented on Jun 16, 2025 • 0 new comments
`pandas has no attribute 'compat'` Deserialization bug when running tasks very rarely
#7879 commented on Jun 16, 2025 • 0 new comments
"Lost reference to actor" when returning actor handle from actor
#7815 commented on Jun 16, 2025 • 0 new comments
Ray has both ray.util and ray.utils, which is confusing.
#7787 commented on Jun 16, 2025 • 0 new comments
Provide more scheduling algorithms for actors/tasks
#7723 commented on Jun 16, 2025 • 0 new comments
[ray] Object store shared memory numpy leak in worker loop
#7653 commented on Jun 16, 2025 • 0 new comments
Ray processes on slave node become defunct when the head node is restarted/stopped
#7651 commented on Jun 16, 2025 • 0 new comments
Relax python version match requirement when joining a cluster
#7648 commented on Jun 16, 2025 • 0 new comments
Does ray workers could share the same tf.sess?
#7646 commented on Jun 16, 2025 • 0 new comments
About model configuration.
#7644 commented on Jun 16, 2025 • 0 new comments
Probable race condition
#7617 commented on Jun 16, 2025 • 0 new comments
Recursion with pickling in ray.init with py3.5
#7605 commented on Jun 16, 2025 • 0 new comments
Is it possible to create process inside ray Actor?
#7578 commented on Jun 16, 2025 • 0 new comments
Why seems getting from local object store not faster than getting from remote object store?
#7575 commented on Jun 16, 2025 • 0 new comments
[util.multiprocessing] Unable to pass Queue to pool.apply_async
#7561 commented on Jun 16, 2025 • 0 new comments
Keyword arguments should be keyword only arguments in the Ray API
#7548 commented on Jun 16, 2025 • 0 new comments
[Pool] About using ray.util.multiprocessing import Pool
#7542 commented on Jun 16, 2025 • 0 new comments
Reporting Reward Breakdowns
#7518 commented on Jun 16, 2025 • 0 new comments
[config] Introduce a configuration library for unified configuration code
#7485 commented on Jun 16, 2025 • 0 new comments
[util.multiprocessing] Support generators
#9712 commented on Jun 16, 2025 • 0 new comments
[Core] A ray.remote flag for nested object ID gathering in task arguments.
#9489 commented on Jun 16, 2025 • 0 new comments
[docs] ray up <config.xml> --help does not show help
#9455 commented on Jun 16, 2025 • 0 new comments
[docs] Document how to use conda environments with the autoscaler
#9199 commented on Jun 16, 2025 • 0 new comments
[ray] Visualize Ray dashboard locally/offline
#9095 commented on Jun 16, 2025 • 0 new comments
Confusing RedisError when many threads are used
#9083 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Check failed: _s.ok() Heartbeat failed: NotImplemented
#8883 commented on Jun 16, 2025 • 0 new comments
Can't parallelize non-pickable function with initializer in Pool
#8876 commented on Jun 16, 2025 • 0 new comments
DQN Minibatch Option
#8870 commented on Jun 16, 2025 • 0 new comments
tune: module 'tensorflow' has no attribute __version__ in Ray Trainable since v0.7.7
#8729 commented on Jun 16, 2025 • 0 new comments
Blank redis-password gives wrong message to add node
#8629 commented on Jun 16, 2025 • 0 new comments
absl.logging inside remote tasks does not get printed
#8625 commented on Jun 16, 2025 • 0 new comments
Ability to select a disk for ray workers
#8607 commented on Jun 16, 2025 • 0 new comments
Invalid iterator dereference in TestReconstructionChain (fails in debug mode)
#8587 commented on Jun 16, 2025 • 0 new comments
Can't pickle CudnnModule objects
#8569 commented on Jun 16, 2025 • 0 new comments
Reducing unnecessary process overhead in practice
#8522 commented on Jun 16, 2025 • 0 new comments
[tune]Error in BOHB perhaps caused by different trainable instances running in the same Trial ???
#8455 commented on Jun 16, 2025 • 0 new comments
incompatible with 'msgpack_numpy.patch()' function
#8409 commented on Jun 16, 2025 • 0 new comments
Error connecting to Redis server at 127.0.0.1:35709
#8389 commented on Jun 16, 2025 • 0 new comments
Error while shutting down Ray
#8385 commented on Jun 16, 2025 • 0 new comments
[ray] Pyarmor compatibility
#8365 commented on Jun 16, 2025 • 0 new comments
[ray] Can RAY pause and continue tasks distributed to the cluster's nodes?
#8263 commented on Jun 16, 2025 • 0 new comments
[Bug] Potential deadlock in task scheduling algorithm for placement group resources.
#20051 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Plasma store probably doesn't respect the max shm size.
#14145 commented on Jun 16, 2025 • 0 new comments
Latent bugs in command_runner.py
#14139 commented on Jun 16, 2025 • 0 new comments
[rllib] undocumented behavior of timers/* in progress.csv
#14052 commented on Jun 16, 2025 • 0 new comments
Graceful Placement Group Removal
#14045 commented on Jun 16, 2025 • 0 new comments
Improve Docker manual setup document
#14030 commented on Jun 16, 2025 • 0 new comments
[UX] Allow passing CPU and GPU to actor and task resources.
#13996 commented on Jun 16, 2025 • 0 new comments
Remove cluster_synced_files and file_mounts_sync_continuously
#13967 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Allow to specify max_disk_usage for file system spilling.
#13960 commented on Jun 16, 2025 • 0 new comments
[Dashboard] add actor detail to experimental dashboard
#13875 commented on Jun 16, 2025 • 0 new comments
ray.put() slows down over time.
#13612 commented on Jun 16, 2025 • 0 new comments
[rllib]Action masking with tuple action space
#13592 commented on Jun 16, 2025 • 0 new comments
[dask-on-ray] Remove internal Dask API dependencies from the Dask-on-Ray scheduler.
#13560 commented on Jun 16, 2025 • 0 new comments
[core] GCS doesn't always cancel worker leases for killed actors
#13545 commented on Jun 16, 2025 • 0 new comments
test_autoscaling_policy.py prints out huge pile of JsonErrors
#13433 commented on Jun 16, 2025 • 0 new comments
Remove the RAY_CLIENT_MODE flag now that we don't need it
#13279 commented on Jun 16, 2025 • 0 new comments
[Core] Make CoreWorker more unit-testable
#13268 commented on Jun 16, 2025 • 0 new comments
Test S3 object spilling on multiple nodes with big data (streaming shuffle)
#13222 commented on Jun 16, 2025 • 0 new comments
[core] RAY_HOME path is hardcoded
#13168 commented on Jun 16, 2025 • 0 new comments
[Plasma Store]PlasmaClient::Get() return Status::OK() when timeout
#12995 commented on Jun 16, 2025 • 0 new comments
Add dashboard to bazel target to avoid running manual build commands
#12956 commented on Jun 16, 2025 • 0 new comments
Improve dashboard not found exception
#12955 commented on Jun 16, 2025 • 0 new comments
Cannot save training episodes: "TypeError: Object of type ndarray is not JSON serializable"
#12951 commented on Jun 16, 2025 • 0 new comments
num_cpus not handled correctly when function has a Queue argument
#14863 commented on Jun 16, 2025 • 0 new comments
Make rolling update batch size configurable
#14853 commented on Jun 16, 2025 • 0 new comments
Typed handle to deployments
#14810 commented on Jun 16, 2025 • 0 new comments
[Core] Docs - run data processing examples in CI
#14769 commented on Jun 16, 2025 • 0 new comments
[core] The remote function has been exported 100 times..
#14730 commented on Jun 16, 2025 • 0 new comments
Support `ray status CLUSTER.YAML`
#14549 commented on Jun 16, 2025 • 0 new comments
Support decoupling task/actor interfaces from implementation
#14529 commented on Jun 16, 2025 • 0 new comments
Support specifying container images in runtime_env
#14528 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Error message not being cleared when autoscaler recovers
#14494 commented on Jun 16, 2025 • 0 new comments
[Docs] [tune] WanDB + Ray Integration a bit unclear from the docs
#14478 commented on Jun 16, 2025 • 0 new comments
[tune] TBXLoggerCallback not creating necessary directory
#14437 commented on Jun 16, 2025 • 0 new comments
[autoscaler][interface] Per-node-type docker configs
#14418 commented on Jun 16, 2025 • 0 new comments
[metrics] Add metrics for debugging Dask-on-Ray
#14372 commented on Jun 16, 2025 • 0 new comments
[metrics] Report metrics to be used for debugging load balancing issues
#14369 commented on Jun 16, 2025 • 0 new comments
[metrics] Remove unused or unnecessary metrics.
#14366 commented on Jun 16, 2025 • 0 new comments
When the node is crashed, logs are not accessible.
#14307 commented on Jun 16, 2025 • 0 new comments
[autoscaler] SSH command errors aren't written to monitor.out
#14298 commented on Jun 16, 2025 • 0 new comments
[dashboard] Add resource usage/availability to the dashboard
#14292 commented on Jun 16, 2025 • 0 new comments
[Core] Fix ray::Status <--> gRPC status interplay.
#14278 commented on Jun 16, 2025 • 0 new comments
updating worker nodes show as healthy
#14232 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Use subdirectories to avoid large top level inodes for file spilling
#14166 commented on Jun 16, 2025 • 0 new comments
[tune] Stack Traces with Function API are really hard to parse
#14162 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Improve Read throughput
#12950 commented on Jun 16, 2025 • 0 new comments
[tune] Restarted Trials Use Incorrect Command When Multiple Commands Run on Cluster/Runtime
#12048 commented on Jun 16, 2025 • 0 new comments
[Object spilling] Move LocalObjectManager into the plasma store
#12042 commented on Jun 16, 2025 • 0 new comments
[Object spilling] Improve OutOfMemory handling through better memory bookkeeping in plasma store
#12040 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Use compression to reduce IO cost.
#11992 commented on Jun 16, 2025 • 0 new comments
[Tune] Add more custom Error Types
#11871 commented on Jun 16, 2025 • 0 new comments
Tune report histograms
#11797 commented on Jun 16, 2025 • 0 new comments
[Feature Request] [Tune] Add a special 'evaluation-step' flag to avoid unnecessary lengthy evaluations
#11725 commented on Jun 16, 2025 • 0 new comments
Unable to create ActorHandle for already created inherited classes object list [java][ray]
#11715 commented on Jun 16, 2025 • 0 new comments
Socket connections from GCS stuck in TIME_WAIT after actor death
#11713 commented on Jun 16, 2025 • 0 new comments
[docs] tutorial for autoscaling (really basic version)
#11680 commented on Jun 16, 2025 • 0 new comments
[flaky] test_multi_node/2 is flaky
#11663 commented on Jun 16, 2025 • 0 new comments
[flaky] test_object_manager is flaky
#11661 commented on Jun 16, 2025 • 0 new comments
[Core] Reduce the Redis connection per worker.
#11655 commented on Jun 16, 2025 • 0 new comments
[flaky] gcs_server test is flaky
#11640 commented on Jun 16, 2025 • 0 new comments
Use Pathlib instead of strings in Autoscaler
#11633 commented on Jun 16, 2025 • 0 new comments
[tune] PopulationBasedTraining and Tensorboard HPARAMS
#11612 commented on Jun 16, 2025 • 0 new comments
AWS Security group rule issue
#11601 commented on Jun 16, 2025 • 0 new comments
[dask] Parquet write fails if directory does not exist in advance
#11566 commented on Jun 16, 2025 • 0 new comments
[dask] Object store fills up too quickly in simple processing script
#11565 commented on Jun 16, 2025 • 0 new comments
[dask/tune] Provide an example of using Dask on Ray with Tune
#11564 commented on Jun 16, 2025 • 0 new comments
[tune] tutorial should indicate specific library version that we've tested against.
#11540 commented on Jun 16, 2025 • 0 new comments
[Core] Raylet can schedule tasks from a dead driver.
#11520 commented on Jun 16, 2025 • 0 new comments
Startup log use autoscaler_log.out / err instead of monitor.log
#12884 commented on Jun 16, 2025 • 0 new comments
[New scheduler] Don't assume 1-CPU tasks are feasible
#12870 commented on Jun 16, 2025 • 0 new comments
Turn on Test_reference_counting
#12849 commented on Jun 16, 2025 • 0 new comments
[Core] Locality-aware leasing: Milestone 3 - Spillback
#12815 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] Refactor bin packing routines in autoscaler for code clarity
#12723 commented on Jun 16, 2025 • 0 new comments
[Core] Ray.get(timeout=0) doesn't work
#12680 commented on Jun 16, 2025 • 0 new comments
[core] Is starvation possible for multi-driver on the same cluster?
#12667 commented on Jun 16, 2025 • 0 new comments
GCS server ip error
#12639 commented on Jun 16, 2025 • 0 new comments
[core] Support detached/GCS owned objects
#12635 commented on Jun 16, 2025 • 0 new comments
[autoscaler] respect max_workers per node type when terminating nodes
#12634 commented on Jun 16, 2025 • 0 new comments
[Cluster launcher] Command runner logs are improperly quoted when logged
#12631 commented on Jun 16, 2025 • 0 new comments
permissions on rsync'd files are incorrect on worker nodes, results in inability to update workers
#12630 commented on Jun 16, 2025 • 0 new comments
[tune] Full experiment checkpointing doesn't work with PBT
#12558 commented on Jun 16, 2025 • 0 new comments
New workers are started slowly on a node if running workers >= `num_cpus`
#12525 commented on Jun 16, 2025 • 0 new comments
[tune] get_checkpoint_paths fails due to glob command for .tune_metadata file
#12453 commented on Jun 16, 2025 • 0 new comments
[New scheduler] Implement dynamic resources
#12433 commented on Jun 16, 2025 • 0 new comments
[metrics] Investigate tracing visualization tools
#12314 commented on Jun 16, 2025 • 0 new comments
[metrics] Utility to easily configure logging for a Ray job/actor/task
#12306 commented on Jun 16, 2025 • 0 new comments
`ray dashboard` throws bad exception
#12246 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Tune S3 performance + Add unit tests with moto3
#12232 commented on Jun 16, 2025 • 0 new comments
Duplicated IDs are generated
#12197 commented on Jun 16, 2025 • 0 new comments
[tune/logging] Warning for Tune
#12140 commented on Jun 16, 2025 • 0 new comments
[C++ API] Support cross-lang API with Python/Java
#18149 commented on Jun 16, 2025 • 0 new comments
[helm][kubernetes][test] Add formatting tests for Helm chart
#18125 commented on Jun 16, 2025 • 0 new comments
[workflows] Better message when not init'ed
#18121 commented on Jun 16, 2025 • 0 new comments
resource config is not respected in head_start_ray_commands in cluster.yaml
#18097 commented on Jun 16, 2025 • 0 new comments
[Dask-on-Ray] Propagate Dask-on-Ray scheduler config to (rest of) cluster
#17943 commented on Jun 16, 2025 • 0 new comments
[core] PlacementGroup should be no op for local_mode=True
#17937 commented on Jun 16, 2025 • 0 new comments
Enhance document on Java API
#17820 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Remove the spilled directory upon Sigterm for ray start
#17790 commented on Jun 16, 2025 • 0 new comments
[C++ API] Support non-global named actor
#17734 commented on Jun 16, 2025 • 0 new comments
Cleanup stats/metrics.h
#17679 commented on Jun 16, 2025 • 0 new comments
workflow cli to manage all jobs
#17672 commented on Jun 16, 2025 • 0 new comments
[docs] Tutorial on Pytorch Lightning needs rearranging
#17611 commented on Jun 16, 2025 • 0 new comments
[Serve] Helper functions that are written below the actor class don't work
#17590 commented on Jun 16, 2025 • 0 new comments
Fix circular dependence in workflow's code
#17445 commented on Jun 16, 2025 • 0 new comments
[lineage] Support lineage reconstruction for borrowed ObjectRefs
#17380 commented on Jun 16, 2025 • 0 new comments
Errors during scaling cluster
#17292 commented on Jun 16, 2025 • 0 new comments
Trial is being repeated with the exact same results
#17257 commented on Jun 16, 2025 • 0 new comments
[RFC][Placement groups] Allow tasks to acquire resources in addition to placement group bundle
#17229 commented on Jun 16, 2025 • 0 new comments
runtime env in workflow
#16992 commented on Jun 16, 2025 • 0 new comments
[autoscaler][core] Safe node termination
#16975 commented on Jun 16, 2025 • 0 new comments
[Ray Client] [Usability] Help users spot bandwidth bounded workload
#16966 commented on Jun 16, 2025 • 0 new comments
[docker][Clusters][autoscaler][local] Can't connect to cluster when using docker with ray cluster launcher
#16961 commented on Jun 16, 2025 • 0 new comments
[Feature] rllib + tune metric logging selection
#19816 commented on Jun 16, 2025 • 0 new comments
[RLlib] [documentation] clarify postprocess_fn usage in our doc
#19648 commented on Jun 16, 2025 • 0 new comments
[Feature] [runtime env] Clean up the command arguments in raylet args
#19448 commented on Jun 16, 2025 • 0 new comments
[client] better error message when failing to connect with client
#19371 commented on Jun 16, 2025 • 0 new comments
[SGD] Document best practices for Pipeline epochs
#19323 commented on Jun 16, 2025 • 0 new comments
[workflow] scan_prefix with pages/as geneartor
#19234 commented on Jun 16, 2025 • 0 new comments
[Core][usability] Improve Ray cluster start up time
#19215 commented on Jun 16, 2025 • 0 new comments
[Serve] Don't use `ray.wait()` to drain tracking refs in handle
#19158 commented on Jun 16, 2025 • 0 new comments
Unify internal configs & common datastructures
#19152 commented on Jun 16, 2025 • 0 new comments
Clean up EndpointState
#19148 commented on Jun 16, 2025 • 0 new comments
[Core][Feature] use clang-tidy/format to block usage of std::getenv
#18894 commented on Jun 16, 2025 • 0 new comments
[Feature][workflow] Namespace for workflow
#18818 commented on Jun 16, 2025 • 0 new comments
[Feature][workflow] Resource limit for workflow job
#18780 commented on Jun 16, 2025 • 0 new comments
[Bug] Exception in task leads to truncated error message
#18699 commented on Jun 16, 2025 • 0 new comments
[Bug] Logging config is not propagated to driver
#18660 commented on Jun 16, 2025 • 0 new comments
Enable copy/paste to get correct command for connecting to Ray client
#18513 commented on Jun 16, 2025 • 0 new comments
Ray client suppresses error messages
#18512 commented on Jun 16, 2025 • 0 new comments
[serve] Feature request: timeout or max_retries to limit the time spent waiting for a deployment to complete
#18432 commented on Jun 16, 2025 • 0 new comments
Add workflow.current_step_uuid() function
#18356 commented on Jun 16, 2025 • 0 new comments
[Shuffle] non streaming shuffle 5000 partitions seem to reach the scalability limit
#18333 commented on Jun 16, 2025 • 0 new comments
[tune] atari-impala-large.yaml does not finish gracefully
#18325 commented on Jun 16, 2025 • 0 new comments
[runtime env] eagerly install for task/actor level
#18160 commented on Jun 16, 2025 • 0 new comments
[cli] Support redis password for all ray commands
#16921 commented on Jun 16, 2025 • 0 new comments
[client][core] Have Unified `register_serializer` interface
#15486 commented on Jun 16, 2025 • 0 new comments
[Job submission] Monitor driver
#15480 commented on Jun 16, 2025 • 0 new comments
[Job submission] Java support
#15479 commented on Jun 16, 2025 • 0 new comments
[Job submission] Basic drop job feature
#15478 commented on Jun 16, 2025 • 0 new comments
Ray memory size and object store size not correct on k8s
#15463 commented on Jun 16, 2025 • 0 new comments
Ray status not report correctly after node crashed
#15459 commented on Jun 16, 2025 • 0 new comments
Async actor method hang
#15437 commented on Jun 16, 2025 • 0 new comments
[client] python packages version mismatch fail silently
#15407 commented on Jun 16, 2025 • 0 new comments
[cluster] Make node_ip_address work throughout
#15239 commented on Jun 16, 2025 • 0 new comments
[autoscaler][docs] Explain how the `ray_bootstrap_config` is generated
#15232 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Don't autofill `setup_commands` if head/worker `setup_commands` are used
#15231 commented on Jun 16, 2025 • 0 new comments
[Core] Add gRPC streaming support.
#15219 commented on Jun 16, 2025 • 0 new comments
Optimise for num_workers stucks in the infinite loop
#15168 commented on Jun 16, 2025 • 0 new comments
Ray dies without a proper error message - "Killed", might have to do with pandas
#15165 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Simplify Custom ObjectStore Size
#15147 commented on Jun 16, 2025 • 0 new comments
[Core] Periodical runner can cause heap-use-after-free
#15141 commented on Jun 16, 2025 • 0 new comments
Metric tag keys type inference (Tuple To String)
#15130 commented on Jun 16, 2025 • 0 new comments
Actor task hangs after actor crashes with max_task_retries=0
#15045 commented on Jun 16, 2025 • 0 new comments
AlphaZero torch model doesn't support cuda, only cpu
#14970 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] AWS setup commands hardcodes pip
#14963 commented on Jun 16, 2025 • 0 new comments
Support "dry runs" for deploy() operations
#14936 commented on Jun 16, 2025 • 0 new comments
[Object Spilling] Failing objects that fail to restore many times.
#14921 commented on Jun 16, 2025 • 0 new comments
[Core] [runtime env] Use portable hash function for runtime_env_hash
#16821 commented on Jun 16, 2025 • 0 new comments
[runtime env] Support rescheduling tasks when runtime env creation failed.
#16800 commented on Jun 16, 2025 • 0 new comments
Priority scheduling of jobs
#16782 commented on Jun 16, 2025 • 0 new comments
[C++ API] Completed object reference counting support
#16702 commented on Jun 16, 2025 • 0 new comments
[Core] Programmatic way to access pending tasks for an actor?
#16641 commented on Jun 16, 2025 • 0 new comments
[Core] Erroneous check for size_t underflow
#16626 commented on Jun 16, 2025 • 0 new comments
[Core] Standardize Timestamps across codebase
#16510 commented on Jun 16, 2025 • 0 new comments
[test][MLDataset] Fix test_from_modin
#16357 commented on Jun 16, 2025 • 0 new comments
Example for tuning layer count, dropout probabilities with Transformers
#16340 commented on Jun 16, 2025 • 0 new comments
Ray started in local mode doesn't restore environment variables after shutdown
#16132 commented on Jun 16, 2025 • 0 new comments
[core] ray.remote hides the docstring of the decorated class
#15877 commented on Jun 16, 2025 • 0 new comments
[autoscaler] support rsync option `--include`
#15859 commented on Jun 16, 2025 • 0 new comments
[Placement Group] The bundle_reservation_check_func breaks load code from local
#15840 commented on Jun 16, 2025 • 0 new comments
Contributor docs don't mention running tests via bazel
#15833 commented on Jun 16, 2025 • 0 new comments
[docs] should actor methods always have num_returns value?
#15818 commented on Jun 16, 2025 • 0 new comments
[rfc] Support `ray[aws,gcp,azure]` as an install target
#15725 commented on Jun 16, 2025 • 0 new comments
[rllib] Error while using "count_steps_by": "agent_steps" and misleading documentation
#15708 commented on Jun 16, 2025 • 0 new comments
Ray duplicate data from GPU to CPU when placing an actor on GPU
#15692 commented on Jun 16, 2025 • 0 new comments
[kubernetes] ModuleNotFoundError when executing a task on a remote cluster
#15668 commented on Jun 16, 2025 • 0 new comments
[cross_language] Support Python dictionaries
#15569 commented on Jun 16, 2025 • 0 new comments
[core] detached actor logs are not streamed to successive clients
#15549 commented on Jun 16, 2025 • 0 new comments
Serve Deployment with Reload Option
#15505 commented on Jun 16, 2025 • 0 new comments
Building an executable using Ray and Cx_freeze
#42101 commented on Jun 16, 2025 • 0 new comments
RichProgressBar in PyTorch Lightning only show progress at the very end
#42091 commented on Jun 16, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] Channel errore
#42089 commented on Jun 16, 2025 • 0 new comments
[RLlib] When setting `config.environment(normalize_actions=False)` and using CQL, it raises an error: `AttributeError: 'TorchDiagGaussian' object has no attribute 'sample_logp'`.
#42064 commented on Jun 16, 2025 • 0 new comments
[Workflow] get_metadata() returns RUNNING instead of RESUMABLE status
#41980 commented on Jun 16, 2025 • 0 new comments
Ray IDs vs endianness?
#41961 commented on Jun 16, 2025 • 0 new comments
"RaySystemError: System error: Unknown error"
#41786 commented on Jun 16, 2025 • 0 new comments
[RLlib] Value error while running DQN
#41559 commented on Jun 16, 2025 • 0 new comments
Is there an error in the interface name " cdef cppclass CFunctionDescriptorInterface "ray::CFunctionDescriptorInterface "?
#41553 commented on Jun 16, 2025 • 0 new comments
Saving XGBoost model with json extension
#41374 commented on Jun 16, 2025 • 0 new comments
[data] date32 and datetime64 handling should be the same
#41358 commented on Jun 16, 2025 • 0 new comments
[RLlib] User guides are not ordered
#41340 commented on Jun 16, 2025 • 0 new comments
error installing library
#41223 commented on Jun 16, 2025 • 0 new comments
[core][state][dashboard] Better tasks info GC control at GCS
#41142 commented on Jun 16, 2025 • 0 new comments
[RLLib] External simulator: mean episode reward is NaN due to done not set
#40954 commented on Jun 16, 2025 • 0 new comments
[Core] - Cannot install in tiny core linux
#40832 commented on Jun 16, 2025 • 0 new comments
[Tune|RLlib] Add error-tolerant version of PB2
#40787 commented on Jun 16, 2025 • 0 new comments
NCCL Proxy Call to rank 1 failed - on Cloud VM Docker setup for huggingface distributed ray train script
#40758 commented on Jun 16, 2025 • 0 new comments
[Ray Train] - Add Options to Save Last checkpoint in Ray Train Checkpointing Config
#40503 commented on Jun 16, 2025 • 0 new comments
ray.init() can sometimes hang with a limited range specified for --worker-port-list
#40497 commented on Jun 16, 2025 • 0 new comments
[Core] Dead session not closed
#40482 commented on Jun 16, 2025 • 0 new comments
[RLlib][MBMPO] The algorithm does not learn as intended.
#40400 commented on Jun 16, 2025 • 0 new comments
[Data] Add `delete_dir_contents` parameter to `FileDatasink`
#44794 commented on Jun 16, 2025 • 0 new comments
[RLlib] PPO and framework=tf / issue with latest tensorflow 2.16.1
#44675 commented on Jun 16, 2025 • 0 new comments
[RLlib] PPO reset_config() AttributeError: 'dict' object has no attribute '_enable_new_api_stack'
#44506 commented on Jun 16, 2025 • 0 new comments
[RLlib] ReplayBuffer doesnt work with zero_init_states False when store rnn sequence
#44383 commented on Jun 16, 2025 • 0 new comments
[Cluster, YARN with Skein] Ray cluster keeps crashing when running on YARN via Skein
#44112 commented on Jun 16, 2025 • 0 new comments
[Ray Core] Ray nightly GPU docker image broken on NVIDIA V100 GPUs on AWS
#43565 commented on Jun 16, 2025 • 0 new comments
Using RNN for RL
#43420 commented on Jun 16, 2025 • 0 new comments
from ray.rllib.agents.registry import get_trainer_class ModuleNotFoundError: No module named 'ray.rllib.agents'
#43310 commented on Jun 16, 2025 • 0 new comments
WARNING deprecation.py:50 -- DeprecationWarning: `ray.rllib.execution.train_ops.multi_gpu_train_one_step` has been deprecated. This will raise an error in the future!
#43250 commented on Jun 16, 2025 • 0 new comments
[RLlib] Build_for_inference() in env_runner_v2.py created empty state_out_1 and lead to failure of initiation
#42978 commented on Jun 16, 2025 • 0 new comments
Core: ray.remote raises ValueError when used on torch IterableDataset
#42914 commented on Jun 16, 2025 • 0 new comments
Core: Join zombie subprocesses after task completion
#42913 commented on Jun 16, 2025 • 0 new comments
[Core] SIGSEGV when running Ray
#42868 commented on Jun 16, 2025 • 0 new comments
[Core] Serialisation does not work with classes with `__init_subclass__`
#42823 commented on Jun 16, 2025 • 0 new comments
Problem with YOLOv8 Hyperparameters tuning
#42770 commented on Jun 16, 2025 • 0 new comments
[RLLIB] Passing configuration to Custom Environment in rllib is giving an error
#42753 commented on Jun 16, 2025 • 0 new comments
[RLlib] Algorithms ES, A3C are deprecated and replacement does not exist in python package
#42579 commented on Jun 16, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] Inite state of attention_net.py is empty
#42569 commented on Jun 16, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] KeyError with RNN
#42501 commented on Jun 16, 2025 • 0 new comments
[RLlib] gpu cannot enable
#42388 commented on Jun 16, 2025 • 0 new comments
[<Ray component: Core|RLlib|etc...>] reslink in model
#42333 commented on Jun 16, 2025 • 0 new comments
[RLlib] shape [] in Box action space not supported.
#42199 commented on Jun 16, 2025 • 0 new comments
[Tune] Support for new algorithm: Cost-Aware Pareto Region Bayesian Search (CARBS).
#40356 commented on Jun 16, 2025 • 0 new comments
latest ray microbenchmark fails
#38758 commented on Jun 16, 2025 • 0 new comments
Ray Memory Usage Keeps Increasing even after Manual Garbage Collection
#38730 commented on Jun 16, 2025 • 0 new comments
[docs] Document Tune/Train placement group
#38706 commented on Jun 16, 2025 • 0 new comments
ray/RLlib/offline/estimators
#38357 commented on Jun 16, 2025 • 0 new comments
[Core] gcs_server Failed accept4: Too many open files
#38248 commented on Jun 16, 2025 • 0 new comments
[Core] Segfault from fibers when using streaming/dynamic generator (only happening from test_streaming_generator_exception)
#38167 commented on Jun 16, 2025 • 0 new comments
[RLlib] RL module and PPO implementation
#38012 commented on Jun 16, 2025 • 0 new comments
[RLlib] ray 2.6 relies on tf.bool which does not exist in tensorflow 2.13
#37895 commented on Jun 16, 2025 • 0 new comments
[RLlib] Sampler takes first step before next batch is requested
#37893 commented on Jun 16, 2025 • 0 new comments
[Data] Ray 2.6 created a breaking change in the index of a Modin DataFrame
#37771 commented on Jun 16, 2025 • 0 new comments
[Ray-Java client] Call actor report 'No module named' with py script
#37600 commented on Jun 16, 2025 • 0 new comments
[Core] Ray cpp example, if not call ray::Shutdown when exit, will cause segment fault.
#37596 commented on Jun 16, 2025 • 0 new comments
RLLib: Training Rllib-DDPG with custom environment leads error in Inference.
#37242 commented on Jun 16, 2025 • 0 new comments
[<Ray component: autoscaler>] _load_kubernetes_defaults_config function is not yet made
#37033 commented on Jun 16, 2025 • 0 new comments
[Core] Activate Ray tracing casue error when calling actor method with decorator with wraps. (i propose the possible solution)
#36891 commented on Jun 16, 2025 • 0 new comments
[Core] No dependency on setuptools results in broken build
#36742 commented on Jun 16, 2025 • 0 new comments
[data] Report actual task time and object sizes in Dataset.stats()
#36671 commented on Jun 16, 2025 • 0 new comments
[CI][Docs] Example in Train FAQ is flakey
#36399 commented on Jun 16, 2025 • 0 new comments
Be consistent on whether or not you include a dot at the end of a bullet list element.
#36308 commented on Jun 16, 2025 • 0 new comments
[Core] ray.put and ray.get extremely slow with polars frames
#36068 commented on Jun 16, 2025 • 0 new comments
[Ray Core] There is a Exception error message bug which convert byte array to String.
#35880 commented on Jun 16, 2025 • 0 new comments
System error: Ray has not been started yet. You can start Ray with 'ray.init()'
#35592 commented on Jun 16, 2025 • 0 new comments
[Workflow] Incorrectly set max_calls in options
#40252 commented on Jun 16, 2025 • 0 new comments
[PPOConfig] Utilising new API/models without matching documentation
#40201 commented on Jun 16, 2025 • 0 new comments
[Rllib] Tune locks up when attempting to create an rllib algorithm in a trainable
#40015 commented on Jun 16, 2025 • 0 new comments
[Tune/Air] Memory Leak when using WandbLoggerCallback with Population Based Tuning
#40014 commented on Jun 16, 2025 • 0 new comments
[RLlib] TD3/DDPG doesn't seem to respect action space bounds (at least initially)?
#40002 commented on Jun 16, 2025 • 0 new comments
[RLLIB] Issue with AlphaZero algorithm Stateless CartPole
#39937 commented on Jun 16, 2025 • 0 new comments
[RLLIB] Error in executing StatelessCartPole environment with AlphaZero
#39862 commented on Jun 16, 2025 • 0 new comments
Allow train_loop_config to be a dataclass / pydantic model
#39824 commented on Jun 16, 2025 • 0 new comments
[Core] ResolutionImpossible - Test requirements appear to not fit versions
#39782 commented on Jun 16, 2025 • 0 new comments
Job history is lost when Ray cluster is restarted (via kuberay)
#39764 commented on Jun 16, 2025 • 0 new comments
Ray::Tune::Logger::Tensorboardx
#39741 commented on Jun 16, 2025 • 0 new comments
[Core] Upgrading grpc to 1.57.0 causes perf regressions
#39679 commented on Jun 16, 2025 • 0 new comments
ray failed to register worker when I used vllm
#39618 commented on Jun 16, 2025 • 0 new comments
[rllib] Action space MultiDiscrete([11 5 1 2]) is not supported for DQN
#39571 commented on Jun 16, 2025 • 0 new comments
[RLlib] Support JAX-(numpy)-based envs.
#39528 commented on Jun 16, 2025 • 0 new comments
[RLlib] Ray RLLib Dependencies Version Information
#39405 commented on Jun 16, 2025 • 0 new comments
[RLlib] dreamerv3 causes debug code to be executed when running tune
#39302 commented on Jun 16, 2025 • 0 new comments
[Core] CPP Interface crashes on Ray.Init()
#39252 commented on Jun 16, 2025 • 0 new comments
ValueError: Must set agent_id on policy config
#39246 commented on Jun 16, 2025 • 0 new comments
[Core] Actor retry count is consumed because the task is retried when actor is still alive.
#39110 commented on Jun 16, 2025 • 0 new comments
[Core] Memory Leak
#38877 commented on Jun 16, 2025 • 0 new comments
[Tune] Leaky core concepts in Ray Tune documentation
#38781 commented on Jun 16, 2025 • 0 new comments
[GCS]Support different backend for GCS instead of Redis
#10356 commented on Jun 16, 2025 • 0 new comments
[metrics] Better way of grouping metric definitions
#10341 commented on Jun 16, 2025 • 0 new comments
[Core] WorkerThreadContext semantics are incorrect for async Python actors.
#10324 commented on Jun 16, 2025 • 0 new comments
[ray] Programatically expose the amount of memory available in the object store
#10278 commented on Jun 16, 2025 • 0 new comments
[tune] Improve the serialization diagnoser by providing deeper introspection
#10263 commented on Jun 16, 2025 • 0 new comments
[tune] Usability issues
#10248 commented on Jun 16, 2025 • 0 new comments
[ray] Support mypy
#10244 commented on Jun 16, 2025 • 0 new comments
Removed the following hyperparameter values when logging to tensorboard: ... [tune]
#10166 commented on Jun 16, 2025 • 0 new comments
[dask-on-ray] ValueError on read-only memory
#10124 commented on Jun 16, 2025 • 0 new comments
[cli/docs] Provide example commands in the CLI docstrings.
#10079 commented on Jun 16, 2025 • 0 new comments
[Placement Group] Placement group dashboard
#9775 commented on Jun 16, 2025 • 0 new comments
Ray issue with serializing pytorch objects only when running on 40+ cores
#9752 commented on Jun 16, 2025 • 0 new comments
Ray typing IDE code completion support
#9623 commented on Jun 16, 2025 • 0 new comments
[Cluster][Task Schedule] Remote function is not executing without any errors
#9598 commented on Jun 16, 2025 • 0 new comments
[core] RayConfig does not get set properly after multiple `ray.init` calls
#9545 commented on Jun 16, 2025 • 0 new comments
[New scheduler] Performance optimization
#9487 commented on Jun 16, 2025 • 0 new comments
[New scheduler] Release testing
#9486 commented on Jun 16, 2025 • 0 new comments
Specify network interface to use / RuntimeError: Redis has started but no raylets have registered yet.
#9456 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Autoscaler prints (harmless) errors every 30 mins and then kills the workers in GCP cluster
#9368 commented on Jun 16, 2025 • 0 new comments
[Core] Core Worker Actor Handle GC.
#9342 commented on Jun 16, 2025 • 0 new comments
Graph related applications
#9324 commented on Jun 16, 2025 • 0 new comments
Options Support for Actor Methods
#9296 commented on Jun 16, 2025 • 0 new comments
`ray stop` should not kill all redis-server processes
#11513 commented on Jun 16, 2025 • 0 new comments
[core] Track the number of connection and use shared pool whenever possible for grpc clients.
#11445 commented on Jun 16, 2025 • 0 new comments
ray commandline tools raise exceptions if you forget the YAML config file
#11396 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] Placement group rescheduling over-allocates resources
#11372 commented on Jun 16, 2025 • 0 new comments
[autoscaler] request_resources with partial instance availability leads to workers never shutting down
#11367 commented on Jun 16, 2025 • 0 new comments
how to add two-timescales Learning rate schedule in coustom policy?
#11328 commented on Jun 16, 2025 • 0 new comments
[Autoscaler] Add additional gpu types to util.accelerators
#11160 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Worker node container is not removed after ray down?
#11098 commented on Jun 16, 2025 • 0 new comments
`ray stop` should wait for processes to exit
#10955 commented on Jun 16, 2025 • 0 new comments
[autoscaler] node type preferences
#10929 commented on Jun 16, 2025 • 0 new comments
Private/onprem clusters always need explicit ssh_private_key in docker
#10838 commented on Jun 16, 2025 • 0 new comments
[docs] Add examples for using custom resources
#10808 commented on Jun 16, 2025 • 0 new comments
Autoscaler should set RAY_ADDRESS environment variable
#10752 commented on Jun 16, 2025 • 0 new comments
Stop using `file_mounts` for ray_bootstrap_config & ray_bootstrap_key
#10743 commented on Jun 16, 2025 • 0 new comments
[Java] Remove Java 9/10/11 warnings
#10673 commented on Jun 16, 2025 • 0 new comments
[Documentation] need for default_resource_requests when using custom train function
#10572 commented on Jun 16, 2025 • 0 new comments
[rllib] action from policy with Tuple action space has wrong shape
#10516 commented on Jun 16, 2025 • 0 new comments
[tune] String summarization/representations for user objects
#10489 commented on Jun 16, 2025 • 0 new comments
[tune] Add regression test for avoiding extraneous output
#10485 commented on Jun 16, 2025 • 0 new comments
[GCS]Remove tightly coupled Redis code path from Python
#10359 commented on Jun 16, 2025 • 0 new comments
[GCS]Support Sharding GCS server
#10358 commented on Jun 16, 2025 • 0 new comments
[GCS]Support Multi-threaded GCS server.
#10357 commented on Jun 16, 2025 • 0 new comments
[ray] constant memory usage increase of actor using actor handle.
#9232 commented on Jun 16, 2025 • 0 new comments
Code coverage tracker
#5473 commented on Jun 16, 2025 • 0 new comments
[ray] ray misuse gpu in docker container
#5245 commented on Jun 16, 2025 • 0 new comments
On a background thread, `ray.wait` doesn't timeout until another method on the actor is called
#4934 commented on Jun 16, 2025 • 0 new comments
Ray is not propagating variable types correctly
#4463 commented on Jun 16, 2025 • 0 new comments
[tune] Support nesting grid_search in lambdas
#3466 commented on Jun 16, 2025 • 0 new comments
Retry policy when a worker crashes: a hook missing?
#2635 commented on Jun 16, 2025 • 0 new comments
Task introspection
#2617 commented on Jun 16, 2025 • 0 new comments
ray start does not restart failed processes
#2587 commented on Jun 16, 2025 • 0 new comments
CI test linux://rllib:learning_tests_cartpole_dqn_gpu is flaky
#46683 commented on Jun 16, 2025 • 0 new comments
[rllib] flattening error in gym.spaces.Sequence
#45563 commented on Jun 16, 2025 • 0 new comments
cannot import name 'EPISODE_RETURN_MEAN' from 'ray.rllib.utils.metrics'
#45453 commented on Jun 16, 2025 • 0 new comments
error: No such option: --torch
#45452 commented on Jun 16, 2025 • 0 new comments
[Core] Unable to run worker with virtual environment without installing dashboard
#45410 commented on Jun 16, 2025 • 0 new comments
[RLlib] How to support gymnasium graph obs space?
#45290 commented on Jun 16, 2025 • 0 new comments
Ray Cluster does not work across multiple docker containers
#45252 commented on Jun 16, 2025 • 0 new comments
[Core] Worker crashes unexpectedly due to frequent triggering of OOM
#45244 commented on Jun 16, 2025 • 0 new comments
Ray Cluster: Failed to create a ray cluster using running container
#45148 commented on Jun 16, 2025 • 0 new comments
[Rllib] Rllib provides wrong state batch size during "bug check" batches on torch custom model
#45131 commented on Jun 16, 2025 • 0 new comments
[RLlib] ValueError in initialization of ImpalaTF2Policy
#45050 commented on Jun 16, 2025 • 0 new comments
[core] GcsSubscriber hangs in shutdown if the connection broke on MacOS
#45044 commented on Jun 16, 2025 • 0 new comments
Workflow: Reading workflow status can lead to corrupted json reads.
#45027 commented on Jun 16, 2025 • 0 new comments
[Core] `ray.wait` not actually wait until ready when the task is longer than 12 days
#44909 commented on Jun 16, 2025 • 0 new comments
Invalid memory access in RedisAsioClient/RedisAsyncContext on shutdown
#9074 commented on Jun 16, 2025 • 0 new comments
Performance issue with many large tasks on 10 node cluster.
#8950 commented on Jun 16, 2025 • 0 new comments
Ray Dashboard Head-node CLI [autoscaler]
#8450 commented on Jun 16, 2025 • 0 new comments
Support TPUs across all of Ray
#8260 commented on Jun 16, 2025 • 0 new comments
[core] ray.init does not work if run in a node with external ip while the cluster is started internally
#8244 commented on Jun 16, 2025 • 0 new comments
[docs][autoscaler] additional dependencies needs to be mentioned to build your own autoscaler image
#8235 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Api instead of CLI to interact with cluster.
#8036 commented on Jun 16, 2025 • 0 new comments
Incorrect unreconstructable error message and raise different exception.
#7804 commented on Jun 16, 2025 • 0 new comments
ray.wait hangs with no warning or error when local object store is too small to receive object
#7802 commented on Jun 16, 2025 • 0 new comments
Segmentation Fault when using multiprocessing.Queue
#7793 commented on Jun 16, 2025 • 0 new comments
ray.wait with local_mode=True blocks for a very long time
#7741 commented on Jun 16, 2025 • 0 new comments
Awesome: algorithm selection helper & diagrams
#7722 commented on Jun 16, 2025 • 0 new comments
Ray hangs when machine is disconnected from network
#7696 commented on Jun 16, 2025 • 0 new comments
[docs] Clarify that in K8s the jobs need to be launched from the workers
#7188 commented on Jun 16, 2025 • 0 new comments
[ray] tasks running in docker containers are not stopped on local cluster
#6898 commented on Jun 16, 2025 • 0 new comments
[dist] Release notes for Java And other Languages
#6608 commented on Jun 16, 2025 • 0 new comments
Ray.wait causes node to hang if there are too many object ids
#6403 commented on Jun 16, 2025 • 0 new comments
Performance issues with defining remote functions and actor classes from within tasks.
#6240 commented on Jun 16, 2025 • 0 new comments
TypeError: can't pickle CudnnModule objects
#5947 commented on Jun 16, 2025 • 0 new comments
Profiling ray tasks includes ray initialization time
#5832 commented on Jun 16, 2025 • 0 new comments
Make it possible to see resource deadlocks through web UI.
#5789 commented on Jun 16, 2025 • 0 new comments
[autoscaler] Raise better error message if `ssh_user` is not correct
#5772 commented on Jun 16, 2025 • 0 new comments