-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
124 Pull requests merged by 46 people
-
[Data] Prefetch data for
PandasJSONDatasource
#54667 merged
Jul 17, 2025 -
[cpp] explicitly add header files as bazel deps
#54686 merged
Jul 17, 2025 -
[java] move copy_pom_file rule up
#54692 merged
Jul 17, 2025 -
[doc] update links for KubeRay 1.4.2
#54687 merged
Jul 17, 2025 -
[train][tune] v1 CheckpointManager only stores scoring metric if env var set
#54642 merged
Jul 17, 2025 -
[Data] node_trackers init file
#54665 merged
Jul 17, 2025 -
[core][gpu-objects] Avoid triggering a KeyError by the GPU object GC callback for intra-actor communication
#54556 merged
Jul 17, 2025 -
fix controller recovery test on windows
#54645 merged
Jul 16, 2025 -
[ci] fix merge conflict on buildifer format
#54685 merged
Jul 16, 2025 -
[ci] updating pre commit config - updating buildifier paths
#54593 merged
Jul 16, 2025 -
[deps] upgrade polars to 1.31.0
#54653 merged
Jul 16, 2025 -
[Core] Add core as code owner for more files
#54669 merged
Jul 16, 2025 -
[serve.llm] Handle Mistral tekken tokenizer
#54666 merged
Jul 16, 2025 -
[core] adding additional stats to the dump object store usage api.
#53856 merged
Jul 16, 2025 -
fix closing tags for div
#54668 merged
Jul 16, 2025 -
[serve] move test from test_grpc to test_proxy
#53933 merged
Jul 16, 2025 -
[llm][serve] Update ray-llm docker to ucx-1.18.1 nixl-0.3.1
#54598 merged
Jul 16, 2025 -
Train Benchmark: Add preserver_order
#54474 merged
Jul 16, 2025 -
[RLlib; docs] Docs do-over (new API stack):
ConnectorV2
documentation (part II).#54313 merged
Jul 16, 2025 -
[core][oneevent/01] gcs AddEvent: scaffolding
#54609 merged
Jul 16, 2025 -
[llm] add hf_transfer to dependencies by default
#54643 merged
Jul 16, 2025 -
Add perf metrics for 2.48.0
#54647 merged
Jul 16, 2025 -
[ci] raydepsets: moving uv binary
#54641 merged
Jul 16, 2025 -
[ci] verifying all llm lock files
#54640 merged
Jul 16, 2025 -
[Core] Fix missing brace and add pytest for AMDGPUAcceleratorManager.get_visible_accelerator_ids_env_var
#54270 merged
Jul 16, 2025 -
[Core] Remove duplicate code
#54637 merged
Jul 15, 2025 -
[release] Add autoscaling test for GKE kuberay pipeline
#54603 merged
Jul 15, 2025 -
[serve] skip autoscaling test
#54631 merged
Jul 15, 2025 -
[Data] Fixed chained inplace assignment to prevent FutureWarning from Pandas
#54486 merged
Jul 15, 2025 -
[doc] fix one comment for experiment-tracking user guide
#54605 merged
Jul 15, 2025 -
[Data] Actor location tracker
#54590 merged
Jul 15, 2025 -
Fix invalid type for progress_reporter parameter of RunConfig
#48439 merged
Jul 15, 2025 -
[Air] Add Video FPS Support for
WandbLoggerCallback
+ more customization options for Image and Video#53638 merged
Jul 15, 2025 -
[Data] Add streaming executor duration to dashboard
#54614 merged
Jul 15, 2025 -
MCP Ray Serve End to End Example
#54289 merged
Jul 15, 2025 -
Minor Documentation Fixes in Protobuf Files
#53731 merged
Jul 15, 2025 -
[RLlib] Enhance SAC (new API stack) with discrete action support.
#53982 merged
Jul 15, 2025 -
Feat/remove cpu profiler
#54569 merged
Jul 15, 2025 -
[ci] make daily postmerge schedule also no db
#54617 merged
Jul 15, 2025 -
[Core] use RunFnPeriodically for metrics report in GCS server
#54358 merged
Jul 15, 2025 -
[core] fix checking for uv existence during ray_runtime setup
#54141 merged
Jul 15, 2025 -
[RLlib; docs] Docs do-over (new API stack):
ConnectorV2
documentation (part I).#53732 merged
Jul 15, 2025 -
[train] update
datasets
from2.19.1
to3.6.0
#54338 merged
Jul 15, 2025 -
[ci] raydepsets: adding config dataclass and config loading
#54394 merged
Jul 15, 2025 -
cherrypick #54563
#54613 merged
Jul 15, 2025 -
cherrypick #54592
#54612 merged
Jul 15, 2025 -
[Serve] Update HEALTHY_MESSAGE constant location
#54606 merged
Jul 14, 2025 -
[air] fix
test_wandb_logging_actor_fault_tolerance
#54572 merged
Jul 14, 2025 -
Address the case where min_rows_per_group exceeds arrow's default for write_dataset
#54592 merged
Jul 14, 2025 -
[serve] improve observability for flaky test
#54599 merged
Jul 14, 2025 -
[core] Normal task submitter cleanup
#54206 merged
Jul 14, 2025 -
[batch.llm] Fix mocks for unrun tests
#54588 merged
Jul 14, 2025 -
[serve] add timeout to test_cancel_on_http_timeout_during_execution
#54594 merged
Jul 14, 2025 -
migrate ray_option_utils from private to common
#54578 merged
Jul 14, 2025 -
[serve] update receive proxy in replica
#54585 merged
Jul 14, 2025 -
[data] Allocate GPU resources in ResourceManager
#54445 merged
Jul 14, 2025 -
[docs] updating broken links on rllib torch doc
#53161 merged
Jul 14, 2025 -
Fix backpressure gRPC error code
#54537 merged
Jul 14, 2025 -
[Doc] Update Istio service mesh graph and image tag to 2.46.0
#53988 merged
Jul 14, 2025 -
[Core] Minor fixes in GCS health check manager
#54473 merged
Jul 14, 2025 -
[core] Support pip_install_options for pip
#53551 merged
Jul 14, 2025 -
[RLlib, CI] Remove old API stack Unity3D test case.
#54582 merged
Jul 14, 2025 -
[RLlib] - Increased default timesteps on two experiments.
#54185 merged
Jul 14, 2025 -
[RLlib] Switch Offline Data iteration to
iter_torch_batches
.#54277 merged
Jul 14, 2025 -
[ci] disable test db on release auto nightly run
#54563 merged
Jul 14, 2025 -
[Doc] Make the wording more accurate since we not only have Python workers but also C++, Java workers
#54546 merged
Jul 14, 2025 -
[Core] Minor fixes in gcs job manager
#54562 merged
Jul 14, 2025 -
Revert "[serve] reorganize how we handle the http receive task"
#54565 merged
Jul 14, 2025 -
[DOC][Core] fix typo in Anti-pattern.
#54547 merged
Jul 12, 2025 -
use wait_condition for verifying http response
#54522 merged
Jul 12, 2025 -
[serve] reorganize how we handle the http receive task
#54543 merged
Jul 12, 2025 -
[deps] upgrade python protobuf to 4
#54496 merged
Jul 12, 2025 -
cherrypick #54518
#54561 merged
Jul 12, 2025 -
cherrypick #54386
#54560 merged
Jul 12, 2025 -
cherrypick #54511
#54559 merged
Jul 12, 2025 -
cherrypick #54544
#54558 merged
Jul 12, 2025 -
[release] veresion change to 2.48.0
#54557 merged
Jul 12, 2025 -
[Core] Add file_mounts to azure example-minimal config
#54533 merged
Jul 12, 2025 -
[core][telemetry/10] support custom gauge+counter+sum metrics
#53734 merged
Jul 12, 2025 -
[Data] Add Expression Support & with_columns API
#54322 merged
Jul 12, 2025 -
[core][autoscaler] add the missing readonly/example.yaml to the build
#54535 merged
Jul 12, 2025 -
document unexpected queuing behavior in handle
#54542 merged
Jul 12, 2025 -
[train] TrainStateActor periodically checks controller status and sets aborted
#53818 merged
Jul 12, 2025 -
Revert "[core] Default state API address when in a connected worker"
#54549 merged
Jul 12, 2025 -
[Serve.llm] Make llm serve endpoints compatible with vLLM serve frontend (4/N): Refactor LLMServer
#54484 merged
Jul 12, 2025 -
[ci] kick forge refresh
#54544 merged
Jul 11, 2025 -
only print log line once during shutdown
#54534 merged
Jul 11, 2025 -
concat
: Handle mixed Tensor types for structs#54386 merged
Jul 11, 2025 -
[core][gpu-objects] garbage collection
#53911 merged
Jul 11, 2025 -
[wheel] limit build artifacts duplicated in example directory for ray_cpp
#54465 merged
Jul 11, 2025 -
[core] Default state API address when in a connected worker
#54468 merged
Jul 11, 2025 -
[core] enable the v2 autoscaler by default when the cluster is managed by KubeRay
#54518 merged
Jul 11, 2025 -
[Serve] Set the docs path after app is initialized on the replica
#53463 merged
Jul 11, 2025 -
[Doc][Cluster] Update Azure cluster docs
#54517 merged
Jul 11, 2025 -
[ci] bumping uv binary version
#54514 merged
Jul 11, 2025 -
[serve] Fix
test_deploy
on windows#54511 merged
Jul 11, 2025 -
[core][telemetry/09] record sum metric e2e
#53512 merged
Jul 11, 2025 -
[core][telemetry/08-bis] api documentation + improvements
#54472 merged
Jul 11, 2025 -
[release] remove dask from byod 3.9 deps
#54521 merged
Jul 11, 2025 -
[serve] update
test_request_timeout
#54519 merged
Jul 11, 2025 -
[uv] Fix uv run parser for handling extra arguments
#54488 merged
Jul 10, 2025 -
[core][autoscaler] fix: enable cloud_instance_id reusing in autoscaler v2
#54397 merged
Jul 10, 2025 -
[core] Don't order retries for in-order actors to prevent deadlock
#54034 merged
Jul 10, 2025 -
[serve] deflake test_e2e_preserve_prev_replicas
#54513 merged
Jul 10, 2025 -
[Serve] Update timeout to 20 for test_deploy_bad_pip_package_deployment
#54510 merged
Jul 10, 2025 -
increase timeout for wait condition
#54503 merged
Jul 10, 2025 -
[serve] deflake test_replica_metrics_fields
#54493 merged
Jul 10, 2025 -
Feat/add websocket support for di
#54490 merged
Jul 10, 2025 -
[serve] fix
test_standalone_2
#54508 merged
Jul 10, 2025 -
Optimize get_live_deployments
#54454 merged
Jul 10, 2025 -
Feat/fix callback tests
#54507 merged
Jul 10, 2025 -
[Docs] Troubleshooting DeepSeek/multi-node GPU deployment on KubeRay
#54229 merged
Jul 10, 2025 -
split _wrap_user_method_call into _wrap_request and _start_request
#54485 merged
Jul 10, 2025 -
[core] Improve status messages and add comments about stale seq_no handling
#54470 merged
Jul 10, 2025 -
[deps] Allow to call individual functions within install-dependencies
#54502 merged
Jul 10, 2025 -
Updated stalebot to use unstale label instead of bounced.
#54506 merged
Jul 10, 2025 -
[core][telemetry/12] record histogram metric e2e
#53927 merged
Jul 10, 2025 -
[deps] core: drop opencensus-proto test dep
#54497 merged
Jul 10, 2025 -
[runtime env]: Integrating ROCm Systems Profiler to Ray worker process
#48525 merged
Jul 10, 2025
69 Pull requests opened by 47 people
-
[WIP] `TaskExecutionResult`
#54505 opened
Jul 10, 2025 -
[serve.llm] Remove upstreamed workarounds
#54512 opened
Jul 10, 2025 -
Fix bug in http_serve_head by using os.path.realpath instead of inval…
#54523 opened
Jul 11, 2025 -
Fix get actor timeout multiplier
#54525 opened
Jul 11, 2025 -
Handle missing 'chunks' key when Databricks UC query returns zero rows
#54526 opened
Jul 11, 2025 -
[core][raycheck/01] Fix "it != submissible_tasks_.end()"
#54527 opened
Jul 11, 2025 -
[Doc] Update deprecated `evaluation_strategy` parameter to `eval_strategy` in transformers examples
#54528 opened
Jul 11, 2025 -
[core] attempting streaming generator hanging fix
#54529 opened
Jul 11, 2025 -
[Serve] Fix windows test deploy apps flakiness
#54530 opened
Jul 11, 2025 -
[Data] [Draft] introduce per-op config options to disable operator fusion
#54539 opened
Jul 11, 2025 -
[dashboard] fix typos
#54550 opened
Jul 12, 2025 -
[train][checkpoint] CheckpointManager and Worker both count checkpoints
#54555 opened
Jul 12, 2025 -
Feat/fix request replica context
#54566 opened
Jul 12, 2025 -
[Core] Core Worker GetObjStatus GRPC Fault Tolerance
#54567 opened
Jul 12, 2025 -
[ci] use compiled list for install
#54568 opened
Jul 12, 2025 -
[llm.serve] Add unit test for `completions` endpoint
#54570 opened
Jul 12, 2025 -
[RLlib] Fixes Implementation of Shared Encoder
#54571 opened
Jul 13, 2025 -
[data.llm] Allow vLLM deployments to be shared by sequential processors
#54573 opened
Jul 13, 2025 -
test local context
#54574 opened
Jul 14, 2025 -
[Core] Add NodeAffinitySchedulingStrategy Attributes Validation in API Layer
#54577 opened
Jul 14, 2025 -
[Serve.llm] Add LMCacheConnectorV1 support for kv_transfer_config
#54579 opened
Jul 14, 2025 -
[tune][typing] type reset_config to return bool
#54581 opened
Jul 14, 2025 -
[core] Ensure Actor __del__ method invoked on Actor destruction
#54584 opened
Jul 14, 2025 -
[Serve] Fix windows test deploy apps flakiness scratch
#54591 opened
Jul 14, 2025 -
chore: if keypair doesnt exist create one automatically + doc typo fix
#54596 opened
Jul 14, 2025 -
run only one serve windows test
#54597 opened
Jul 14, 2025 -
[ci] raydepsets: subset operation
#54602 opened
Jul 14, 2025 -
[core][oneevent/01] task 01
#54607 opened
Jul 14, 2025 -
[ci] raydepsets: adding expand operation
#54608 opened
Jul 14, 2025 -
[ci] raydepsets: build graph execution
#54610 opened
Jul 14, 2025 -
[core][oneevent/02] gcs AddEvent: TaskDefinition support
#54616 opened
Jul 15, 2025 -
fix(rllib): Correct typo and consistency in pyspiel import error message (#53841)
#54618 opened
Jul 15, 2025 -
test commit
#54619 opened
Jul 15, 2025 -
[RLlib][typing] AlgorithmConfig return-type should be Self instead upper bound
#54620 opened
Jul 15, 2025 -
Bump aiohttp from 3.11.16 to 3.12.14 in /python
#54621 opened
Jul 15, 2025 -
Bump aiohttp from 3.9.5 to 3.12.14 in /release
#54622 opened
Jul 15, 2025 -
[WIP][Data] Update Export API metadata and refresh the dataset/operator state when there is a change
#54623 opened
Jul 15, 2025 -
[data]Add stratify parameter to train_test_split method
#54624 opened
Jul 15, 2025 -
[RLlib; docs] Docs do-over (new API stack): ConnectorV2 documentation (part III).
#54626 opened
Jul 15, 2025 -
[Data] schema warning change
#54630 opened
Jul 15, 2025 -
[Train] Split the TrainingFailedError into WorkerTrainingFailedError and SchedulingTrainingFailedError
#54633 opened
Jul 15, 2025 -
[Data] Add release test for JSONL
#54634 opened
Jul 15, 2025 -
Fix MCP example name + sidebar navigation
#54636 opened
Jul 15, 2025 -
[Serve] Handle autoscaling edge case
#54644 opened
Jul 16, 2025 -
Refactor `EC2InstanceTerminator`
#54646 opened
Jul 16, 2025 -
Train Tests: Remove limit pushdowns for local FS tests
#54649 opened
Jul 16, 2025 -
increase timeout for test standalone to fix windows failure
#54650 opened
Jul 16, 2025 -
[core] Core worker + Cython cleanup unnecessary paths
#54654 opened
Jul 16, 2025 -
[ci] Disable KubeRay release tests based on flag
#54656 opened
Jul 16, 2025 -
add request id in proxy logs in files
#54657 opened
Jul 16, 2025 -
WIP: Allow Dict and Tuple spaces when concatenating using `SingleAgentEpisode`
#54661 opened
Jul 16, 2025 -
[core] support ipv6 in host network mode
#54662 opened
Jul 16, 2025 -
[spark] Log Ray metrics to MLFlow run in Ray-on-Spark
#54663 opened
Jul 16, 2025 -
[Core] minor fixes in GCS actor manager
#54664 opened
Jul 16, 2025 -
[RLlib] Fix bug in `restore_from_path` such that connector states are alos restored on remote `EnvRunner`s.
#54672 opened
Jul 16, 2025 -
[Docs][minor] Delete unused numpy import on Ray Data vLLM frontpage
#54683 opened
Jul 16, 2025 -
criteo full data run script
#54684 opened
Jul 16, 2025 -
[core][oneevent/03] gcs AddEvent: TaskExecution support
#54688 opened
Jul 17, 2025 -
[Do not merge] Auto editor demo
#54689 opened
Jul 17, 2025 -
[core] Remove temp ref increment on HandleGetObjectStatus
#54690 opened
Jul 17, 2025 -
[recsys] full run for criteo
#54691 opened
Jul 17, 2025 -
[Data] Adding row-based metrics
#54693 opened
Jul 17, 2025 -
[Data] Cleaning up `ExecutionResources`
#54694 opened
Jul 17, 2025 -
[Serve.llm] choose better default values for deployment configs so that they are not the bottleneck
#54696 opened
Jul 17, 2025 -
deflake test metrics 2
#54697 opened
Jul 17, 2025 -
[V2][Autoscaler] Fix `_compute_to_launch` rate limiting upscaling on cold start
#54699 opened
Jul 17, 2025 -
update hash tag to support redis cluster with multi shard
#54701 opened
Jul 17, 2025 -
[RLlib] Remove `--enable-new-api-stack` option from all scripts (it's the new default)
#54702 opened
Jul 17, 2025
77 Issues closed by 23 people
-
[RLlib] - `MultiDiscrete` action spaces with different category numbers do not work with `LSTM`.
#54409 closed
Jul 17, 2025 -
CI test windows://python/ray/serve/tests:test_controller_recovery is consistently_failing
#46022 closed
Jul 17, 2025 -
Release test lightgbm_train_batch_inference_benchmark_10G.aws failed
#54674 closed
Jul 17, 2025 -
Release test xgboost_train_batch_inference_benchmark_10G.aws failed
#54673 closed
Jul 17, 2025 -
Release test aggregate_groups_autoscaling_sort_shuffle_pull_based_column02 column14 failed
#54678 closed
Jul 17, 2025 -
Release test aggregate_groups_fixed_size_sort_shuffle_pull_based_column02 column14 failed
#54677 closed
Jul 17, 2025 -
Release test map_groups_autoscaling_sort_shuffle_pull_based_column08 column13 column14 failed
#54681 closed
Jul 17, 2025 -
Release test aggregate_groups_autoscaling_sort_shuffle_pull_based_column08 column13 column14 failed
#54676 closed
Jul 17, 2025 -
Release test map_groups_autoscaling_sort_shuffle_pull_based_column02 column14 failed
#54680 closed
Jul 17, 2025 -
Release test aggregate_groups_fixed_size_sort_shuffle_pull_based_column08 column13 column14 failed
#54675 closed
Jul 17, 2025 -
Release test joins_sf100_inner failed
#54679 closed
Jul 17, 2025 -
CI test linux://rllib:learning_tests_cartpole_dqn_multi_cpu is flaky
#47214 closed
Jul 17, 2025 -
CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_cpu is consistently_failing
#47465 closed
Jul 16, 2025 -
Can't run python unit tests via compiled Ray wheel?
#54451 closed
Jul 16, 2025 -
[serve.llm] LLM serving seems not working with mistral tokenizer.
#53873 closed
Jul 16, 2025 -
ray ignores memory limits and allocated 30 % of memory according to dashboard
#41983 closed
Jul 16, 2025 -
[Core] Tensor Transport GPU Path Not Triggered Due to Missing Cython Constants
#54463 closed
Jul 16, 2025 -
[Serve] reason_content is null returned by llm serve
#53324 closed
Jul 16, 2025 -
Ray build_openai_app Vs Vllm Serve
#52934 closed
Jul 16, 2025 -
[ray|llm] ray lora DiskMultiplexConfig loss load from local path to disk_cache
#53315 closed
Jul 16, 2025 -
[Serve][llm] Make Serve LLM endpoint 100% compatible with the engine's native server.
#53533 closed
Jul 16, 2025 -
[Core] TypeError: RayGaugeWrapper.__init__() got an unexpected keyword argument
#54611 closed
Jul 15, 2025 -
[CI] `linux://python/ray/data:test_consumption` is failing/flaky on master.
#53897 closed
Jul 15, 2025 -
CI test windows://python/ray/tests:test_object_store_metrics is flaky
#49514 closed
Jul 15, 2025 -
ray.exceptions.RayTaskError(CompilationError)
#54399 closed
Jul 15, 2025 -
[Train] Allow customization of FPS for wandb logger; instead of slow 4 FPS
#50186 closed
Jul 15, 2025 -
[Serve] Serve-native CPU profiling in Replicas is broken
#53677 closed
Jul 15, 2025 -
[data][bug] repartition(target_num_rows_per_block) should not be fused with downstream op
#54448 closed
Jul 15, 2025 -
CI test linux://python/ray/air:test_integration_wandb is flaky
#54553 closed
Jul 15, 2025 -
[Core] bug in _install_uv_packages() method of uv runtime backend breaks installing packages in ray runtimes
#54134 closed
Jul 15, 2025 -
[RLLib] Dead links in "Using RLlib with torch 2.x compile" page
#52495 closed
Jul 15, 2025 -
CI test linux://rllib:examples/connectors/flatten_observations_dict_space_impala is flaky
#49754 closed
Jul 15, 2025 -
[serve.llm] vLLM engine became unhealthy under high incoming traffic
#54070 closed
Jul 14, 2025 -
CI test linux://python/ray/tests:test_state_api is flaky
#54541 closed
Jul 14, 2025 -
[Core] Support setting options to the pip install command
#52679 closed
Jul 14, 2025 -
Ray 2.47.1 segfaults on AMD platform during collective communication
#54580 closed
Jul 14, 2025 -
CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_cpu is consistently_failing
#47264 closed
Jul 14, 2025 -
[Azure] Ray up for Azure fails
#48976 closed
Jul 12, 2025 -
CI test linux://python/ray/tests:test_metrics_agent is consistently_failing
#48956 closed
Jul 12, 2025 -
[Core][autoscaler] autoscaler v2 tries to load default configs that do not exist on the image.
#54532 closed
Jul 12, 2025 -
[Ray Data] Filtering function is very slow
#53493 closed
Jul 11, 2025 -
[core][gpu-objects] Garbage collection for in-actor GPU objects
#51262 closed
Jul 11, 2025 -
[core][gpu-objects] Actor sends the same ObjectRef twice to another actor
#51273 closed
Jul 11, 2025 -
[Core] Ray Data job hanging with flooded Cancelling stale RPC with seqno 125 < 127 error
#50814 closed
Jul 11, 2025 -
[core][autoscaler] Enable autoscaler v2 by default when running on KubeRay
#54226 closed
Jul 11, 2025 -
[Serve] refactor serve code that sets `docs_path`
#53023 closed
Jul 11, 2025 -
ray azure does not work out of the box
#52511 closed
Jul 11, 2025 -
[<Ray component: Core|RLlib|etc...>] Anaconda free python in Docker images
#51991 closed
Jul 11, 2025 -
CI test windows://python/ray/serve/tests:test_logging is consistently_failing
#46043 closed
Jul 11, 2025 -
[Serve] FastAPI ingress does not work with composable routers
#50373 closed
Jul 11, 2025 -
[Serve] ingress decorator does not work with fastapi.APIRouter arg
#50372 closed
Jul 11, 2025 -
CI test windows://python/ray/tests:test_node_labels is consistently_failing
#52307 closed
Jul 10, 2025 -
[autoscaler][v2] Autoscaler stops working after the head node recovers with enabled FT
#54353 closed
Jul 10, 2025 -
CI test linux://rllib:examples/metrics/custom_metrics_in_algorithm_training_step is flaky
#51870 closed
Jul 10, 2025 -
[Serve] DeepSeek-R1 mode load stuck in H20
#50975 closed
Jul 10, 2025 -
[data][bug] Dataset execution can be implicitly triggered when passing a dataset to an Actor.
#52549 closed
Jul 10, 2025 -
[Data] Refactor `ParquetDatasink._write_partition_files` to use `pyarrow.parquet.write_to_dataset`
#50502 closed
Jul 10, 2025 -
[Core] ray distributed debugger, always connecting to cluster..
#50682 closed
Jul 10, 2025 -
[Data] __repr__ shouldn't trigger execution
#50361 closed
Jul 10, 2025 -
[data] RefBundle doesn't always eagerly free data
#37910 closed
Jul 10, 2025 -
[data -- read_iceberg] pickling error on UDF for dataset.groupby.map_batches
#54280 closed
Jul 10, 2025 -
[Data] `test_hudi` flakes in CI 25% of the time
#50463 closed
Jul 10, 2025 -
[core] default uv integration breaks runtime env pip packages (when running in existing environments?)
#54344 closed
Jul 10, 2025 -
CI test windows://python/ray/serve/tests:test_deploy is consistently_failing
#46033 closed
Jul 10, 2025 -
CI test linux://python/ray/serve/tests:test_standalone_2_with_compact_scheduling is flaky
#48338 closed
Jul 10, 2025 -
CI test linux://python/ray/serve/tests:test_standalone_2 is flaky
#48403 closed
Jul 10, 2025 -
[Serve] Specify different images for each deployment
#52994 closed
Jul 10, 2025 -
[Serve] Optimize the _get_live_deployments function
#45793 closed
Jul 10, 2025 -
CI test windows://python/ray/serve/tests:test_grpc is flaky
#46028 closed
Jul 10, 2025 -
[Ray serve] Unable to serve meta-llama/Llama-3.1-8B-Instruct
#53663 closed
Jul 10, 2025 -
[Serve] Unable to load meta-llama/Llama-3.3-70B-Instruct
#53571 closed
Jul 10, 2025 -
[Dashboard] Refactor job / node / actor updating code
#16243 closed
Jul 10, 2025 -
[Dashboard][event] Event API in Python.
#16250 closed
Jul 10, 2025 -
[Dashboard][event] Event API in Java.
#16251 closed
Jul 10, 2025
37 Issues opened by 29 people
-
CI test windows://python/ray/serve/tests:test_metrics_2 is flaky
#54698 opened
Jul 17, 2025 -
[Ray Compiled DAG] [ROCm] NOT support compiled dag on ROCm
#54695 opened
Jul 17, 2025 -
[Docs] Broken link to parametric actions
#54671 opened
Jul 16, 2025 -
[llm] Strict JSON validation is not compatible with LiteLLM / OpenAI structured outputs
#54670 opened
Jul 16, 2025 -
[Core] support ipv6 in host network mode
#54660 opened
Jul 16, 2025 -
[RLlib] SingleAgentEpisode is not designed to handle dict observations
#54659 opened
Jul 16, 2025 -
[Windows] Ray dashboard fails to start inside Conda environment — OSError: [WinError 6]
#54658 opened
Jul 16, 2025 -
[<Ray component: Core|RLlib|etc...>]
#54655 opened
Jul 16, 2025 -
[RFC] - [Serve] Support for Asynchonous inference
#54652 opened
Jul 16, 2025 -
Ray is running on an ARM system and an error is reported
#54651 opened
Jul 16, 2025 -
[Data] Support for setting an initial concurrency
#54648 opened
Jul 16, 2025 -
[core] Unify executor threads when enabling/disabling concurrency_groups
#54639 opened
Jul 15, 2025 -
[core][gpu-objects] Support overriding tensor transport in `ray.get`
#54638 opened
Jul 15, 2025 -
[core] Lineage eviction can evict some returned objects but still resubmit the task.
#54628 opened
Jul 15, 2025 -
[core][gpu-objects] Tensor transport "gloo" and collective group "torch_gloo" naming is inconsistent
#54627 opened
Jul 15, 2025 -
[Core] Transient error on RPC `InternalKVExists` causes child task cannot be accessed by parent task
#54625 opened
Jul 15, 2025 -
[core][gpu-objects] Allow users to specify a tensor buffer when materializing GPU object refs
#54615 opened
Jul 15, 2025 -
Specifying a separate GCS address for client connection
#54601 opened
Jul 14, 2025 -
precheck of environment samples invalid action and causes exception
#54600 opened
Jul 14, 2025 -
[CI] Add L4 machine type into machine pool for llm/batch tests
#54589 opened
Jul 14, 2025 -
[Core] update grpcio requirement for Darwin platforms
#54587 opened
Jul 14, 2025 -
[Core] Support for Worker Blacklist to Ignore Workers with Environmental Exceptions
#54576 opened
Jul 14, 2025 -
CI test linux://rllib:env/wrappers/tests/test_unity3d_env is consistently_failing
#54575 opened
Jul 14, 2025 -
[Tune] New Trial status
#54564 opened
Jul 12, 2025 -
CI test linux://python/ray/tests:test_gpu_objects_gloo is flaky
#54552 opened
Jul 12, 2025 -
[Ray-llm on Google Cloud] Ray cannot detect GPU device in ray-llm latest version
#54551 opened
Jul 12, 2025 -
[data] Possible bug / regression in nightly with autoscaling
#54548 opened
Jul 12, 2025 -
[Ray Cluster: Azure provider] Enable automatic keypair creation
#54545 opened
Jul 11, 2025 -
[data] Ray Autoscaling - Suboptimal Performance with Actors
#54540 opened
Jul 11, 2025 -
[Ray Metric Infra] improvement backlogs
#54538 opened
Jul 11, 2025 -
[RLlib] Add RLlibCallback on Checkpoint Creation
#54524 opened
Jul 11, 2025 -
[data] introduce per-op config options
#54520 opened
Jul 10, 2025 -
[Core][Draft] Followup Work on Task Events Buffer
#54515 opened
Jul 10, 2025 -
[Serve] non-blocking reconfigure design
#54509 opened
Jul 10, 2025
237 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[core] Introduce `ShutdownCoordinator` and unified core worker shutdown entry points
#54244 commented on
Jul 15, 2025 • 35 new comments -
[Core] Add Logic to Emit Task Events to Event Aggregator
#53402 commented on
Jul 17, 2025 • 28 new comments -
[core][gpu-objects] Move data transfers to a background thread
#54256 commented on
Jul 16, 2025 • 18 new comments -
Give the option to make `target_max_block_size` nullable
#54450 commented on
Jul 17, 2025 • 16 new comments -
[train] fail fast if pg cannot be met
#54402 commented on
Jul 15, 2025 • 15 new comments -
[core][GPU Objects] Support nixl as tensor transport backend
#54459 commented on
Jul 16, 2025 • 15 new comments -
[core] Add switch for the cache of runtime env
#53775 commented on
Jul 12, 2025 • 12 new comments -
[core]: Use a temporary file to share default worker path in runtime env
#53653 commented on
Jul 16, 2025 • 11 new comments -
[ci] raydepsets: adding compile operation
#54389 commented on
Jul 17, 2025 • 11 new comments -
[core][telemetry/11] record histogram metric e2e
#53740 commented on
Jul 14, 2025 • 9 new comments -
Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API
#53578 commented on
Jul 17, 2025 • 8 new comments -
Relax check_version_info to check for bytecode compatibility
#41373 commented on
Jul 14, 2025 • 8 new comments -
[Core] Add default Ray Node labels at Node init
#53360 commented on
Jul 15, 2025 • 6 new comments -
[train] Use FailurePolicy to handle resize failure
#54257 commented on
Jul 15, 2025 • 6 new comments -
[Dashboard] Add GPU component usage
#52102 commented on
Jul 10, 2025 • 4 new comments -
[data] Update Hudi integration to support incremental query
#54301 commented on
Jul 17, 2025 • 4 new comments -
[data] Return schema of res in aggregations
#54489 commented on
Jul 16, 2025 • 4 new comments -
Show debug log for when aggregators are ready once.
#54483 commented on
Jul 17, 2025 • 4 new comments -
[core][autoscaler][v1] add heartbeat timeout logic to determine node activity status
#54030 commented on
Jul 17, 2025 • 4 new comments -
[Data] Limit operator push down
#54457 commented on
Jul 16, 2025 • 4 new comments -
check if ray is installed when using conda env
#52677 commented on
Jul 16, 2025 • 3 new comments -
[Data] User guide for aggregations
#53568 commented on
Jul 17, 2025 • 3 new comments -
[serve.llm] Refactor/Consolidate LoRA downloading
#53714 commented on
Jul 11, 2025 • 3 new comments -
[core] enable -Wshadow for all c++ targets
#53194 commented on
Jul 15, 2025 • 2 new comments -
[core] Returning a useful message when trying to get logs for a job that has not started yet
#53174 commented on
Jul 15, 2025 • 2 new comments -
[Core] Fixed the bug where the head was unable to submit tasks after redis is turned on.
#54267 commented on
Jul 16, 2025 • 2 new comments -
[Serve] Prioritize stopping most recently scaled-up replicas during downscaling
#52929 commented on
Jul 15, 2025 • 2 new comments -
[serve] refactor call_http_entrypoint
#54253 commented on
Jul 17, 2025 • 1 new comment -
Run Docker builds as non-root user with scoped root access via `sudo`
#54285 commented on
Jul 16, 2025 • 1 new comment -
[Data,Train] Add helpful errors when running forbidden methods on sharded datasets
#52079 commented on
Jul 10, 2025 • 1 new comment -
[core][autoscaler][v1] drop object_store_memory from ResourceDemandScheduler._update_node_resources_from_runtime
#53283 commented on
Jul 15, 2025 • 0 new comments -
[data] add explain interface for dataset
#53235 commented on
Jul 13, 2025 • 0 new comments -
[data] New landing page with better examples that show key workloads
#53228 commented on
Jul 13, 2025 • 0 new comments -
Add progress bars to hash operators
#53175 commented on
Jul 14, 2025 • 0 new comments -
[core] Node manager related cpp cleanup
#52990 commented on
Jul 16, 2025 • 0 new comments -
[core] Core worker get cv - notify after unlock
#53311 commented on
Jul 16, 2025 • 0 new comments -
Kuberay as one implementation of the operator model
#53318 commented on
Jul 16, 2025 • 0 new comments -
[serve.llm] DO NOT REVIEW, IN DRAFT
#53391 commented on
Jul 16, 2025 • 0 new comments -
kuberay edits
#53411 commented on
Jul 16, 2025 • 0 new comments -
[Data] Add support for ray.dataset.map_sql
#53417 commented on
Jul 13, 2025 • 0 new comments -
[Data] add switch for optimizer rules
#53427 commented on
Jul 13, 2025 • 0 new comments -
Bump torch from 2.0.1 to 2.7.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#53447 commented on
Jul 17, 2025 • 0 new comments -
[WIP][Data] Add support for Arrow native fixed-shape tensor type
#53450 commented on
Jul 13, 2025 • 0 new comments -
[Data] Add fillna function
#53459 commented on
Jul 12, 2025 • 0 new comments -
[core] Check if a task can be spilled before checking if args can be pinned
#53462 commented on
Jul 14, 2025 • 0 new comments -
Enable setting OS disk size in Azure
#45867 commented on
Jul 15, 2025 • 0 new comments -
[bazel] move python rules up
#47260 commented on
Jul 11, 2025 • 0 new comments -
[Data] Fix parallelism deriving heuristic to ensure parallelism stays w/in min/max bounds
#47695 commented on
Jul 12, 2025 • 0 new comments -
[DATA]Add custom resources in data autoscaling
#49756 commented on
Jul 17, 2025 • 0 new comments -
[Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on
Jul 17, 2025 • 0 new comments -
[core] Cover cpplint for ray/src/ray/stats
#50678 commented on
Jul 17, 2025 • 0 new comments -
[doc] add jax example
#51040 commented on
Jul 17, 2025 • 0 new comments -
[CI] Replace `black` with `ruff format`
#51332 commented on
Jul 10, 2025 • 0 new comments -
[Core] Cover cpplint for ray/src/ray/common
#51551 commented on
Jul 17, 2025 • 0 new comments -
[Docs][wip] Feature: adopt llms.txt convention
#51605 commented on
Jul 11, 2025 • 0 new comments -
[core] Remove client call tag
#51817 commented on
Jul 14, 2025 • 0 new comments -
[Data] Fix bug where pandas blocks don't use tensor extension
#51868 commented on
Jul 12, 2025 • 0 new comments -
Add new autoscaling parameter `aggregation function`
#51905 commented on
Jul 10, 2025 • 0 new comments -
[core] add ray.util.concurrent.futures.RayExecutor
#51933 commented on
Jul 12, 2025 • 0 new comments -
[Chore][Dashboard] Move DataHead to python/ray/data/ folder
#52013 commented on
Jul 12, 2025 • 0 new comments -
[Data] Make `from_items` lineage serializable
#52026 commented on
Jul 12, 2025 • 0 new comments -
[WIP] Ray Data doc updates
#52062 commented on
Jul 12, 2025 • 0 new comments -
[core] Static Priority scheduling
#52489 commented on
Jul 14, 2025 • 0 new comments -
[Dashboard] Add Worker ID column to Worker table in Node detail page
#52581 commented on
Jul 11, 2025 • 0 new comments -
[ci] try running cicd unit tests in forge env
#52792 commented on
Jul 11, 2025 • 0 new comments -
[Data] remove empty lance read tasks
#52831 commented on
Jul 17, 2025 • 0 new comments -
[core] Add sync get node info to NodeInfoAccessor
#52928 commented on
Jul 10, 2025 • 0 new comments -
[Data] Replace `_MapWorker` name with operator names
#52949 commented on
Jul 15, 2025 • 0 new comments -
[core] Use GetResourceLoadRequest as a substitute liveness check
#52971 commented on
Jul 14, 2025 • 0 new comments -
[Data] Fixing null-safety when converting to `TensorArray`
#52977 commented on
Jul 12, 2025 • 0 new comments -
[Data] Add dropna function
#53464 commented on
Jul 13, 2025 • 0 new comments -
[Data] Fix examples in some Data user guides
#54158 commented on
Jul 13, 2025 • 0 new comments -
[Core] Use Factory method to create gcs KV Manager
#54178 commented on
Jul 12, 2025 • 0 new comments -
[Feat][Core] Don't count task retries due to node preemption
#54182 commented on
Jul 12, 2025 • 0 new comments -
Token-split prefix router
#54187 commented on
Jul 12, 2025 • 0 new comments -
[Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior
#54189 commented on
Jul 11, 2025 • 0 new comments -
[data] allow custom batcher for dataset iteration
#54193 commented on
Jul 13, 2025 • 0 new comments -
[data.llm] Add release test to capture memory leak
#54194 commented on
Jul 15, 2025 • 0 new comments -
[core] [wip] lazy sub
#54220 commented on
Jul 16, 2025 • 0 new comments -
another round of mac debug
#54232 commented on
Jul 17, 2025 • 0 new comments -
[DOC-127] MVP for OSS Ray labels
#54254 commented on
Jul 16, 2025 • 0 new comments -
[core] Refactoring LocalObjectManager to have a cleaner API for pinning objects.
#54255 commented on
Jul 15, 2025 • 0 new comments -
[Core] Fixed the bug where the child process turned into a zombie process.
#54266 commented on
Jul 16, 2025 • 0 new comments -
Enable field documentation with Pydantic
#54306 commented on
Jul 16, 2025 • 0 new comments -
[java] encapsulation + resource immutability for option classes
#54370 commented on
Jul 16, 2025 • 0 new comments -
[core] prevent sending SIGTERM after calling Worker::MarkDead
#54377 commented on
Jul 16, 2025 • 0 new comments -
[release][ci] First test for kuberay release test trigger path
#54415 commented on
Jul 16, 2025 • 0 new comments -
[Core] Avoid copy deque in cluster task manager
#54432 commented on
Jul 17, 2025 • 0 new comments -
Add ray.dataset.write_delta for supporting writes to Delta Lake
#54447 commented on
Jul 16, 2025 • 0 new comments -
[Core] Remove ineffectual TODO comment
#54464 commented on
Jul 12, 2025 • 0 new comments -
Add optional APIType filter to /api/serve/applications/ endpoint
#54478 commented on
Jul 10, 2025 • 0 new comments -
[train] add LightGBMTrainer user guide
#54492 commented on
Jul 12, 2025 • 0 new comments -
[Core] Fix the issue where multiple multithreaded calls to ray.get may cause hanging.
#54495 commented on
Jul 11, 2025 • 0 new comments -
[serve.llm] Pass dimensions of embedding request to vllm engine
#54499 commented on
Jul 15, 2025 • 0 new comments -
[Data] Add option for enabling out-of-order execution to optimize data processing performance
#54504 commented on
Jul 17, 2025 • 0 new comments -
[WIP][Data] Batch query for block_ref_iter
#53485 commented on
Jul 13, 2025 • 0 new comments -
[Data] Add a data compaction function
#53489 commented on
Jul 13, 2025 • 0 new comments -
[data] add Lance-based ordered data conversion that keeps row_id content unchanged
#53542 commented on
Jul 13, 2025 • 0 new comments -
[RLlib] Upgrade RLlink protocol for external env/simulator training.
#53550 commented on
Jul 14, 2025 • 0 new comments -
[Not for Merge] Event Aggregator Perf
#53576 commented on
Jul 17, 2025 • 0 new comments -
[core] Support broadcast and reduce collective for compiled graphs
#53625 commented on
Jul 12, 2025 • 0 new comments -
[data] allow max_calls to be a static but not dynamic option
#53687 commented on
Jul 13, 2025 • 0 new comments -
Bump requests from 2.32.3 to 2.32.4 in /python
#53691 commented on
Jul 12, 2025 • 0 new comments -
[RLlib] Examples folder do-over (vol 53): Learning 2-agent cartpole with global observation, 1 policy outputting all agents' actions, and individual rewards.
#53697 commented on
Jul 16, 2025 • 0 new comments -
[Data] Avoid failing when no `batch_size` is specified while using GPUs as it's perfectly legitimate use-case
#53810 commented on
Jul 13, 2025 • 0 new comments -
Bump tqdm from 4.64.1 to 4.66.3 in /python
#53820 commented on
Jul 17, 2025 • 0 new comments -
[RLlib] Mixin Layer Design Sketch Up
#53850 commented on
Jul 16, 2025 • 0 new comments -
[core] Sleep to debug container test
#53862 commented on
Jul 16, 2025 • 0 new comments -
[core] Move inner_publisher logic into gcsPublisher
#53905 commented on
Jul 17, 2025 • 0 new comments -
[RLlib] Add missing colon to CUBLAS_WORKSPACE_CONFIG
#53913 commented on
Jul 16, 2025 • 0 new comments -
[RLlib] Add missing documentation for SACConfig's training()
#53918 commented on
Jul 15, 2025 • 0 new comments -
[Data] Replaced `get_object_locations` with `get_local_object_locations`
#53942 commented on
Jul 12, 2025 • 0 new comments -
finishing commit for issue #52113
#53964 commented on
Jul 15, 2025 • 0 new comments -
docs(data): fix broken Parameters table
#53972 commented on
Jul 17, 2025 • 0 new comments -
[Serve] Make replica scheduler backoff configurable #52871
#53991 commented on
Jul 16, 2025 • 0 new comments -
[dashboard] Clean up naming for GPU profiling module
#54009 commented on
Jul 15, 2025 • 0 new comments -
update all 'Run on Anyscale' buttons to redirect to respective template preview pages
#54049 commented on
Jul 17, 2025 • 0 new comments -
Add Azure Files support to persistent storage documentation
#54055 commented on
Jul 16, 2025 • 0 new comments -
[data] Remove asserts that test internal `ds._block_num_rows()`
#54109 commented on
Jul 12, 2025 • 0 new comments -
[Serve] Calls to a Serve Deployment's .remote(), hang after some amount of time / requests.
#47870 commented on
Jul 10, 2025 • 0 new comments -
[Data]Fuse operator
#49587 commented on
Jul 10, 2025 • 0 new comments -
[Serve.llm] vLLMDeployment throughput doesn't scale well with `n_replicas`.
#53356 commented on
Jul 10, 2025 • 0 new comments -
[llm] Roadmap for Data and Serve LLM APIs
#51313 commented on
Jul 10, 2025 • 0 new comments -
[serve.llm][Feature request] Adding new models to a multi-gpu multi-model service would require the duplication of all the resources
#51720 commented on
Jul 10, 2025 • 0 new comments -
[LLM] In-place update for deployments when you have new models without having re-deploy the cluster
#51891 commented on
Jul 10, 2025 • 0 new comments -
[LLM/Data] lazy import for transformers
#52632 commented on
Jul 10, 2025 • 0 new comments -
[LLM] We need to create a more robust way of handling actor shutdown
#53179 commented on
Jul 10, 2025 • 0 new comments -
[Serve.llm] Clean up output logs and give option to opt out of different verbosity levels
#53492 commented on
Jul 10, 2025 • 0 new comments -
[serve.llm] Ray LLM serving not respecting max_completion_tokens parameter
#53922 commented on
Jul 10, 2025 • 0 new comments -
[Data] RayData driver process crashes when some worker(pod) been preempted
#52815 commented on
Jul 10, 2025 • 0 new comments -
[core] Improving Ray Typing annotation
#54149 commented on
Jul 10, 2025 • 0 new comments -
[Data] Allow disabling Task Fusion / Documenting how to avoid it
#54433 commented on
Jul 10, 2025 • 0 new comments -
[Core] [Dashboard] Support a way to stream data from the dashboard service to persist externally
#53073 commented on
Jul 11, 2025 • 0 new comments -
Windows VS WSL2
#53924 commented on
Jul 11, 2025 • 0 new comments -
[core] ray.util.state.api.get_actor with timeout = 1s does not work
#54153 commented on
Jul 11, 2025 • 0 new comments -
[Core] ray.init() hangs/fails after "Started a local Ray instance."
#31897 commented on
Jul 11, 2025 • 0 new comments -
[Core] runtime_env: can't update an application installed from gitlab
#44423 commented on
Jul 11, 2025 • 0 new comments -
[Runtime Environment] Remove cached python libs, working dir etc
#47488 commented on
Jul 11, 2025 • 0 new comments -
[Data] Support for SQL/DataFrame capability
#53693 commented on
Jul 11, 2025 • 0 new comments -
[Core] pip runtime env cache by filename instead of the actual file content
#41827 commented on
Jul 11, 2025 • 0 new comments -
[Serve] Different Downscale Delay for Scale to Zero
#52867 commented on
Jul 11, 2025 • 0 new comments -
resource leak in ray/pthon/ray/node.py
#9546 commented on
Jul 11, 2025 • 0 new comments -
[Core|Dataset] Ray job stuck with idle actors with no tasks
#45822 commented on
Jul 11, 2025 • 0 new comments -
[data] Autoscaling ignores disk pressure
#54442 commented on
Jul 11, 2025 • 0 new comments -
[Tune|RLlib] PBT reward drop - not checkpointing or restoring properly
#53831 commented on
Jul 11, 2025 • 0 new comments -
Using ray for LLM inference got errors
#53907 commented on
Jul 12, 2025 • 0 new comments -
CI test linux://python/ray/llm/tests:batch/gpu/processor/test_vllm_engine_proc is consistently_failing
#52074 commented on
Jul 12, 2025 • 0 new comments -
CI test linux://python/ray/llm/tests:batch/gpu/stages/test_vllm_engine_stage is consistently_failing
#52075 commented on
Jul 12, 2025 • 0 new comments -
CI test linux://doc/source/llm/examples/batch:vllm-with-lora is consistently_failing
#50881 commented on
Jul 12, 2025 • 0 new comments -
[Ray dashboard] Random character in ray log viewer
#52346 commented on
Jul 10, 2025 • 0 new comments -
[Dashboard] Hide GPU and GRAM columns from clusters and actors table if there are 0 rows with GPUs.
#49989 commented on
Jul 10, 2025 • 0 new comments -
[Dashboard] Make it easier to figure out the PID of running tasks.
#49988 commented on
Jul 10, 2025 • 0 new comments -
[core][dashboard] Use state API directly for actor name
#34479 commented on
Jul 10, 2025 • 0 new comments -
[Umbrella][core] Add context information for all C++ logs
#52314 commented on
Jul 10, 2025 • 0 new comments -
Raise helpful error message when `ImportError: cannot import name '_psutil_osx`
#28903 commented on
Jul 10, 2025 • 0 new comments -
[Train] [Good First Issue] Bug in sample code in documentation
#54401 commented on
Jul 10, 2025 • 0 new comments -
[Dashboard] Explain Disk usage for KubeRay
#36362 commented on
Jul 10, 2025 • 0 new comments -
[Core][State Observability] Grouping state APIs when `ray --help` is called.
#26376 commented on
Jul 10, 2025 • 0 new comments -
Give a better error message when starting ray on a machine with little memory
#6172 commented on
Jul 10, 2025 • 0 new comments -
Improve error messages for serializing/deserializing remote functions and actor classes.
#5618 commented on
Jul 10, 2025 • 0 new comments -
[Dashboard] Display worker utilization & task list instead of process name.
#14175 commented on
Jul 10, 2025 • 0 new comments -
Running Multiple Applications in Different Containers stuck in status=DEPLOYING
#49540 commented on
Jul 10, 2025 • 0 new comments -
Ray serve + core steaming is slow at high concurrency
#52745 commented on
Jul 10, 2025 • 0 new comments -
refactor serve constants to have a utils
#51036 commented on
Jul 10, 2025 • 0 new comments -
[ray.serve.llm] serve.llm with streaming has overhead compared to vllm-v0 for a single replica when concurrency > 32
#52746 commented on
Jul 10, 2025 • 0 new comments -
[Serve] change the metric tag for the proxy metrics to `route_prefix` for clarity
#52212 commented on
Jul 10, 2025 • 0 new comments -
[serve] Architecture docs mention round-robin and not pow-of-two scheduler
#49292 commented on
Jul 10, 2025 • 0 new comments -
[Serve] Ray Serve Autoscaling supports the configuration of custom-metrics and policy
#51632 commented on
Jul 10, 2025 • 0 new comments -
[Core] - providing `py_executable=uv run` causes failures with unloadable logs
#54275 commented on
Jul 10, 2025 • 0 new comments -
[RayLLM] error helper for TypeError: _extractNVMLErrorsAsClasses..gen_new..new() takes 1 positional argument but 2 were given
#53407 commented on
Jul 10, 2025 • 0 new comments -
[RFC] [Serve] Custom Scaling
#41135 commented on
Jul 10, 2025 • 0 new comments -
[Data]Pylint detection found some Python code defects in ray data
#53881 commented on
Jul 10, 2025 • 0 new comments -
[data] support streaming writes for `write_lance`
#54069 commented on
Jul 10, 2025 • 0 new comments -
[data.llm] Fix AttributeError for the shallow copy of data batch transfer
#54420 commented on
Jul 10, 2025 • 0 new comments -
[Ray Data] Zero Division Error when pyarrow block table nbytes is small and table.num_rows is large, integer casting leads to zero division
#54385 commented on
Jul 10, 2025 • 0 new comments -
[Core] Support general Arrow ExtensionTypes
#51959 commented on
Jul 10, 2025 • 0 new comments -
[Data] Filter operation changes schema of dataset
#51217 commented on
Jul 10, 2025 • 0 new comments -
[Data]Extend Ray Data with read/write hive
#51094 commented on
Jul 10, 2025 • 0 new comments -
CI test windows://python/ray/serve/tests:test_deploy_app is consistently_failing
#46448 commented on
Jul 12, 2025 • 0 new comments -
[RLlib][PPO new-API] Large discrepancy between Algorithm.evaluate() and manual inference via restored EnvToModule/ModuleToEnv pipelines on CarRacing-v3
#53588 commented on
Jul 15, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] Ray Worker on WSL2 behind Corporate VPN Disconnects Consistently After ~30s Despite Verified Bidirectional gRPC Connectivity
#54365 commented on
Jul 15, 2025 • 0 new comments -
[Core|Autoscaler|Kuberay] Cannot parallelize tasks across multiple nodes that depend on a large input arrow table
#54372 commented on
Jul 15, 2025 • 0 new comments -
[Core] Deadlock when trying to use ray.remote(func).remote(args) in the callback of ObjectRef.future()
#54439 commented on
Jul 15, 2025 • 0 new comments -
[Serve] does not work with tracing
#46252 commented on
Jul 16, 2025 • 0 new comments -
[Core] `DeleteObjects` fails silently on transient network failure
#54412 commented on
Jul 16, 2025 • 0 new comments -
[Ray Data] Categorizer throws internal errors during doctest
#50285 commented on
Jul 16, 2025 • 0 new comments -
[data] OOM killer kicks in but vLLM gpu processes are not cleaned up
#54364 commented on
Jul 16, 2025 • 0 new comments -
[Data] Allow parameterized queries in `read_sql`
#54098 commented on
Jul 16, 2025 • 0 new comments -
[Core] Raylet heartbeat misses
#54321 commented on
Jul 16, 2025 • 0 new comments -
[serve.llm] Dimensions api of embedding req does not work for serve.llm
#54498 commented on
Jul 16, 2025 • 0 new comments -
Support Availability Zone Deployment in Azure
#39966 commented on
Jul 16, 2025 • 0 new comments -
[Ray Clusters] Azure subscription id is required in yaml to get-head-ip
#44254 commented on
Jul 16, 2025 • 0 new comments -
Ray Actor Timeout breaks cluster in that workers can no longer be ssh'd
#47953 commented on
Jul 16, 2025 • 0 new comments -
[Data][azure] RayTaskError(AttributeError) Can't get attribute 'CacheOptions._reconstruct' on <module 'pyarrow.lib'
#48592 commented on
Jul 16, 2025 • 0 new comments -
[Clusters][Azure] Custom ARM template for Azure Clusters
#50684 commented on
Jul 16, 2025 • 0 new comments -
[train] Add Azure Files support to persistent storage documentation
#54054 commented on
Jul 16, 2025 • 0 new comments -
[Core] Ray hangs with vllm0.8.5 v1 api for tp8+pp4
#53758 commented on
Jul 16, 2025 • 0 new comments -
[core] max_retry configuration in the task does not take effect.
#54342 commented on
Jul 17, 2025 • 0 new comments -
[RFC] Add resource limit for worker without container
#17596 commented on
Jul 17, 2025 • 0 new comments -
[Proxy] X-Request-ID not output to proxy log file
#54400 commented on
Jul 17, 2025 • 0 new comments -
[Core] Identify Mac M1/M2 GPUs as valid GPUs
#39136 commented on
Jul 17, 2025 • 0 new comments -
[RFC] Improving Ray for Post-Training / RL for LLM Projects
#54021 commented on
Jul 17, 2025 • 0 new comments -
CI test windows://python/ray/serve/tests:test_standalone is consistently_failing
#48420 commented on
Jul 17, 2025 • 0 new comments -
[ci] remove is_automated_build in setup.py
#36547 commented on
Jul 15, 2025 • 0 new comments -
Add Apple silicon GPU(mps) support to ray
#38464 commented on
Jul 14, 2025 • 0 new comments -
Ray IPv6 support
#44252 commented on
Jul 13, 2025 • 0 new comments -
blind try on ubuntu upgrade ..
#45427 commented on
Jul 15, 2025 • 0 new comments -
[RLlib] DreamerV3 on PyTorch.
#45463 commented on
Jul 15, 2025 • 0 new comments -
Running multiple instances (clusters) of ray on the same node with slurm is unstable
#36554 commented on
Jul 13, 2025 • 0 new comments -
[ Core] cannot serialize polars.LazyFrame
#46343 commented on
Jul 13, 2025 • 0 new comments -
[Data] Stratification in train_test_split
#53297 commented on
Jul 13, 2025 • 0 new comments -
[Data] [LLM] Allow vLLM deployments to be shared by sequential processors
#52277 commented on
Jul 13, 2025 • 0 new comments -
[RLlib] PPO algorithm can't be trained from checkpoint
#50136 commented on
Jul 14, 2025 • 0 new comments -
[core] Passing `_spill_on_unavailable=True` with `soft=False` crashes the raylet
#54246 commented on
Jul 14, 2025 • 0 new comments -
[Ray Core] Fallback to non-spot pools
#39861 commented on
Jul 14, 2025 • 0 new comments -
[Core] Ray Label Selector API Implementation Tracker
#51564 commented on
Jul 14, 2025 • 0 new comments -
[Ray Server: Deployment] Failed to update the deployments ['LLMRouter'].
#54500 commented on
Jul 14, 2025 • 0 new comments -
[data] Zero-sized blocks crashes write_bigquery
#51892 commented on
Jul 14, 2025 • 0 new comments -
[Data] Downstream Stages Run Sequentially After Fanout in Ray Data
#54430 commented on
Jul 14, 2025 • 0 new comments -
[Data] Inconsistent Serialization of pd.DataFrame Field in Dataclass When Using asdict with Ray Data
#54428 commented on
Jul 14, 2025 • 0 new comments -
[data] Empty DatabricksUCDatasource provides unhelpful error
#54369 commented on
Jul 14, 2025 • 0 new comments -
[data] Slow fetching of metadata for large number of parquet files
#53995 commented on
Jul 14, 2025 • 0 new comments -
[Data, Train] ray::SplitCoordinator is very slow at every epoch + takes up too much memory
#49190 commented on
Jul 14, 2025 • 0 new comments -
[Data] Aggregation is doing internal conversions that breaks on list-like AggType
#52257 commented on
Jul 14, 2025 • 0 new comments -
[Core] Ray causes a 25% slower GPU performance compared with manually written Multi-processing program on 8 Hopper GPUs
#53799 commented on
Jul 15, 2025 • 0 new comments -
[Core] Stale ray_cluster_<state>_nodes metrics
#50735 commented on
Jul 15, 2025 • 0 new comments -
[Core] [Observability] Add PID to structured logs
#52840 commented on
Jul 15, 2025 • 0 new comments -
[Core] Missing :authority header
#40575 commented on
Jul 15, 2025 • 0 new comments -
Ray Serve Replica Initialization Timeout: STDOUT "Failed to load", RequestCancelledError, Likely Due to Slow/Crashing RLModule.from_checkpoint()
#53079 commented on
Jul 15, 2025 • 0 new comments -
[Serve] `serve.run` can bind the incorrect Application if Deployments have the same name
#53295 commented on
Jul 15, 2025 • 0 new comments -
[Serve] Deadlock when awaiting DeploymentResponse
#54201 commented on
Jul 15, 2025 • 0 new comments -
[Core] Multi-threaded ray.get can hang in certain situations.
#54007 commented on
Jul 15, 2025 • 0 new comments -
[Serve] RayServe Pods Stuck in Unready State Causing API Outages
#53323 commented on
Jul 15, 2025 • 0 new comments -
[Serve] Make replica scheduler backoff configurable
#52871 commented on
Jul 15, 2025 • 0 new comments -
[Core] Make sure Actor's `__del__` method invoked on Actor's destruction
#53169 commented on
Jul 15, 2025 • 0 new comments -
[core][experimental] Accelerated DAG should execute work on actor's main thread
#46336 commented on
Jul 15, 2025 • 0 new comments -
[RLlib] Checkpointing fails with CUDA GPU learner using the new API stack
#53793 commented on
Jul 15, 2025 • 0 new comments