Skip to content

Fix FlashRL compatibility with vllm 0.10.2 and torch 2.8#36

Open
nightlessbaron wants to merge 3 commits intoyaof20:mainfrom
nightlessbaron:main
Open

Fix FlashRL compatibility with vllm 0.10.2 and torch 2.8#36
nightlessbaron wants to merge 3 commits intoyaof20:mainfrom
nightlessbaron:main

Conversation

@nightlessbaron
Copy link
Copy Markdown

This PR addresses critical compatibility issues with newer VLLM versions (0.10.2+) and PyTorch 2.8, ensuring FlashRL continues to work seamlessly with updated dependencies.

Changes Made

  1. VLLM Parameter Attribute Compatibility (flash_rl/vllm_patch.py)

    Issue: Parameter objects lost tp_rank and tp_size attributes after PyTorch/VLLM upgrades.

    Fix: Added tp_rank and tp_size to recorded_loader_keys list to ensure these critical distributed training attributes are preserved during model loading.

  2. Memory Pool Management (flash_rl/fp8loader.py)

    Issue: VLLM memory pool API changes in newer versions

    Fix: Updated memory pool context management to work with new PyTorch memory allocation APIs

    Changes:

    • Updated imports to use new MemPool class from torch.cuda.memory
    • Refactored disable_mem_pool context manager to use new thread-local allocation APIs (_cuda_beginAllocateCurrentThreadToPool, _cuda_endAllocateToPool)
    • Added robust pool fetching with fallback handling for VLLM's internal pool storage changes

@nightlessbaron
Copy link
Copy Markdown
Author

I believe this should works now. Can you review it @LiyuanLucasLiu ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant