[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152

excelsimon · 2024-07-05T10:08:15Z

Your current environment

vllm version: '0.5.0.post1'

🐛 Describe the bug

When I set tensor_parallel_size=1, it works well.
But, if I set tensor_parallel_size>1, below error occurs:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
After I add

import torch
import multiprocessing
torch.multiprocessing.set_start_method('spawn')

the same RuntimeError still occurs.

The text was updated successfully, but these errors were encountered:

youkaichao · 2024-07-05T17:41:14Z

please paste your full code. you might initialized cuda before using vLLM.

yuchenlin · 2024-07-06T08:26:30Z

I'm also having the same issue with the latest version of vllm + gemma-2-27B-it

yuchenlin · 2024-07-06T08:43:42Z

export VLLM_WORKER_MULTIPROC_METHOD=spawn may help

youkaichao · 2024-07-06T17:56:54Z

I can run the following code without any issues:

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="google/gemma-2-27b-it", tensor_parallel_size=2)
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

excelsimon · 2024-07-08T02:44:38Z

export VLLM_WORKER_MULTIPROC_METHOD=spawn may help

It works for me. Thank you~

yuchenlin · 2024-07-08T04:27:50Z

I can run the following code without any issues:

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="google/gemma-2-27b-it", tensor_parallel_size=2)
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

i'm not sure. maybe it's about the version issue of powerinfer? and I actually find that even I used the above solution to make vllm able to generate, the quality is not as good as other Gemma-2-27B inference (both under greedy decoding).

henry-y · 2024-07-24T13:08:21Z

export VLLM_WORKER_MULTIPROC_METHOD=spawn may help

it also works for me! thank you!

CharlesRiggins · 2024-08-09T08:18:27Z

I have encountered the same issue. I solved it by making VLLM_WORKER_MULTIPROC_METHOD=spawn as mentioned by @yuchenlin
Now I'm wondering why I get this error and why setting VLLM_WORKER_MULTIPROC_METHOD solves it. Please help us know and make it clear.

lonngxiang · 2024-08-12T01:56:55Z

not work for me

export VLLM_WORKER_MULTIPROC_METHOD=spawn;CUDA_VISIBLE_DEVICES=1 vllm serve  /ai/minicpmv --host 192.168.2.238 --port 10868 --max-model-len 10000 --trust-remote-code --api-key token-abc123 --gpu_memory_utilization 1 --trust-remote-code

rin2401 · 2024-09-19T17:28:23Z

from peft import PeftModel, PeftConfig
i import peft made this error too

youkaichao · 2024-09-20T19:44:01Z

@rin2401 try to use distributed_executor_backend="ray" ?

IIDCII · 2024-10-22T17:11:01Z

export VLLM_WORKER_MULTIPROC_METHOD=spawn may help

if you're using this in a Python notebook run the following first on a reset kernel:

import os
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

excelsimon added the bug Something isn't working label Jul 5, 2024

excelsimon closed this as completed Jul 8, 2024

kota-iizuka mentioned this issue Jul 19, 2024

Is there a way to terminate vllm.LLM and release the GPU memory #1908

Open

lastdefiance20 mentioned this issue Jul 24, 2024

[Bug] model:DeepSeek-V2-Chat-0628 bug: ChildProcessError: worker died #6668

Open

ignorejjj mentioned this issue Jul 28, 2024

how to change Bfloat16 to float16 (I can only use a GPU with compute capacity >= 8.0 to run this repo?) RUC-NLPIR/FlashRAG#58

Closed

greenmoon55 mentioned this issue Jul 31, 2024

Bump vLLM to 0.5.3.post1 kserve/kserve#3828

Merged

9 tasks

khai-meetkai mentioned this issue Aug 9, 2024

server_vllm.py doesn't run, multiprocessing errors MeetKai/functionary#240

Closed

youkaichao mentioned this issue Aug 12, 2024

[frontend] isolate api server process and engine process #7411

Closed

ashgold mentioned this issue Sep 12, 2024

[Bug]: Error when using tensor_parallel in v0.6.1 #8397

Closed

1 task

Shaoting-Feng mentioned this issue Oct 15, 2024

CUDA initialization / operation error when using distributed GPU LMCache/lmcache-tests#16

Closed

ikergarcia1996 mentioned this issue Oct 23, 2024

Wrap gen_ins.py under if __name__ == "__main__": magpie-align/magpie#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152

[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152

excelsimon commented Jul 5, 2024

youkaichao commented Jul 5, 2024

yuchenlin commented Jul 6, 2024

yuchenlin commented Jul 6, 2024

youkaichao commented Jul 6, 2024

excelsimon commented Jul 8, 2024

yuchenlin commented Jul 8, 2024 •

edited

Loading

henry-y commented Jul 24, 2024

CharlesRiggins commented Aug 9, 2024

lonngxiang commented Aug 12, 2024

rin2401 commented Sep 19, 2024

youkaichao commented Sep 20, 2024

IIDCII commented Oct 22, 2024

[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152

[Bug]: When tensor_parallel_size>1, RuntimeError: Cannot re-initialize CUDA in forked subprocess. #6152

Comments

excelsimon commented Jul 5, 2024

Your current environment

🐛 Describe the bug

youkaichao commented Jul 5, 2024

yuchenlin commented Jul 6, 2024

yuchenlin commented Jul 6, 2024

youkaichao commented Jul 6, 2024

excelsimon commented Jul 8, 2024

yuchenlin commented Jul 8, 2024 • edited Loading

henry-y commented Jul 24, 2024

CharlesRiggins commented Aug 9, 2024

lonngxiang commented Aug 12, 2024

rin2401 commented Sep 19, 2024

youkaichao commented Sep 20, 2024

IIDCII commented Oct 22, 2024

yuchenlin commented Jul 8, 2024 •

edited

Loading