Skip to content

[Llama-2-13b-chat-hf] IPv6 Network Address Retrieval Error on 4 V100s 16GB #570

@sksq96

Description

@sksq96

Hello,

I'm encountering an issue while running the following code:

from vllm import LLM, SamplingParams
llm = LLM(model="meta-llama/Llama-2-13b-chat-hf", tensor_parallel_size=2)

The hardware I'm using is 4 V100s with 16GB each. The error I'm receiving is as follows:

(Worker pid=22514) [W socket.cpp:601] [c10d] The IPv6 network addresses of (__internal_head__, 16516) cannot be retrieved (gai error: -2 - Name or service not known).
(Worker pid=22513) [W socket.cpp:601] [c10d] The IPv6 network addresses of (__internal_head__, 16516) cannot be retrieved (gai error: -2 - Name or service not known). [repeated 10x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(Worker pid=22513) [W socket.cpp:601] [c10d] The IPv6 network addresses of (__internal_head__, 16516) cannot be retrieved (gai error: -2 - Name or service not known). [repeated 10x across cluster]

Any help or guidance on how to resolve this issue would be greatly appreciated.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions