-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs #5657
Comments
will take a look later |
sorry i don't get it. the usage of oot model registration, is that you register the architecture name appearing in the huggingface config file, not the see https://huggingface.co/facebook/opt-125m/blob/main/config.json#L6 for example. |
Yes, this is how I am using it. from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
from vllm import LLM, SamplingParams
if __name__ == "__main__":
llm = LLM(
model="path_to_directory/", # directory which has a config.json with architectures: ["SomeModel"]
tensor_parallel_size=8,
# distributed_executor_backend="ray", # ray backend fails!
) |
then it makes sense to me. from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM) is not executed in ray workers. |
thanks! |
@SamKG so the default backend (multiprocessing) should work out-of-the-box, right? |
@richardliaw Try attached. Note that the default backend will also fail (but with an expected error), since I added a stub tensor to keep the model directory small. @youkaichao yes, default backend works fine (as long as the OOT definition happens outside of main) |
ray.init(runtime_env={"worker_process_setup_hook": })... allows to execute code on all workers. Would this suffice? |
@rkooo567 this functionality seems related, but how can we expose it to users? |
this seems to fix the issue! import ray
from vllm import ModelRegistry, LLM
def _init_worker():
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
_init_worker()
if __name__ == "__main__":
ray.init(runtime_env={"worker_process_setup_hook": _init_worker})
llm = LLM(
model="model/",
tensor_parallel_size=8,
distributed_executor_backend="ray",
)
llm.generate("test") |
very nice! @youkaichao maybe we can just print out a warning linking to the vllm docs about this? and in the vllm docs let's have an example snippet like above! |
Your current environment
🐛 Describe the bug
The ray distributed backend does not support out-of-tree models (on a single node).
Repro:
The text was updated successfully, but these errors were encountered: