Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to load training_args.bin #373

Merged
merged 1 commit into from
Jul 8, 2023

Conversation

lpfhs
Copy link
Contributor

@lpfhs lpfhs commented Jul 5, 2023

While trying to load a fine-tuned model, I was getting this exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 82, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 212, in from_engine_args
    engine = cls(engine_args.worker_use_ray,
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 49, in __init__
    self.engine = engine_class(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 97, in __init__
    worker = worker_cls(
  File "/opt/venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 45, in __init__
    self.model = get_model(model_config)
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 49, in get_model
    model.load_weights(
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 248, in load_weights
    for name, loaded_weight in hf_model_weights_iterator(
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 74, in hf_model_weights_iterator
    state = torch.load(bin_file, map_location="cpu")
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'TrainingArguments' on <module 'vllm.entrypoints.api_server' from '/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/api_server.py'>

@zhuohan123
Copy link
Member

Hi @lpfhs! Thanks for your contribution! Can you provide the name of the model that can cause this error for us to test out?

@lpfhs
Copy link
Contributor Author

lpfhs commented Jul 6, 2023

@zhuohan123 The model is not public, so I can't share it. It's a fine tuned vicuna 7b model that was trained using the training code in FastChat. You can see that the TrainingArguments class is present in the FastChat training script: https://github.com/lm-sys/FastChat/blob/0a827abe0cc60a3733b4406a070beb1ac8d0e5e1/fastchat/train/train.py#L50

@zhuohan123
Copy link
Member

Just want to make sure this is a common pattern instead of a specific case for a specific model. Is this only introduced by the specific training script in FastChat, or any fine-tuned HuggingFace model will have this training_args.bin?

While trying to load a fine-tuned model, I was getting this exception:
```
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 82, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 212, in from_engine_args
    engine = cls(engine_args.worker_use_ray,
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 49, in __init__
    self.engine = engine_class(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 97, in __init__
    worker = worker_cls(
  File "/opt/venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 45, in __init__
    self.model = get_model(model_config)
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 49, in get_model
    model.load_weights(
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 248, in load_weights
    for name, loaded_weight in hf_model_weights_iterator(
  File "/opt/venv/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 74, in hf_model_weights_iterator
    state = torch.load(bin_file, map_location="cpu")
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/opt/venv/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'TrainingArguments' on <module 'vllm.entrypoints.api_server' from '/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/api_server.py'>
```
@lpfhs
Copy link
Contributor Author

lpfhs commented Jul 6, 2023

@zhuohan123 I think the issue is that FastChat has a custom TrainingArguments class, so when we're loading the pickle file, we need the definition of this custom class. From some web search, it seems transformers package is able to load training_args.bin if it's not a custom class (i.e. it's transformers.TrainingArguments).

@bryanhpchiang
Copy link

running into the same issue

image

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is a common issue. Thanks for your contribution!

@zhuohan123 zhuohan123 merged commit 75beba2 into vllm-project:main Jul 8, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Xaenalt pushed a commit to opendatahub-io/vllm that referenced this pull request Oct 14, 2024
Updating docker links & version references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants