[QUESTION] I was able to download the LLM in llama.cpp, but I cannot download the LLM in Transformers. #1734

yamatohimi · 2024-06-27T22:10:15Z

I want to run 70B on llama-3-instruct using a GPU, so I am trying to run 8B on a GPU first.
I have confirmed that there is already a bug in the gguf model using llama.cpp that prevents the use of GPUs.
So I wanted to use pytorch's model with Transfoemer, but I can't download it.

I think this may have already been reported, or it may be a configuration error on my part, but I would appreciate it if you could help me.

settings

error

user@398d4e0508fb:/workspace# xinference launch --model-engine llama.cpp --model-name llama-3-instruct --size-in-billions 8 --model-format ggufv2 --quantization Q4_K_M --gpu-idx 0,1
Launch model name: llama-3-instruct with kwargs: {}
Model uid: llama-3-instruct-VxQZAWuj
user@398d4e0508fb:/workspace# xinference launch --model-engine Transformers --model-name llama-3-instruct --size-in-billions 8 --model-format pytorch --quantization 4-bit --gpu-idx 0,1
Launch model name: llama-3-instruct with kwargs: {}
Traceback (most recent call last):
  File "/opt/conda/bin/xinference", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/deploy/cmdline.py", line 898, in model_launch
    model_uid = client.launch_model(
  File "/opt/conda/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 911, in launch_model
    raise RuntimeError(
RuntimeError: Failed to launch model, detail: [address=0.0.0.0:50729, pid=43] Failed to download model 'llama-3-instruct' (size: 8, format: pytorch) after multiple retries

The text was updated successfully, but these errors were encountered:

yamatohimi · 2024-06-27T22:25:02Z

Python 3.10.13
xinference v.0.12.2.post1

ChengjieLi28 · 2024-06-28T03:09:59Z

Python 3.10.13 xinference v.0.12.2.post1

Paste the error from the xinference backend, not just from the commandline.

yamatohimi · 2024-06-28T12:12:15Z

OK!
Is this enough? If not, please let us know.

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-667ea709-40b704097bad97526456dd7c;c8a0107e-2241-49ff-9f19-817625902912)

Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B-Instruct/revision/main.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 771, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 837, in launch_builtin_model
    await _launch_model()
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 801, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/supervisor.py", line 782, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/core/worker.py", line 641, in launch_builtin_model
    model, model_description = await asyncio.to_thread(
  File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/core.py", line 67, in create_model_instance
    return create_llm_model_instance(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/core.py", line 223, in create_llm_model_instance
    save_path = cache(llm_family, llm_spec, quantization)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/llm_family.py", line 297, in cache
    return cache_from_huggingface(llm_family, llm_spec, quantization)
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/llm/llm_family.py", line 824, in cache_from_huggingface
    download_dir = retry_download(
  File "/opt/conda/lib/python3.10/site-packages/xinference/model/utils.py", line 127, in retry_download
    raise RuntimeError(
RuntimeError: [address=0.0.0.0:28136, pid=42] Failed to download model 'llama-3-instruct' (size: 8, format: pytorch) after multiple retries

ChengjieLi28 · 2024-07-01T02:57:35Z

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-667ea709-40b704097bad97526456dd7c;c8a0107e-2241-49ff-9f19-817625902912)

Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B-Instruct/revision/main.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-667ea709-40b704097bad97526456dd7c;c8a0107e-2241-49ff-9f19-817625902912)

Cannot access gated repo for url https://huggingface.co/api/models/meta-llama/Meta-Llama-3-8B-Instruct/revision/main.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

Llama-3 is a kind of model that needs to be authenticated. Refer to this: https://inference.readthedocs.io/en/latest/getting_started/troubleshooting.html to set the environment variable.

yamatohimi · 2024-07-01T03:56:19Z

Thank you for your answer.

yamatohimi added the question Further information is requested label Jun 27, 2024

XprobeBot added the gpu label Jun 27, 2024

XprobeBot added this to the v0.12.2 milestone Jun 27, 2024

XprobeBot modified the milestones: v0.12.2, v0.12.4 Jun 28, 2024

ChengjieLi28 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024

ChengjieLi28 mentioned this issue Jul 1, 2024

Some models require login to Hugging Face for download #1749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] I was able to download the LLM in llama.cpp, but I cannot download the LLM in Transformers. #1734

[QUESTION] I was able to download the LLM in llama.cpp, but I cannot download the LLM in Transformers. #1734

yamatohimi commented Jun 27, 2024 •

edited

Loading

yamatohimi commented Jun 27, 2024

ChengjieLi28 commented Jun 28, 2024 •

edited

Loading

yamatohimi commented Jun 28, 2024

ChengjieLi28 commented Jul 1, 2024

yamatohimi commented Jul 1, 2024

[QUESTION] I was able to download the LLM in llama.cpp, but I cannot download the LLM in Transformers. #1734

[QUESTION] I was able to download the LLM in llama.cpp, but I cannot download the LLM in Transformers. #1734

Comments

yamatohimi commented Jun 27, 2024 • edited Loading

yamatohimi commented Jun 27, 2024

ChengjieLi28 commented Jun 28, 2024 • edited Loading

yamatohimi commented Jun 28, 2024

ChengjieLi28 commented Jul 1, 2024

yamatohimi commented Jul 1, 2024

yamatohimi commented Jun 27, 2024 •

edited

Loading

ChengjieLi28 commented Jun 28, 2024 •

edited

Loading