Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with image loading and accelerate #45

Open
aamir-gmail opened this issue Apr 19, 2023 · 8 comments
Open

Issues with image loading and accelerate #45

aamir-gmail opened this issue Apr 19, 2023 · 8 comments

Comments

@aamir-gmail
Copy link

FYI , when starting the demo file I get the following message
torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file:

the other one is to do the hugging-face accelerate

This model has some weights that should be kept in higher precision, you need to upgrade accelerate to properly deal with them (pip install --upgrade accelerate)

@yemingx
Copy link

yemingx commented Apr 26, 2023

Same for the first issue.
Error message when running the demo:
"/mnt/software/anaconda3/envs/minigpt4/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory"

I have checked my cuda and pytorch installation. Looks fine. See outputs below.

import torch
print(torch.version.cuda)
11.7
torch.cuda.is_available()
True
torch.cuda.device_count()
1
print(torch.version)
2.0.0+cu117

nvidia-smi
Wed Apr 26 15:02:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 On | 00000000:00:08.0 Off | 0 |
| 0% 32C P8 21W / 150W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

Need help with the error.

@getpa
Copy link

getpa commented Apr 26, 2023

The same issue for me. It seems that pytorch=1.12.1 installed with conda has been uninstalled and upgraded to torch==2.0.0 with pip. Therefore, environment.yaml should be updated.

Installing collected packages: webencodings, wcwidth, tokenizers, sentencepiece, Send2Trash, pytz, pydub, pure-eval, ptyprocess, pickleshare, pathtools, mpmath, mistune, lit, ipython-genutils, ffmpy, fastjsonschema, executing, cymem, cmake, cchardet, braceexpand, bitsandbytes, backcall, appdirs, antlr4-python3-runtime, zipp, websockets, websocket-client, webcolors, wasabi, uri-template, uc-micro-py, tzdata, traitlets, tqdm, tornado, toolz, tinycss2, threadpoolctl, tenacity, sympy, spacy-loggers, spacy-legacy, soupsieve, sniffio, smmap, smart-open, six, setproctitle, sentry-sdk, semantic-version, scipy, rfc3986-validator, regex, pyzmq, pyyaml, python-multipart, python-json-logger, pyrsistent, pyparsing, pygments, pydantic, psutil, protobuf, prompt-toolkit, prometheus-client, portalocker, platformdirs, pexpect, parso, pandocfilters, packaging, orjson, opencv-python, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, networkx, nest-asyncio, murmurhash, multidict, mdurl, markupsafe, llvmlite, langcodes, kiwisolver, jupyterlab-pygments, jsonpointer, joblib, h11, fsspec, frozenlist, fqdn, fonttools, filelock, entrypoints, defusedxml, decord, decorator, debugpy, cycler, contourpy, Click, chardet, catalogue, blis, attrs, async-timeout, aiofiles, yarl, webdataset, uvicorn, typer, terminado, srsly, scikit-learn, rfc3339-validator, python-dateutil, preshed, omegaconf, nvidia-cusolver-cu11, nvidia-cudnn-cu11, numba, nltk, matplotlib-inline, markdown-it-py, linkify-it-py, jupyter-core, jsonschema, jinja2, jedi, iopath, importlib-resources, importlib-metadata, huggingface-hub, gitdb, docker-pycreds, comm, bleach, beautifulsoup4, asttokens, argon2-cffi-bindings, anyio, aiosignal, transformers, starlette, stack-data, pynndescent, pathy, pandas, nbformat, mdit-py-plugins, matplotlib, jupyter-server-terminals, jupyter-client, httpcore, gradio-client, GitPython, confection, arrow, argon2-cffi, aiohttp, wandb, umap-learn, thinc, pycocotools, openai, nbclient, isoduration, ipython, httpx, fastapi, altair, spacy, pycocoevalcap, nbconvert, ipykernel, gradio, jupyter-events, jupyter-server, notebook-shim, nbclassic, notebook, triton, torch, accelerate, timm, sentence-transformers, peft
Attempting uninstall: torch
Found existing installation: torch 1.12.1
Uninstalling torch-1.12.1:
Successfully uninstalled torch-1.12.1
Successfully installed Click-8.1.3 GitPython-3.1.31 Send2Trash-1.8.0 accelerate-0.16.0 aiofiles-23.1.0 aiohttp-3.8.4 aiosignal-1.3.1 altair-4.2.2 antlr4-python3-runtime-4.9.3 anyio-3.6.2 appdirs-1.4.4 argon2-cffi-21.3.0 argon2-cffi-bindings-21.2.0 arrow-1.2.3 asttokens-2.2.1 async-timeout-4.0.2 attrs-22.2.0 backcall-0.2.0 beautifulsoup4-4.12.2 bitsandbytes-0.37.0 bleach-6.0.0 blis-0.7.9 braceexpand-0.1.7 catalogue-2.0.8 cchardet-2.1.7 chardet-5.1.0 cmake-3.26.3 comm-0.1.3 confection-0.0.4 contourpy-1.0.7 cycler-0.11.0 cymem-2.0.7 debugpy-1.6.7 decorator-5.1.1 decord-0.6.0 defusedxml-0.7.1 docker-pycreds-0.4.0 entrypoints-0.4 executing-1.2.0 fastapi-0.95.1 fastjsonschema-2.16.3 ffmpy-0.3.0 filelock-3.9.0 fonttools-4.38.0 fqdn-1.5.1 frozenlist-1.3.3 fsspec-2023.4.0 gitdb-4.0.10 gradio-3.24.1 gradio-client-0.0.8 h11-0.14.0 httpcore-0.17.0 httpx-0.24.0 huggingface-hub-0.13.4 importlib-metadata-6.6.0 importlib-resources-5.12.0 iopath-0.1.10 ipykernel-6.22.0 ipython-8.12.0 ipython-genutils-0.2.0 isoduration-20.11.0 jedi-0.18.2 jinja2-3.1.2 joblib-1.2.0 jsonpointer-2.3 jsonschema-4.17.3 jupyter-client-8.2.0 jupyter-core-5.3.0 jupyter-events-0.6.3 jupyter-server-2.5.0 jupyter-server-terminals-0.4.4 jupyterlab-pygments-0.2.2 kiwisolver-1.4.4 langcodes-3.3.0 linkify-it-py-2.0.0 lit-16.0.2 llvmlite-0.39.1 markdown-it-py-2.2.0 markupsafe-2.1.2 matplotlib-3.7.0 matplotlib-inline-0.1.6 mdit-py-plugins-0.3.3 mdurl-0.1.2 mistune-2.0.5 mpmath-1.3.0 multidict-6.0.4 murmurhash-1.0.9 nbclassic-0.5.5 nbclient-0.7.4 nbconvert-7.3.1 nbformat-5.8.0 nest-asyncio-1.5.6 networkx-3.1 nltk-3.8.1 notebook-6.5.4 notebook-shim-0.2.3 numba-0.56.4 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 omegaconf-2.3.0 openai-0.27.0 opencv-python-4.7.0.72 orjson-3.8.10 packaging-23.0 pandas-2.0.1 pandocfilters-1.5.0 parso-0.8.3 pathtools-0.1.2 pathy-0.10.1 peft-0.2.0 pexpect-4.8.0 pickleshare-0.7.5 platformdirs-3.3.0 portalocker-2.7.0 preshed-3.0.8 prometheus-client-0.16.0 prompt-toolkit-3.0.38 protobuf-4.22.3 psutil-5.9.4 ptyprocess-0.7.0 pure-eval-0.2.2 pycocoevalcap-1.2 pycocotools-2.0.6 pydantic-1.10.7 pydub-0.25.1 pygments-2.15.1 pynndescent-0.5.10 pyparsing-3.0.9 pyrsistent-0.19.3 python-dateutil-2.8.2 python-json-logger-2.0.7 python-multipart-0.0.6 pytz-2023.3 pyyaml-6.0 pyzmq-25.0.2 regex-2022.10.31 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 scikit-learn-1.2.2 scipy-1.10.1 semantic-version-2.10.0 sentence-transformers-2.2.2 sentencepiece-0.1.98 sentry-sdk-1.21.0 setproctitle-1.3.2 six-1.16.0 smart-open-6.3.0 smmap-5.0.0 sniffio-1.3.0 soupsieve-2.4.1 spacy-3.5.1 spacy-legacy-3.0.12 spacy-loggers-1.0.4 srsly-2.4.6 stack-data-0.6.2 starlette-0.26.1 sympy-1.11.1 tenacity-8.2.2 terminado-0.17.1 thinc-8.1.9 threadpoolctl-3.1.0 timm-0.6.13 tinycss2-1.2.1 tokenizers-0.13.2 toolz-0.12.0 torch-2.0.0 tornado-6.3.1 tqdm-4.64.1 traitlets-5.9.0 transformers-4.28.0 triton-2.0.0 typer-0.7.0 tzdata-2023.3 uc-micro-py-1.0.1 umap-learn-0.5.3 uri-template-1.2.0 uvicorn-0.21.1 wandb-0.15.0 wasabi-1.1.1 wcwidth-0.2.6 webcolors-1.13 webdataset-0.2.48 webencodings-0.5.1 websocket-client-1.5.1 websockets-11.0.2 yarl-1.8.2 zipp-3.14.0

Then I tried to specify torch version in pip package list in environment.yml as follows:

...
  - pip:
    - --extra-index-url https://download.pytorch.org/whl/cu113
    - torch==1.12.0+cu113
...

But I got the following error.

INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.

The conflict is caused by:
The user requested torch==1.12.0+cu113
accelerate 0.16.0 depends on torch>=1.4.0
timm 0.6.13 depends on torch>=1.7
peft 0.2.0 depends on torch>=1.13.0
The user requested torch==1.12.0+cu113
accelerate 0.16.0 depends on torch>=1.4.0
timm 0.6.13 depends on torch>=1.7
peft 0.1.0 depends on torch>=1.13.0
The user requested torch==1.12.0+cu113
accelerate 0.16.0 depends on torch>=1.4.0
timm 0.6.13 depends on torch>=1.7
peft 0.0.2 depends on torch>=1.13.0
The user requested torch==1.12.0+cu113
accelerate 0.16.0 depends on torch>=1.4.0
timm 0.6.13 depends on torch>=1.7
peft 0.0.1 depends on torch>=1.13.0

I have no idea how can I solve this because there is no torch compatible with cu113.

@thiner
Copy link

thiner commented Apr 26, 2023

My service ran into this issue as well. I solved it by:

cd MiniGPT-4
conda env update

You can check effective lib versions by conda list. If there are still lib conflicts, try:

conda deactivate
source ~/.bashrc
conda activate minigpt4

@getpa
Copy link

getpa commented Apr 26, 2023

just solved the following error: torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file:

To solve, you can just rewrite environment.yml as follows.

name: minigpt4
channels:
  - pytorch
  - defaults
  - anaconda
  - nvidia
dependencies:
  - python=3.9
  - cudatoolkit=11.8.0
  - pip
  - pytorch
  - pytorch-cuda=11.8
  - torchaudio
  - torchvision
  - pip:
    - accelerate==0.16.0
    - aiohttp==3.8.4
    - aiosignal==1.3.1
    - async-timeout==4.0.2
    - attrs==22.2.0
    - bitsandbytes==0.38.0
    - cchardet==2.1.7
    - chardet==5.1.0
    - contourpy==1.0.7
    - cycler==0.11.0
    - filelock==3.9.0
    - fonttools==4.38.0
    - frozenlist==1.3.3
    - huggingface-hub==0.13.4
    - importlib-resources==5.12.0
    - kiwisolver==1.4.4
    - matplotlib==3.7.0
    - multidict==6.0.4
    - openai==0.27.0
    - packaging==23.0
    - psutil==5.9.4
    - pycocotools==2.0.6
    - pyparsing==3.0.9
    - python-dateutil==2.8.2
    - pyyaml==6.0
    - regex==2022.10.31
    - tokenizers==0.13.2
    - tqdm==4.64.1
    - transformers==4.28.0
    - timm==0.6.13
    - spacy==3.5.1
    - webdataset==0.2.48
    - scikit-learn==1.2.2
    - scipy==1.10.1
    - yarl==1.8.2
    - zipp==3.14.0
    - omegaconf==2.3.0
    - opencv-python==4.7.0.72
    - iopath==0.1.10
    - decord==0.6.0
    - tenacity==8.2.2
    - peft
    - pycocoevalcap
    - sentence-transformers
    - umap-learn
    - notebook
    - gradio==3.24.1
    - gradio-client==0.0.8
    - wandb

But unfortunately, I'm still unable to interact with miniGPT4 due to following error when I upload an image.

Traceback (most recent call last):
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/gradio/routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/gradio/blocks.py", line 915, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/work/s183313/MiniGPT-4/demo.py", line 83, in upload_img
    llm_message = chat.upload_img(gr_img, chat_state, img_list)
  File "/work/s183313/MiniGPT-4/minigpt4/conversation/conversation.py", line 185, in upload_img
    image_emb, _ = self.model.encode_img(image)
  File "/work/s183313/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 139, in encode_img
    query_output = self.Qformer.bert(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/MiniGPT-4/minigpt4/models/Qformer.py", line 937, in forward
    encoder_outputs = self.encoder(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/MiniGPT-4/minigpt4/models/Qformer.py", line 550, in forward
    layer_outputs = layer_module(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/MiniGPT-4/minigpt4/models/Qformer.py", line 417, in forward
    self_attention_outputs = self.attention(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/MiniGPT-4/minigpt4/models/Qformer.py", line 332, in forward
    self_outputs = self.self(
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/MiniGPT-4/minigpt4/models/Qformer.py", line 195, in forward
    key_layer = self.transpose_for_scores(self.key(hidden_states))
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/s183313/.pyenv/versions/mambaforge/envs/minigpt4/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Does anyone have idea?

@thiner
Copy link

thiner commented Apr 26, 2023

I think you should not modify the requirement.txt file. Because the libs have dependencies to each other. The problem is caused by pytorch in my opinion, the pytorch version was upgraded to 2.0.0 in my case, which is not the specified version number in the original requirement.txt file. I guess it was upgraded by other lib. You need to update the conda env "minigpt4" to fix pytorch version by conda env update.

@sushilkhadkaanon
Copy link

Did anyone solve the issue?

@sushilkhadkaanon
Copy link

Same for the first issue. Error message when running the demo: "/mnt/software/anaconda3/envs/minigpt4/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory"

I have checked my cuda and pytorch installation. Looks fine. See outputs below.

import torch
print(torch.version.cuda)
11.7
torch.cuda.is_available()
True
torch.cuda.device_count()
1
print(torch.version)
2.0.0+cu117

nvidia-smi Wed Apr 26 15:02:29 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:00:08.0 Off | 0 | | 0% 32C P8 21W / 150W | 0MiB / 23028MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

Need help with the error.

I ran into the same issue. Did you solve ?

@sushilkhadkaanon
Copy link

sushilkhadkaanon commented Sep 19, 2023

I solvee this issue.
The image warning is just a warning , you can ignore that. Won't affect while doing inference.

  1. pip install --upgrade accelerate
  2. In my case it was because of GPU memory capacity, I was able to run inference on 7B model .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants