Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: No module named 'fused' #35

Closed
HasnainKhanNiazi opened this issue Sep 10, 2021 · 24 comments
Closed

ImportError: No module named 'fused' #35

HasnainKhanNiazi opened this issue Sep 10, 2021 · 24 comments

Comments

@HasnainKhanNiazi
Copy link

Hi, I am trying to setup this repo on my own local machine but I am getting this error. I searched on internet but couldn't find a single solution of this. Any help will be appreciated. Thanks

ImportError: No module named 'fused'

@yuval-alaluf
Copy link
Owner

Are you working on linux? Have you tried running the code using the provided conda environment?

@HasnainKhanNiazi
Copy link
Author

HasnainKhanNiazi commented Sep 10, 2021

Yes, I am working in Linux and I am using the provided conda environment.
Here are system specs:
GPU: Tesla T4
CUDA Version: 11.2
Ubuntu: 18.04

@yuval-alaluf
Copy link
Owner

Weird. I have Ubuntu 18.04.5 and CUDA 11.1 so the environment seems good. Can you send over the command you tried running?

@HasnainKhanNiazi
Copy link
Author

I am using Jupyter Notebook present in the notebooks folder ("inference_playground") and I am getting that error on this import line

from models.psp import pSp

@HasnainKhanNiazi
Copy link
Author

HasnainKhanNiazi commented Sep 10, 2021

I am not sure what was wrong but now I am not having this error instead I am having an error on this line and error is mentioned below:

Code Line: os.path.join(module_path, 'fused_bias_act_kernel.cu')

Error: ninja: build stopped: subcommand failed.

@yuval-alaluf
Copy link
Owner

Hmmm. I just ran the notebook in Colab and it worked fine. Ninja can be a pain and there are no really good references to how to fix them.

Any chance you can send me the full stack trace? Maybe there is something that can help us there.

@HasnainKhanNiazi
Copy link
Author

@yuval-alaluf here is the the full stack trace.


CalledProcessError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1029 cwd=build_directory,
-> 1030 check=True)
1031 else:

~/anaconda3/envs/newEnv/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
417 raise CalledProcessError(retcode, process.args,
--> 418 output=stdout, stderr=stderr)
419 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
13 from datasets.augmentations import AgeTransformer
14 from utils.common import tensor2im
---> 15 from models.psp import pSp

/SAM/notebooks/SAM/notebooks/SAM/models/psp.py in
10
11 from configs.paths_config import model_paths
---> 12 from models.encoders import psp_encoders
13 from models.stylegan2.model import Generator
14

/SAM/notebooks/SAM/notebooks/SAM/models/encoders/psp_encoders.py in
6
7 from models.encoders.helpers import get_blocks, bottleneck_IR, bottleneck_IR_SE
----> 8 from models.stylegan2.model import EqualLinear
9
10

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/model.py in
5 from torch.nn import functional as F
6
----> 7 from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
8
9

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/init.py in
----> 1 from .fused_act import FusedLeakyReLU, fused_leaky_relu
2 from .upfirdn2d import upfirdn2d

/SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_act.py in
11 sources=[
12 os.path.join(module_path, 'fused_bias_act.cpp'),
---> 13 os.path.join(module_path, 'fused_bias_act_kernel.cu'),
14 ],
15 )

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
659 verbose,
660 with_cuda,
--> 661 is_python_module)
662
663

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module)
828 build_directory=build_directory,
829 verbose=verbose,
--> 830 with_cuda=with_cuda)
831 finally:
832 baton.release()

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda)
881 if verbose:
882 print('Building extension module {}...'.format(name))
--> 883 _build_extension_module(name, build_directory, verbose)
884
885

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _build_extension_module(name, build_directory, verbose)
1041 if hasattr(error, 'output') and error.output:
1042 message += ": {}".format(error.output.decode())
-> 1043 raise RuntimeError(message)
1044
1045

RuntimeError: Error building extension 'fused': [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_75'
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/TH -isystem /root/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/include/THC -isystem /root/anaconda3/envs/newEnv/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /SAM/notebooks/SAM/notebooks/SAM/models/stylegan2/op/fused_bias_act.cpp -o fused_bias_act.o
ninja: build stopped: subcommand failed.

@yuval-alaluf
Copy link
Owner

Seems like we're getting somewhere. I noticed the following line:
nvcc fatal : Unsupported gpu architecture 'compute_75'
It seems like there is a mismatch between the GPU and the CUDA version on your system. Were you able to previously use the GPU with CUDA?

@yuval-alaluf
Copy link
Owner

I found some other issues that may be of help:
facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

@HasnainKhanNiazi
Copy link
Author

This is a fresh system and this is first github repo I ran on this machine so can't say for sure about that.

@HasnainKhanNiazi
Copy link
Author

I found some other issues that may be of help:
facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

@yuval-alaluf
Copy link
Owner

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

@HasnainKhanNiazi
Copy link
Author

HasnainKhanNiazi commented Sep 10, 2021

Thanks @yuval-alaluf , let me have a look at these links and I will update you.

Cool. In order to isolate the issues with ninja and your machine, I would try to make sure you're able to get torch to run with a GPU and then try running the code in this repo (since this repo requires ninja which can be tricky on its own).

I tested torch with cuda and it is working fine.

import torch
Code: torch.cuda.is_available()
OutPut: True
Code: torch.cuda.device(0)
OutPut: <torch.cuda.device object at 0x7fa552331588>
Code: torch.cuda.current_device()
OutPut: 0
Code: torch.cuda.device_count()
OutPut: 1
Code: torch.cuda.get_device_name(0)
OutPut: 'Tesla T4'

@yuval-alaluf
Copy link
Owner

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

@HasnainKhanNiazi
Copy link
Author

Can you please check what version of nvcc you have? You can do this by running nvcc --version.

Here is output that I get by running nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

@yuval-alaluf
Copy link
Owner

yuval-alaluf commented Sep 10, 2021

Yea. I see your problem. It appears that you have multiple CUDA versions instead. If you notice, the result of running nvcc --version indicates that you are using CUDA 9.1. And CUDA 9.1 is not compatible with your T4 GPU (which requires CUDA >- 10.1). You need to switch your CUDA to use version 11.2 which you mentioned above.

facebookresearch/detectron2#149 (comment)
torch/torch7#1190 (comment)

Take a look at the first link here, which will take you to the steps you need for correctly setting your environment to use CUDA 11.1. Just note that in the example there, they use 10.1 so make sure to make the necessary adjustments based on your machine.

@HasnainKhanNiazi
Copy link
Author

Thanks @yuval-alaluf , I have tried these steps to set the Cuda 11.2 in the source file but after setting it up, still it isn't working and giving me the same error.

@HasnainKhanNiazi
Copy link
Author

@yuval-alaluf I have changed Cuda to 11.2 and luckily I am not getting that error but now I am getting an error on this line,

Code: ckpt = torch.load(model_path, map_location='cpu')

Error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
188 s = nts(s, "ascii", "strict")
--> 189 n = int(s.strip() or "0", 8)
190 except ValueError:

ValueError: invalid literal for int() with base 8: 'ightq\x04ct'

During handling of the above exception, another exception occurred:

InvalidHeaderError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2296 try:
-> 2297 tarinfo = self.tarinfo.fromtarfile(self)
2298 except EOFHeaderError as e:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in fromtarfile(cls, tarfile)
1092 buf = tarfile.fileobj.read(BLOCKSIZE)
-> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in frombuf(cls, buf, encoding, errors)
1034
-> 1035 chksum = nti(buf[148:156])
1036 if chksum not in calc_chksums(buf):

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in nti(s)
190 except ValueError:
--> 191 raise InvalidHeaderError("invalid header")
192 return n

InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

ReadError Traceback (most recent call last)
~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
594 try:
--> 595 return legacy_load(f)
596 except tarfile.TarError:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in legacy_load(f)
505
--> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
507 mkdtemp() as tmpdir:

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1588 raise CompressionError("unknown compression type %r" % comptype)
-> 1589 return func(name, filemode, fileobj, **kwargs)
1590

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
1618 raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1619 return cls(name, mode, fileobj, **kwargs)
1620

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in init(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
1481 self.firstmember = None
-> 1482 self.firstmember = self.next()
1483

~/anaconda3/envs/newEnv/lib/python3.6/tarfile.py in next(self)
2308 elif self.offset == 0:
-> 2309 raise ReadError(str(e))
2310 except EmptyHeaderError:

ReadError: invalid header

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
in
1 model_path = EXPERIMENT_ARGS['model_path']
----> 2 ckpt = torch.load(model_path, map_location='cpu')

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
424 if sys.version_info >= (3, 0) and 'encoding' not in pickle_load_args.keys():
425 pickle_load_args['encoding'] = 'utf-8'
--> 426 return _load(f, map_location, pickle_module, **pickle_load_args)
427 finally:
428 if new_fd:

~/anaconda3/envs/newEnv/lib/python3.6/site-packages/torch/serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
597 if _is_zipfile(f):
598 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
600 # if not a tarfile, reset file offset and proceed
601 f.seek(0)

RuntimeError: ../pretrained_models/sam_ffhq_aging.pt is a zip archive (did you mean to use torch.jit.load()?)`

@HasnainKhanNiazi
Copy link
Author

I think, this is because of Pytorch version.

@yuval-alaluf
Copy link
Owner

I think, this is because of Pytorch version.

What torch version are you using?

@HasnainKhanNiazi
Copy link
Author

I am using this torch version 1.3.1+cu100'

@yuval-alaluf
Copy link
Owner

Ah. You need to update your torch version to at least 1.6.0.

@HasnainKhanNiazi
Copy link
Author

Yes, I am doing that, I will update you as soon as I get it done. Thanks for your time, much appreciated.

@HasnainKhanNiazi
Copy link
Author

@yuval-alaluf Thanks for your time, first it was problem-related to Cuda and then the Pytorch version played an important role in giving errors. Now after Cuda setting to 11.3 and Pytorch to 1.9 it is working fine.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants