RuntimeError: CUDA error: no kernel image is available for execution on the device #2

lsj1111 · 2024-03-12T14:03:56Z

当我按照上述环境跑代码时，出现了这个错误，请问作者有遇到过吗，怎么解决的

yuhongtian17 · 2024-03-13T01:36:17Z

检查您的显卡型号、"nvidia-smi" cuda版本、"nvcc -V" cuda版本和PyTorch版本是否匹配（例如RTX 4090要求CUDA 11.6及以上）。如果您难以确定，也可以将上述信息提供给我们来帮助您检查。

lsj1111 · 2024-03-13T02:39:25Z

但是当我把CUDA和torch版本都提升之后，感觉就无法安装您源码对应的MMCV了，会导致运行时有很多的bug，因为MMCV2.0版本有太多的更新

yuhongtian17 · 2024-03-16T10:34:18Z

我们的代码版本是基于mmrotate-0.3.3/0.3.4编写的。也就是说只要mmrotate-0.3.3/0.3.4的例如Rotated Faster RCNN r50示例代码能跑通的话我们的代码也能跑通；官方代码也不能跑通的话请到mmrotate仓库下报告issue。应当根据您的显卡型号选择合适的（而非最新的）CUDA和PyTorch版本。

lsj1111 · 2024-03-22T06:40:11Z

当我将maevit的预训练权重加载进来时一切都很正常，但是当我将您的预训练权重加载进来时，似乎不能成功加载导致效果很差，代码如下所示：
pretrained = r'D:\Tool\Datasets\pretrained_weight\Spatial-Transform-Decoupling-Models\pretrained\mae_pretrain_vit_base_full.pth'

angle_version = 'le90'
norm_cfg = dict(type='LN', requires_grad=True)
model = dict(
type='RotatedimTED',
#pretrained=pretrained,
proposals_dim=6,
backbone=dict(
type='VisionTransformer',
init_cfg=dict(type='Pretrained', checkpoint=pretrained),
img_size=224,

yuhongtian17 · 2024-03-27T04:34:51Z

这些描述不足够定位错误。是否可以提供更详细的配置文件和报错信息呢？

lsj1111 · 2024-03-28T02:33:08Z

我总结了一下问题，你提供的可下载的预训练权重中包含多个文件：第一个是MAE预训练的VIT权重-> mae_pretrain_vit_base_full.pth，第二个是你引入了STD模块后训练了12个epoch的权重 ->epoch_12.pth，但是现在有一个很奇怪的事情，当我拿第一个权重训练时得到的MAP值居然是远大于第二个的，理论上来说不是应该第二个权重训练出来的效果会更好吗，所以我很疑惑，还请解答。

lsj1111 · 2024-03-28T09:07:52Z

当我加载你的epoch_12权重时，出现了下述报错信息：
missing keys in source state_dict: fc_cls.weight, fc_cls.bias, decoder_blocks.2.layer_reg.norms.0.weight, decoder_blocks.2.layer_reg.norms.0.bias, decoder_blocks.2.layer_reg.norms.1.weight, decoder_blocks.2.layer_reg.norm
s.1.bias, decoder_blocks.2.layer_reg.norms.2.weight, decoder_blocks.2.layer_reg.norms.2.bias, decoder_blocks.2.layer_reg.convs.0.weight, decoder_blocks.2.layer_reg.convs.0.bias, decoder_blocks.2.layer_reg.convs.1.weight,
decoder_blocks.2.layer_reg.convs.1.bias, decoder_blocks.2.layer_reg.convs.2.weight, decoder_blocks.2.layer_reg.convs.2.bias, decoder_blocks.2.layer_reg.norm_reg.weight, decoder_blocks.2.layer_reg.norm_reg.bias, decoder_bl
ocks.2.layer_reg.fc_reg.weight, decoder_blocks.2.layer_reg.fc_reg.bias, decoder_blocks.3.layer_reg.norms.0.weight, decoder_blocks.3.layer_reg.norms.0.bias, decoder_blocks.3.layer_reg.norms.1.weight, decoder_blocks.3.layer
_reg.norms.1.bias, decoder_blocks.3.layer_reg.norms.2.weight, decoder_blocks.3.layer_reg.norms.2.bias, decoder_blocks.3.layer_reg.convs.0.weight, decoder_blocks.3.layer_reg.convs.0.bias, decoder_blocks.3.layer_reg.convs.1
.weight, decoder_blocks.3.layer_reg.convs.1.bias, decoder_blocks.3.layer_reg.convs.2.weight, decoder_blocks.3.layer_reg.convs.2.bias, decoder_blocks.3.layer_reg.norm_reg.weight, decoder_blocks.3.layer_reg.norm_reg.bias, decoder_blocks.3.layer_reg.fc_reg.weight, decoder_blocks.3.layer_reg.fc_reg.bias, decoder_blocks.4.layer_reg.norms.0.weight, decoder_blocks.4.layer_reg.norms.0.bias, decoder_blocks.4.layer_reg.norms.1.weight, decoder_block
s.4.layer_reg.norms.1.bias, decoder_blocks.4.layer_reg.convs.0.weight, decoder_blocks.4.layer_reg.convs.0.bias, decoder_blocks.4.layer_reg.convs.1.weight, decoder_blocks.4.layer_reg.convs.1.bias, decoder_blocks.4.layer_re
g.norm_reg.weight, decoder_blocks.4.layer_reg.norm_reg.bias, decoder_blocks.4.layer_reg.fc_reg.weight, decoder_blocks.4.layer_reg.fc_reg.bias, decoder_blocks.5.layer_reg.norms.0.weight, decoder_blocks.5.layer_reg.norms.0.
bias, decoder_blocks.5.layer_reg.norms.1.weight, decoder_blocks.5.layer_reg.norms.1.bias, decoder_blocks.5.layer_reg.convs.0.weight, decoder_blocks.5.layer_reg.convs.0.bias, decoder_blocks.5.layer_reg.convs.1.weight, decoder_blocks.5.layer_reg.convs.1.bias, decoder_blocks.5.layer_reg.norm_reg.weight, decoder_blocks.5.layer_reg.norm_reg.bias, decoder_blocks.5.layer_reg.fc_reg.weight, decoder_blocks.5.layer_reg.fc_reg.bias, decoder_blocks.6
.layer_reg.norms.0.weight, decoder_blocks.6.layer_reg.norms.0.bias, decoder_blocks.6.layer_reg.convs.0.weight, decoder_blocks.6.layer_reg.convs.0.bias, decoder_blocks.6.layer_reg.norm_reg.weight, decoder_blocks.6.layer_re
g.norm_reg.bias, decoder_blocks.6.layer_reg.fc_reg.weight, decoder_blocks.6.layer_reg.fc_reg.bias, decoder_blocks.7.layer_reg.norms.0.weight, decoder_blocks.7.layer_reg.norms.0.bias, decoder_blocks.7.layer_reg.convs.0.wei
ght, decoder_blocks.7.layer_reg.convs.0.bias, decoder_blocks.7.layer_reg.norm_reg.weight, decoder_blocks.7.layer_reg.norm_reg.bias, decoder_blocks.7.layer_reg.fc_reg.weight, decoder_blocks.7.layer_reg.fc_reg.bias, decoder_box_norm.weight, decoder_box_norm.bias
我不知道问题出在了哪里。

yuhongtian17 · 2024-03-28T14:40:48Z

是这样的，因为epoch_12.pth是按检测器所定义的变量名作为权重存储字典的key的，它与mae pretrained的权重存储字典的key不一致。而检测器定义的init_weight函数是按mae pretrained的字典key去查找对应value和加载权重的。因此epoch_12.pth不能直接作为检测器的预训练模型从头开始全监督预训练，否则就会出现未能成功加载权重、相当于随机初始化的情形；如果需要权重微调，则应按test时的整个模型load_state_dict方式加载权重然后开始训练。

quxianjiuguo · 2024-04-28T08:30:44Z

Can I use a single 4090 in this github to train my own dataset？

yuhongtian17 · 2024-05-11T09:02:47Z

See issue #7

WenLinLliu · 2024-05-15T01:18:06Z

Can I use a single 4090 in this github to train my own dataset？

yes，you can，no problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: no kernel image is available for execution on the device #2

RuntimeError: CUDA error: no kernel image is available for execution on the device #2

lsj1111 commented Mar 12, 2024

yuhongtian17 commented Mar 13, 2024

lsj1111 commented Mar 13, 2024

yuhongtian17 commented Mar 16, 2024

lsj1111 commented Mar 22, 2024

yuhongtian17 commented Mar 27, 2024

lsj1111 commented Mar 28, 2024

lsj1111 commented Mar 28, 2024

yuhongtian17 commented Mar 28, 2024

quxianjiuguo commented Apr 28, 2024

yuhongtian17 commented May 11, 2024

WenLinLliu commented May 15, 2024

RuntimeError: CUDA error: no kernel image is available for execution on the device #2

RuntimeError: CUDA error: no kernel image is available for execution on the device #2

Comments

lsj1111 commented Mar 12, 2024

yuhongtian17 commented Mar 13, 2024

lsj1111 commented Mar 13, 2024

yuhongtian17 commented Mar 16, 2024

lsj1111 commented Mar 22, 2024

yuhongtian17 commented Mar 27, 2024

lsj1111 commented Mar 28, 2024

lsj1111 commented Mar 28, 2024

yuhongtian17 commented Mar 28, 2024

quxianjiuguo commented Apr 28, 2024

yuhongtian17 commented May 11, 2024

WenLinLliu commented May 15, 2024