Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'module_list.85.Conv2d.weight' #657

Closed
Samjith888 opened this issue Nov 25, 2019 · 17 comments
Closed

KeyError: 'module_list.85.Conv2d.weight' #657

Samjith888 opened this issue Nov 25, 2019 · 17 comments
Labels
question Further information is requested

Comments

@Samjith888
Copy link

Samjith888 commented Nov 25, 2019

Got the following error:

$ python train.py --data data/coco.data --cfg cfg/yolov3.cfg
Namespace(accumulate=2, adam=False, arc='default', batch_size=32, bucket='', cache_images=False, cfg='cfg/yolov3.cfg', data='data/coco.data', device='', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, name='', nosave=False, notest=False, prebias=False, rect=False, resume=False, transfer=False, var=None, weights='weights/ultralytics49.pt')
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1070', total_memory=8116MB)

Traceback (most recent call last):
  File "train.py", line 444, in <module>
    train()  # train normally
  File "train.py", line 111, in train
    chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
  File "train.py", line 111, in <dictcomp>
    chkpt['model'] = {k: v for k, v in chkpt['model'].items() if model.state_dict()[k].numel() == v.numel()}
KeyError: 'module_list.85.Conv2d.weight'
(base) 
@Samjith888 Samjith888 added the bug Something isn't working label Nov 25, 2019
@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 25, 2019

@Samjith888 your command automatically loads the ultralytics49.pt backbone, which requires yolov3-spp.cfg. You must remove the backbone by using --weights '', or specify a weights-cfg combination that is compatible.

This error is caused by a user supplying incompatible --weights and --cfg arguments. To solve this you must specify no weights (i.e. random initialization of the model) using --weights '' and any --cfg, or use a --cfg that is compatible with your --weights. If none are specified, the defaults are --weights ultralytics49.pt and --cfg cfg/yolov3-spp.cfg.

Compatible --weights --cfg combinations:

python3 train.py --weights yolov3.pt --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3.weights --cfg cfg/yolov3.cfg
python3 train.py --weights yolov3-spp.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights ultralytics49.pt --cfg cfg/yolov3-spp.cfg
python3 train.py --weights ultralytics68.pt --cfg cfg/yolov3-spp.cfg

To train from scratch (randomly initialized weights), use:

python3 train.py --weights '' --cfg cfg/*.cfg  # any cfg will work here

ultralytics49.pt is currently the highest performing YOLOv3 model (trained from scratch using this repo) available at the default img-size of 416 (see #310), which is the reason it is used as the default backbone.

@glenn-jocher glenn-jocher added question Further information is requested and removed bug Something isn't working labels Nov 25, 2019
@hanrui15765510320
Copy link

if i don't want pre_weights,how should i do?

@okanlv
Copy link

okanlv commented Nov 30, 2019

As @glenn-jocher said,

You must remove the backbone by using --weights ''

@hanrui15765510320
Copy link

thanks,bro

@daddydrac
Copy link

daddydrac commented Dec 2, 2019

I ran this: python3 train.py --data data/custom.data --cfg cfg/yolov3-spp-r.cfg

And got:

AssertionError: Target classes exceed model classes

What am I mising?

@glenn-jocher
Copy link
Member

I'll close this issue for now as the original issue appears to have been resolved, and/or no activity has been seen for some time. Feel free to comment if this is not the case.

@rohan-pradhan
Copy link

rohan-pradhan commented Jan 22, 2020

Hi guys,

I'm trying to train on a custom CFG (therefore should be using a random initialization of weights). I understand that to do this we should set --weights ''

Unfortunately, even when I do that, it keeps trying to download the weights and I get this error:
Exception: '' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0

This is the full command I am using to train:
python train.py --weights '' --cfg cfg/yolov3-custom.cfg --data data/coco1.data

Any help would be great - thanks!

@glenn-jocher
Copy link
Member

glenn-jocher commented Jan 23, 2020

@rohan-pradhan no space: --weights ''

$ python3 train.py --weights '' --data coco16.data

Namespace(accumulate=4, adam=False, arc='default', batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov3-spp.cfg', data='coco16.data', device='', epochs=273, evolve=False, img_size=[416], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, var=None, weights='')
Using CPU

Caching labels (16 found, 0 missing, 0 empty, 0 duplicate, for 16 images): 100%|█████████████████████████████| 16/16 [00:00<00:00, 2515.70it/s]
Caching labels (16 found, 0 missing, 0 empty, 0 duplicate, for 16 images): 100%|█████████████████████████████| 16/16 [00:00<00:00, 5567.35it/s]
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Using 8 dataloader workers
Starting training for 273 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     0/272        0G       7.7      13.3      7.87      28.9       211       416: 100%|██████████████████████████| 1/1 [01:05<00:00, 65.12s/it]
               Class    Images   Targets         P         R   mAP@0.5        F1:   0%|                                  | 0/1 [00:00<?, ?it/s]

@rohan-pradhan
Copy link

Thanks for the quick response, Glenn. Unfortunately, even when I copy and paste your command it still gives the same error.

`>python train.py --weights '' --data coco1.data
Namespace(accumulate=4, adam=False, arc='default', batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov3-spp.cfg', data='coco1.data', device='', epochs=273, evolve=False, img_size=416, img_weights=False, multi_scale=False, name='', nosave=False, notest=False, prebias=False, rect=False, resume=False, transfer=False, var=None, weights="''")
Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11264MB)

2020-01-23 11:02:59.119516: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Downloading https://pjreddie.com/media/files/''
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404 Not Found
'rm' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
  File "train.py", line 463, in <module>
    train()  # train normally
  File "train.py", line 108, in train
    attempt_download(weights)
  File "C:\Users\Rohan\Documents\Development\Thesis\yolov3\models.py", line 454, in attempt_download
    raise Exception(msg)
Exception: '' missing, try downloading from https://drive.google.com/open?id=1LezFG5g3BCW6iYaV89B2i64cqEUZD7e0`

Not sure why it is treating '' as a string.

@rohan-pradhan
Copy link

Figured it out! Changed it to --weights "" and it seemed to work.

Thanks again!

@glenn-jocher
Copy link
Member

@rohan-pradhan ah interesting. What's your OS?

@rohan-pradhan
Copy link

@glenn-jocher I'm running Windows 10 in a Conda environment (Anaconda Prompt).

@glenn-jocher
Copy link
Member

@rohan-pradhan hmm ok. Perhaps it's windows.

@sunset326
Copy link

sunset326 commented Oct 2, 2020

hi,guys
when i run python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights ""
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights ''
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights weights/yolov3.pt
python train.py --data data/rbc.data --cfg cfg/yolov3.cfg --weights weights/yolov3.weights

the same error occured,as follows.
my pytorch is 1.5.1 + torchvision 0.6.0

Traceback (most recent call last):
File "train.py", line 431, in
train(hyp) # train normally
File "train.py", line 164, in train
model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/frontend.py", line 339, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 228, in _initialize
handle = amp_init(loss_scale=properties.loss_scale, verbose=(_amp_state.verbosity == 2))
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/amp.py", line 101, in init
try_caching, verbose)
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/wrap.py", line 33, in cached_cast
if not utils.has_func(mod, fn):
File "/home/anaconda2/envs/Maskrcnn_Benchmark/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/utils.py", line 132, in has_func
if isinstance(mod, torch.nn.backends.backend.FunctionBackend):
AttributeError: module 'torch.nn' has no attribute 'backends
`

@glenn-jocher
Copy link
Member

@sunset326 update torch to latest version.

@sunset326
Copy link

@sunset326 update torch to latest version.

thx,brother
i have solved the problem,the requirement.txt says python > = 3.7, i update my python,and the problem doesn't occures.

@glenn-jocher
Copy link
Member

@sunset326 Great to hear that updating Python resolved the issue! If you have any more questions or run into further issues, feel free to ask. Happy training! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants