Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练过程中出现问题 #15

Closed
xzdong-2019 opened this issue Jun 8, 2023 · 6 comments
Closed

训练过程中出现问题 #15

xzdong-2019 opened this issue Jun 8, 2023 · 6 comments

Comments

@xzdong-2019
Copy link

[2023-06-08 07:37:55.523808 INFO ] trainer:train:310 - Test epoch: 99, time/epoch: 0:04:26.419727, best_mAP: 0.83875, mAP: 0.81693
[2023-06-08 07:37:55.524025 INFO ] trainer:train:312 - ======================================================================
[2023-06-08 07:37:56.038528 INFO ] trainer:__save_checkpoint:196 - 已保存模型:models/PPYOLOE_M/epoch_99
Traceback (most recent call last):
File "train.py", line 44, in
trainer.train(num_epoch=args.num_epoch,
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/trainer.py", line 300, in train
self.__train_epoch(max_epoch=num_epoch, epoch_id=epoch_id, log_interval=log_interval, local_rank=local_rank, writer=writer)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/trainer.py", line 204, in __train_epoch
output = self.model(data)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/meta_arch.py", line 53, in forward
out = self.get_loss()
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/yolo.py", line 46, in get_loss
return self._forward()
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/yolo.py", line 36, in _forward
yolo_losses = self.yolo_head(neck_feats, self.inputs)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 202, in forward
return self.forward_train(feats, targets)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 142, in forward_train
return self.get_loss([
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 308, in get_loss
self.assigner(
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 375, in _decorate_function
return func(*args, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/task_aligned_assigner.py", line 93, in forward
ious = iou_similarity(gt_bboxes, pred_bboxes)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/bbox_utils.py", line 42, in iou_similarity
x2y2 = paddle.minimum(px2y2, gx2y2)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/tensor/math.py", line 1008, in minimum
return _C_ops.minimum(x, y)
ValueError: (InvalidArgument) The 3-th dimension of input tensor is expected to be equal with the 3-th dimension of outputtensor 2 or 1, but received 0. (at /paddle/paddle/phi/kernels/funcs/broadcast_function.h:77)

@yeyupiaoling
Copy link
Owner

yeyupiaoling commented Jun 8, 2023 via email

@xzdong-2019
Copy link
Author

这是在训练过程中出现的,已经训练了100轮

@yeyupiaoling
Copy link
Owner

看报错是纬度问题,但是如果前面正常训练过的话,数据还有模型,其他的应该不会有问题。损失值有什么变化吗?

@xzdong-2019
Copy link
Author

对,报错是维度问题,损失值也下降了;但是感觉能训练应该之前的数据 应该没有问题~,就不知道哪里出错了,每次到100轮的时候就终止了

@xzdong-2019
Copy link
Author

再次启动时,还是出现以下问题:
declare_namespace(pkg)
----------- Configuration Arguments -----------
batch_size: 4
eval_anno_path: dataset/eval.json
image_dir: dataset/
learning_rate: 0.000125
log_interval: 500
model_type: M
num_classes: 4
num_epoch: 1000
num_workers: 8
pretrained_model: None
resume_model: None
save_model_path: models/
train_anno_path: dataset/train.json
use_gpu: True
use_random_crop: True
use_random_distort: True
use_random_expand: True
use_random_flip: True

True
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
W0609 16:01:00.625028 6134 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.1, Runtime API Version: 11.1
W0609 16:01:00.627727 6134 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
W0609 16:01:00.627743 6134 gpu_resources.cc:117] WARNING: device: 0. The installed Paddle is compiled with CUDA 11.2, but CUDA runtime version in your machine is 11.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[2023-06-09 16:01:02.739166 INFO ] trainer:train:285 - 训练数据:720
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.0.blocks.0.conv2.alpha. backbone.stages.0.blocks.0.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.0.blocks.1.conv2.alpha. backbone.stages.0.blocks.1.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.1.blocks.0.conv2.alpha. backbone.stages.1.blocks.0.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.1.blocks.1.conv2.alpha. backbone.stages.1.blocks.1.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.1.blocks.2.conv2.alpha. backbone.stages.1.blocks.2.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.1.blocks.3.conv2.alpha. backbone.stages.1.blocks.3.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.2.blocks.0.conv2.alpha. backbone.stages.2.blocks.0.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.2.blocks.1.conv2.alpha. backbone.stages.2.blocks.1.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.2.blocks.2.conv2.alpha. backbone.stages.2.blocks.2.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.2.blocks.3.conv2.alpha. backbone.stages.2.blocks.3.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.3.blocks.0.conv2.alpha. backbone.stages.3.blocks.0.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for backbone.stages.3.blocks.1.conv2.alpha. backbone.stages.3.blocks.1.conv2.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.0.weight. yolo_head.pred_cls.0.weight receives a shape [365, 576, 3, 3], but the expected shape is [4, 576, 3, 3].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.0.bias. yolo_head.pred_cls.0.bias receives a shape [365], but the expected shape is [4].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.1.weight. yolo_head.pred_cls.1.weight receives a shape [365, 288, 3, 3], but the expected shape is [4, 288, 3, 3].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.1.bias. yolo_head.pred_cls.1.bias receives a shape [365], but the expected shape is [4].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.2.weight. yolo_head.pred_cls.2.weight receives a shape [365, 144, 3, 3], but the expected shape is [4, 144, 3, 3].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:1652: UserWarning: Skip loading for yolo_head.pred_cls.2.bias. yolo_head.pred_cls.2.bias receives a shape [365], but the expected shape is [4].
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
[2023-06-09 16:01:02.990542 INFO ] trainer:__load_pretrained:151 - 成功加载预训练模型:pretrained_models/ppyoloe_crn_m_obj365_pretrained.pdparams
[2023-06-09 16:01:03.230492 INFO ] trainer:__load_checkpoint:169 - 成功恢复模型参数和优化方法参数:models/PPYOLOE_M/last_model
W0609 16:01:06.713155 6134 gpu_resources.cc:217] WARNING: device: . The installed Paddle is compiled with CUDNN 8.2, but CUDNN version in your machine is 8.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/nn/layer/norm.py:712: UserWarning: When training, we now always track global mean and variance.
warnings.warn(
Traceback (most recent call last):
File "train.py", line 44, in
trainer.train(num_epoch=args.num_epoch,
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/trainer.py", line 300, in train
self.__train_epoch(max_epoch=num_epoch, epoch_id=epoch_id, log_interval=log_interval, local_rank=local_rank, writer=writer)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/trainer.py", line 204, in __train_epoch
output = self.model(data)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/meta_arch.py", line 53, in forward
out = self.get_loss()
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/yolo.py", line 46, in get_loss
return self._forward()
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/yolo.py", line 36, in _forward
yolo_losses = self.yolo_head(neck_feats, self.inputs)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 202, in forward
return self.forward_train(feats, targets)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 142, in forward_train
return self.get_loss([
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/ppyoloe_head.py", line 308, in get_loss
self.assigner(
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 375, in _decorate_function
return func(*args, **kwargs)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/task_aligned_assigner.py", line 93, in forward
ious = iou_similarity(gt_bboxes, pred_bboxes)
File "/data/dongxz/competion/cv/PP-YOLOE/ppyoloe/model/bbox_utils.py", line 42, in iou_similarity
x2y2 = paddle.minimum(px2y2, gx2y2)
File "/data/anaconda3/envs/dongxz_paddlepaddle/lib/python3.8/site-packages/paddle/tensor/math.py", line 1008, in minimum
return _C_ops.minimum(x, y)
ValueError: (InvalidArgument) The 3-th dimension of input tensor is expected to be equal with the 3-th dimension of output tensor 2 or 1, but received 0. (at /paddle/paddle/phi/kernels/funcs/broadcast_function.h:77)

@yeyupiaoling
Copy link
Owner

yeyupiaoling commented Jun 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants