Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练自己数据集 #78

Open
8umpk1n opened this issue Nov 1, 2022 · 7 comments
Open

训练自己数据集 #78

8umpk1n opened this issue Nov 1, 2022 · 7 comments

Comments

@8umpk1n
Copy link

8umpk1n commented Nov 1, 2022

你好,我是一个刚接触点云的小白,没有找到太多训练方面的资料。我在训练自己数据集时,将ShapeNet-55文件中的点云文件和txt文件都换成了我自己的数据文件,但训练时出现以下问题,想请教以下自己点云数据应该如何规范化与ShapeNet-55中的数据对齐呢 qwq
[DATASET] Open file data/ShapeNet55-34/ShapeNet-55/train.txt
[DATASET] 4 instances were loaded
[DATASET] Open file data/ShapeNet55-34/ShapeNet-55/test.txt
[DATASET] 10518 instances were loaded
2022-11-01 20:36:45,313 - MODEL - INFO - Transformer with knn_layer 1
2022-11-01 20:36:47,342 - PoinTr - INFO - Using Data parallel ...
/home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
/home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/home/a/PoinTr/tools/runner.py", line 142, in run_net
train_writer.add_scalar('Loss/Epoch/Sparse', losses.avg(0), epoch)
File "/home/a/PoinTr/utils/AverageMeter.py", line 42, in avg
return self._sum[idx] / self._count[idx]
ZeroDivisionError: division by zero

@yuxumin
Copy link
Owner

yuxumin commented Nov 1, 2022

这个问题看起来并不是数据的问题吧,感觉是你的averagemeter没有计数,导致count是0,所以出的是报不能除0的错误。

  • 你自己跳过了training?
  • bs > 4了,导致drop last把training阶段跳过了吧?

@8umpk1n
Copy link
Author

8umpk1n commented Nov 2, 2022

太感谢了!是的,确实是这样,我把bs改成4,解决了这个问题,但训练过程出现了这个问题,它生成了pth文件,但是在接下来的epoch中出现了以下问题,请问这是怎么会事呢?怎么解决或者说对我训练结果有什么影响呢 qaq

2022-11-02 19:13:39,163 - PoinTr - INFO - Using Data parallel ...
2022-11-02 19:13:40,184 - PoinTr - INFO - [Epoch 0/2][Batch 1/1] BatchTime = 1.013 (s) DataTime = 0.097 (s) Losses = ['106.8453', '200.6035'] lr = 0.000500
/home/a/anaconda3/envs/10.1/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
2022-11-02 19:13:40,214 - PoinTr - INFO - [Training] EPOCH: 0 EpochTime = 1.043 (s) Losses = ['106.8453', '200.6035']
2022-11-02 19:13:47,207 - PoinTr - INFO - Save checkpoint at ./experiments/PoinTr/ShapeNet55_models/e/ckpt-last.pth
2022-11-02 19:13:47,612 - PoinTr - INFO - Save checkpoint at ./experiments/PoinTr/ShapeNet55_models/e/ckpt-epoch-000.pth
2022-11-02 19:13:47,925 - PoinTr - INFO - [Epoch 1/2][Batch 1/1] BatchTime = 0.312 (s) DataTime = 0.100 (s) Losses = ['115.6831', '206.9640'] lr = 0.000500
2022-11-02 19:13:47,961 - PoinTr - INFO - [Training] EPOCH: 1 EpochTime = 0.347 (s) Losses = ['115.6831', '206.9640']
2022-11-02 19:13:47,961 - PoinTr - INFO - [VALIDATION] Start validating epoch 1
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/home/a/PoinTr/tools/runner.py", line 149, in run_net
metrics = validate(base_model, test_dataloader, epoch, ChamferDisL1, ChamferDisL2, val_writer, args, config, logger=logger)
File "/home/a/PoinTr/tools/runner.py", line 217, in validate
input_pc = misc.get_ptcloud_img(input_pc)
File "/home/a/PoinTr/utils/misc.py", line 190, in get_ptcloud_img
ax = fig.gca(projection=Axes3D.name, adjustable='box')
TypeError: gca() got an unexpected keyword argument 'projection'

@yuxumin
Copy link
Owner

yuxumin commented Nov 2, 2022

更新了misc.py,应该可以了

@8umpk1n
Copy link
Author

8umpk1n commented Nov 3, 2022

可能就是我的数据集有问题,我用obj文件转化成的txt又转化成的npy,训练报错如下,但我用原始数据时却没问题,不知道我数据集制作过程中错误出现在那里。
2022-11-03 11:05:22,927 - PoinTr - INFO - [VALIDATION] Start validating epoch 1
2022-11-03 11:05:24,236 - PoinTr - INFO - [Validation] EPOCH: 1 Metrics = ['0.2723', '1687.6722', '14483.1594']
2022-11-03 11:05:24,236 - PoinTr - INFO - ============================ TEST RESULTS ============================
2022-11-03 11:05:24,236 - PoinTr - INFO - Taxonomy #Sample F-Score CDL1 CDL2 #ModelName
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/home/a/PoinTr/tools/runner.py", line 149, in run_net
metrics = validate(base_model, test_dataloader, epoch, ChamferDisL1, ChamferDisL2, val_writer, args, config, logger=logger)
File "/home/a/PoinTr/tools/runner.py", line 260, in validate
msg += shapenet_dict[taxonomy_id] + '\t'
KeyError: '1'

@8umpk1n
Copy link
Author

8umpk1n commented Nov 3, 2022

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。
或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

@Zhengchao97201
Copy link

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。 或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

请问你用自己的数据集对应ShapeNet-55时是怎么统一点数的,是修改了网络还是把自己数据集的点数进行了限制?

@Rogerlv51
Copy link

我想简单的训练一个模型,然后点云补全可视化,我在尝试一下PNC模型。 或者说这些点云数据在制作时不需要什么统一处理吧!谢谢!

请问你用自己的数据集对应ShapeNet-55时是怎么统一点数的,是修改了网络还是把自己数据集的点数进行了限制?

下采样到一致点数即可,我记得代码中也有现成的FPS操作,可以去看一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants