Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in save model during training #17

Open
godspeed1989 opened this issue Oct 22, 2018 · 6 comments
Open

Error in save model during training #17

godspeed1989 opened this issue Oct 22, 2018 · 6 comments

Comments

@godspeed1989
Copy link

I try to train model with cmd

python pytorch/train.py train --config_path=./configs/car_test.config --model_dir=./predicts

but encountered following error message

  File "pytorch/train.py", line 396, in train
    net.get_global_step())
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 173, in save_models
    save(model_dir, model, name, global_step, max_to_keep, keep_latest)
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 107, in save
    os.remove(str(Path(model_dir) / ckpt_to_delete))
FileNotFoundError: [Errno 2] No such file or directory: 'predicts/predicts/voxelnet-2487.tckpt'

The path seems incorrect which leads to the error of removing.

@Benzlxs
Copy link

Benzlxs commented Oct 23, 2018

Does the directory, ./predicts, exist or not?
Make sure "/path/to/model_dir" doesn't exist if you want to train new model. A new directory will be created if the model_dir doesn't exist, otherwise will read checkpoints in it.

@godspeed1989
Copy link
Author

The directory ./predicts is a new directory created by train.py
But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

@traveller59
Copy link
Owner

try to use absolute path for model dir, i will attempt to fix relative path problem later.

@gujiaqivadin
Copy link

The directory ./predicts is a new directory created by train.py
But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.

@zmlll
Copy link

zmlll commented Jan 2, 2021

The directory ./predicts is a new directory created by train.py
But the path predicts/predicts/voxelnet-2487.tckpt is incorrect. The correct path should be predicts/voxelnet-2487.tckpt.

Hello, Did you solve the problem of path? I meet the same problem with you, and have no idea how to sovle it. It turned out to be "model16/model16/voxelnet-2220.tckpt", but "model16/voxelnet-2220.tckpt" is the correct path.

hi,i meet the same problem ,have you solved this problem? how to deal with it ? please give me an answer ,thanks

@huangbinz
Copy link

I meet the same problem and solved it, record to help others.

modify line 100 of second.pytorch/torchplus/train/checkpoint.py from ckpt_to_delete = all_ckpts.pop(0) to ckpt_to_delete = Path(all_ckpts.pop(0)).name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants