Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with load checkpoint #149

Closed
beebrain opened this issue Apr 8, 2017 · 22 comments
Closed

Problem with load checkpoint #149

beebrain opened this issue Apr 8, 2017 · 22 comments

Comments

@beebrain
Copy link

beebrain commented Apr 8, 2017

I have a litle some error with checkpoint.
When I train a model the program save check point in path "./ckpt/cfg/". It work if i load with " --load [numberstep]" but when I want to load last checkpoint with " --load -1 ". The program read checkpoint file in path "./ckpt/". In this path it don't have checkpoint file. The checkpoint file is in ./ckpt/cfg/.
image
The checkpoint isn't in this folder
image

when i load with "--load -1"
image

@venuktan
Copy link

venuktan commented Apr 8, 2017

Checkpoint file should be in the ckpt folder, move it back . It should work

@ghost
Copy link

ghost commented Apr 8, 2017

move files in ckpt/cfg to ckpt/checkpoint and try it
for example, move the files
yolo-bee-125.data-00000-00001
yolo-bee-125.index
yolo-bee-125.meta
yolo-bee-125-profile
to the folder ckpt/checkpoint
and try again using option --load 125 or --load -1

@beebrain
Copy link
Author

beebrain commented Apr 8, 2017

Thank you It's work.

@thtrieu thtrieu closed this as completed Apr 9, 2017
@geroge-gao
Copy link

I met the same problem as you,could you show me your entire command?

@anushabhura
Copy link

@beebrain can you share the full command which worked for you?

@anushabhura
Copy link

anushabhura commented Jan 24, 2020

step 124 - loss 60.73698425292969 - moving ave loss 64.46739123826117
Finish 124 epoch(es)
step 125 - loss 60.53150177001953 - moving ave loss 64.073802291437
Traceback (most recent call last):
File "flow", line 6, in
cliHandler(sys.argv)
File "C:\Users\intel\darkflow\cli.py", line 33, in cliHandler
print('Enter training ...'); tfnet.train()
File "C:\Users\intel\darkflow\net\flow.py", line 65, in train
if not ckpt: _save_ckpt(self, *args)
File "C:\Users\intel\darkflow\net\flow.py", line 21, in _save_ckpt
with open(profile, 'wb') as profile_ckpt:
FileNotFoundError: [Errno 2] No such file or directory: './ckpt/cfg/tiny-yolo-vo
c-4c-125.profile'

For me, it's working till 125 epochs and then this error is coming.
How checkpoint file is created through program?
I didn't get it

@AsithaIndrajith
Copy link

@anushabhura This can be probably your computer ran out of storage and it cannot create a checkpoint at 125th step. You can use Google co-lab with your google drive.

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@beebrain
Copy link
Author

beebrain commented Feb 6, 2020

#869 I tried this too but this command (python flow --model cfg/tiny-yolo-voc-1c.cfg --load -1 --savepb) is giving me error. Traceback (most recent call last): File "flow", line 6, in cliHandler(sys.argv) File "C:\Users\intel\darkflow\cli.py", line 26, in cliHandler tfnet = TFNet(FLAGS) File "C:\Users\intel\darkflow\net\build.py", line 88, in init self.setup_meta_ops() File "C:\Users\intel\darkflow\net\build.py", line 163, in setup_meta_ops if self.FLAGS.load != 0: self.load_from_ckpt() File "C:\Users\intel\darkflow\net\help.py", line 23, in load_from_ckpt with open(self.FLAGS.backup + 'checkpoint', 'r') as f: PermissionError: [Errno 13] Permission denied: './ckpt/checkpoint' In anaconda jupyter, ckpt folder has been created by program but is not creating any file inside ckpt.

On Thu, Feb 6, 2020 at 12:39 AM K.K.D.A.K.Indrajith < @.***> wrote: @anushabhura https://github.com/anushabhura This can be probably your computer ran out of storage and it cannot create a checkpoint at 125th step. You can use Google co-lab with your google drive. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#149?email_source=notifications&email_token=AKUL53NE6WNCIHECN6PG2D3RBMFHLA5CNFSM4DG6R6H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4TNMI#issuecomment-582563505>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUL53L6OC24H7MUHYXU5JLRBMFHLANCNFSM4DG6R6HQ .

Could you show an example command when you executed your code?
I think, your script need permission to write the checkpoint file.

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@beebrain
Copy link
Author

beebrain commented Feb 6, 2020

I think this file tiny-yolo-voc-1c-100.profile should be created but my ckpt folder is empty every time i do training .

On Thu, Feb 6, 2020 at 10:21 AM Anusha Bhura @.> wrote: python flow --model cfg/tiny-yolo-voc-1c.cfg --load tiny-yolo-voc-.weights --train --annotation darkflow/annotation --dataset darkflow/image_files --epoch 250 I am using above command and its again giving me this checkpoint error . Finish 97 epoch(es) step 98 - loss 106.16637420654297 - moving ave loss 106.27569753284817 Finish 98 epoch(es) step 99 - loss 106.19925689697266 - moving ave loss 106.26805346926062 Finish 99 epoch(es) step 100 - loss 106.1893081665039 - moving ave loss 106.26017893898495 Traceback (most recent call last): File "flow", line 6, in cliHandler(sys.argv) File "C:\Users\intel\darkflow\cli.py", line 33, in cliHandler print('Enter training ...'); tfnet.train() File "C:\Users\intel\darkflow\net\flow.py", line 66, in train if not ckpt: _save_ckpt(self, args) File "C:\Users\intel\darkflow\net\flow.py", line 21, in _save_ckpt with open(profile, 'wb') as profile_ckpt: FileNotFoundError: [Errno 2] No such file or directory: './ckpt/cfg/tiny-yolo-vo c-1c-100.profile' On Thu, Feb 6, 2020 at 10:14 AM Pisit Nakjai @.> wrote: > #869 <#869> I tried this too > but this command (python flow --model cfg/tiny-yolo-voc-1c.cfg --load -1 > --savepb) is giving me error. Traceback (most recent call last): File > "flow", line 6, in cliHandler(sys.argv) File > "C:\Users\intel\darkflow\cli.py", line 26, in cliHandler tfnet = > TFNet(FLAGS) File "C:\Users\intel\darkflow\net\build.py", line 88, in > init self.setup_meta_ops() File > "C:\Users\intel\darkflow\net\build.py", line 163, in setup_meta_ops if > self.FLAGS.load != 0: self.load_from_ckpt() File > "C:\Users\intel\darkflow\net\help.py", line 23, in load_from_ckpt with > open(self.FLAGS.backup + 'checkpoint', 'r') as f: PermissionError: [Errno > 13] Permission denied: './ckpt/checkpoint' In anaconda jupyter, ckpt folder > has been created by program but is not creating any file inside ckpt. > … <#m_3992249403181806711_m_5662571605862279490_> > On Thu, Feb 6, 2020 at 12:39 AM K.K.D.A.K.Indrajith < @.**> wrote: > @anushabhura https://github.com/anushabhura > https://github.com/anushabhura This can be probably your computer ran > out of storage and it cannot create a checkpoint at 125th step. You can use > Google co-lab with your google drive. — You are receiving this because you > were mentioned. Reply to this email directly, view it on GitHub <#149 > <#149>?email_source=notifications&email_token=AKUL53NE6WNCIHECN6PG2D3RBMFHLA5CNFSM4DG6R6H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4TNMI#issuecomment-582563505>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AKUL53L6OC24H7MUHYXU5JLRBMFHLANCNFSM4DG6R6HQ > . > > Could you show an example command when you executed your code? > I think, your script need permission to write the checkpoint file. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#149?email_source=notifications&email_token=AKUL53ISUS62J4ZYEWMWKTTRBOIURA5CNFSM4DG6R6H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK55G3I#issuecomment-582734701>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AKUL53OHS34AJH4EKRPSKDTRBOIURANCNFSM4DG6R6HQ > . >

yes, The checkpoint file should be created in the folder but It needs permission to create a checkpoint file. Please check the permission to create.

@beebrain
Copy link
Author

beebrain commented Feb 6, 2020

May I see your folder in /ckpt/ path.

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@beebrain
Copy link
Author

beebrain commented Feb 6, 2020

@anushabhura Sorry, I can't see your attached file.
What is your OS system?

@anushabhura
Copy link

anushabhura commented Feb 6, 2020 via email

@beebrain
Copy link
Author

beebrain commented Feb 7, 2020

Check your jupyter is working in your current path. run pwd on the block command in jupyter notebook.

@anushabhura
Copy link

anushabhura commented Feb 7, 2020 via email

@anushabhura
Copy link

anushabhura commented Feb 7, 2020 via email

@beebrain
Copy link
Author

beebrain commented Feb 10, 2020

Do I have to change the working directory so that checkpoint files can be saved?

On Fri, Feb 7, 2020 at 10:54 AM Anusha Bhura @.> wrote: It's " C:\users\intel. On Fri 7 Feb, 2020, 8:43 AM Pisit Nakjai, @.> wrote: > Check your jupyter is working in your current path. run pwd on the block > command in jupyter notebook. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#149?email_source=notifications&email_token=AKUL53JKJETJI24ZH3RDZJDRBTGT3A5CNFSM4DG6R6H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELBSKAY#issuecomment-583214339>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AKUL53P7A4CG6WL5XMTLU63RBTGT3ANCNFSM4DG6R6HQ > . >

Yes you should

@anushabhura
Copy link

anushabhura commented Feb 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants