-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors with Train.py #25
Comments
Hi @luke2997 It seems that you have not put the GPU ids as a string. If using GPUs 0 and 1, use:
Please let us know if this fixes the issue. |
@simongraham - this doesn't fix the problem unfortunately. |
Can you let me know what tensorpack version you are using and then copy the command that you use in the terminal. |
The output is telling you that there is already a checkpoint file where you plan to save your logs. You need to press k (keep), d (delete) or q (quit) depending on what you want to do. I'm still not sure exactly what you are requesting here? Are you getting an error? If so, please supply the terminal output. |
Yeah i did delete it, i think I now realise the source of the issue being a couple of modules are not importing properly due to libpng12 ... i'll update when I get this resolved. |
I will close this issue for now. Please reopen if necessary, with a specific question- then we can be of more assistance. |
@luke2997 , for the record, you can reopen it by yourself. hover_net/src/extract_patches.py Lines 27 to 28 in 909ef03
Also, please provide as many details as possible for what you have changed in the code compared to the github version. |
Well I only changed the paths which is why I didn't understand why the error came. So I have the same lines as those above. I had a look and tried running config.py and it seems I get an error related to #12 (comment). So it seems it is an error with my paths, although it's the same paths i used before which then worked! One thing that may be causing it is when i first run the code I get this error:
And I export it using
which then fixes it but causes the error above. |
That is very strange, may be your new environment broke something. For now, you can try change this hover_net/src/misc/patch_extractor.py Line 82 in 909ef03
into |
Also, the preferred library version is listed here https://github.com/vqdang/hover_net/blob/master/requirements.txt So you may want to check if it matched. In case you need to reinstall, use the following as guideline.
|
Perfect thanks a lot! Changing that line of code fixed it and I was able to successfully extract patches. However, I have a few errors with train.py suggesting TF GPU is not working, so I will try create a new virtual environment as you suggested and try again, anyway as perhaps there was a fault using tensorflow gpu. Although it is a bit of a pain getting these packages installed all together for some reason.
|
Hi @luke2997 , As @vqdang suggested- please setup your environment from scratch to ensure there are no issues with library versions.
Installing the libraries should be simple and easy enough if you follows @vqdang 's instructions. This is only a few lines in the terminal. Of course you need to make sure you run the commands separately- line by line. After this has been done let us know and we can advise how to proceed. Whaat CUDA version do you have installed? |
I'm in Mainland China so some channels get blocked, e.g. using the command above doesn't work for tensorflow, but anyway I do have all the requirements I believe and I have restarted and now got a little further, I assume GPU is working now! However I do have an output but get the long below error. Also I have Cudatoolkit 9.2 and cudnn 7.6.5.
|
It looks like you are using python 2.7. As stated in the requirements you need to use python 3.6. |
I mean the virtual environment is fully set up with python 3.6..... however it seems to be initialising with 2.7 as you said. I'll try see how I can fix this. Yeah - it seems one of the packages i installed manually through conda has simultaneously downgraded python. Will update. |
Right, thanks a lot for the help I appreciate it, I've successfully trained the data after changing the Python Version! |
I've extracted image patches successfully, however I get the following error when running train.py. Any ideas?
File "train.py", line 184, in run_once logger.set_logger_dir(save_dir) File "../tensorpack/utils/logger.py", line 131, in set_logger_dir action = input("Select Action: k (keep) / d (delete) / q (quit):").lower().strip()EOFError: EOF when reading a line
Any idea here? This is inside my log file:
^[[32m[0303 01:56:20 @logger.py:90]^[[0m Argv: train.py --gpu=0,1^[[32m[0303 01:56:39 @training.py:50]^[[0m [DataParallel] Training a model of 2 towers.^[[32m[0303 01:56:39 @interface.py:31]^[[0m Automatically applying QueueInput on the DataFlow.^[[32m[0303 01:56:39 @interface.py:43]^[[0m Automatically applying StagingInput on the DataFlow.
The text was updated successfully, but these errors were encountered: