Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics are not displayed during training #9136

Open
DanilKonon opened this issue Aug 21, 2020 · 8 comments
Open

metrics are not displayed during training #9136

DanilKonon opened this issue Aug 21, 2020 · 8 comments

Comments

@DanilKonon
Copy link

Hi

I install tensorflow 2.2, and use efficientdet_d2_coco17_tpu-32.
I managed to start training this model with this command:

!python3 ./models/research/object_detection/model_main_tf2.py   \
            --pipeline_config_path=./pipeline.config  \
            --model_dir=./efficient_det  \
            --batch_size=4 \
            --num_train_steps=150_000   --sample_1_of_n_eval_examples=4     --alsologtostderr

But while training it outputs only loss without metrics evaluated on eval set as it was in Tensorflow Object Detection 1.

I0821 09:23:17.842881 140282024908672 model_lib_v2.py:652] Step 16800 per-step time 1.223s loss=0.788
INFO:tensorflow:Step 16900 per-step time 1.076s loss=0.503
I0821 09:25:09.853582 140282024908672 model_lib_v2.py:652] Step 16900 per-step time 1.076s loss=0.503
INFO:tensorflow:Step 17000 per-step time 1.252s loss=1.163
I0821 09:27:01.702962 140282024908672 model_lib_v2.py:652] Step 17000 per-step time 1.252s loss=1.163
INFO:tensorflow:Step 17100 per-step time 1.072s loss=0.916
I0821 09:28:55.677012 140282024908672 model_lib_v2.py:652] Step 17100 per-step time 1.072s loss=0.916
INFO:tensorflow:Step 17200 per-step time 1.138s loss=0.819

How can I see my metrics?

Also, in train folder there are events file. They are initialised in the beginning, and then nothing is happening with them. How can I update events to see model metrics and progress in Tensorboard?

@saikumarchalla saikumarchalla self-assigned this Aug 22, 2020
@saikumarchalla saikumarchalla added the models:research models that come under research directory label Aug 22, 2020
@dinis-rodrigues
Copy link

Yup, same issue here. in TF 1 this is works properly, while in TF 2 it does not...

@TolgaBkm
Copy link

I also have the same issue. I have to stop the training once in a while, run evaluation manually and resume the training process afterwards.

@ecatkins
Copy link

ecatkins commented Sep 1, 2020

Also hitting the same issue trying to port my code over from TF1 to TF2. I previously was using Weights & Biases to sync to Tensorboard so that I could monitor the progress of training... and now not sure what to do

@dinis-rodrigues
Copy link

Just checked related issues, it seems that evaluation while training, as we did in TF 1 is not supported with TF2's model_main_tf2.py

@cl886699
Copy link

cl886699 commented Sep 3, 2020

me too

@qraleq
Copy link

qraleq commented Sep 16, 2020

Hi, any update on this issue?

@LaraNeves
Copy link

I found this tutorial to be really useful to get evaluation on tensorboard while training the model with TF2. Check that for more details but basically you have to run your model_main_tf2.py script in parallel, one for training with the training dataset the other for evaluating with the validation dataset. You can either use 2 GPUs or if you have only one, use GPU for training, CPU for evaluating - it's explained how in the tutorial.

@DanilKonon
Copy link
Author

I understand that we can run in parallel two scripts. But what should I do if I run everything in Colab? I thought most everyone uses Colab here...

@jaeyounkim jaeyounkim added models:research:odapi ODAPI and removed models:research models that come under research directory labels Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests