Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard does not correctly show steps after continuation of experiment #734

Open
kocmitom opened this issue Jul 11, 2018 · 16 comments
Open
Labels

Comments

@kocmitom
Copy link
Contributor

As discussed earlier today:
After continuation the tensorboard starts counting from zero. It is due to a variable seen_instances:

https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/learning_utils.py#L523

@kocmitom kocmitom added the bug label Jul 11, 2018
@varisd
Copy link
Member

varisd commented Jul 13, 2018

Seems like the easiest way would be replacing the seen_instances by global_step variable. This would change the interpretation of the logging_period/validation_period attributes.

@jlibovicky
Copy link
Contributor

Global step and seen instances are different quantities on purpose. Logging based on seen instances allows us to compare learning curves from training that has different batch size. On the other hand, optimizer need to use the global steps and not number of processed data items.

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@varisd
Copy link
Member

varisd commented Jul 13, 2018

I don't think that comparing learning curves with regards to number of seen instances is relevant. We do not care after how many sentences we got a certain improvement, only after how many updates (batches) we got to a certain point.

There is a huge difference between doing 64 updates with batch_size 1 (many updates, probably in various directions) than one update with batch_size 64 (note that the size of the update might be similar to the one from batch_size 1). That is, the size of each update is usually normalized (we use mean loss) only the direction should be (in theory) better estimated with larger batch.

So it is IMO more reasonable to compare convergence with regards to number of updates than number of seen instances. Alternatively, you can use relative time for another comparison.

@varisd
Copy link
Member

varisd commented Jul 13, 2018

Also, as far as model comparison goes:
If your not measuring the effect of batch_size on your models, it is a bad idea to have different batch_sizes between runs you want to compare (which basically means that it does not matter whether steps are counted as global steps or number of seen instances).

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@varisd
Copy link
Member

varisd commented Jul 13, 2018

The original point was replacing seen_instances by global_step which would pretty much simplify the current implementation.

And that sticking with seen_instances is not justified enough.

@jlibovicky
Copy link
Contributor

Indeed, what I wanted to say was: do not discard seen instances by replacing them by global step.

And regarding the off topic discussion: why do you think seen instances do not matter? It is actually the only measure of training progress that is comparable across training runs. If really the number of seen instances did not matter and the only important think was the number of updates, then you can do batch of 1 or 2 and many updates for lowest cost possible and this is simply not true. Relative time is different on different machines. We can plot both, if you think it is important, but I would prefer to stick with the number of processed instances.

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@varisd
Copy link
Member

varisd commented Jul 13, 2018

Number of seen instances is not comparable across runs with different batch_size is not comparable because you cannot tell how much the difference between runs is influenced by the batching itself.

Simply put, try running two experiments with exactly the same parameters (except for the batch_size) and compare their performance at the exact same moment (after seeing N training instances). If they are comparable, the results must be same. They will most likely not. (Same applies for the global step.)

Bottom-line: Experiments with different batch_size parameter are not comparable and it is not a good idea to try to compare them. The batch_size comparison argument for keeping seen_instances is therefore week, however, there are arguments againts it (and for global_step):

  1. Throwing away seen_instances (and using) simplifies the code and logging in general. It also solves the bug (since global_step is a saved variable).
  2. Other TF toolkits use global_step (at least t2t, as far as I know). Which means easier comparison.

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@varisd
Copy link
Member

varisd commented Jul 13, 2018

Well, you just said that you can imagine both scenarios (without saying which one is better as long as you stay consistent). In my post, I presented arguments for switching to global_step and banishing seen_instances, so I am getting a little bit lost.

We can keep this issue open a leave it to the offline discussion later.

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@jlibovicky
Copy link
Contributor

If you don't remember, at the very beginning we logged in TB using the global step only and we had hard time to at least approximately compare runs with different batch sizes. Model with double batch size just to converge two times as fast, but that was not true at all. To be able to least somehow compare them, we introduced this number of processed instances. I am not claiming it is a very rigorous way of comparison, but it is at least some way of comparison.

@jindrahelcl
Copy link
Member

jindrahelcl commented Jul 13, 2018 via email

@kocmitom
Copy link
Contributor Author

kocmitom commented Jul 15, 2018

I would also say, that having both metrics (seen examples as well as steps) would be the best option. But either way, after starting the continuation of the experiment, the curve should stay at correct step/seen_examples not starting from 0, otherwise, it is a mess in tensorboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants