-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorboard does not correctly show steps after continuation of experiment #734
Comments
Seems like the easiest way would be replacing the seen_instances by global_step variable. This would change the interpretation of the logging_period/validation_period attributes. |
Global step and seen instances are different quantities on purpose. Logging based on seen instances allows us to compare learning curves from training that has different batch size. On the other hand, optimizer need to use the global steps and not number of processed data items. |
Therefore, we need another variable for logging to TB to have nice
continuous curves.
But it think seen_instances is suitable as well. The only thing you need is
to be able to setup an initial value for this var in the config (or
somehow automatically).
Dne pá 13. 7. 2018 19:48 uživatel Jindřich Libovický <
notifications@github.com> napsal:
… Global step and seen instances are different quantities on purpose.
Logging based on seen instances allows us to compare learning curves from
training that has different batch size. On the other hand, optimizer need
to use the global steps and not number of processed data items.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcsw0myjeLg9Dl-kSfMbQ1ikpDVIEVks5uGN2HgaJpZM4VLLJ1>
.
|
I don't think that comparing learning curves with regards to number of seen instances is relevant. We do not care after how many sentences we got a certain improvement, only after how many updates (batches) we got to a certain point. There is a huge difference between doing 64 updates with batch_size 1 (many updates, probably in various directions) than one update with batch_size 64 (note that the size of the update might be similar to the one from batch_size 1). That is, the size of each update is usually normalized (we use mean loss) only the direction should be (in theory) better estimated with larger batch. So it is IMO more reasonable to compare convergence with regards to number of updates than number of seen instances. Alternatively, you can use relative time for another comparison. |
Also, as far as model comparison goes: |
It seems you are missing the original point of this issue
Dne pá 13. 7. 2018 20:15 uživatel Dušan Variš <notifications@github.com>
napsal:
… Also, as far as model comparison goes:
If your not measuring the effect of batch_size on your models, it is a bad
idea to have different batch_sizes between runs you want to compare (which
basically means that it does not matter whether steps are counted as global
steps or number of seen instances).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs0nGw7z4RgDiBP2FhpNC688hYwPJks5uGOPOgaJpZM4VLLJ1>
.
|
The original point was replacing seen_instances by global_step which would pretty much simplify the current implementation. And that sticking with seen_instances is not justified enough. |
Indeed, what I wanted to say was: do not discard seen instances by replacing them by global step. And regarding the off topic discussion: why do you think seen instances do not matter? It is actually the only measure of training progress that is comparable across training runs. If really the number of seen instances did not matter and the only important think was the number of updates, then you can do batch of 1 or 2 and many updates for lowest cost possible and this is simply not true. Relative time is different on different machines. We can plot both, if you think it is important, but I would prefer to stick with the number of processed instances. |
No. The original issue here is that after you run a continuation of an
experiment, the curves in tensorboard start at x=0. Instead, they should
start with x=(seen_instances from the previous run). Step is number of
batches and does nothing to do with tensorboard's x-axis which is
determined by the seen_instances variable.
Dne pá 13. 7. 2018 20:29 uživatel Dušan Variš <notifications@github.com>
napsal:
… The original point was replacing seen_instances by global_step which would
pretty much simplify the current implementation
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs8QF8k6grV7tj7MjYGBlT38qz3GTks5uGOcIgaJpZM4VLLJ1>
.
|
Number of seen instances is not comparable across runs with different batch_size is not comparable because you cannot tell how much the difference between runs is influenced by the batching itself. Simply put, try running two experiments with exactly the same parameters (except for the batch_size) and compare their performance at the exact same moment (after seeing N training instances). If they are comparable, the results must be same. They will most likely not. (Same applies for the global step.) Bottom-line: Experiments with different batch_size parameter are not comparable and it is not a good idea to try to compare them. The batch_size comparison argument for keeping seen_instances is therefore week, however, there are arguments againts it (and for global_step):
|
Imagine you are doing a benchmark series of experiments that you train for
one epoch. You almost certainly want your learning curves to have the same
length, independently of any hyperparameter, including the batch size.
That said, I could imagine a scenario in which you want to have batches on
the x axis instead of instances. However, you must agree that if such
scenario exists, there is another scenario where you want exactly the
opposite behavior.
That also said, in both scenarios, you want your learning curves to
continue from the correct x-axis value when you are running a continuation
of an experiment. Right now, this means we need to allow tweaking the
seen_instances value, and not the global step.
Dne 13. 7. 2018 8:57 odp. napsal uživatel "Dušan Variš" <
notifications@github.com>:
Number of seen instances is not comparable across runs with different
batch_size is not comparable because you cannot tell how much the
difference is influenced by the batching itself.
Simply put, try running two experiments with exactly the same parameters
(except for the batch_size) and compare their performance at the exact same
moment (after seeing N training instances). If they are comparable, the
results must be same. They will most likely not. (Same applies for the
global step.)
Bottom-line: Experiments with different batch_size parameter are not
comparable and it is not a good idea to try to compare them. The batch_size
comparison argument for keeping seen_instances is therefore week, however,
there are arguments againts it (and for global_step):
1. Throwing away seen_instances (and using) simplifies the code and
logging in general. It also solves the bug (since global_step is a saved
variable).
2. Other TF toolkits use global_step (at least t2t, as far as I know).
Which means easier comparison.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs4iKdGU-Q9bTfv8BkZM8Z4x_gDcRks5uGO2EgaJpZM4VLLJ1>
.
|
Well, you just said that you can imagine both scenarios (without saying which one is better as long as you stay consistent). In my post, I presented arguments for switching to global_step and banishing seen_instances, so I am getting a little bit lost. We can keep this issue open a leave it to the offline discussion later. |
This issue should be solved now, by allowing to set initial value for seen
instances.
Discussions may happen in the future.
Dne pá 13. 7. 2018 21:23 uživatel Dušan Variš <notifications@github.com>
napsal:
… Well, you just said that you can imagine both scenarios (without saying
which one is better as long as you stay consistent). In my post, I
presented arguments for switching to global_step and banishing
seen_instances, so I am getting a little bit lost.
We can keep this issue open a leave it to the offline discussion later.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs7pQCPGBpWJsbZUPNWU_Qcgpfqubks5uGPOYgaJpZM4VLLJ1>
.
|
If you don't remember, at the very beginning we logged in TB using the global step only and we had hard time to at least approximately compare runs with different batch sizes. Model with double batch size just to converge two times as fast, but that was not true at all. To be able to least somehow compare them, we introduced this number of processed instances. I am not claiming it is a very rigorous way of comparison, but it is at least some way of comparison. |
The best way would be to provide both of these values to tensorboard and
then switch between views, just as you swicth between step, wall time, and
relative time.
Dne pá 13. 7. 2018 21:41 uživatel Jindřich Libovický <
notifications@github.com> napsal:
… If you don't remember, at the very beginning we logged in TB using the
global step only and we had hard time to at least approximately compare
runs with different batch sizes. Model with double batch size just to
converge two times as fast, but that was not true at all. To be able to
least somehow compare them, we introduced this number of processed
instances. I am not claiming it is a very rigorous way of comparison, but
it is at least some way of comparison.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#734 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs2lER0CmTKRAfIlHurpQswP-1XoNks5uGPfmgaJpZM4VLLJ1>
.
|
I would also say, that having both metrics (seen examples as well as steps) would be the best option. But either way, after starting the continuation of the experiment, the curve should stay at correct step/seen_examples not starting from 0, otherwise, it is a mess in tensorboard. |
As discussed earlier today:
After continuation the tensorboard starts counting from zero. It is due to a variable seen_instances:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/learning_utils.py#L523
The text was updated successfully, but these errors were encountered: