another ValidationMonitor with validation(+early stopping) per epoch #133

alanyuchenhou · 2016-03-09T00:09:38Z

From what I understand, the existing ValidationMonitor performs validation every [print_steps] steps, and checks for stop condition every [early_stopping_rounds] steps. I'd like to add another ValidationMonitor that performs validation once and checks for stoping condition once every epoch. Is this the recommended practice in machine learning regarding validation and early stopping? I mean I'd like to add a fit process something like this:

def fit(self, x_train, y_train, x_validate, y_validate):
    while (current_validation_loss < previous_validation_loss):
        estimator.train_one_more_epoch(x_train, y_train)
        previous_validation_loss = current_validation_loss
        current_validation_loss = some_error(y_validate, estimator.predict(x_validate))

alanyuchenhou · 2016-03-09T06:25:47Z

@dansbecker I also noticed the inefficiency mentioned in #102 by @mheilman. I think the inefficiency problem is in this loop: https://github.com/tensorflow/skflow/blob/master/skflow/trainer.py#L113
Calling monitor.update() in the loop is too expensive and too fine-grained for most practical applications.

Can we consider moving the monitor.update() to https://github.com/tensorflow/skflow/blob/master/skflow/estimators/base.py#L236 ?
and have something like this:

def fit(self, X, y, monitor=None, logdir=None):
   ...
   for epoch in range(monitor.n_epochs_max_tolerable):
       self._trainer.train()
       monitor.update()
       if monitor.monitor_inducing_stop():
           break

In this way, the monitor is invoked every epoch to check over-fitting(is it called over-training or over-fitting?) and stop the fit process when over-training occurs.

ilblackdragon · 2016-03-09T18:11:36Z

Actually may be a better option is to have monitor in a separate thread and just push some information into it from time to time from main thread.

waleedka · 2016-04-24T10:15:02Z

I've struggled with the inefficiency mentioned here as well. My validation set is 25,000 records (30% of my data), and my mini-batch is 20. When I use the ValidationMonitor, I end up training on 20 records and then calculating the validation error on 25,000 records, which slows my training by a 100x or more.

Putting the monitor in a separate thread, as @ilblackdragon suggested, is interesting but won't solve the problem in every case. For example, if training a mini-batch takes 1 second and calculating the validation error takes a 100 seconds, then the monitor thread will fall behind and won't be able to stop the training in time.

I solved this locally by modifying ValidationMonitor._set_last_loss_seen() in monitors.py to run once every print_steps. It's a simple fix that doesn't require passing additional parameters. And it's intuitive to have the validation test be done at the same frequency as the printing of it's values.

To address the original issue of this thread (validation every epoch), the value of print_steps could be set to a large enough number such that the printing and the validation test, both, happen once per epoch.

If I get a thumbs up on this approach, I can create a PR for it.

ilblackdragon · 2016-04-28T05:38:54Z

I think the problem you observe can be fixed by adding validation over
batches instead of full set every time and moving average of the score.

On Sun, Apr 24, 2016 at 3:15 AM, Waleed notifications@github.com wrote:

I've struggled with the inefficiency mentioned here as well. My validation
set is 25,000 records (30% of my data), and my mini-batch is 20. When I use
the ValidationMonitor, I end up training on 20 records and then calculating
the validation error on 25,000 records, which slows my training by a 100x
or more.

Putting the monitor in a separate thread, as @ilblackdragon
https://github.com/ilblackdragon suggested, is interesting but won't
solve the problem in every case. For example, if training a mini-batch
takes 1 second and calculating the validation error takes a 100 seconds,
then the monitor thread will fall behind and won't be able to stop the
training in time.

I solved this locally by modifying ValidationMonitor._set_last_loss_seen()
in monitors.py to run once every print_steps. It's a simple fix that
doesn't require passing additional parameters. And it's intuitive to have
the validation test be done at the same frequency as the printing of it's
values.

To address the original issue of this thread (validation every epoch), the
value of print_steps could be set to a large enough number such that the
printing and the validation test, both, happen once per epoch.

If I get a thumbs up on this approach, I can create a PR for it.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#133 (comment)

Best regards,
Illia Polosukhin

waleedka · 2016-04-30T00:33:08Z

@ilblackdragon That's a good solution. I remember seeing a discussion about supporting more early stopping options, and what you mentioned seems like it belongs as part of that.

In the meantime, if someone needs an urgent fix, here is the the two lines I changed to fix the performance issue for me. It simply calculates the validation error once every print_steps rather than with every step.

waleedka/tensorflow@2ef359c

ilblackdragon · 2016-05-02T16:49:00Z

Let me add this actually to the master - I think it's an important fix.

On Fri, Apr 29, 2016 at 5:33 PM, Waleed notifications@github.com wrote:

@ilblackdragon https://github.com/ilblackdragon That's a good solution.
I remember seeing a discussion about supporting more early stopping
options, and what you mentioned seems like it belongs as part of that.

In the meantime, if someone needs an urgent fix, here is the the two lines
I changed to fix the performance issue for me. It simply calculates the
validation error once every print_steps rather than with every step.

waleedka/tensorflow@2ef359c
waleedka/tensorflow@2ef359c

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#133 (comment)

Best regards,
Illia Polosukhin

terrytangyuan · 2016-07-24T19:37:35Z

Feel like this is addressed in the latest version. Please submit an issue/PR to TensorFlow if it's not there. Thanks!

terrytangyuan added enhancement help wanted labels Mar 9, 2016

alanyuchenhou mentioned this issue Mar 9, 2016

logging and validation alanyuchenhou/elephant#21

Closed

ilblackdragon mentioned this issue May 2, 2016

Adding validation_steps and batch_size to VaidationMonitor tensorflow/tensorflow#2195

Closed

terrytangyuan closed this as completed Jul 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

another ValidationMonitor with validation(+early stopping) per epoch #133

another ValidationMonitor with validation(+early stopping) per epoch #133

alanyuchenhou commented Mar 9, 2016

alanyuchenhou commented Mar 9, 2016

ilblackdragon commented Mar 9, 2016

waleedka commented Apr 24, 2016

ilblackdragon commented Apr 28, 2016

waleedka commented Apr 30, 2016

ilblackdragon commented May 2, 2016

terrytangyuan commented Jul 24, 2016

another ValidationMonitor with validation(+early stopping) per epoch #133

another ValidationMonitor with validation(+early stopping) per epoch #133

Comments

alanyuchenhou commented Mar 9, 2016

alanyuchenhou commented Mar 9, 2016

ilblackdragon commented Mar 9, 2016

waleedka commented Apr 24, 2016

ilblackdragon commented Apr 28, 2016

waleedka commented Apr 30, 2016

ilblackdragon commented May 2, 2016

terrytangyuan commented Jul 24, 2016