Add support for validation sets #85

untom · 2016-01-19T13:51:48Z

It would be nice if skflow had some support for validation sets to be used for early stopping and monitor validation set loss during training. This could be realized failry easily by adding a fraction_validationset to the TensorFlowEstimator. Within fit, the given training set could then be split into two parts.

The text was updated successfully, but these errors were encountered:

ilblackdragon · 2016-01-24T01:49:54Z

The validation set support is a good idea. One concern I have is to how to fit with sklearn interface when a user wants to pass his specific validation set. Because usually you split your dataset 3 ways - train, validation and test and use validation for hyperparameter search.

makseq · 2016-02-03T15:08:03Z

I made it as delegate validation function passing through fit(..., cross_valid_fn=my_func()).
my_func(): return calc_rmse(test_data)
It's very useful because user can adjust his my_func() as he wants.

dansbecker · 2016-02-11T04:01:20Z

I was going to take a look at this (though I'm still learning my way around skflow).

I see an early_stopping_rounds argument in the estimators (TensorFlowEstimator and its derived classes), and the argument is passed to the TensorFlowTrainer, which appears to implement early stopping logic.

Is the current early stopping logic different from what's suggested in this issue? I'll pursue this issue further if I can understand how it differs from the current early stopping.

ilblackdragon · 2016-02-11T06:05:38Z

@dansbecker Thanks for taking a look! Right now early stopping is done on training loss - e.g. if training converged, model stops before a number of required steps.

On the other hand, using validation set is another option. But so far, in examples we were implementing it this way: https://github.com/tensorflow/skflow/blob/master/examples/resnet.py#L148
So one way I was thinking this issue can be addressed is by making something like
skflow.train(estimator, X_train, y_train, X_valid, y_valid, metric?) that does this loop and also does stopping if validation metric stops improving.

dansbecker · 2016-02-11T18:48:27Z

Thanks @ilblackdragon. That makes sense.

On the issue of using the same data for early stopping and for hyperparameter search: Three options come to mind.

Let the user pass in two sets of data to the fit method. One set for training the network, the second (which @untom calls validation in this issue) for determining when to stop training. As I think you mentioned, this raises the question of whether the same validation data can also being used for hyperparameter search.

It feels reasonable if I think of the number of steps as a hyperparameter, in which case we'd expect the same data to determine number of steps as is used to determine the other hyperparameters.

However, I think it is least consistent with the sklearn interface.

Incidentally, this is the approach that keras uses.
Let the user specify an argument for what fraction of the training data is used only to determine early stopping (and not used to set weights). In this approach, they don't specify a separate data set to be used for early stopping.

This option may confuse some users, who expect all data in the training set to always be used to determine network weights. However, it's consistent with the sklearn interface than the first option, and I think most users will find it the easiest to use.
Let the user create a monitor object that tells the network when to stop. The user specifies the relevant data when creating that object, and that monitor is an optional argument to the fit method. This is how I interpret what @makseq described above, and this post describes its use with sklearn's GradientBoostingClassifier.

Thoughts?

dansbecker · 2016-02-11T19:04:30Z

@ilblackdragon :Those three above are in addition to the one you mentioned
skflow.train(estimator, X_train, y_train, X_valid, y_valid, metric?)

My inclination is towards either your suggestion, or 2 or 3 in the note above.

terrytangyuan mentioned this issue Feb 6, 2016

Remove call to set seed. Seed is set again in line that would use it #91

Merged

dansbecker mentioned this issue Feb 16, 2016

Add early stopping and reporting based on validation data #104

Merged

terrytangyuan added the enhancement label Feb 19, 2016

ilblackdragon closed this as completed in a28882f Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for validation sets #85

Add support for validation sets #85

untom commented Jan 19, 2016

ilblackdragon commented Jan 24, 2016

makseq commented Feb 3, 2016

dansbecker commented Feb 11, 2016

ilblackdragon commented Feb 11, 2016

dansbecker commented Feb 11, 2016

dansbecker commented Feb 11, 2016

Add support for validation sets #85

Add support for validation sets #85

Comments

untom commented Jan 19, 2016

ilblackdragon commented Jan 24, 2016

makseq commented Feb 3, 2016

dansbecker commented Feb 11, 2016

ilblackdragon commented Feb 11, 2016

dansbecker commented Feb 11, 2016

dansbecker commented Feb 11, 2016