Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

Loss scores are different for contiguous run of fit() for 200 steps and 4 runs of fit() for 50 steps #88

Closed
olegarch opened this issue Jan 31, 2016 · 5 comments

Comments

@olegarch
Copy link

I am doing regression with DNN.

Final MSE for contiguous run of 200 steps: 1.45781016655
Final MSE for 4 runs with 50 steps each: 1.44524233948

Score for contiguous run:
Step #1, epoch #1, avg. loss: 27.95941
Step #21, epoch #21, avg. loss: 5.64051
Step #41, epoch #41, avg. loss: 1.78990
Step #61, epoch #61, avg. loss: 1.53639
Step #81, epoch #81, avg. loss: 1.49865
Step #101, epoch #101, avg. loss: 1.48255
Step #121, epoch #121, avg. loss: 1.47312
Step #141, epoch #141, avg. loss: 1.46747
Step #161, epoch #161, avg. loss: 1.46394
Step #181, epoch #181, avg. loss: 1.46122

Score for 4 runs 50 steps each:
Step #1, epoch #1, avg. loss: 27.95941
Step #6, epoch #6, avg. loss: 13.49244
Step #11, epoch #11, avg. loss: 4.11436
Step #16, epoch #16, avg. loss: 2.69326
Step #21, epoch #21, avg. loss: 2.26197
Step #26, epoch #26, avg. loss: 2.02976
Step #31, epoch #31, avg. loss: 1.79997
Step #36, epoch #36, avg. loss: 1.71287
Step #41, epoch #41, avg. loss: 1.61699
Step #46, epoch #46, avg. loss: 1.56702

Step #51, epoch #1, avg. loss: 1.52925
Step #56, epoch #6, avg. loss: 1.52344
Step #61, epoch #11, avg. loss: 1.51318
Step #66, epoch #16, avg. loss: 1.50661
Step #71, epoch #21, avg. loss: 1.50114
Step #76, epoch #26, avg. loss: 1.49584
Step #81, epoch #31, avg. loss: 1.49099
Step #86, epoch #36, avg. loss: 1.48698
Step #91, epoch #41, avg. loss: 1.48371
Step #96, epoch #46, avg. loss: 1.48097

Step #101, epoch #1, avg. loss: 1.47760
Step #106, epoch #6, avg. loss: 1.47609
Step #111, epoch #11, avg. loss: 1.47386
Step #116, epoch #16, avg. loss: 1.47201
Step #121, epoch #21, avg. loss: 1.47048
Step #126, epoch #26, avg. loss: 1.46914
Step #131, epoch #31, avg. loss: 1.46795
Step #136, epoch #36, avg. loss: 1.46686
Step #141, epoch #41, avg. loss: 1.46591
Step #146, epoch #46, avg. loss: 1.46506

Step #151, epoch #1, avg. loss: 1.46384
Step #156, epoch #6, avg. loss: 1.46348
Step #161, epoch #11, avg. loss: 1.46276
Step #166, epoch #16, avg. loss: 1.46212
Step #171, epoch #21, avg. loss: 1.46144
Step #176, epoch #26, avg. loss: 1.46086
Step #181, epoch #31, avg. loss: 1.46028
Step #186, epoch #36, avg. loss: 1.45976
Step #191, epoch #41, avg. loss: 1.45914
Step #196, epoch #46, avg. loss: 1.45857

@ilblackdragon
Copy link
Contributor

There is one reason why this can be happening - every time fit restarts, the new re-sampling happens inside data_feeder and different order of data is seen by the model.

I'll look into it more tomorrow, to check that this is indeed the only reason.

Otherwise, a better comparison would be to let it train until convergence (e.g. loss doesn't go down anymore) for both cases and it should result in a very similar score.

@olegarch
Copy link
Author

After run for 50 steps I was executing session to get test set results and that was changing random generator state in dropout operation. Dropout is still exercised for non-training step with probability 1. So subsequent training was slightly different from contiguous run. If I don't execute session to get test results or remove dropout layer - training results for contiguous and non-contiguous runs match.

@ilblackdragon
Copy link
Contributor

Yeah, one option is to try to remove dropout at all in the non-training case (e.g. a tf.cond(is_training, dropout(x, prob), x)). Do you feel this difference is a big deal or after enough iterations it converges to the same results?

@olegarch
Copy link
Author

It's not a big deal. Convergence is similar in both cases.

@ilblackdragon
Copy link
Contributor

Closing as WAI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants