Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in skflow monitoring code #2487

Closed
laouer opened this issue May 24, 2016 · 2 comments
Closed

regression in skflow monitoring code #2487

laouer opened this issue May 24, 2016 · 2 comments
Assignees

Comments

@laouer
Copy link

laouer commented May 24, 2016

Using the version pip_gpu version #100, I noted that a corrected issue by ilblackdragon (#2063) re-occurred and a PR from him self Ref #2063 has disappeared in the master code.

Could the issue be corrected and the enhancement reintegrated ?
Cordially

@ilblackdragon
Copy link
Contributor

Let me look into this - we have refactored a large chunk of code, including monitoring code so may have missed some parts of functionality.

@ilblackdragon
Copy link
Contributor

FYI, got a fix, but after adding some more tests to check that ValidationMonitor actually works correctly got a problem with flaky test (result change slightly every time I run) - trying to debug what is happening there.

We will cherry pick that fix into the 0.9 release though.

@vrv vrv closed this as completed in 54cf160 Jun 7, 2016
martinwicke pushed a commit to martinwicke/tensorflow that referenced this issue Jun 20, 2016
…n feed_dict or 1 epoch using readers.

* Improving PrintTensor monitor, to support tags for printed tensor (e.g. passing {'loss': loss_op} will now display loss = %f instead of full name of the op).

* Improving ValidationMonitor to support various metrics, minimization/maximization, naming to run multiple validations.

* Make sure early_stopping test actually early stops. Updated example as well.
Note, test is unstable, so exact number of steps it stops are non reproducible. See stability tests for more examples of issues.

* Added GraphDump monitor for in-depth debugging.

* Added stability test to make sure the same model trained on the same data given exactly the same results. Note: it's all super unstable, increased tolerance to just make it pass. Possibly issues with numerical stability in TF.

* Changed max_steps in graph_actions.train into steps, which adds that many steps to the training (instead of just defining max steps). For Estimator this returns previous logic, where fit(..., steps=100) followed by fit(..., steps=100) will result in 200 steps trained.
Change: 124197624
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants