-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval gets stuck forever in the Trainer Component #113
Comments
Hi, are you running the taxi example or your own one? |
I’m using my own one.
… On 14 May 2019, at 2:07 AM, Jiayi Zhao ***@***.***> wrote:
Hi, are you running the taxi example or your own one?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
SchemaGen generate schema based on train split and ExampleValidator validates eval split, could you check schemagen and examplevalidator's output to see if there is anything wrong in the data? |
@benjamintanweihao Can you attach the log for the trainer component? Assuming you're using Airflow (from your last reply), you can find the log under |
@ruoyu90 Yes, I'm using Airflow. Here you go:
|
I'm using |
I think I found the problem! I parsed the |
@benjamintanweihao Thanks for reporting this! Actually this helps surfaced a bug in our trainer executor. The |
…r module to have same type (List[str]). PiperOrigin-RevId: 248586432
…r module to have same type (List[str]). PiperOrigin-RevId: 248586432
…r module to have same type (List[str]). Also removing unused output_dir in params passed to user module. PiperOrigin-RevId: 248586432
…r module to have same type (List[str]). Also removing unused output_dir in params passed to user module. PiperOrigin-RevId: 248637693
* RFC proposal on standartizing composite operations in tensorflow. * RFC: Standardizing Composite Operations In TensorFlow * typo fix * Update 20190610-standartizing-composite_ops.md typo fixes * Update rfcs/20190610-standartizing-composite_ops.md Co-Authored-By: Mehdi Amini <joker.eph@gmail.com> * Update 20190610-standartizing-composite_ops.md * Update 20190610-standartizing-composite_ops.md * Update 20190610-standartizing-composite_ops.md * Fixes type in file name
When I'm in the Trainer component, eval gets stuck forever:
Strangely, if I replace the path eval example path with the training example path, it manages to make progress to the model validator (though it fails model validation).
Any pointers on how to debug this?
The text was updated successfully, but these errors were encountered: