New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot process dataset with 800+ features: Job graph is too large #57
Comments
Is it possible to use a few multi-dimensional features instead of many single-dimensional features? This may greatly improve performance. If your features are somehow grouped by catagories, this may also be a logical way to arrange your features. |
I pushed 2000 features by translating 2000 one dimensional features to 1 2000 dimensional feature |
Any update about this issue? I'm in a similar situation.. |
Are you able to use a similar workaround? |
this workaround worked fine for us and actually helps make training faster also...overall idea is to run through tf-transform in high dimensional tensor for numeric/categorical/? features and at the end of the flow either unstack or let the tf model fit on high dimensional tensor like image training... |
I think tf-transform can fuse it automatically but we are not getting time to push the fix in...also giving a unstacked view is good for tf.boosted_trees but I don't think we can make it generic... |
Team is working on this and we encourage to use the workaround provided(above) for now. |
Any updates or workarounds for this feature request? I am now hitting the "graph too large" limit, but unfortunately the transformations I use cannot be concatenated together to be applied element-wise. |
Yes, #110 would suffice for our needs. Just wondering if there are any workarounds for folks running into this issue with the current release. |
No there is no workaround for #110 |
Any update on this? The workaround works for the case of "same transform goes to N features so we can concat N features first then do this transform". Basically we have to do Another question is, any guidance on rule of thumb of avoiding "graph is too large". Like 'number of features should less < X' or 'certain transformation should less than M' etc. Right now as a user i am in constant fear of worrying the graph is too large, the only way to see it is large or not is by try launch that in dataflow (take several minutes to wait) and hope for the best. This experience is not great. |
There have been several improvements since the original issue. Re. guidance on rule of thumb for avoiding graph is too large error, we can't do that because a) it depends on so many factors that are specific to each user's inputs and preprocessing_fn, and b) this is a Dataflow error that we can't really control. |
Thanks @zoyahav |
Could you please confirm if this issue can be closed.Thanks |
Apologies for the delay and It seems like duplicate issue #223 and Developers are reporting that using runner_v1 with upload_graph solves the problem as per this comment so Could you please try that workaround and let us know is it resolving your issue ? Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved? If issue still persists please let us know if possible please help us with error log to do investigation to find out root cause for your issue ? Thank you! |
Hi, @fabito Closing this issue due to lack of recent activity for couple of weeks. Please feel free to reopen the issue or post your comments if you need any further assistance or update Thank you! |
Hi,
I'm trying to submit a job to process a dataset (~850 features) in Cloud Dataflow.
The
preprocessing_fn
looks like this:After a few minutes the job submission fails claiming that "The job graph is too large."
Has anyone seen this before ? How can I workaround it?
Detailed logs below:
The text was updated successfully, but these errors were encountered: