Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a taxicab demo that runs in "Continuous Training" mode? #421

Closed
robertlugg opened this issue Aug 2, 2019 · 5 comments
Closed

Create a taxicab demo that runs in "Continuous Training" mode? #421

robertlugg opened this issue Aug 2, 2019 · 5 comments

Comments

@robertlugg
Copy link
Contributor

In a recent paper, I see

One key distinction is that between one-off and continuous pipelines. One-off pipelines are initiated by engineers to produce ML models “on demand”. In contrast, continuous pipelines are “always on”: they ingest new data and produce newly updated models continuously.

The Chicago taxicab example appears to be an "on-demand" pipeline. Any change to the data directory (changing a row in the *.csv or adding a new *.csv file) triggers a complete re-run of all the rows of all the *.csv files. From what I see, the current demo can't be run as a continuous pipeline.

Could you either correct me, explain how it might be done, or create a version of the taxicab demo which can run in continuous mode? I expect that it would watch for new *.csv files or changes in the .csv files and it would adjust the output tfrecord files of the CsvExampleGen but only process the changed rows or added files without needing to process every row again.

@1025KB
Copy link
Collaborator

1025KB commented Aug 3, 2019

currently we haven't support continuous training in TFX OSS yet, what you can do now is trigger the pipeline periodically to mimic continuous mode.

similar feature request [1][2], stay tuned, it's on our radar!

@robertlugg
Copy link
Contributor Author

Hi @1025KB , I was curious if you have made any progress. Or, if you can give me a conceptual understanding of how this could be done? For instance, the taxicab demo takes in a directory into CsvExampleGen. Would that be change to take in individual files? Or individual lines?

WIth the idea of line-by-line processing. If each line operation takes a large amount of time could I also data-distribute that? Would I just make up my own URI schema?

@singhniraj08
Copy link
Contributor

@robertlugg,

Since similar feature request #210 is already in progress, requesting you to close this issue, follow and +1 similar thread for updates. Thanks.

@github-actions
Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label May 13, 2023
@github-actions
Copy link

This issue was closed due to lack of activity after being marked stale for past 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants