Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf.data] move StructuredFunctionWrapper into a separate module #51947

Merged

Conversation

kvignesh1420
Copy link
Member

@kvignesh1420 kvignesh1420 commented Sep 11, 2021

This PR refactors dataset_ops.py by moving the StructuredFunctionWrapper class into a common module. This enables further refactoring of function-specific Datasets.

side note for future PR's: after further examination of the function-specific Dataset classes (for ex: MapDataset), I see that they inherit from UnaryDataset which itself inherits from DatasetV2. And since the DatasetV2 class itself needs MapDataset for the def map() functionality, this poses a circular dependency challenge even if we split the classes into different files. Thus, lazy loading would be imminent and the files might get a bit messy.

cc: @aaudiber What do you think?

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Sep 11, 2021
@google-cla google-cla bot added the cla: yes label Sep 11, 2021
@kvignesh1420 kvignesh1420 marked this pull request as ready for review September 11, 2021 18:04
@gbaned gbaned self-assigned this Sep 13, 2021
@gbaned gbaned added the comp:data tf.data related issues label Sep 13, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Sep 13, 2021
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Sep 13, 2021
Copy link
Contributor

@aaudiber aaudiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kvignesh1420.

Regarding the circular dependency with DatasetV2, lazy-loading DatasetV2 at the start of each dataset transformation sgtm.

tensorflow/python/data/ops/common.py Outdated Show resolved Hide resolved
tensorflow/python/data/ops/dataset_ops.py Outdated Show resolved Hide resolved
PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Sep 13, 2021
@kvignesh1420 kvignesh1420 changed the title [tf.data] move StructuredFunctionWrapper into a common module [tf.data] move StructuredFunctionWrapper into a separate module Sep 15, 2021
@kvignesh1420
Copy link
Member Author

@aaudiber I have the changes to the current PR. Please take a look.

P.S: Thanks for the suggestions related to the future PRs.

Copy link
Contributor

@aaudiber aaudiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kvignesh1420, one last comment and this LGTM

tensorflow/python/data/ops/dataset_ops.py Outdated Show resolved Hide resolved
@kvignesh1420 kvignesh1420 force-pushed the dataset-ops-refactor-1 branch 3 times, most recently from efc4dea to c6dd3d3 Compare September 15, 2021 17:35
refactor using structured_function module

use the new API location of structured_function

move DEBUG_MODE to dataset_ops
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Sep 20, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 20, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 20, 2021
@gbaned gbaned removed the awaiting review Pull request awaiting review label Sep 21, 2021
@rthadur rthadur added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Sep 21, 2021
@kvignesh1420
Copy link
Member Author

@aaudiber any updates on this? Please let me know if any changes are required. Thanks!

@copybara-service copybara-service bot merged commit ec9b3e3 into tensorflow:master Oct 4, 2021
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:data tf.data related issues size:L CL Change Size: Large
Projects
PR Queue
  
Approved by Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

5 participants