Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Update to Apache Beam 2.52.0, enable Beam 2.46.0 compatibility #6416

Closed
masonkirchner opened this issue Nov 2, 2023 · 5 comments
Closed

Comments

@masonkirchner
Copy link

masonkirchner commented Nov 2, 2023

Describe the feature and the current behavior/state.

The current compatibility matrix lists that TFX 1.14.0 is compatible with Beam 2.47.0. However, it is a known issue that there is a memory leak in Beam versions 2.47.0 - 2.51.0. This issue will be resolved in 2.52.0, so it would be appreciated if all TFX libraries update Beam when 2.52.0 is available.

You may wish to update the Protobuf dependencies as well to exclude the versions that have the memory leak. See this Apache Beam PR for reference: https://github.com/apache/beam/pull/29255/files

Additional request to loosen the requirements in 1.14.0 to allow 2.46.0 as a workaround in the meantime.

Will this change the current API? How?

Potential for breaking changes in Beam / Protobuf, but hopefully none.

Who will benefit with this feature?

All users running long-running TFX jobs that require Beam.

Do you have a workaround or are completely blocked by this? :

Cannot workaround by using 2.46.0 due to the current rule requiring greater than or equal to 2.47.0:

'apache-beam[gcp]>=2.47,<3',

Loosening this requirement and cutting a new release would be appreciated in the meantime.

Current workaround requires using < Python 3.10, < TFX 1.14.0, < Beam 2.47.0

Name of your Organization (Optional)

Any Other info.

@masonkirchner masonkirchner changed the title [Request] Update to Apache Beam 2.52.0 [Request] Update to Apache Beam 2.52.0, enable Beam 2.46.0 compatibility Nov 2, 2023
@singhniraj08 singhniraj08 self-assigned this Nov 6, 2023
@singhniraj08
Copy link
Contributor

@masonkirchner,

I tried loosening the apache-beam[gcp] dependency from 2.47.0 to 2.46.0 and tried installing TFX but looks like other packages also have dependency on apache-beam[gcp]==2.47.0. Below is the error I got while installing TFX with apache-beam 2.46: error: apache-beam 2.46.0 is installed but apache-beam[gcp]<3,>=2.47 is required by {'tensorflow-transform', 'tensorflow-data-validation', 'tensorflow-model-analysis', 'tfx-bsl'}

Once beam 2.52.0 get released, we will try to update the dependency in upcoming release. Meanwhile, the workaround to avoid memory leak issue is to use TFX 1.13.0 release which support apache-beam[gcp]>=2.40,<3. Thank you!

@masonkirchner
Copy link
Author

masonkirchner commented Nov 8, 2023

@singhniraj08 Thanks for trying to loosen the requirements to allow 2.46.0. It's too bad that it wasn't compatible with other dependencies. I've been using TFX 1.13.0 as a workaround.

Good news: Beam 2.52.0 published a release candidate if you want to get a jump start on testing compatibility: https://pypi.org/project/apache-beam/2.52.0rc2/

@singhniraj08
Copy link
Contributor

@masonkirchner,

We have already loosen the apache-beam version requirements in TFX 1.14.0 release. So apache-beam 2.52.0rc2 should work with TFX 1.14 release without any issues. I tried the same and am able to run an example notebook without any issues. Ref: notebook with TFX and beam 2.52.0rc2

Please try using apache-beam 2.52.0rc2 with TFX 1.14.0 release and let us know if you face any issues. Thank you.

Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Nov 23, 2023
Copy link

This issue was closed due to lack of activity after being marked stale for past 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants