-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fetching beam args at runtime from env vars. Fixes issue #3326 #3455
Support fetching beam args at runtime from env vars. Fixes issue #3326 #3455
Conversation
@kennethyang404 Build is now passing except for pre-existing lint error and one new lint error in the same file due to following the same convention. |
@kennethyang404 It would be awesome if there is enough time to get this into release 0.29.0! |
@kennethyang404 Do you have time to review this or can you pass it to someone else? Along with #3541 this eliminates all clear text password leaks. |
I think a problem here is that the additional_pipeline_args cannot be overridden on each component, therefore this would force all components to use the same beam pipeline args, which conflicts with the intent in be20d08 A better, but possibly much more involved way to do this is:
What do you think, @kennethyang404 and @ConverJens ? |
@kennethyang404 @zhitaoli I must admit that I don't understand how placeholders work in this context nor how the implementation would look like. Would another option be to add another argument to the new beam_base_component? Something like beam_pipeline_args_from_env or similar? The current implementation uses basically this logic but instead of introducing this new parameter, a specific keyword passed to additional_beam_args was choose. What do you think? |
@neuromage @1025KB @jiyongjung0 WDYT? Merge or reimplement after the 0.30.0 release? I'm open for both but this is a blocker for us taking TFX into production since it leaks credentials. |
IIUC, per-component beam_pipeline_arg of beam_base_component doesn't work with Kubeflow yet because base_beam_component only works with TFX IR and kubeflow support for TFX IR is WIP. |
@jiyongjung0 That seems reasonable. When is TFX IR expected to land? Next release, Q2 etc? |
TFX IR support of KubeflowDagRunner is in #3447 (It is hard to read due to lack of rebase.). So it won't take months I hope. Your suggestion sounds good to me. 😄 |
Sound promising and looking forward to it! |
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days |
@zhitaoli @jiyongjung0 @kennethyang404 |
FYI, although #3447 was merged, it is not included in TFX 1.0.0 release branch. It will be included in 1.1.0 (or whatever version after 1.0.0). Thank you so much for the contribution! |
@jiyongjung0 Thank you for the information! As far as I can see, in TFX 1.0.0rc1 there is an per-component-option of adding beam args but you're saying that this doesn't work/has no effect if run on KubeFlow, correct? Even if that's the case, hopefully it will be fine implementing placeholder support in beam args, even though they are on pipeline level. |
@jiyongjung0 @zhitaoli However, I'm unfamiliar with both the new TFX IR as well as placeholders. Do you have any ideas or pointer on how you would like to see this implemented? More specifically:
A discussion around this would be very appreciated! |
I opened a new PR, perhaps we can continue the discussion there? #4147 |
@kennethyang404 , could you take a look on this PR again? I'm not familiar with the placeholder stuff... |
@jiyongjung0 @kennethyang404 Could you please check the other PR again? It was approved but the build failed since it timed out on pips dependency resolving. To get your attention I requested another review and not it looks as it was never approved. Could you just kindly approve it again and give it more build time so it can finish and it can be merged? This has been going on for quite a while now. It would be really great if we can finally close it. |
@ConverJens curious about the value overwrite strategy upon conflicts: why do we choose to deprioritize the later-bound value (environment var)? Environment vars can be changed more flexibly and can serve as the same functionality as "flags". WDYT? |
@ruoyu90 Good point! I chose this strategy since I deemed it the least risk of impacting existing workflows and that probably caused the least amount of confusion. I agree that it might be more natural to allow env vars to take precedence. Note that this PR is closed but a link to the new PR is referred to above and I think we settled on rasing an error in case of conflicting arguments. |
Currently, TFX only support passing Beam args as static values defined at pipeline compile time. This is undesirable since one Beam needs to pass credentials to in order to use certain file systems, e.g. S3.
This PR allows Beam args to be populated from environment variables, which in turn can be populated by k8s secrets, and hence leaves no credentials visible in code nor pipeline definition yaml files.
This PR fixes the issues for beam args highlighted in: #3326