Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code cleanup #34

Closed

Conversation

dimon222
Copy link
Collaborator

@dimon222 dimon222 commented May 30, 2019

To address #33, #19 and remove now unnecessary kerberos variables. Any kind of auth (HTTPKerberosAuth, HTTPSPNEGOAuth, basic auth, etc) can be directly passed to auth in constructor of necessary app context. Another good thing is now we get session instead of atomic calls. That would eliminate unnecessary additional handshakes for kerberos.

In this sense example with typical HTTPKerberosAuth:

from yarn_api_client.history_server import HistoryServer
from requests_kerberos import HTTPKerberosAuth
history_server = HistoryServer('https://127.0.0.2:5678', auth=HTTPKerberosAuth())

Same/similar logic for any requests supported module for auth. GSSAPI is one of the most flexible.

@dimon222 dimon222 changed the title Code cleanup [WIP] Code cleanup May 30, 2019
@dimon222 dimon222 force-pushed the feature/better_endpoints branch 3 times, most recently from dc52803 to aebb93e Compare May 30, 2019 03:33
@dimon222 dimon222 changed the title [WIP] Code cleanup Code cleanup May 30, 2019
@dimon222 dimon222 force-pushed the feature/better_endpoints branch 2 times, most recently from 06868cb to e8f4f13 Compare May 30, 2019 03:54
itests/integration_test_resource_manager.py Outdated Show resolved Hide resolved
yarn_api_client/application_master.py Outdated Show resolved Hide resolved
yarn_api_client/base.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
@lresende
Copy link
Collaborator

@kevin-bates Any other thoughts here?

yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
yarn_api_client/base.py Show resolved Hide resolved
@dimon222 dimon222 force-pushed the feature/better_endpoints branch 3 times, most recently from 87e9a4e to 4f5654a Compare May 31, 2019 04:03
@dimon222
Copy link
Collaborator Author

@lresende @kevin-bates I've addressed mentioned things + updated definition of RM to use list of any amount of endpoints for "failover out of the box".

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimon222 - thank you for doing this. Its quite a change, but a good one!

I had a few comments and suggestions - primarily wrt to encapsulation and some renaming.

yarn_api_client/base.py Outdated Show resolved Hide resolved
yarn_api_client/resource_manager.py Show resolved Hide resolved
yarn_api_client/base.py Outdated Show resolved Hide resolved
yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
yarn_api_client/hadoop_conf.py Outdated Show resolved Hide resolved
@dimon222
Copy link
Collaborator Author

dimon222 commented Jun 1, 2019

@kevin-bates I applied suggested corrections. Let me know, so I can apply rebase with fixup to make it one consolidated PR

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dmitry - this is an outstanding PR - thank you!

The only comment I have is regarding the deferred raise due to a lack of endpoint. I think I'd prefer to have the raise occur at the point in the code in which the issue needs to be addressed - during construction. But if there are reasons to defer until first use - I'm okay with how it is.

yarn_api_client/base.py Show resolved Hide resolved
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Dmitry - thank you!

yarn_api_client/base.py Show resolved Hide resolved
@dimon222
Copy link
Collaborator Author

dimon222 commented Jun 2, 2019

@kevin-bates alright, thanks. I've rebased with fixup into one single commit with full changes. @lresende please have a final look and let me know if anything else I missed.

@dimon222
Copy link
Collaborator Author

dimon222 commented Jun 6, 2019

Incorporated hotfix from #37

@dimon222
Copy link
Collaborator Author

Any plan on this?

@lresende
Copy link
Collaborator

I am good with any direction/feedback @kevin-bates gives towards merging this, otherwise, I have some hard deadlines this week and should be able to get back to this next week.

@kevin-bates
Copy link
Member

Thanks for the ping @dimon222 (and @lresende). I think my biggest concern is that this PR breaks compatibility with existing clients. Enterprise Gateway 2.x caps yarn-api-client to < 0.4, so I think we're good there. However, EG 1.x uses yarn-api-client >= 0.3.3, so users installing or upgrading will break w/o explicit pre-install or downgrade of yarn-api-client.

Here's a possible plan:

  1. Merge Changes in yarn api client for Enterprise gateway yarn checks. #39 and cut release 0.3.6.
  2. Update EG 1.x to cap < 0.4.0 and cut minor releases 1.3.0 (and probably 2.0.0rc3 after merge of Kernel Startup using Yarn with resource and request check jupyter-server/enterprise_gateway#699) . (Sorry, this item is not necessarily germane, but is an instance of a broken client.)
  3. Merge this PR and cut release 0.4.0.
  4. Update EG 2.x to use new methods and add dependency for yarn-api-client >= 0.4.0 (Again, not necessarily germane)

Any comments, concerns?

cc: @dimon222, @IMAM9AIS, @lresende, @toidi, or anyone else watching this repo

@dimon222
Copy link
Collaborator Author

@kevin-bates sounds good, however the decision on #8 (comment) would be useful incase we decide to introduce more breaking changes. Basically - to what extend we plan to support compliance with Hadoop, and do we need to backport/support old apis. Right now we literally have a mix of new and old apis (not good)

@ediskandarov
Copy link
Collaborator

it's ok to introduce backward-incompatible changes
but if we do so - major version bump should be considered 1.x.x

usually, apps have pinned version of requirements. hadoop yarn client is not an exception here

or I got something wrong?

@kevin-bates
Copy link
Member

Yeah, I agree with a major version bump. Since we have yet to release a major (1.0) version at all, I wasn't sure if this should be that "thing", but perhaps it should.

I'm not familiar with the REST API and its version history. @dimon222 - could you please elaborate on what parts of this repo are using different versions of the api? I would hope that the v1 in the URL represents the version of the REST API and that responses from a given version are stable.

If we wanted to support "legacy" versions, which given (glacial) speed at which organizations move - perhaps we should, then we should come up with a 'contract' for how many back-releases we'd support. If only two, then perhaps a 'legacy' indicator in each constructor is appropriate (to the referenced comment) - or we actually use the version indicator of the URL (if its even applicable).

@kevin-bates
Copy link
Member

Ok. @toidi @dimon222 @lresende - How about this?

  1. We build 0.3.7 from master (to get the latest http-policy fix).
  2. Merge this PR and take a day or two to check things out (test with Enterprise Gateway, etc.)
  3. Create 1.0.0 based on this PR

Applications that use the former api will need to pin < 1.0, if they haven't already (EG has pinned).

I'd be happy to work on these tomorrow (get 1 and 2 done, prepare for 3 - assuming 2 is successful).

@dimon222
Copy link
Collaborator Author

dimon222 commented Sep 17, 2019

@kevin-bates Wow, surprised you wrote right in time I've looked at it 💯
This PR been stuck for quite long time. I've just remembered and came back to do review of endpoint compliance for existing implementation.

I've picked the latest possible Hadoop REST (for now, just ResourceManager) spec and next are my findings. I need to know opinion on how we want to keep compatibility, as we have couple of endpoints that still have "unstable/experimental" stage. That means endpoint spec can change anytime. If we want to polish it still, we could probably make beta version release and run our tests/polish the outstanding points i mentioned below.

ResourceManager:

  1. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API - Applications API got several new arguments. I couldn't backtrack, I guess quite long ago (over 5 years).
  2. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Specific_Container_for_an_Application_Attempt_API - implementation missing
  3. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Nodes_API - spec doesn't match (changed at some point in the past?)
    4. (ALPHA) https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_State_API - we have GET and independent "KILL" implementation, but not support for generic PUT with setting any other state. I might assume its kinda incomplete spec implementation - only KILLED is supported
  4. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_List - implementation missing
  5. (ALPHA) https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Delegation_Tokens_API - implementation missing
  6. (ALPHA) https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Create - implementation missing
  7. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Submit - implementation missing
  8. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Update - implementation missing
  9. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Reservation_API_Delete - implementation missing
  10. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Timeouts_API - implementation missing
  11. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Timeout_API - implementation missing
  12. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Timeout_Update_API - implementation missing
  13. (ALPHA) https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Scheduler_Configuration_Mutation_API - implementation missin

NodeManager, HistoryServer, ApplicationMaster - TBD, i still have to review spec for them.

We have variety of choices as its basically already somewhat "broken" in current master. Maybe as an option we could make roadmap or/and add TO DO list and cover stuff over time. The main concern is "what version of hadoop we actually trying to comply here".

@kevin-bates
Copy link
Member

@dimon222 - thanks for the detailed response - these are good points.

... as we have couple of endpoints that still have "unstable/experimental" stage. That means endpoint spec can change anytime.

Could you elaborate on which endpoints have "unstable/experimental" stage?

Regarding compatibility, I think we just need to be sure we follow the REST API docs and, right now, with the exception of the Timeline Service, there is just one version of the REST API - v1. I don't think we can get caught up in the underlying version of Hadoop/YARN, nor should we, since this is what a REST API is all about - acting as a thin veneer over more specific programmatic APIs. As such, we may want to consider adding a version indicator (e.g., v1, v2, etc.) as a parameter to the various constructors, with the implementation constructing the correct requests based on that value.

Nearly all of the links you provide are to requests that are not implemented. I don't know the original history behind this project but I think we should treat additional requests on an as needed basis. The project I work on (Enterprise Gateway), requires very few methods from this project, but should we discover others are required, we would certainly contribute their implementations. If there are requests that you need, then by all means, let's be sure and get them in. The issue with adding requests that aren't necessarily needed is that they increase the surface area for test and support and, in the case of the ALPHA requests, are subject to change.

Here's what I see as next steps...

  1. Produce release 0.3.7 from master.
  2. Merge this PR.
  3. Produce release 1.0.0a (alpha) as you suggest. Existing applications should probably cap their requirements to <1.0.0.
  4. See what additional APIs are necessary, fine-tune existing ones (e.g., look at adding REST version indicator, etc.) and produce a beta or release-candidate (depending on our confidence).
  5. After some time on the beta/release candidate produce 1.0.0

Comments, thoughts?

@dimon222
Copy link
Collaborator Author

@kevin-bates
I'm referring to Alpha marked ones.

I'm good with plan tho.

@kevin-bates
Copy link
Member

Thanks for the quick response. Cool - I was thinking it was the YARN ALPHAs you were referring to, but wanted to make sure it wasn't anything of ours.

Great. I'll produce 0.3.7 today. If there are objections to that from others, speak up now (or prepare to build 0.3.8 😄)

@dimon222
Copy link
Collaborator Author

Closed in favor of #43

@dimon222 dimon222 closed this Sep 17, 2019
@dimon222 dimon222 deleted the feature/better_endpoints branch September 19, 2019 00:40
@dimon222 dimon222 mentioned this pull request Sep 20, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants