-
Notifications
You must be signed in to change notification settings - Fork 1k
oidc: Refactor lookup strategies into single functions #18169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
oidc: Refactor lookup strategies into single functions #18169
Conversation
Signed-off-by: Facundo Tuesca <facundo.tuesca@trailofbits.com>
Signed-off-by: Facundo Tuesca <facundo.tuesca@trailofbits.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the move from using a list-in-subclass to a function-that-contains-the-complexity, as that simplifies the overall design and allows individual publishers to decide their own lookup method.
I'm also a little wary of how well-tested these publishers are. There's one functional test for provenance that creates a GitHub publisher, but everything else in theOIDC publishers stack seems heavily unit/mocked tests, so would it make sense to add some more functional tests around these interactions? The amount of mixins and abstractions make me a little cautions, being less familiar with the code behaviors.
warehouse/oidc/models/_core.py
Outdated
@@ -174,20 +174,10 @@ class OIDCPublisherMixin: | |||
# but there are a few problems: those claim sets don't map to their | |||
# "equivalent" column (only to an instantiated property), and may not | |||
# even have an "equivalent" column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is now invalid and probably should be written to express the new design of "implement the checks in a subclass' lookup_by_claims()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed
warehouse/oidc/models/google.py
Outdated
specific_publishers = [p for p in publishers if p.sub == sub] | ||
if specific_publishers: | ||
return specific_publishers[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize: This might be cleaner with more_itertools.first_true
, like so:
specific_publishers = [p for p in publishers if p.sub == sub] | |
if specific_publishers: | |
return specific_publishers[0] | |
if specific_publisher := first_true(publishers, pred=lambda p: p.sub == sub): | |
return specific_publisher |
And in other places that are using this iteration pattern. Definitely worth thinking about other selector patterns from more-itertools, since we have it available for use.
Not a sticking point, but wanted to call out the pattern while I noticed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I have a question that came up while trying to write the functional tests: I started the tests by POSTing to warehouse/warehouse/oidc/services.py Lines 343 to 346 in 79691b4
since the
Do we have a way to set it from the tests, without mocking the whole |
This PR refactors the Trusted Publishing "lookup strategies" pattern into a single
lookup_by_claims()
method for each of the publishers.Context
A strategy is a way of, given a set of OIDC claims, query the database for a matching Trusted Publisher. Concretely, a strategy is a function that takes a set of claims and returns a
Query
object. Each publisher has a list of these strategies, ordered from specific to general. When trying to find a Trusted Publisher, the more specific strategies are tried first, and if they fail the more general ones are tried.For example, for the given set of claims for a GitHub OIDC token:
first we ran a strategy that tried to find publishers with exactly those values (in particular,
environment==my_environment
). If that strategy failed, we tried the second strategy, where we tried to find publishers withenvironment==None
(that is, allowing any environment).The "specific to general" order in this case meant going from "Publishers that only allow
my_environment
as an environment" to "Publishers that allow any environment".New implementation
This PR changes the above approach to a single function per provider called
lookup_by_claims()
which takes a set of claims and returns a Publisher.The multiple strategies are collapsed into a single query: we query for all publishers where the non-optional fields match the claims. In the example above, this means our single query looks for all publishers that match
repository
andjob_workflow_ref
, ignoring theenvironment
value.We then look at the resulting Python Publisher objects, and select the most specific one.
Rationale
The reasons for this change are:
cc @woodruffw @miketheman @di