-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PEP 694: Abstract file upload mechanisms #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I spent some time to sketch this out as a Finite State Machine in pypi/warehouse#18174, and will be using what I learned to refine this a bit! |
I'm planning on taking a closer look later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments, but overall the changes look good to me.
I just wanted to note a few things I came across when reading the entire PEP (with these changes incorporated-- and probably a number of these came from my original PEP so that's on me :D ).
- The PEP specifies that the endpoint for PyPI will be
https://upload.pypi.org/2.0
. I would probably remove that from the PEP and let PyPI decide what it's endpoint will be (in particular the/2.0
part is somewhat confusing given the PEP also uses conneg). - The PEP calls out the inability to parallelize or resume an upload as a problem to be solved, and then later states the PEP solves all the identified problems. The TUS based approach didn't really solve parallelization to begin with, and the removal of TUS means the PEP also doesn't solve resuming an uploading. I think that's fine, but we should update the wording.
- The content type handling is kind of wonky I think. Much like PEP 691 the client can use the
Accept
header to request a particular content type from the server, and the server includes the full version number in themeta.api-version
key in the response. However, the requests appear to only be using themeta.api-version
key. I think ideally we want to have requests using a correctContent-Type
for their request (and thelatest
wouldn't be supported here), and have the server use that for handling the request data. We probably want to also explicitly require that themeta.api-version
matches theContent-Type
for major version.
While this PEP would no longer directly implement resumable or parallel uploads, it does solve the problem of how to address them. Individual file-upload sessions may occur in parallel if a server chooses to implement a mechanism that can support it, and similarly resumable uploads can be implemented as a mechanism. I'll clarify it in the "The new upload API...", 05d7fc2 |
Realization while specifying |
See: 924a27d |
See: f8469cf |
Alternative to inline suggestions:
|
Could you add some sentences about forward compatibility? I'm thinking e.g. about declaring all value enums and dicts as non-exhaustive, so we can add new info both on the server and client side without it being a breaking change. This allows let's say adding variant information (from the WheelNext project) to the
Is it intentional that this means that we can't have an Upload API 2.1 that requires servers that say they're version 2.1 to implement let's say an S3 endpoint? This would be compatible with 2.0 clients since they can still use regular application/octet-stream POST.
As a client implementer, it would be very valuable to have a standard way to pass a token. This wouldn't be about how to obtain the token or its lifetime and semantic, and it would also leave room for non-token auth, but it would solve the current user confusion about different servers using different usernames for their API tokens (we regularly get questions about that in uv). For the session status, do I read it correctly that if the server says an upload is pending, the client has to poll until it is completed, there is no mechanism intended for subscribing to server updates? nit: Instead of |
I'm uncertain about this. I believe that the specific example provided is addressed by the fact that variants would already be encoded into the filename, and I assume (though I'm not fully up to date) in the METADATA file of the wheel. Additional statuses that implementors might need would generally be part of their File Upload Mechanism, which can already document and include its own methods for retrieving detailed status in its responses.
I consider this intentional. Requiring the addition of a mechanism is a large enough change that implementors will have real work to do. I don't forsee that we will ever require mechanisms that can't be "baked in" to an arbitrary server. Requiring S3-compatible protocols of arbitrary servers... no. While requiring implementation of a standard like tus is likely for a future major version.
Understandable, but ultimately I think that enforcing authentication methods through PEP would likely cause trouble . With PEP authoring hat on, I think it would hamstring this PEPs progress. With my PyPI maintainer hat on, I think it would unnecessarily impede future progress on evolving PyPI's authentication mechanisms, Macaroons are already showing some unforeseen consequences and we desperately need to address authentication for more general APIs.
I think that is reasonable, though I don't specifically prefer one option to the other. |
Ope, missed this. Correct there is no method for subscription in this model. |
To me, a lack of auth is a major hurdle to writing a client with Upload API 2.0: Without a way to pass any sorts of credentials, it's impossible to write a client that works universally the way |
Again, this is very understandable, but it is likely to mean that this PEP would bake in HTTP Basic Auth with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewdurbin - apologies for the long delay, but I've finally completed my review of the updated PR text.
TL;DR: LGTM, thank you!
I do have a number of small suggestions for wording and such, and a question or two, but overall, this looks great and I approve. Next, I'll go through and resolve any outstanding conversations, and I'll update the branch via the GH web UI. Then I think it's just a matter of you going through the suggestions and question, and then we can merge. We can talk off-line about posting to DPO after the PR lands.
Co-authored-by: Barry Warsaw <barry@python.org>
Interesting use case 😄. Forward compatibility is different than allowing index-specific variations. I'd be more concerned about the latter introducing variations that aren't standardized, making evolution of the standard more difficult. Forward compatible evolution of the standard without a major version bump should be fine, although a minor version bump probably still makes sense, so clients know what they're interacting with. Regarding the admonition against adding new upload mechanisms without a major version bump, with the above caveat in place, it makes sense to only require a major version bump if a standard mechanism is removed, although frankly I don't expect that to happen in practice.
Like a webhook? That could make sense, e.g. add a URL to the (upload & file-upload) session request payload when they are created. I think we'd have to evaluate a) security implications (e.g. we wouldn't want a package uploader to trigger push notifications to some unrelated service); b) complexity for implementations such as PyPI which don't AFAIK currently have infra to do that. Maybe for 2.0 we can just leave it at polling and think about push notifications for a future rev? |
I thought the same thing when I read this section, but didn't comment, so I'm glad you did! If this change is made, I'd recommend ISO 8601 UTC date time for consistency. |
Co-authored-by: Ee Durbin <ewdurbin@gmail.com>
Um, I committed the suggestion about not allowing server specific mechanisms to bump the major or minor versions. I hope I didn't mess up the rest of the text (GH always surprises me here). Looks like it did resolve the thread though. |
This attempts to defer the implementation details of getting the bits of a given artifact from the client to the server.
The primary motivation here is to decouple this PEP from resumable/multi-part uploads, provide flexibility to implementers of PEP 694, and allow for new upload mechanisms without a PEP cycle.
📚 Documentation preview 📚: https://pep-previews--4431.org.readthedocs.build/