Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tarfile.open overloads #13441

Merged
merged 1 commit into from
Mar 6, 2025
Merged

Conversation

srittau
Copy link
Collaborator

@srittau srittau commented Jan 27, 2025

  • Allow compresslevel argument for modes w|gz and w|bz2.
  • Remove preset argument from modes where it's not allowed.

Closes: #13440

* Allow `compresslevel` argument for modes `w|gz` and `w|bz2`.
* Remove `preset` argument from modes where it's not allowed.

Closes: python#13440
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

vision (https://github.com/pytorch/vision)
- torchvision/datasets/utils.py:215: note:     def open(name: str | bytes | PathLike[str] | PathLike[bytes] | Buffer | None = ..., *, mode: Literal['r|*', 'r|', 'r|gz', 'r|bz2', 'r|xz'], fileobj: IO[bytes] | None = ..., bufsize: int = ..., format: int | None = ..., tarinfo: type[TarInfo] | None = ..., dereference: bool | None = ..., ignore_zeros: bool | None = ..., encoding: str | None = ..., errors: str = ..., pax_headers: Mapping[str, str] | None = ..., debug: int | None = ..., errorlevel: int | None = ..., preset: int | None = ...) -> TarFile
+ torchvision/datasets/utils.py:215: note:     def open(name: str | bytes | PathLike[str] | PathLike[bytes] | Buffer | None = ..., *, mode: Literal['r|*', 'r|', 'r|gz', 'r|bz2', 'r|xz'], fileobj: IO[bytes] | None = ..., bufsize: int = ..., format: int | None = ..., tarinfo: type[TarInfo] | None = ..., dereference: bool | None = ..., ignore_zeros: bool | None = ..., encoding: str | None = ..., errors: str = ..., pax_headers: Mapping[str, str] | None = ..., debug: int | None = ..., errorlevel: int | None = ...) -> TarFile
- torchvision/datasets/utils.py:215: note:     def open(name: str | bytes | PathLike[str] | PathLike[bytes] | Buffer | None = ..., *, mode: Literal['w|', 'w|gz', 'w|bz2', 'w|xz'], fileobj: IO[bytes] | None = ..., bufsize: int = ..., format: int | None = ..., tarinfo: type[TarInfo] | None = ..., dereference: bool | None = ..., ignore_zeros: bool | None = ..., encoding: str | None = ..., errors: str = ..., pax_headers: Mapping[str, str] | None = ..., debug: int | None = ..., errorlevel: int | None = ..., preset: int | None = ...) -> TarFile
+ torchvision/datasets/utils.py:215: note:     def open(name: str | bytes | PathLike[str] | PathLike[bytes] | Buffer | None = ..., *, mode: Literal['w|', 'w|xz'], fileobj: IO[bytes] | None = ..., bufsize: int = ..., format: int | None = ..., tarinfo: type[TarInfo] | None = ..., dereference: bool | None = ..., ignore_zeros: bool | None = ..., encoding: str | None = ..., errors: str = ..., pax_headers: Mapping[str, str] | None = ..., debug: int | None = ..., errorlevel: int | None = ...) -> TarFile
+ torchvision/datasets/utils.py:215: note:     def open(name: str | bytes | PathLike[str] | PathLike[bytes] | Buffer | None = ..., *, mode: Literal['w|gz', 'w|bz2'], fileobj: IO[bytes] | None = ..., bufsize: int = ..., format: int | None = ..., tarinfo: type[TarInfo] | None = ..., dereference: bool | None = ..., ignore_zeros: bool | None = ..., encoding: str | None = ..., errors: str = ..., pax_headers: Mapping[str, str] | None = ..., debug: int | None = ..., errorlevel: int | None = ..., compresslevel: int = ...) -> TarFile

@Birnendampf
Copy link

Birnendampf commented Feb 16, 2025

There is still something to be considered here:
In the filemode|[compression] modes, tarfile places fewer restrictions on the fileobj’s abilities:
From the docs:

No random seeking will be done on the file. If given, fileobj may be any object that has a read() or write() method (depending on the mode) that works with bytes.

However, typeshed annotates these cases as IO[bytes] which is far too restrictive as it defines 20 abstract methods and properties, of which only one is really required.
As a result, when trying to pass in a custom Wrapper or Dummy classes with only a read() or write() method, a type checker will complain unless said class also defines all 19 other methods and properties or is explicitly cast to IO[bytes]

My proposal is to use SupportsRead[bytes] and SupportsWrite[bytes] here respectively

I’m happy to move this to a separate issue if necessary but I figured this counts as “fixing tarfile.open overloads“

@srittau
Copy link
Collaborator Author

srittau commented Feb 17, 2025

IO (as well as the derived TextIO and BinaryIO) are discouraged as argument types. That said, I consider changing those to protocols as outside the scope of this PR.

@srittau
Copy link
Collaborator Author

srittau commented Feb 25, 2025

I just noticed that I have a second tarfile.open PR open: #13177. Could someone review and merge this so I can rework the other PR, which now has merge conflicts?

@tsibley
Copy link

tsibley commented Mar 5, 2025

+1 for this PR and the related #13177

@@ -260,7 +259,24 @@ def open(
pax_headers: Mapping[str, str] | None = ...,
debug: int | None = ...,
errorlevel: int | None = ...,
preset: int | None = ...,
) -> TarFile: ...
Copy link
Collaborator

@Avasam Avasam Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why modes w| or w|xz couldn't pass a compresslevel kwarg (avoiding an extra overload), it just goes unused doesn't it ?
Or are we being so restrictive with this function as to not allow unused parameters with certain modes? If that's the case, then changes LGTM. Otherwise I'd avoid the extra overload and add compresslevel: int = 9, to mode: Literal["w|", "w|gz", "w|bz2", "w|xz"].
Source: https://github.com/python/cpython/blob/5e73ece95e8aa92d0695acb039ef54e2103ce66b/Lib/tarfile.py#L1889

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intended to prevent a misunderstanding of how this works (I.e. a subtle if harmless bug).

@srittau srittau merged commit ba85e0c into python:main Mar 6, 2025
55 checks passed
@srittau srittau deleted the tarfile-compresslevel branch March 6, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tarfile.open in streaming mode is typed incorrectly
4 participants