Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default entry point is being updated incorrectly #478

Closed
vyasr opened this issue Oct 10, 2023 · 2 comments · Fixed by #475
Closed

Default entry point is being updated incorrectly #478

vyasr opened this issue Oct 10, 2023 · 2 comments · Fixed by #475

Comments

@vyasr
Copy link
Contributor

vyasr commented Oct 10, 2023

Minimal, reproducible code sample, a copy-pastable example if possible

I don't think it's worth trying to create an MRE for this situation, please see below.

Problem description

This has turned out to be a much more subtle issue than I first thought. The problem looks like it's due to a combination of two factors:

  1. The interface of the EntryPoint object changing over time, and
  2. The monkey-patching of importlib.metadata by importlib_metadata

To start with, we need to look at the evolution of importlib_metadata. Until importlib_metadata version 4.13.0, the EntryPoints class had an __iter__ method. This method was deprecated much earlier (sometime around 3.x) but was not removed until version 5.0.0. Meanwhile, version 4.8.1 introduced support for EntryPoints.__getitem__ via the DeprecatedTuple class, functionality which had been supported in previous versions but had been removed.

Now, when we roll around to version 5.0.0, the removal of EntryPoints.__iter__ meant that iteration over an EntryPoints object instead fell back to calling __getitem__ (this is how iterables are designed in Python). The problem is that the __getitem__ implementation would in turn call _key, which would return a tuple (self.name, self.value, self.group). This is a problem because the old __iter__ implementation instead returned iter((self.name, self)).

The relevant code in numcodecs for Python 3.9 is assuming that we'll get the second interface, such that given a tuple of EntryPoint objects we'll be able to process it as key-value pairs of the form (name, EntryPoint), which works with the old explicit __iter__ API but not the new __getitem__ implicit iteration API.

A major part of the complexity comes from the fact that importlib_metadata was provisionally added to the standard library as importlib.metadata in Python 3.8, then made a full part of the standard in 3.10. As of Python 3.9, the version of the code still had an EntryPoints object that defined __iter__, i.e. it was matching importlib_metadata<5.0.0. Therefore, the code in numcodecs seems like it should work. However, importlib_metadata actually monkey-patches parts of importlib.metadata:

>>> from importlib.metadata import entry_points
>>> type(entry_points()[next(iter(entry_points()))][0])
<class 'importlib.metadata.EntryPoint'>
>>> import importlib_metadata
>>> type(entry_points()[next(iter(entry_points()))][0])
<class 'importlib_metadata.EntryPoint'>

Note that on the final line we go from importlib.metadata to importlib_metadata. This is the root of the problem: if you have a newer version of importlib_metadata installed in Python 3.9, you start seeing EntryPoints objects with the new iteration protocol instead of the old one. Therefore, in the Python 3.9 scenario numcodecs needs to handle both possibilities.

The simplest solution is to avoid relying on the iteration protocol altogether and simply construct the appropriate update argument manually. This is what I now do in #475.

Version and installation information

Please provide the following:

  • Value of numcodecs.__version__: 0.12.0
  • Version of Python interpreter: 3.9.18
  • Operating system (Linux/Windows/Mac): Linux (Ubuntu 20.04
  • How NumCodecs was installed (e.g., "using pip into virtual environment", or "using conda"): Using conda or pip

Also, if you think it might be relevant, please provide the output from pip list or
conda list depending on which was used to install NumCodecs.

@martindurant
Copy link
Member

Kerchunk has pinned back numcodecs due to CI failing with current entryponts implementations (e.g., https://github.com/fsspec/kerchunk/actions/runs/6436907706/job/17481181465?pr=372#step:5:58 ). Please advise when fixed, or if there's anything I can do.

@jakirkham
Copy link
Member

Yeah we had to do the same in RAPIDS unfortunately

Reading over Vyas' nice writeup above would help

Reviewing the PR: #475

Suggesting a test we can use to ensure we don't reintroduce this accidentally in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants