ENH: Support Plugin Accessors Via Entry Points #61499

PedroM4rques · 2025-05-26T19:12:20Z

TLDR: Allows external libraries to register DataFrame accessors using the 'pandas_dataframe_accessor' entry point group. This enables plugins to be automatically used without explicit import.

I'm working on this PR collaboratively with @afonso-antunes .

closes GraphQL support / accessor plugin system #29076
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Proposal

We propose implementing an entrypoint system similar to Vaex (#29076) to allow easy access to the functionalities of any installed plugin without requiring explicit imports. The idea is to make all installed packages available for use, only being "imported" when they are needed in the program, in a seamless manner.

Current Behavior

Currently, each plugin must be explicitly imported:

import pandas as pd
import vaex.graphql  # required to enable .graphql (.graphql is compatible with pd.DataFrames)

df = pd.DataFrame(...)
df.graphql.query(...)  # only works after the import

Proposed Behavior

With our feature implemented, the code would be simplified to:

import pandas as pd

df = pd.DataFrame(...)
df.graphql.query(...)  # works directly if the plugin is installed via pip

…9076) Allows external libraries to register DataFrame accessors using the 'pandas_dataframe_accessor' entry point group. This enables plugins to be automatically used without explicit import. Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

afonso-antunes · 2025-05-26T23:57:03Z

Most of the errors are from:
from importlib_metadata import entry_points

datapythonista · 2025-05-27T15:30:57Z

I'm personally happy to add this, but it's kind of a big change in terms of the code users can write. @pandas-dev/pandas-core, thoughts here?

If this moves forward, you'll want to fix the CI and add documentation for this.

Dr-Irv · 2025-05-27T15:49:47Z

I'm personally happy to add this, but it's kind of a big change in terms of the code users can write. @pandas-dev/pandas-core, thoughts here?

If this moves forward, you'll want to fix the CI and add documentation for this.

Hard to understand without documentation that illustrates a use case.

datapythonista · 2025-05-27T15:57:34Z

Hard to understand without documentation that illustrates a use case.

We allow creating pandas accessors with register_dataframe_accessor. For cyberpandas for example, when you do import cyberpandas they'd call that, and you'll be able to use the accessor:

df.ip.is_ipv6

If we allow registration via Python entry points, the import cyberpandas won't be needed anymore. On import pandas we will check the packages the user has installed in their environment that provide an accessor, and we will register them automatically.

Same idea as what PDEP-9 proposed for the read_* and to_* functions/methods if you remember that.

Dr-Irv · 2025-05-27T16:08:50Z

On import pandas we will check the packages the user has installed in their environment that provide an accessor, and we will register them automatically.

Couldn't that be really expensive if lots of packages were installed in the environment? What if there were conflicts in naming?

TomAugspurger · 2025-05-27T16:13:50Z

Agreed this would need docs, but I'm generally +1 on using entry points rather than import-time side effects.

Couldn't that be really expensive if lots of packages were installed in the environment?

It's scoped to projects that declare an entrypoint, so it doesn't scale with every package installed. But it would be good to measure the performance impact on import pandas here, both for an environment with several and without any entrypoints installed.

What if there were conflicts in naming?

That's probably defined somewhere in importlib, but the same issue would affect plugins using import-time registration today.

Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

afonso-antunes · 2025-05-28T09:10:22Z

Since the implementation seems now stable, would it make sense to start working on the documentation already, or would you prefer we wait for further maintainer input?

datapythonista · 2025-05-28T11:47:02Z

What if there were conflicts in naming?

I think by default, the last package found for the entry point with that name will overwrite the previous. As Tom says, this is the same as with imports now, the second import will overwrite the accessor of the first. But we have control over it when registering the entry points. We could keep the behaviour but show a warning, raise an exception and ask the user to remove one of the packages (probably not a great option), let the user decide which package has higher priority in the config... Since this should be very rare, I would go for the simplest solution that doesn't "fail" silently, which would be show a warning saying something like Both packageA and packageB provide the accessor foo. packageA is being used, please uninstall the package you don't want to use to remove this warning.

would it make sense to start working on the documentation already, or would you prefer we wait for further maintainer input?

Up to you @PedroM4rques. The more complete is this PR the easier is for everybody to understand what it's proposed. But if at the end there is no agreement to add this, you'll be spending time in a PR that won't get merged.

PedroM4rques · 2025-05-29T22:10:06Z

I think by default, the last package found for the entry point with that name will overwrite the previous

From what I could test locally, this is true.

We could keep the behaviour but show a warning [or] raise an exception and ask the user to remove one of the packages

I think raising an exception is the better approach, as this is a critical error. It’s likely a rare scenario, and if it does occur intentionally, the user can always handle it explicitly (try-catch and pass for example). I think the plugin system would be safer this way.
I can also imagine that it's possible implement a system where the user chooses the plugins to throw away in the catch block, would that be desirable?

datapythonista · 2025-05-30T08:22:47Z

I don't think we should raise an exception. Imagine a case where someone workimg with dna has two packages installed that provide a dna accessor. The user doesn't even care about the accessors, it's using the packages independently of pandas. Raising meams that the user needs to uninstall one of the packagea they need in order to use pandas. It doesn't make any sense in my opinion. Ideally we would just inform which accessor pandas will use, in case the user cares. And how to change it if needed. Which probably should be with an option, but since at present is an extfemely rare scenario I wouldn't make things complex to implement it.

PedroM4rques · 2025-05-30T10:38:12Z

Raising means that the user needs to uninstall one of the packages they need in order to use pandas

I agree, that wouldn't make any sense.

Unless there are any objections, we'll implement the warning system.

rhshadrach · 2025-05-30T18:51:18Z

Are there other mainstream packages using entrypoints?

TomAugspurger · 2025-05-30T19:43:47Z

Yeah, xarray, fsspec, pytest are a few.

Entry points can be a good option anytime you have som sort of plugin system that requires coordinating how a "framework" (pandas in this case) loads code provided by a plugin.

datapythonista · 2025-05-30T19:48:42Z

We already use entrypoints in pandas for the plotting backends. Besides what Tom said, if I'm not wrong many projects using commands (e.g. jupyter <command>, black <command>, flake8 <command>) are implemented with entrypoints. Airflow plugins are also using them. I don't think they are super popular, but surely not experimental or rare.

rhshadrach · 2025-05-31T10:43:21Z

Thanks @TomAugspurger. Looking at prior art, all options to deal with collisions exist.

My personal preference would be to warn.

datapythonista

I guess there are no objections to this. Added some comments.

This will need proper documentation, so libraries can use this, and users can understand what's going on.

Also, this will need better tests.

Do you mind letting us know which third-party accessor are you planning to implement this on? Or what's the motivation for this work.

Thanks!

datapythonista · 2025-06-01T09:02:29Z

pandas/__init__.py

+
+from pandas.core.accessor import DataFrameAccessorLoader
+
+DataFrameAccessorLoader.load()


Why a class with a single method?

And why only doing this for DataFrame, not for Series?

Me and @PedroM4rques are currently planning to use this for a third-party accessor related to Vaex (see related discussion in issue #29076 . Our main motivation is to provide a structured and maintainable way to register external accessors without cluttering the core codebase.

We initially opted for a class with a single method (load) mostly as a pragmatic choice, since we're not yet deeply familiar with all the internals of Pandas. It seemed like a clean and extensible way to isolate the registration logic. That said, if a standalone function would be preferred, we’re absolutely open to changing it.

Regarding the focus on DataFrame only: we started with that use case since it was our immediate need, but extending this to Series or other objects makes sense and could certainly be part of the plan going forward. Would you recommend covering that already in this PR?

Would you recommend covering that already in this PR?

Yes, I'd say so. It won't be much more complicated and any libraries that provide both Series and DataFrame accessors wanting to will want these at the same time.

We should probably add the same for Index too.

Better to use a function than class with a single method. I doubt it'll never have more methods, but if it does, we can always change later, as this is private.

- Added tests - created doc .rst file Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

Dr-Irv · 2025-06-03T20:28:39Z

Above, I raised 3 issues:

Documentation is needed
Concern about performance when people don't have packages using entry points, and import pandas as pd just takes longer because there are a lot of packages installed. This probably has more to do with the performance of importlb.entry_points() than anything else.
Concern about duplicates and conflicts.

I don't think (1) or (2) have been discussed.

For (3), the suggestion of warning if a conflict occurs is fine with me.

datapythonista · 2025-06-03T20:36:22Z

Tom commented about 2. The entry points are a registry. I think the cost is just a lookup of the entry point name in a hash table. It shouldn't depend on the amount of packages installed. So it's just the loop over the packages that register an accessor that exist in the user environment. Even if this becomes popular, I'd be surprised the number is more than around 5. I don't think there should be any impact in practice.

But worth benchmarking, better to be sure.

- Added 1 test for no packages Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

datapythonista

Thanks for the changes. Added some comments that hopefully can be helpful. I'll have another look when the accessors for all datastructures are implemented, which will change this PR significantly.

datapythonista · 2025-06-04T11:10:49Z

doc/source/user_guide/entry_points.rst

@@ -0,0 +1 @@
+TODO


This is the main document you want to update: https://pandas.pydata.org/docs/development/extending.html

Probably a comment here too: https://pandas.pydata.org/docs/reference/series.html#accessors (and for dataframe and index maybe).

I think for most users, if they are using a package providing an accessor, they'll already get the idea on how this works from the package documentation.

datapythonista · 2025-06-04T11:13:20Z

pandas/__init__.py

+
+from pandas.core.accessor import DataFrameAccessorLoader
+
+DataFrameAccessorLoader.load()


We should probably add the same for Index too.

Better to use a function than class with a single method. I doubt it'll never have more methods, but if it does, we can always change later, as this is private.

datapythonista · 2025-06-04T11:15:24Z

pandas/core/accessor.py

+class DataFrameAccessorLoader:
+    """Loader class for registering DataFrame accessors via entry points."""
+
+    ENTRY_POINT_GROUP: str = "pandas_dataframe_accessor"


I don't think we want different entrypoints for each data type, so pandas_accessor should be better.

datapythonista · 2025-06-04T11:16:36Z

pandas/core/accessor.py

+
+
+class DataFrameAccessorLoader:
+    """Loader class for registering DataFrame accessors via entry points."""


Do you mind adding proper documentation here? Explaining how the entrypoints need to be implemented, when this is called, what happens with conflicts...

datapythonista · 2025-06-04T11:20:28Z

pandas/core/accessor.py

+
+            if name in names:  # Verifies duplicated package names
+                warnings.warn(
+                    f"Warning: you have two packages with the same name: '{name}'. "


Something like The accessor 'foo' has already been registered by the package 'bar', so the accessor provided by the package 'foobar' is not being registered seems like more useful to the users. If I remember correctly, it was a bit tricky but possible to get the pip name of the package registering the entry point.

datapythonista · 2025-06-04T11:28:04Z

pandas/core/accessor.py

+    def load(cls) -> None:
+        """loads and registers accessors defined by 'pandas_dataframe_accessor'."""
+        eps = entry_points(group=cls.ENTRY_POINT_GROUP)
+        names: set[str] = set()


Can you use more descriptive variable names please? names doesn't mean much. This is registered_accessor_names I guess?

Also eps, ep aren't great names in my opinion.

- Added DocStrs - Fixed small typo in test file name Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

PedroM4rques and others added 2 commits May 26, 2025 20:07

add whatsnew entry && type annotations

ded0b0d

datapythonista added API Design Needs Discussion labels May 27, 2025

PedroM4rques and others added 2 commits May 27, 2025 17:53

fix typo in importlib.metadata

f2a036e

trying to fix the pipeline errors

fcfa155

Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

datapythonista reviewed Jun 1, 2025

View reviewed changes

datapythonista removed the Needs Discussion label Jun 1, 2025

feat: Warning for duplicated packages

678b2dc

- Added tests - created doc .rst file Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

PedroM4rques changed the title ~~ENH: Support plugin DataFrame accessor via entry points (#29076)~~ ENH: Support plugin DataFrame accessor via entry points Jun 3, 2025

Compliance with pre-commit

2793f91

- Added 1 test for no packages Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

datapythonista reviewed Jun 4, 2025

View reviewed changes

Improve var and class names

01a1bfc

PedroM4rques and others added 3 commits June 4, 2025 14:52

Improve warning message

2807683

rmv entry_points.rst

d2066a1

Refactor: class to fn && Fix: warning

03885c5

- Added DocStrs - Fixed small typo in test file name Co-authored-by: Afonso Antunes <afonso.antunes@tecnico.ulisboa.pt>

PedroM4rques changed the title ~~ENH: Support plugin DataFrame accessor via entry points~~ ENH: Support Plugin Accessors Via Entry Points Jun 5, 2025


		from pandas.core.accessor import DataFrameAccessorLoader

		DataFrameAccessorLoader.load()



		class DataFrameAccessorLoader:
		"""Loader class for registering DataFrame accessors via entry points."""

		@@ -0,0 +1 @@
		TODO

Uh oh!

ENH: Support Plugin Accessors Via Entry Points #61499

Are you sure you want to change the base?

ENH: Support Plugin Accessors Via Entry Points #61499

Conversation

PedroM4rques commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal

Current Behavior

Proposed Behavior

Uh oh!

afonso-antunes commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista commented May 27, 2025

Uh oh!

Dr-Irv commented May 27, 2025

Uh oh!

datapythonista commented May 27, 2025

Uh oh!

Dr-Irv commented May 27, 2025

Uh oh!

TomAugspurger commented May 27, 2025

Uh oh!

afonso-antunes commented May 28, 2025

Uh oh!

datapythonista commented May 28, 2025

Uh oh!

PedroM4rques commented May 29, 2025

Uh oh!

datapythonista commented May 30, 2025

Uh oh!

PedroM4rques commented May 30, 2025

Uh oh!

rhshadrach commented May 30, 2025

Uh oh!

TomAugspurger commented May 30, 2025

Uh oh!

datapythonista commented May 30, 2025

Uh oh!

rhshadrach commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afonso-antunes Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dr-Irv commented Jun 3, 2025

Uh oh!

datapythonista commented Jun 3, 2025

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PedroM4rques commented May 26, 2025 •

edited

Loading

afonso-antunes commented May 26, 2025 •

edited

Loading

rhshadrach commented May 31, 2025 •

edited

Loading

afonso-antunes Jun 3, 2025 •

edited

Loading