Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify dependency graph #1365

Closed
ulfaslakprecis opened this issue Oct 3, 2023 · 11 comments
Closed

Simplify dependency graph #1365

ulfaslakprecis opened this issue Oct 3, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@ulfaslakprecis
Copy link

Is your feature request related to a problem? Please describe.
We want to use Pandera in our organization's codebase, but a some evaluation deemed it unusable at the moment, due to the ENORMOUS (213 pages long) dependency graph.

Describe the solution you'd like
I think A LOT of dependencies can be removed.

Additional context
I used Pandera in one project and loved it. Tried to push it to the rest of the org as an alternative to Pydantic.

@ulfaslakprecis ulfaslakprecis added the enhancement New feature or request label Oct 3, 2023
@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Oct 3, 2023

Care to provide additional detail and a proposal? E.g. what can be removed and what part of the codebase would need to be refactored?

@ulfaslakprecis
Copy link
Author

I apologize for not providing more detail, but the main problem is just that dependency graph is huge. I'm raising this issue out of concern that this is a problem. It is in our case and it may be so too for others that would like to use it as a dependency.

Screenshot 2023-10-03 at 11 29 16

I can't start going into how this can best be pruned, but I'm sure there is a way. OS libraries will typically have up to tens of packages in their dependency graph, but this package has thousands. Surely, there's something which can be done about this?

I understand if this is an annoying request to make, but I'm only here because I really like Pandera and wish I could use it at work.

@GOGKI
Copy link

GOGKI commented Mar 14, 2024

I appreciate the developers in what they created. I have exactly the same situation as @ulfaslakprecis in my org. Having all dev dependencies in the requirements, means that if we want to productionize it, we will need to create enormous containers just to perform a simple validation. Not all the functionalities are needed for the core, and in terms of other teams that's a clear no go. There are lot's of dependencies management tools (like poetry, or pip-tools), that besides other optionalities target this issue. If you want to go for a native pip solution it is possible as well:
https://peps.python.org/pep-0508/

@ulfaslakprecis
Copy link
Author

@GOGKI I started building pandabear recently. It has a similar/near-identical API to pandera but it ONLY does pandas dataframe/series validation. Still very beta, but input is much appreciated.

@cosmicBboy
Copy link
Collaborator

Happy to support work on making pandera more light-weight. @ulfaslakprecis any appetite for contributing to pandera as opposed to building + maintaining a brand new project?

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Mar 14, 2024

Also wanted to better-understand the issue here. The items listed in the dependency graph are not necessarily what you get when you pip install pandera. The dependencies listed there are an exhaustive list based on all the **/requirements* files and github actions: these are not installed with a plain pip install pandera installation.

Without installing all of the extras, the packages installed are listed here:
https://github.com/unionai-oss/pandera/blob/main/setup.py#L47-L57

That said, I do think we could get rid of multimethod, wrapt, and packaging off the bat. pydantic and typeguard can potentially be cordoned off into their own extras.

Having all dev dependencies in the requirements, means that if we want to productionize it, we will need to create enormous containers just to perform a simple validation

@GOGKI just so I understand this, do you only need to install core pandera when you need to productionize your code? What unexpected/unwanted dependencies do you get?

@cosmicBboy
Copy link
Collaborator

@GOGKI just so I understand this, do you only need to install core pandera when you need to productionize your code? What unexpected/unwanted dependencies do you get?

same question to you @ulfaslakprecis. What dependencies do you consider too heavy weight in your pandera installation (not the dependency graph reported by github, but the ones that are actually installed when you pip install pandera

@z4m0
Copy link

z4m0 commented Apr 10, 2024

@cosmicBboy in our case we are having issues with typeguard. Pandera uses typeguard>=3.0.2 and jaxtyping uses typeguard==2.13.3 which makes them incompatible. So having typeguard as an optional dependency would possibly fix the problem.

Allowing typeguard 2 would also fix our problem.

@cosmicBboy
Copy link
Collaborator

@z4m0 see #1563

@cosmicBboy
Copy link
Collaborator

@ulfaslakprecis any comments on #1365 (comment)?

If not gonna close this issue in the next few days.

Created

To capture slimming down the dependencies of a bare pandera installation, but the initial claim in this issue

We want to use Pandera in our organization's codebase, but a some evaluation deemed it unusable at the moment, due to the ENORMOUS (213 pages long) dependency graph.

is actually a non-issue, since the github-reported dependency graph naively reports dependencies in requirements files and not actually the dependencies entailed by pip install pandera.

@cosmicBboy
Copy link
Collaborator

Closing now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants