Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NumPy dependency to support NumPy 2.x #3951

Open
bmjjr opened this issue Mar 9, 2025 · 1 comment
Open

Update NumPy dependency to support NumPy 2.x #3951

bmjjr opened this issue Mar 9, 2025 · 1 comment

Comments

@bmjjr
Copy link

bmjjr commented Mar 9, 2025

Hi Unstructured team,

I'm opening this issue to request an update to the NumPy dependency constraint in the unstructured library. Currently, the project's constraints.txt file (used during installation) restricts NumPy to versions less than 2 (numpy<2). This creates dependency conflicts with other libraries, particularly ONNX, that now support and recommend NumPy 2.x.

Problem:

The numpy<2 constraint in unstructured's constraints.txt forces a downgrade of NumPy when installing unstructured[all-docs]. This is problematic because:

  1. ONNX Compatibility: The comment in constraints.txt states that the pin is due to ONNX compatibility: # NOTE(robinson) - numpy pin is because ONNX model weights are only compatible with numpy 1.x.x. However, this is outdated. The ONNX project itself now supports NumPy 2.x, and its documentation/requirements reflect this. For example, the current ONNX requirements specify numpy>=1.20. See the onnx requirements on Github: https://github.com/onnx/onnx/blob/main/requirements.txt
  2. Dependency Conflicts: The forced downgrade creates conflicts with other packages in environments that need or benefit from NumPy 2.x features.
  3. Unnecessary Restriction: Many users can benefit from NumPy 2.x and will face installation issues without this update.

Reproduction Steps:

The following steps (using conda and pip) demonstrate the issue. This assumes a clean conda environment with Python 3.12 (or 3.13):

conda create -n test_env python=3.12 -y
conda activate test_env
pip install "unstructured[all-docs]" --no-cache-dir
pip install numpy==2.1.0 --no-cache-dir --force-reinstall #Attempt to use a NumPy 2.x version
#Observe numpy being downgraded to 1.x due to constraints.

Proposed Solution:

Update the NumPy dependency constraint in unstructured (and any related projects like unstructured-inference) to allow NumPy 2.x. A constraint like numpy>=1.26,<3 would be a good starting point, allowing for future NumPy 2.x releases while still providing a lower bound for compatibility. If there are specific known incompatibilities with certain NumPy 2.x versions, a more precise range (e.g., numpy>=2.1.0,<2.3.0) could be used, but a broad numpy>=1.26 is overly restrictive. I believe the correct range should be numpy>=2.1.0,<2.2.0.

Environment:

  • Python version: 3.12 (and 3.13)
  • unstructured version: (Attempted with latest, and from git)
  • onnx version: 1.17.0 (and likely later versions)
  • protobuf: 5.29.1 (and likely later versions)
  • Operating System: Windows (but the issue is likely platform-independent)

Thank you for considering this request. Updating the NumPy dependency will significantly improve the compatibility of unstructured with modern Python data science ecosystems.

@EdAyers
Copy link

EdAyers commented Mar 10, 2025

duplicate of #3684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants