You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm opening this issue to request an update to the NumPy dependency constraint in the unstructured library. Currently, the project's constraints.txt file (used during installation) restricts NumPy to versions less than 2 (numpy<2). This creates dependency conflicts with other libraries, particularly ONNX, that now support and recommend NumPy 2.x.
Problem:
The numpy<2 constraint in unstructured's constraints.txt forces a downgrade of NumPy when installing unstructured[all-docs]. This is problematic because:
ONNX Compatibility: The comment in constraints.txt states that the pin is due to ONNX compatibility: # NOTE(robinson) - numpy pin is because ONNX model weights are only compatible with numpy 1.x.x. However, this is outdated. The ONNX project itself now supports NumPy 2.x, and its documentation/requirements reflect this. For example, the current ONNX requirements specify numpy>=1.20. See the onnx requirements on Github: https://github.com/onnx/onnx/blob/main/requirements.txt
Dependency Conflicts: The forced downgrade creates conflicts with other packages in environments that need or benefit from NumPy 2.x features.
Unnecessary Restriction: Many users can benefit from NumPy 2.x and will face installation issues without this update.
Reproduction Steps:
The following steps (using conda and pip) demonstrate the issue. This assumes a clean conda environment with Python 3.12 (or 3.13):
conda create -n test_env python=3.12 -y
conda activate test_env
pip install "unstructured[all-docs]" --no-cache-dir
pip install numpy==2.1.0 --no-cache-dir --force-reinstall #Attempt to use a NumPy 2.x version#Observe numpy being downgraded to 1.x due to constraints.
Proposed Solution:
Update the NumPy dependency constraint in unstructured (and any related projects like unstructured-inference) to allow NumPy 2.x. A constraint like numpy>=1.26,<3 would be a good starting point, allowing for future NumPy 2.x releases while still providing a lower bound for compatibility. If there are specific known incompatibilities with certain NumPy 2.x versions, a more precise range (e.g., numpy>=2.1.0,<2.3.0) could be used, but a broad numpy>=1.26 is overly restrictive. I believe the correct range should be numpy>=2.1.0,<2.2.0.
Environment:
Python version: 3.12 (and 3.13)
unstructured version: (Attempted with latest, and from git)
onnx version: 1.17.0 (and likely later versions)
protobuf: 5.29.1 (and likely later versions)
Operating System: Windows (but the issue is likely platform-independent)
Thank you for considering this request. Updating the NumPy dependency will significantly improve the compatibility of unstructured with modern Python data science ecosystems.
The text was updated successfully, but these errors were encountered:
Hi Unstructured team,
I'm opening this issue to request an update to the NumPy dependency constraint in the
unstructured
library. Currently, the project'sconstraints.txt
file (used during installation) restricts NumPy to versions less than 2 (numpy<2
). This creates dependency conflicts with other libraries, particularly ONNX, that now support and recommend NumPy 2.x.Problem:
The
numpy<2
constraint inunstructured
'sconstraints.txt
forces a downgrade of NumPy when installingunstructured[all-docs]
. This is problematic because:constraints.txt
states that the pin is due to ONNX compatibility:# NOTE(robinson) - numpy pin is because ONNX model weights are only compatible with numpy 1.x.x
. However, this is outdated. The ONNX project itself now supports NumPy 2.x, and its documentation/requirements reflect this. For example, the current ONNX requirements specify numpy>=1.20. See the onnx requirements on Github: https://github.com/onnx/onnx/blob/main/requirements.txtReproduction Steps:
The following steps (using conda and pip) demonstrate the issue. This assumes a clean conda environment with Python 3.12 (or 3.13):
Proposed Solution:
Update the NumPy dependency constraint in
unstructured
(and any related projects likeunstructured-inference
) to allow NumPy 2.x. A constraint likenumpy>=1.26,<3
would be a good starting point, allowing for future NumPy 2.x releases while still providing a lower bound for compatibility. If there are specific known incompatibilities with certain NumPy 2.x versions, a more precise range (e.g.,numpy>=2.1.0,<2.3.0
) could be used, but a broadnumpy>=1.26
is overly restrictive. I believe the correct range should benumpy>=2.1.0,<2.2.0
.Environment:
unstructured
version: (Attempted with latest, and from git)onnx
version: 1.17.0 (and likely later versions)protobuf
: 5.29.1 (and likely later versions)Thank you for considering this request. Updating the NumPy dependency will significantly improve the compatibility of
unstructured
with modern Python data science ecosystems.The text was updated successfully, but these errors were encountered: