Skip to content

Fix: Widen lxml dependency ceiling (resolves #2019)fix: widen lxml dependency ceiling to allow lxml>=6.x, fixing scrapli…#2050

Open
mrinal22258 wants to merge 1 commit into
unclecode:mainfrom
mrinal22258:fix/lxml-version-pin-2019
Open

Fix: Widen lxml dependency ceiling (resolves #2019)fix: widen lxml dependency ceiling to allow lxml>=6.x, fixing scrapli…#2050
mrinal22258 wants to merge 1 commit into
unclecode:mainfrom
mrinal22258:fix/lxml-version-pin-2019

Conversation

@mrinal22258

Copy link
Copy Markdown

Fixes #2019

Root Cause

Currently, crawl4ai pins lxml to ~=5.3 (resolving to >=5.3,<6.0). However, packages commonly co-installed alongside it, such as scrapling (used for bypassing bot detection), require lxml>=6.1.1.

Because these two ranges have zero overlap:

  1. Poetry dependency resolution fails outright with version solving failed.
  2. Pip installs lxml 5.4.0 last, silently breaking whichever package expected 6.x API compatibility.

In addition, the rigid <6.0 pin causes build failures on Python 3.14 (#1903) where prebuilt wheels for 5.x are not available and compiling from source fails, whereas lxml>=6.0.2 has been confirmed to work.

Proposed Changes

We widen the constraint to lxml>=5.3,<7. This sets a safe upper ceiling under version 7.0 while allowing 6.x versions to be successfully resolved.

Dependency Diff

# pyproject.toml / requirements.txt
-lxml~=5.3
+lxml>=5.3,<7

Verification & Testing

Pip Co-installation: Verified in a clean virtual environment:

pip install -e .
pip install scrapling
python -c "import lxml; from lxml import etree, html; print(lxml.__version__)"
# Output: 6.1.1 (Successful co-installation and load)

Poetry Resolution: Verified in a clean Poetry project:

poetry init
poetry add scrapling
poetry add <local-crawl4ai-path>
# Successfully resolved and wrote lockfile without conflicts.

Test Suite: Ran the regression extraction strategies and GFM table parsing tests to ensure no APIs broke between lxml 5.x and 6.x:

pytest tests/regression/test_reg_extraction.py   # 20 passed
pytest tests/test_table_gfm_compliance.py        # 13 passed

Note: This change may also resolve or mitigate Python 3.14 build issues described in #1903 by allowing the compiler/resolver to pick a newer lxml wheel, though this has not been independently verified on Python 3.14.

@mrinal22258

Copy link
Copy Markdown
Author

Hi @ntohidi,

I noticed your feedback on the dependency upgrade PR about splitting the changes into smaller PRs.

I already opened this focused PR (#2050), which only widens the lxml dependency ceiling to address #2019, with regression tests and dependency resolution verification included.

If this issue is still open and this approach aligns with the project's direction, I'd appreciate it if you could take a look when you have time. If there's anything you'd like changed, I'm happy to update it.

If this is no longer needed or another fix has already been merged, just let me know and I'll close the PR accordingly.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: lxml~=5.3 is too old, can't install crawl4ai & scrapling at the same time

1 participant