Skip to content

Conversation

@christopher-w-murphy
Copy link

@christopher-w-murphy christopher-w-murphy commented Dec 1, 2025

Summary

Fixes #1642

List of files changed and why

crawl4ai/__init__.py

I added the missing ./deep_crawling objects ContentRelevanceFilter and ContentTypeScorer. This was the simplest fix as it avoids the need to enhance the deserialization logic.

crawl4ai/async_configs

I implemented some of part one of the suggested fix. This isn't strictly necessary given the above change. However, it will make it easier to make further changes to enhance the deserialization logic.

  • Imported importlib
  • Modified from_serializable_dict() to make it possible search multiple modules in order. However, given the above change, it is still currently only search in the crawl4ai module
  • In contrast with part one of the suggested fix, I did not add special handling for FilterChain in from_serializable_dict() since it isn't necessary.

crawl4ai/deep_crawling/filters.py

Here ContentRelevanceFilter was updated to address part two of the suggested fix.

  • Added "query" to __slots__ for serialization support
  • Modified __init__() to accept both list and string for query

crawl4ai/docker_client

For testing purposes I needed to be able to pass a timeout parameter to self._request. The parameter hooks_default has a default value that I didn't change, so this won't affect production.

tests/docker/test_filter_deep_crawl.py

I added three test cases involving the ContentRelevanceFilter to reproduce the issue and then confirm that my updates to the code fixed the issue. There are now five test cases in total.

  • Docker client
  • REST API w/ string query
  • REST API w/ list query

How Has This Been Tested?

  1. Build the Docker container locally: IMAGE=local-test docker compose up
  2. Run the tests on the updated code: python ~/crawl4ai/tests/docker/test_filter_deep_crawl.py

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@christopher-w-murphy christopher-w-murphy changed the title [fix] Docker server does not decode ContentRelevanceFilter [Fix]: Docker server does not decode ContentRelevanceFilter Dec 1, 2025
@christopher-w-murphy christopher-w-murphy marked this pull request as ready for review December 1, 2025 21:50
@ntohidi ntohidi changed the base branch from main to develop December 3, 2025 10:30
@ntohidi
Copy link
Collaborator

ntohidi commented Dec 3, 2025

Thank you for your contribution 🚀

@ntohidi ntohidi merged commit 5a8fb57 into unclecode:develop Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Docker server does not decode ContentRelevanceFilter

4 participants