Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regressions with failOnFailedSeed option #572

Merged
merged 11 commits into from
May 15, 2024

Conversation

tw4l
Copy link
Contributor

@tw4l tw4l commented May 14, 2024

Fixes #563

This PR makes a few changes to fix a regression in behavior around failOnFailedSeed for the 1.x releases:

  • Fail with exit code 1, not 17, when pages are unreachable due to DNS not resolving or other network errors if the page is a seed and failOnFailedSeed is set
  • Extend tests, add test to ensure crawl succeeds on 404 seed status code if failOnINvalidStatus isn't set

(Edited after reverting change related to invalid statuses)

tw4l added 3 commits May 14, 2024 15:48
When pages are unreachable due to DNS not resolving, the crawler
was failing with exit code 17, which conflicts with expected and
documented behavior.
This prevents a breaking change from the 0.x series of releases,
which did not have failOnInvalidStatus, which may otherwise cause
unexpected results for users on upgrading.
@tw4l tw4l requested a review from ikreymer May 14, 2024 19:52
@ikreymer
Copy link
Member

  • Fail crawl if seed has 0/4xx/5xx status code, even if failOnInvalidStatus is not set (otherwise, users who previously only had to pass one argument now have to pass two, which is a breaking change likely to result in unexpected behavior; for non-seeds, we only consider pages with a 4xx/5xx status code a failure if failOnInvalidStatus is set)

Hmm, I think that may be more confusing. The --failOnInvalidStatus was introduced in 1.x, and yes it is a breaking change from 0.x, but I think that's ok, given it was a major version change. I think there should be a way to only fail on invalid/unloadable pages, and pages that do load, but return an error code. The browser does not treat 4xx/5xx responses as failures, as they do load, and #563 is referring to invalid/unloadable URLs specifically it seems.

@tw4l
Copy link
Contributor Author

tw4l commented May 15, 2024

@ikreymer Reverted and tests extended to include status code, which let me catch another issue in the argParser. Should be all good now!

Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cli updates to make it more clear

@ikreymer ikreymer merged commit 8318039 into main May 15, 2024
4 checks passed
@tw4l tw4l deleted the issue-563-fail-on-failed-seed-regression branch May 15, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parameter --failOnFailedSeed exits Docker with ExitCode 0
2 participants