Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle errors like "Connection broken: IncompleteRead" #725

Closed
wschoot opened this issue Nov 10, 2022 · 5 comments · Fixed by #787
Closed

How to handle errors like "Connection broken: IncompleteRead" #725

wschoot opened this issue Nov 10, 2022 · 5 comments · Fixed by #787

Comments

@wschoot
Copy link

wschoot commented Nov 10, 2022

I'm tracking a website that sometimes gives me an errormessage like:

('Connection broken: IncompleteRead(7450 bytes read, 646 more expected)', IncompleteRead(7450 bytes read, 646 more expected))

The configuration I'm using includes the following statements that seem to have no effect on this particular error:

ignore_connection_errors: true
ignore_http_error_codes: 1xx, 4xx, 5xx
timeout: 0

I've also tried "treating" it as a timeout, by setting a stricter timeout and ignoring timeout errors like so:

ignore_connection_errors: true
ignore_http_error_codes: 1xx, 4xx, 5xx
ignore_timeout_errors: true
timeout: 10

But it doesn't really help anything. What else can I try? This is urlwatch v2.25 on Linux

@thp
Copy link
Owner

thp commented Nov 18, 2022

Have you checked whether the website gives invalid Content-length headers? Or if it's just a temporary situation under load? We could have a separate ignore_incomplete_reads: true kind of configuration. Want to make a PR? :)

@wschoot
Copy link
Author

wschoot commented Nov 18, 2022

I was unable to manually test this as it only happens sometimes. I didn't yet put the effort in to make a cronjob for curl and save the output to be able to retrace the calls. I'm not too comfortable with python so making PR's is not my forte I'm afraid :)

@wfrisch
Copy link
Contributor

wfrisch commented Jan 24, 2024

A minimal reproducer that serves incomplete HTTP chunks:
https://gist.github.com/wfrisch/bc00bfa049f2aab76dbb73215b1f5bb5

I have regularly observed the same problem in the wild here: https://www.mozilla.org/en-US/security/advisories/

("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

wfrisch added a commit to wfrisch/urlwatch that referenced this issue Jan 24, 2024
Sometimes web servers return incomplete responses, triggering an
`InvalidChunkLength` exception in urlwatch. Enable this job option to
ignore these errors.

thp#725
@wfrisch
Copy link
Contributor

wfrisch commented Jan 24, 2024

Adding this option was straightforward.
Feature branch: https://github.com/wfrisch/urlwatch/tree/feat/ignore_incomplete_reads

Steps to reproduce:
Run ./http-serve-incomplete-chunks.py (https://gist.github.com/wfrisch/bc00bfa049f2aab76dbb73215b1f5bb5)

Before:
urls.yaml:

name: "incomplete-chunk-server"
url: "http://localhost:8080"
urlwatch --urls urls.yaml
[...]
("Connection broken: InvalidChunkLength(got length b'\\r\\n', 0 bytes read)", InvalidChunkLength(got length b'\r\n', 0 bytes read))

After:
urls.yaml:

name: "incomplete-chunk-server"
url: "http://localhost:8080"
ignore_incomplete_reads: true
./urlwatch --urls urls.yaml

→ exit code 0

wfrisch added a commit to wfrisch/urlwatch that referenced this issue Jan 24, 2024
@wfrisch
Copy link
Contributor

wfrisch commented Feb 1, 2024

An improved reproducer now also emulates regular incomplete reads (wrong Content-Length),
as requested in the first comment:
https://gist.github.com/wfrisch/63d1163645fa01e3ab1296e752769359

cat urls.yaml

url: "http://localhost:8080/invalid-content-length"
  # ignore_incomplete_reads: true
---
url: "http://localhost:8080/invalid-chunk-length"
  # ignore_incomplete_reads: true

urlwatch --urls.yaml

===========================================================================
01. ERROR: http://localhost:8080/invalid-content-length
02. ERROR: http://localhost:8080/invalid-chunk-length
===========================================================================

---------------------------------------------------------------------------
ERROR: http://localhost:8080/invalid-content-length
---------------------------------------------------------------------------
('Connection broken: IncompleteRead(13 bytes read, 10 more expected)', IncompleteRead(13 bytes read, 10 more expected))
---------------------------------------------------------------------------


---------------------------------------------------------------------------
ERROR: http://localhost:8080/invalid-chunk-length
---------------------------------------------------------------------------
("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
---------------------------------------------------------------------------

The new option silences both errors.

wfrisch added a commit to wfrisch/urlwatch that referenced this issue Feb 14, 2024
wfrisch added a commit to wfrisch/urlwatch that referenced this issue Feb 14, 2024
Sometimes web servers return incomplete responses, triggering an
`InvalidChunkLength` exception in urlwatch. Enable this job option to
ignore these errors.

thp#725
wfrisch added a commit to wfrisch/urlwatch that referenced this issue Feb 14, 2024
wfrisch added a commit to wfrisch/urlwatch that referenced this issue Feb 14, 2024
wfrisch added a commit to wfrisch/urlwatch that referenced this issue Feb 14, 2024
@thp thp closed this as completed in #787 Feb 15, 2024
thp pushed a commit that referenced this issue Feb 15, 2024
* Add job option `ignore_incomplete_reads`.

Sometimes web servers return incomplete responses, triggering an
`InvalidChunkLength` exception in urlwatch. Enable this job option to
ignore these errors.

#725
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants