Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge V1.0.4 change -> main: #527

Merged
merged 8 commits into from
Apr 4, 2024
Merged

merge V1.0.4 change -> main: #527

merged 8 commits into from
Apr 4, 2024

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Apr 4, 2024

refactor handling of max size for html/js/css (copy of #525)

  • due to a typo (and lack of type-checking!) incorrectly passed in matchFetchSize instead of maxFetchSize, resulting in text/css/js for >5MB instead of >25MB not properly streamed back to the browser
  • add type checking to AsyncFetcherOptions to avoid this in the future.
  • refactor to avoid checking size altogether for 'essential resources', html(document), js and css, instead always fetch them - fully and continue in the browser. Only apply rewriting if <25MB.
    fixes Website fails to fully load properly since 1.0.0 #522

ikreymer and others added 8 commits March 26, 2024 11:21
)

- subtract extraSeeds when computing limit
- don't include redirect seeds in seen list when serializing
- tests: adjust saved-state-test to also check total pages when crawl is
done

fixes #508
(for 1.0.3 release)
)

sitemap fixes,  follow up to #496
- support parsing sitemap urls that end in .gz with gzip decompression
- support both `application/xml` and `text/xml` as valid sitemap
content-types (add test for both)
- ignore extraHops for sitemap found URLs by setting to past extraHops
limit (otherwise, all sitemap URLs would be treated as links from seed
page)
- due to a typo (and lack of type-checking!) incorrectly passed in
matchFetchSize instead of maxFetchSize, resulting in text/css/js for
>5MB instead of >25MB not properly streamed back to the browser
- add type checking to AsyncFetcherOptions to avoid this in the future.
- refactor to avoid checking size altogether for 'essential resources',
html(document), js and css, instead always fetch them fully and continue
in the browser. Only apply rewriting if <25MB.
- fixes #522
@ikreymer ikreymer merged commit 97b95fd into main Apr 4, 2024
4 checks passed
@ikreymer ikreymer deleted the v1.0.4-release branch April 4, 2024 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Website fails to fully load properly since 1.0.0
1 participant