-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Fix raw:// URL parsing logic
#752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
853fc29 to
27af4cc
Compare
WalkthroughThe code modifies the handling of raw HTML URLs in the crawler strategy to correctly strip both "raw:" and "raw://" prefixes when extracting HTML content. A new test and fixture are added to verify that both prefix variants are processed correctly and return the expected HTML. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as test_raw_urls
participant Strategy as AsyncPlaywrightCrawlerStrategy
participant Response as AsyncCrawlResponse
Test->>Strategy: crawl(raw:<html...> or raw://<html...>)
alt URL starts with "raw://"
Strategy->>Strategy: Extract HTML after "raw://"
else URL starts with "raw:"
Strategy->>Strategy: Extract HTML after "raw:"
end
Strategy->>Response: Return AsyncCrawlResponse(html)
Test->>Test: Assert response.html == basic_html
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes found. Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🔇 Additional comments (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Summary
Fixes #1118
List of files changed and why
How Has This Been Tested?
By implementing a parametrized test that calls AsyncPlaywrightCrawlerStrategy.crawl with a "raw:" and "raw://" URL and checks that the resulting HTML is correct.
Checklist:
Summary by CodeRabbit
Bug Fixes
Tests