-
I'm working on a web scraping project that requires both browser automation and direct HTTP requests. Could the maintainers suggest best practices for mixing Playwright-based page handling with regular HTTP requests in Crawlee? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
How to Switch from Playwright Login to HTTP Crawler After Cookie Acquisition? |
Beta Was this translation helpful? Give feedback.
-
Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a crawler = HttpCrawler({
// ...
sessionPoolOptions: {
sessionOptions: {
cookieJar: {
"yourCookie": "value"
} // this can be a toughcookie.CookieJar instance as well
}
}
})
await crawler.run() If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the |
Beta Was this translation helpful? Give feedback.
Hello @kwdiwt and thank you for your interest in Crawlee! If you don't need to use a
BrowserPool
, I suggest that you just perform the login usingplaywright
without any Crawlee wrappers, retrieve the cookies and use them to construct yourHttpCrawler
(or any of its subclasses -CheerioCrawler
etc.):If you need to perform the login for each new session (perhaps to avoid getting blocked), you can use the
createSessionFunction
option (https://crawlee.dev/api/core/inter…