playwrightUtils for blocking patterns #2810
-
I am currently using https://crawlee.dev/api/playwright-crawler/namespace/playwrightUtils#blockRequests and the extraUrlPatterns to block certain urls. Some of these urls are ones like www.googletagmanager.com or images.taboola.com. However in my stats with my proxy provider it seems my scraper is still letting these through. Am I doing something wrong or does this not work the way I am thinking it does? I recall blocking stuff more effectively in puppeteer and have tried a few things in playwright but none seem to really work correctly. Any ideas? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Hi @tsrseerist and thanks for your interest in Crawlee! Could you please provide a minimal code snippet that reproduces the error so that we can troubleshoot this more efficiently? |
Beta Was this translation helpful? Give feedback.
-
Sure:
I use a route for doing my scraping but this is basically all I am doing. |
Beta Was this translation helpful? Give feedback.
-
Sure:
There's really not much to it, just something basic like this should work. Also let me know if I am using extraUrlPatterns the correct way to. |
Beta Was this translation helpful? Give feedback.
Can you try calling
blockRequests
in a pre-navigation hook (https://crawlee.dev/api/playwright-crawler/interface/PlaywrightCrawlerOptions#preNavigationHooks) instead of a request handler?