-
Notifications
You must be signed in to change notification settings - Fork 897
Closed
Closed
Copy link
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
use enqueueLinks()
without any parameters in the request handler on https://crawlee.dev/, at some point it will escape the domain and start scraping everything
https://console.apify.com/actors/PFaajt3k6oOp1YRAU/runs/0SfY5Ocr1dgQjhSIS#log
Code sample
import { PlaywrightCrawler } from 'crawlee';
import { Actor } from 'apify';
await Actor.init();
const crawler = new PlaywrightCrawler({
proxyConfiguration: await Actor.createProxyConfiguration(),
});
crawler.router.addDefaultHandler(async (ctx) => {
const $ = await ctx.parseWithCheerio();
const title = $('html title').text();
const h1 = $('body h1').text();
const proxy = ctx.proxyInfo?.username;
ctx.log.info(`processing ${ctx.request.url}`, { title, h1, proxy });
await ctx.pushData({ url: ctx.request.url, title, h1 });
await ctx.enqueueLinks();
});
await crawler.run(['https://crawlee.dev/']);
await Actor.exit();
Package version
3.10.3 beta
Node.js version
20
Operating system
No response
Apify platform
- Tick me if you encountered this issue on the Apify platformTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
I have tested this on the next
release
No response
Other context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
janbuchar commentedon Jun 7, 2024
Thanks for the report! Are you aware if there is a page that redirects elsewhere somewhere in the crawlee docs, or is the actual enqueueStrategy check failing (and not the post-redirect check)?
B4nan commentedon Jun 7, 2024
looking at the storage, it feels like its not about redirects, we have the
edit this page
links in there toofew more links here, i don't think they come from redirect either
B4nan commentedon Jun 7, 2024
it almost feels like the adaptive
enqueueLinks
is not checking the strategies at all, maybe its not about the post-redirect check at allfix: Fix link filtering in enqueueLinks in AdaptivePlaywrightCrawler (a…