Skip to content

Enqueue strategy check after redirects is not working with adaptive crawler #2525

@B4nan

Description

@B4nan
Member

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/playwright (PlaywrightCrawler)

Issue description

use enqueueLinks() without any parameters in the request handler on https://crawlee.dev/, at some point it will escape the domain and start scraping everything

https://console.apify.com/actors/PFaajt3k6oOp1YRAU/runs/0SfY5Ocr1dgQjhSIS#log

Code sample

import { PlaywrightCrawler } from 'crawlee';
import { Actor } from 'apify';

await Actor.init();

const crawler = new PlaywrightCrawler({
    proxyConfiguration: await Actor.createProxyConfiguration(),
});
crawler.router.addDefaultHandler(async (ctx) => {
    const $ = await ctx.parseWithCheerio();
    const title = $('html title').text();
    const h1 = $('body h1').text();
    const proxy = ctx.proxyInfo?.username;
    ctx.log.info(`processing ${ctx.request.url}`, { title, h1, proxy });
    await ctx.pushData({ url: ctx.request.url, title, h1 });
    await ctx.enqueueLinks();
});
await crawler.run(['https://crawlee.dev/']);
await Actor.exit();

Package version

3.10.3 beta

Node.js version

20

Operating system

No response

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

Activity

added
bugSomething isn't working.
on Jun 7, 2024
added
t-toolingIssues with this label are in the ownership of the tooling team.
on Jun 7, 2024
janbuchar

janbuchar commented on Jun 7, 2024

@janbuchar
Contributor

Thanks for the report! Are you aware if there is a page that redirects elsewhere somewhere in the crawlee docs, or is the actual enqueueStrategy check failing (and not the post-redirect check)?

B4nan

B4nan commented on Jun 7, 2024

@B4nan
MemberAuthor

looking at the storage, it feels like its not about redirects, we have the edit this page links in there too

image

few more links here, i don't think they come from redirect either

image
B4nan

B4nan commented on Jun 7, 2024

@B4nan
MemberAuthor

it almost feels like the adaptive enqueueLinks is not checking the strategies at all, maybe its not about the post-redirect check at all

added a commit that references this issue on Jun 25, 2025
8a3b6f8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @janbuchar@B4nan

    Issue actions

      Enqueue strategy check after redirects is not working with adaptive crawler · Issue #2525 · apify/crawlee