Should queued task take care about closing the page? #34

BR0kEN- · 2018-10-06T19:43:07Z

My use case is the following: create a cluster with Cluster.CONCURRENCY_BROWSER and never close it.

const { connect } = require('amqplib');
const { Cluster } = require('puppeteer-cluster');
const { crawler, puppeteerOptions, redis } = require('./docroot');
const { Resource } = require('./docroot/Component');

(async ({ RABBITMQ_USER, RABBITMQ_PASS, RABBITMQ_HOST, RABBITMQ_PORT, RABBITMQ_QUEUE, RABBITMQ_THREADS, REDIS_LIST }) => {
  const cluster = await Cluster.launch({
    monitor: true,
    concurrency: Cluster.CONCURRENCY_BROWSER,
    maxConcurrency: Number(RABBITMQ_THREADS),
    puppeteerOptions,
  });

  const channel = await (await connect(`amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@${RABBITMQ_HOST}:${RABBITMQ_PORT}`)).createChannel();

  channel.assertQueue(RABBITMQ_QUEUE, {
    durable: false,
  });

  await cluster.task(async ({ data, page }) => {
    const { resource, message } = data;
    const metadata = await crawler.crawl(resource, page);

    await redis.rpush(REDIS_LIST, JSON.stringify(metadata));

    channel.ack(message);
  });

  channel.consume(RABBITMQ_QUEUE, message => {
    const content = JSON.parse(message.content.toString('utf8'));
    const resource = new Resource(content.resource);

    if (Array.isArray(content.links_to_check_for)) {
      resource.setLinks(content.links_to_check_for);
    }

    cluster.queue({ resource, message });
  });
})(process.env);

As you can see above, the cluster's queue gets filled once RabbitMQ sends something. This means the process is kinda daemon and shouldn't be stopped. I'm worry about of whether the pages that cluster creates should be closed (await page.close() after const metadata = await crawler.crawl(resource, page);) once not needed anymore or is it done automatically?

The text was updated successfully, but these errors were encountered:

thomasdondorf · 2018-10-07T07:01:42Z

It is done automatically. As soon as your task function finishes (or a timeout happens) the opened resources will be closed. You do not need to worry about that.

If you want to see it in action, you can run puppeteer locally and disable headless mode.

BR0kEN- · 2018-10-08T08:38:36Z

Thanks for the response, @thomasdondorf. Another question came to my mind: is the page.close() done by puppeteer-cluster? Asking because had a lot of Protocol error (Page.navigate): Target closed., stack=Error: Protocol error (Page.navigate): Target closed. during the ~45 mins of run of a process.

BR0kEN- · 2018-10-08T15:35:21Z

Another question: can I control page termination myself without relying on automation? It seems for now it getting closed (lost context) before all operations are completed.

thomasdondorf · 2018-10-08T18:25:59Z

The page will be closed as soon as your task function is executed.

But you can use a Promise to wait until you are done. Then you call the resolve function to close the page.

await cluster.task(async ({ data, page }) => {
    await new Promise((resolve) => {
        // do some asynchronous stuff
        // maybe call an async function like setTimeout?
        setTimeout(() => {
            // do more stuff...
            // When we are done we call resolve() to resolve the promise
            resolve(); // this will finish the task function and the page will be closed
        }, 3000);
    });
});

Be careful with asynchronous functions though. Asynchronously thrown errors cannot be caught by the library. So don't forget try-catch blocks where necessary.

thomasdondorf added the question Further information is requested label Oct 7, 2018

BR0kEN- closed this as completed Oct 8, 2018

BR0kEN- reopened this Oct 8, 2018

thomasdondorf closed this as completed Dec 21, 2018

MinSomai mentioned this issue Nov 6, 2023

[QUESTION] Should page be closed when using long lasting cluster #367

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should queued task take care about closing the page? #34

Should queued task take care about closing the page? #34

BR0kEN- commented Oct 6, 2018

thomasdondorf commented Oct 7, 2018

BR0kEN- commented Oct 8, 2018

BR0kEN- commented Oct 8, 2018

thomasdondorf commented Oct 8, 2018

Should queued task take care about closing the page? #34

Should queued task take care about closing the page? #34

Comments

BR0kEN- commented Oct 6, 2018

thomasdondorf commented Oct 7, 2018

BR0kEN- commented Oct 8, 2018

BR0kEN- commented Oct 8, 2018

thomasdondorf commented Oct 8, 2018