Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should queued task take care about closing the page? #34

Closed
BR0kEN- opened this issue Oct 6, 2018 · 4 comments
Closed

Should queued task take care about closing the page? #34

BR0kEN- opened this issue Oct 6, 2018 · 4 comments
Labels
question Further information is requested

Comments

@BR0kEN-
Copy link

BR0kEN- commented Oct 6, 2018

My use case is the following: create a cluster with Cluster.CONCURRENCY_BROWSER and never close it.

const { connect } = require('amqplib');
const { Cluster } = require('puppeteer-cluster');
const { crawler, puppeteerOptions, redis } = require('./docroot');
const { Resource } = require('./docroot/Component');

(async ({ RABBITMQ_USER, RABBITMQ_PASS, RABBITMQ_HOST, RABBITMQ_PORT, RABBITMQ_QUEUE, RABBITMQ_THREADS, REDIS_LIST }) => {
  const cluster = await Cluster.launch({
    monitor: true,
    concurrency: Cluster.CONCURRENCY_BROWSER,
    maxConcurrency: Number(RABBITMQ_THREADS),
    puppeteerOptions,
  });

  const channel = await (await connect(`amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@${RABBITMQ_HOST}:${RABBITMQ_PORT}`)).createChannel();

  channel.assertQueue(RABBITMQ_QUEUE, {
    durable: false,
  });

  await cluster.task(async ({ data, page }) => {
    const { resource, message } = data;
    const metadata = await crawler.crawl(resource, page);

    await redis.rpush(REDIS_LIST, JSON.stringify(metadata));

    channel.ack(message);
  });

  channel.consume(RABBITMQ_QUEUE, message => {
    const content = JSON.parse(message.content.toString('utf8'));
    const resource = new Resource(content.resource);

    if (Array.isArray(content.links_to_check_for)) {
      resource.setLinks(content.links_to_check_for);
    }

    cluster.queue({ resource, message });
  });
})(process.env);

As you can see above, the cluster's queue gets filled once RabbitMQ sends something. This means the process is kinda daemon and shouldn't be stopped. I'm worry about of whether the pages that cluster creates should be closed (await page.close() after const metadata = await crawler.crawl(resource, page);) once not needed anymore or is it done automatically?

@thomasdondorf
Copy link
Owner

It is done automatically. As soon as your task function finishes (or a timeout happens) the opened resources will be closed. You do not need to worry about that.

If you want to see it in action, you can run puppeteer locally and disable headless mode.

@thomasdondorf thomasdondorf added the question Further information is requested label Oct 7, 2018
@BR0kEN- BR0kEN- closed this as completed Oct 8, 2018
@BR0kEN- BR0kEN- reopened this Oct 8, 2018
@BR0kEN-
Copy link
Author

BR0kEN- commented Oct 8, 2018

Thanks for the response, @thomasdondorf. Another question came to my mind: is the page.close() done by puppeteer-cluster? Asking because had a lot of Protocol error (Page.navigate): Target closed., stack=Error: Protocol error (Page.navigate): Target closed. during the ~45 mins of run of a process.

@BR0kEN-
Copy link
Author

BR0kEN- commented Oct 8, 2018

Another question: can I control page termination myself without relying on automation? It seems for now it getting closed (lost context) before all operations are completed.

@thomasdondorf
Copy link
Owner

The page will be closed as soon as your task function is executed.

But you can use a Promise to wait until you are done. Then you call the resolve function to close the page.

await cluster.task(async ({ data, page }) => {
    await new Promise((resolve) => {
        // do some asynchronous stuff
        // maybe call an async function like setTimeout?
        setTimeout(() => {
            // do more stuff...
            // When we are done we call resolve() to resolve the promise
            resolve(); // this will finish the task function and the page will be closed
        }, 3000);
    });
});

Be careful with asynchronous functions though. Asynchronously thrown errors cannot be caught by the library. So don't forget try-catch blocks where necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants