Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxies #368

Open
kc1nn4y opened this issue Jan 23, 2021 · 10 comments
Open

Proxies #368

kc1nn4y opened this issue Jan 23, 2021 · 10 comments

Comments

@kc1nn4y
Copy link

kc1nn4y commented Jan 23, 2021

Is it possible to use different proxies per browser instance?
I want to create something so that every instance has a different proxy through which the browser will retrieve information.

@ejames17
Copy link

you can pass in browser configs via puppeteerOptions.

    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_PAGE,
        maxConcurrency: 2,
        puppeteerOptions: {
            headless: false,
            devTools: true,
            ignoreHTTPSErrors: true,
            timeout: 0,
            args:  [
                    '--no-sandbox',
                    '--disable-setuid-sandbox', 
                  '--window-size=1920,1080',
               '--proxy-server=http://localhost:8888'
            ],
            ignoreDefaultArgs: ['--enable-automation']
        }
    });

@amunim
Copy link

amunim commented Feb 26, 2021

+1, I also would like to use different proxy to each browser instance. @Yannicko have you found a solution?

@amunim
Copy link

amunim commented Mar 1, 2021

I found the solution literally a few searches later after I posted the comment.

Anyway anyone else looking for a solution use this: proxy-per-page

@farruhsydykov
Copy link

Somehow it doesn't work for me :/

@hatemjaber
Copy link

hatemjaber commented Dec 30, 2021

I tried puppeteerOptions and perBrowserOptions individually and together at the same time and the proxy is completely ignored.

@code-ric
Copy link

code-ric commented Jan 14, 2022

I tried puppeteerOptions and perBrowserOptions individually and together at the same time and the proxy is completely ignored.

I experience the same behavior. Have you found a solution to this yet?

Edit:
It's a bit late, but I found a solution to this problem if you are using a proxy-server. Please continue reading:

First of all I created a new Concurrency by copying the Browser-Concurrency and renamed it to BrowserProxy.
Then I changed the code in the workerInstance to check if the options contain the --proxy-server argument like this:

class BrowserProxy extends ConcurrencyImplementation_1.default {

  ...

  let page;
  let context; // puppeteer typings are old...
  const proxyServer = options.args.find(arg => arg.includes('--proxy-server=')).split('--proxy-server=')[1] || null;
  const contextOptions = {proxyServer: proxyServer ? proxyServer : null};
  return {
    jobInstance: () => __awaiter(this, void 0, void 0, function* () {

  ...

If so the proxy-server value will be saved and provided to the createIncognitoBrowserContext like this:

  ...

  jobInstance: () => __awaiter(this, void 0, void 0, function* () {
                      yield util_1.timeoutExecute(BROWSER_TIMEOUT, (() => __awaiter(this, void 0, void 0, function* () {
                          context = yield chrome.createIncognitoBrowserContext(contextOptions);
                          page = yield context.newPage();
                      }))());
                      return {
  ...

After that make changes to all the Concurrency files so your Concurrency can be used by puppeteer-cluster like this:

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSERPROXY,
    maxConcurrency: 1,
    timeout: properties.taskTimeout, 
    puppeteerOptions: {
        headless: false,
        ignoreHTTPSErrors: true,
        args: [
          `--proxy-server=${proxy_server}`,
          '--no-sandbox',
        ]
    },
    puppeteer: puppeteer,
    monitor: false,
    retryLimit: 3,
    retryDelay: 3500
});

There is probably a better way to handle that, but this was my first approach in fixing this issue.
Let me know if that helped you in any way.

@hatemjaber
Copy link

@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:

I created a proxyServer variable with the proxy server for this instance:
const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;

and i changed context to:
context = await chrome.createIncognitoBrowserContext({ proxyServer });

@code-ric
Copy link

code-ric commented Jan 17, 2022

@cedricdsc I'm sorry for the delayed response, just got a chance to reply to your question. I did something similar to you but a little different, here's my solution:

I created a proxyServer variable with the proxy server for this instance: const proxyServer = chrome.process()?.spawnargs.find(it => it.startsWith("--proxy-server"))?.split("=")[1] || undefined;

and i changed context to: context = await chrome.createIncognitoBrowserContext({ proxyServer });

That's another way to do it. Good you found it too 👍

@RestfuI
Copy link

RestfuI commented Jan 18, 2023

Hi all,

I have found a solution for those who may have been struggling with a lack of proxy support per request or per browser in puppeteer-cluster. I was able to achieve this by utilising the proxy-per-page package.

I hope this solution helps others in a similar situation. Please see the example code below for implementation details.

proxies.json

[
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx",
    "xxx.xxx.xxx.xxx:xxxxx"
]

index.ts

import { Cluster } from 'puppeteer-cluster';
import ProxyList from '../proxies.json';
import useProxy from 'puppeteer-page-proxy';

(async () => {

    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
        monitor: true,        
        puppeteerOptions: {
            headless: false
        }
    });

    await cluster.task(async ({page, data: url}) => {

        await useProxy(page, `direct://${getProxy()}`);

        await page.goto(url);
    });

    cluster.queue('https://ipinfo.io');
    cluster.queue('https://ipinfo.io');

    await cluster.idle();
    await cluster.close();
})();

function getProxy() {
    return ProxyList[Math.floor(Math.random() * ProxyList.length)];
}

The end result can be seen in the screenshot:
image

@joone
Copy link

joone commented Mar 15, 2024

I have implemented proxy support in my forked version, available at: https://github.com/joone/headless-cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants