Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with multiple tabs #13

Closed
marcusdiy opened this issue Feb 20, 2024 · 18 comments
Closed

Doesn't work with multiple tabs #13

marcusdiy opened this issue Feb 20, 2024 · 18 comments
Labels
question Further information is requested Solved This problem has been solved.

Comments

@marcusdiy
Copy link

I was wondering was is different about this project. I get that it helps to bootstrap the real browser and connect to it.
But what i dont get is why you launch 2 browser, first one that is hidden and the next one that is being used.
Doesn't launching a single browser directly and connect to it have the same effect with much less overhead?

@zfcsoftware
Copy link
Owner

Hello, the purpose of the second recommended usage is to be able to perform tasks such as installing Chrome extensions. When starting Chromium and providing the command to load a Chrome extension into flags, it does not install the extension. This situation applies only during the initial use. By default, a Chromium instance is launched and connected with Puppeteer without allowing for loading Chrome extensions when connecting with .connect. Therefore, those who do not require actions like installing extensions may prefer the default usage. However, for tasks such as installing Chrome extensions or specifying args, there is a need for the second usage.

@zfcsoftware zfcsoftware added the question Further information is requested label Feb 20, 2024
@marcusdiy
Copy link
Author

marcusdiy commented Feb 20, 2024

In your examples you dont refer to extensions as it seems to be the main point now? there is no example too...
Btw doesn't the --load-extension= argument allow the installing of extensions? Sorry, I'm no expert just want to understand.
Maybe there is some caveat that I'm missing?

@zfcsoftware
Copy link
Owner

In your examples you dont refer to extensions as it seems to be the main point now? there is no example too... Btw doesn't the --load-extension= argument allow the installing of extensions? Sorry, I'm no expert just want to understand. Maybe there is some caveat that I'm missing?

The primary browser is not launched with puppeteer. We are launching it with Chromium. I tried to install an extension on Chromium via command line for a few hours, but it didn't work or I couldn't succeed, and I didn't have time to investigate this deeply. Therefore, I presented 2 usages. The first one starts Chromium and connects with puppeteer.connect to return the browser and page variables. Only 1 browser opens in this method. However, the Chrome extension cannot be loaded in this case.
The other one starts a chromium and returns the connection port. The rest of the process is up to the user. Just like when starting a normal puppeteer with .launch, you can use whatever you were using in that usage here as well.
https://stackoverflow.com/questions/67049065/puppeteer-unable-to-load-chrome-extension-in-browser
The reason why instructions for loading extensions are not provided in the second usage is that its usage is exactly the same as puppeteer's entirely. You can smoothly load the extension just like in the link provided above. However, a total of 2 browsers are run in this case.
Since it did not receive much attention by receiving stars from packages, I cannot dedicate much time to development. In the future, I plan to solve problems such as loading extensions and create a structure where all processes can be done with a single browser instance.

@marcusdiy
Copy link
Author

Hmm strange, did you try passing the path of the extention or the .crx? Because the later seems to be more problematic.
I can launch chrome via nodes spawn, providing the extention folder path, it works with headless too.

@zfcsoftware
Copy link
Owner

Hmm strange, did you try passing the path of the extention or the .crx? Because the later seems to be more problematic. I can launch chrome via nodes spawn, providing the extention folder path, it works with headless too.

I tried different folder types in crx. However, when I tried, the Chromium library example was starting with chromium library. Now I am starting with @sparticuz/chromium. Maybe extensions can be loaded in this browser example.
If you can add your code and usage method with a video, I can update the library as soon as possible.

@marcusdiy
Copy link
Author

marcusdiy commented Feb 21, 2024

It goes something like this
puppeteer-load-extention

@zfcsoftware
Copy link
Owner

zfcsoftware commented Feb 21, 2024

It goes something like this puppeteer-load-extention

Thank you for your contributions. I will update the library soon.

@marcusdiy
Copy link
Author

marcusdiy commented Feb 21, 2024

You are welcome, the fact of using two browsers was just looking a bit strange to me...maybe there are other advantages that Im missing. Anyway .crx installation probably can be automated too becouse its just a zip in the end.

@marcusdiy
Copy link
Author

marcusdiy commented Feb 21, 2024

Btw there is something wrong with my approach as it seems to fail cloudflare baypass... where you method works.
Maybe missing some flag? Will double check.

@marcusdiy
Copy link
Author

marcusdiy commented Feb 21, 2024

At the end it seems to be boiling down to this. I still cant figure out why and how it works tho 😂
Why did you add that check? Thanks
image

@marcusdiy
Copy link
Author

marcusdiy commented Feb 21, 2024

Got a guess. Using your script if I call await browser.newPage() it wont work. But why?
Maybe its because its being filtered by targetFilter as so its kinda disconnected from puppeteer?
And thats why its not being detected by cloudflare too... so maybe there is a need for a way to connect and disconnect

@zfcsoftware zfcsoftware added the Solved This problem has been solved. label Feb 25, 2024
@zfcsoftware
Copy link
Owner

Got a guess. Using your script if I call await browser.newPage() it wont work. But why? Maybe its because its being filtered by targetFilter as so its kinda disconnected from puppeteer? And thats why its not being detected by cloudflare too... so maybe there is a need for a way to connect and disconnect

targetfilter prevents this. But I will definitely try to find a solution, I have updated the library. Thank you for your contributions.

@marcusdiy
Copy link
Author

marcusdiy commented Feb 27, 2024

Yes, thanks for the update. Here a test case.

import { connect } from 'puppeteer-real-browser'
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth'

puppeteer.use(StealthPlugin())
function sleep(ms) { return new Promise(res => setTimeout(res, ms)) }

connect({
    headless: 'auto', args: [], customConfig: {}, skipTarget: [],
    fingerprint: true, turnstile: true, connectOption: {},
    // proxy:{ host:'',  port:'', username:'', password:'' }
}).then(async response => {
    const { browser, page } = response
    await page.goto('https://nopecha.com/demo/cloudflare');
    console.log('Now a new tab should appear');
    let page2 = await browser.newPage();
    console.log('...and it should goto cloudflare test page, but it won\'t');
    await page2.goto('https://nopecha.com/demo/cloudflare');
}).catch(error => {
    console.log(error.message)
})

@marcusdiy marcusdiy reopened this Feb 27, 2024
@marcusdiy marcusdiy changed the title Question: whats the point? Doesn't work with multiple tabs Feb 27, 2024
@zfcsoftware
Copy link
Owner

Yes, thanks for the update. Here a test case.

import { connect } from 'puppeteer-real-browser'
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth'

puppeteer.use(StealthPlugin())
function sleep(ms) { return new Promise(res => setTimeout(res, ms)) }

connect({
    headless: 'auto', args: [], customConfig: {}, skipTarget: [],
    fingerprint: true, turnstile: true, connectOption: {},
    // proxy:{ host:'',  port:'', username:'', password:'' }
}).then(async response => {
    const { browser, page } = response
    await page.goto('https://nopecha.com/demo/cloudflare');
    await page.evaluate(() => { document.body.style.background = '#d9ffe7' });
    console.log('Now a new tab should appear');
    let page2 = await browser.newPage();
    console.log('...and it should goto cloudflare test page, but it won\'t');
    await page2.goto('https://nopecha.com/demo/cloudflare');
}).catch(error => {
    console.log(error.message)
})

I'll post an update soon

@marcusdiy
Copy link
Author

marcusdiy commented Feb 27, 2024

maybe a solution would be to disconnect puppeteer if certain url matches and connect to it later after making sure its not a protected page... btw the problem might still bite back if they check the framework presence on following pages

@zfcsoftware
Copy link
Owner

let page2 = await browser.newPage();
console.log('...and it should goto cloudflare test page, but it won't');
await page2.goto('https://nopecha.com/demo/cloudflare');

I have released an update, could you please try and use it as it is in the readme file?

@marcusdiy
Copy link
Author

Sure thing

@marcusdiy
Copy link
Author

Works, great success 👍
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Solved This problem has been solved.
Projects
None yet
Development

No branches or pull requests

2 participants