Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl nested html documents loaded dynamically #185

Closed
ab14p opened this issue Mar 29, 2018 · 5 comments
Closed

Crawl nested html documents loaded dynamically #185

ab14p opened this issue Mar 29, 2018 · 5 comments
Labels

Comments

@ab14p
Copy link

ab14p commented Mar 29, 2018

What is the current behavior?

#150

With robots.txt set to false

Crawler waits using WaitFor and a specified timeout of 10 seconds, but still not able to crawl nested documents. For example, I want to extract src of an <iframe> which is an AD (display advertisement).

Enabled screen shot option to check if the ad iframe has loaded before evaluatePage function was executed. I can see the ad in screen shot but function does not return the src from <iframe>.

What is the expected behavior?

Be able to crawl nested html documents such as ADs which are loaded dynamically by Java Script.

Can you please provide an example or solution for this..

Please tell us about your environment:

  • Version: 1.5.0
  • Platform / OS version:
  • Node.js version: 6.11.2
@BubuAnabelas
Copy link

Maybe you could use the waitUntiloption in the crawler.queue() method in combination with waitFor, or passing a function to waitFor checking if the ad was loaded.

@yujiosaka
Copy link
Owner

@ab14p
Isn't it because it was blocked by Cross Origin Policy?
Please try passing --disable-web-security argument. See https://github.com/yujiosaka/headless-chrome-crawler#launch-options for more details.

@ab14p
Copy link
Author

ab14p commented Apr 2, 2018

@yujiosaka @BubuAnabelas
I used waitUntil: 'networkidle0' and waitFor: { selectorOrFunctionOrTimeout: 20000 }
I did try setting up Cross Origin policy in launch options as shown below, but still no luck.
args: ['--disable-web-security']

@yujiosaka
Copy link
Owner

@ab14p
Still not encountered the same situation.
Can you provide the code?

@yujiosaka
Copy link
Owner

Closing this issue because no information is provided for a month.
Please feel free to reopen it if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants