Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Ajax functionality #97

Open
FabianL1 opened this issue Nov 13, 2015 · 5 comments
Open

Implementing Ajax functionality #97

FabianL1 opened this issue Nov 13, 2015 · 5 comments

Comments

@FabianL1
Copy link

Hi there,
i would like to implement ajax functionality via Selenium+PhantomJS. But i really need a starting point: Where is the actual content fetched? How does crawler4j extract links?

If you help me with that i will implement the ajax feature during the next days. You can also contact me @ flurz123@gmail.com

Thanks in advance
Best
fabian

@wlqpku
Copy link

wlqpku commented Nov 16, 2015

I did use PhantomJS+casperjs, but I donot implement in crawler4j yet. I think it is good choice to add this extension, should we create a new project for PhantomJS+casperjs for java?

@rzo1
Copy link
Contributor

rzo1 commented Nov 20, 2015

Link extraction is done here (with the help of a parser class - can be found in the source as well)

Line 316ff: https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java

Fetching is done here:
https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/fetcher/PageFetcher.java

@wlqpku
Copy link

wlqpku commented Feb 6, 2016

recently, many site has load the nodejs style in their page. in order to crawl their page, we cannot just simply wget or curl, we need to render the page. can crawl4j do that?

@edgarjoao
Copy link

Phantomjs doesn't support authentication NTLM 👎 yet ariya/phantomjs#11037

@edgarjoao
Copy link

In order to accomplish this, you can use htmlunit for ajax processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants