Implementing Ajax functionality #97

FabianL1 · 2015-11-13T16:14:13Z

Hi there,
i would like to implement ajax functionality via Selenium+PhantomJS. But i really need a starting point: Where is the actual content fetched? How does crawler4j extract links?

If you help me with that i will implement the ajax feature during the next days. You can also contact me @ flurz123@gmail.com

Thanks in advance
Best
fabian

wlqpku · 2015-11-16T10:01:52Z

I did use PhantomJS+casperjs, but I donot implement in crawler4j yet. I think it is good choice to add this extension, should we create a new project for PhantomJS+casperjs for java?

rzo1 · 2015-11-20T11:35:26Z

Link extraction is done here (with the help of a parser class - can be found in the source as well)

Line 316ff: https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java

Fetching is done here:
https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/fetcher/PageFetcher.java

wlqpku · 2016-02-06T02:48:05Z

recently, many site has load the nodejs style in their page. in order to crawl their page, we cannot just simply wget or curl, we need to render the page. can crawl4j do that?

edgarjoao · 2016-03-04T20:59:29Z

Phantomjs doesn't support authentication NTLM 👎 yet ariya/phantomjs#11037

edgarjoao · 2016-03-04T21:00:29Z

In order to accomplish this, you can use htmlunit for ajax processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Ajax functionality #97

Implementing Ajax functionality #97

FabianL1 commented Nov 13, 2015

wlqpku commented Nov 16, 2015

rzo1 commented Nov 20, 2015

wlqpku commented Feb 6, 2016

edgarjoao commented Mar 4, 2016

edgarjoao commented Mar 4, 2016

Implementing Ajax functionality #97

Implementing Ajax functionality #97

Comments

FabianL1 commented Nov 13, 2015

wlqpku commented Nov 16, 2015

rzo1 commented Nov 20, 2015

wlqpku commented Feb 6, 2016

edgarjoao commented Mar 4, 2016

edgarjoao commented Mar 4, 2016