Is there a way to see, which page was parrent for an URL we are crawling? #18

huglester · 2015-04-04T06:58:57Z

Hello,

For example I do:
$urls = (new Centipede\Crawler('domain.com'))->crawl();

and then try to Guzzle get all the pages to check for statuses, etc.

But for example I see, that url '/en/abou-us' is not accessible. (gives 404). Is there a way, to check, on which of parent URLs this '/en/about-us' was found? Because it could be that the link is inside 1 article, or so. sot it would be hard hunting.

Thank you!

umpirsky · 2015-06-01T17:01:20Z

@huglester No, but I guess you can override Centipede\Crawler and build a tree of urls.

umpirsky closed this as completed Jun 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to see, which page was parrent for an URL we are crawling? #18

Is there a way to see, which page was parrent for an URL we are crawling? #18

huglester commented Apr 4, 2015

umpirsky commented Jun 1, 2015

Is there a way to see, which page was parrent for an URL we are crawling? #18

Is there a way to see, which page was parrent for an URL we are crawling? #18

Comments

huglester commented Apr 4, 2015

umpirsky commented Jun 1, 2015