Web Crawler

Description

This is a work in progress and is not yet complete

Web Crawler is an open source technology which will enable users to crawl through the a collection of webpages and executing customized analyzers on each page.

Installation

Add the library to your PHP project using composer.

composer require travy/web-crawler

Use Case

The Crawler will automatically pull all URL addresses listed under an HTML anchor tag on the root URL. Each page that is visited will be run through a collection of Analyzers. These Analyzers can perform various tasks needed for the use of the application such as pruning the markup in order to build a search engine, or almost anything else that can be analyzed.

Custom Analyzer

Analyzers can be created by extending the AbstractAnalyzer class

class MyAnalyzer extends AbstractAnalyzer
{
    public function analyze($url, $html, Dom $parser)
    {
        //  perform tasks
    }
}

Analyzer Registry

The AnalyzerRegistry will contain a list of all Analyzers that should be used while crawling the web. Each analyzer will be assigned a unique key so that fields can be manipulated if needed.

$analyzer = new MyAnalyzer();

$analyzerRegistry = new AnalyzerRegistry();
$analyzerRegistry->registrer($analyzer, 'add-to-database');

$crawler = new Crawler('https://google.com', $analyzerRegsitry);
$crawler->crawl();

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Description

Installation

Use Case

Custom Analyzer

Analyzer Registry

About

Releases

Packages

Languages

License

travy/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Description

Installation

Use Case

Custom Analyzer

Analyzer Registry

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages