php-curl-scraper

A simple web scraper written using PHP + cURL. Lets you crawl a website and extract info. Can be used via the CLI or web.

This is essentially a MVP/starting point that could be used for a specific objective, given a bit of tweaking/expansion.

Usage

Generic

url: url to start scraping from, will assume http:// if scheme isn't provided
limit: maximum number of pages to scrape before stopping

CLI

php scrape.php url pageLimit [json: true|false]
php scrape.php http://insecure.com 5
php scrape.php https://secure.com 5 true

Outputs via print_r unless 3rd argument is true, then it prints JSON.

WEB

.../scrape.php?url=example.com&limit=5

Outputs JSON.

Caveats

Link parsing ignores query params and hashes
"/" and "/index.ext" are currently treated as two different pages
Error handling is pretty minimal
Page class can be expanded to extract additional info from scraped pages, then accessed via the ScrapeController
You can set a specific user agent string in CurlUtil

Testing

test/ folder contains a set of basic pages to crawl

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Controllers		Controllers
Util		Util
test		test
LICENSE.md		LICENSE.md
README.md		README.md
autoloader.php		autoloader.php
index.php		index.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

php-curl-scraper

Usage

Generic

CLI

WEB

Caveats

Testing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

webtopcoder/fb_leads_scrape

Folders and files

Latest commit

History

Repository files navigation

php-curl-scraper

Usage

Generic

CLI

WEB

Caveats

Testing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages