Scrapers¶ ↑

Some tools to retrieve text or files from remote Web pages.

Grabit.pl¶ ↑

My first Web scraper. Expects as argument the name of a file containing a newline-delimited list of URLs. When invoked, launches an interactive shell that asks what type of file should be downloaded. Then downloads all the files that are linked from each of the listed Web pages.

Here’s the instructions to use:

Put a list of all the pages you want to scrape, into a text file named FOO
Say perl grabit.pl FOO
You will be prompted to choose which type of file you want to grab.
Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
sample_data		sample_data
README.rdoc		README.rdoc
grabit.pl		grabit.pl
hpricot_helper.rb		hpricot_helper.rb
rwget.rb		rwget.rb
tumblr.pl		tumblr.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapers¶ ↑

Grabit.pl¶ ↑

About

Releases

Packages

Languages

textarcana/scrapers

Folders and files

Latest commit

History

Repository files navigation

Scrapers¶ ↑

Grabit.pl¶ ↑

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages