GrabCartoons
GrabCartoons is a comic-summarizing utility. It is modular, and it is easy to write modules for new comics.
You can see a sample of grabcartoons output here.
Table of Contents
Installation
You can download the latest source code for this project in either zip or tar formats. It should run as-is on most modern Perl installations.
You can also clone this git repository:
git clone https://github.com/zzamboni/grabcartoons.git
You can run ./grabcartoons.pl
directly from within the source directory, or run make install
to install it under /usr/local
. You can specify the PREFIX
variable if you want to install somewhere else (e.g. make install PREFIX=/some/path
).
Grabcartoons works out of the box on Linux/Unix/macOS. Windows is not explicitly supported, but it can be made to work with some changes. See #11 for details.
Basic usage example: Usage
./grabcartoons.pl sinfest xkcd savage_chickens gocomics.com:gasoline > sample-output.html
And then open sample-output.html in your web browser.
Full set of options:
./grabcartoons.pl --help
GrabCartoons version 2.8.4 Usage: ./grabcartoons.pl [ options ] [ comic_id ...] --all or -a generate a page with all the known comics on stdout. --list [t:] or -l produce a list of the known comic_id's on stdout. If t: is given, the list of comics from the given template is produced. --htmllist [t:] produce HTML list of known comic_id's on stdout. If t: is given, the list of comics from the given template is produced. --file or -f read list of comics from specified file. --random n select n comics at random (they will be output after any other comics requested) --write or -w write output to specified file instead of stdout. --version or -V print version number --verbose or -v be verbose --help or -h print this message. --notitles or -t do not show comic titles (for those that have them) --templates produce a list of defined templates --genmodules for any template specifications (template:comictag), write a snippet to comictag.pl in the directory specified by --genout. --genout dir output directory for generated comics. (default: /Users/taazadi1/.grabcartoons/modules) By default, it will produce a page with the given comics on stdout. comic_id can be: - Any of the predefined modules (e.g. sinfest, adam_at_home) - Of the form 'template:comic title', including quotes if the title has spaces (e.g. 'gocomis.com:Citizen Dog', comics.com:Frazz). This will generate on the fly a module for the given comic. - Of the form 'template:*' or 'template:', which means "all the comics from the named template". This can also be passed as argument to the --list and --htmllist options to produce the listing from the given template instead of from the built-in modules.
Available comics
You can see the list of available comics with using the --list
or --htmllist
options.
Here’s the list of comics for which we currently have modules:
- Abstruse Goose (abstrusegoose)
- Achewood (achewood)
- Adam@Home (adam_at_home)
- A Girl And Her Fed (agirlandherfed)
- Alien Loves Predator (alien_loves_predator)
- Applegeeks (applegeeks)
- A Softer World (asofterworld)
- Atland (atland)
- Better Book Titles (betterbooktitles)
- Bloom County 2019 (bloom-county)
- Bloom County (bloom-county-old)
- Buttersafe (buttersafe)
- Calvin and Hobbes (calvin_and_hobbes)
- Camp Weedonwantcha (campcomic)
- Cathy Classics (cathy)
- Chopping Block (choppingblock)
- Cow and Boy (cowandboy)
- Ctrl+Alt+Del (ctrlaltdel)
- Dan’s Daily Cartoon (danscartoons)
- Dick Tracy (dick_tracy)
- Diesel Sweeties (diesel_sweeties)
- Dilbert (dilbert)
- Dinosaur Comics (dinosaur_comics)
- Doonesbury (doonesbury)
- Errant Story (errantstory)
- Extra Ordinary (extraordinary)
- Full Frontal Nerdity (ffn)
- Formal Sweatpants (formalsweatpants)
- FoxTrot (foxtrot)
- Garfield (garfield)
- Get Fuzzy (getfuzzy)
- Glasbergen (glasbergen)
- Goats (goats)
- Goblins (goblins)
- Girls with Slingshots (gws)
- Herman (herman)
- Irregular Webcomic (irregular)
- The Joy of Tech (joy_of_tech)
- Junior Scientist Power Hour (jspowerhour)
- Kevin and Kell (kevin_and_kell)
- The Last Halloween (lasthalloween)
- Liberty Meadows (liberty_meadows)
- Lighter than Heir (lighter_than_heir)
- Little Gamers (little_gamers)
- MacHall (machall)
- MegaTokyo (megatokyo)
- Monty (monty)
- Mother Goose & Grimm (mother_goose)
- Scenes From A Multiverse (multiverse)
- Nedroid (nedroid)
- 9 to 5 (nine_to_five)
- Nodwick (nodwick)
- Non Sequitur (non_sequitur)
- The Oatmeal (oatmeal)
- Off the Mark (offthemark)
- Order of the Stick (oots)
- Pearls Before Swine (pearls)
- Penny Arcade (penny_arcade)
- Piled Higher and Deeper (phd)
- Power Nap (powernap)
- pVp (pvp)
- Questionable Content (questionable_content)
- Real Life Adventures (real_life_adventures)
- Red Meat (redmeat)
- Robot Hugs (robot_hugs)
- Rose is Rose (rose_is_rose)
- Savage Chickens (savage_chickens)
- Schlock Mercenary (schlock_mercenary)
- Sherman’s Lagoon (sherman)
- Shit Happens (shithappens)
- Sinfest (sinfest)
- Skadi (skadi)
- Sluggy Freelance (sluggy_freelance)
- Saturday Morning Breakfast Cereal (smbc)
- Sufficiently Remarkable (sufficiently_remarkable)
- The Trenches (the_trenches)
- The Zombie Hunters (the_zombie_hunters)
- Three Panel Soul (three_panel_soul)
- Toothpaste for Dinner (toothpastefordinner)
- Unshelved (unshelved)
- User Friendly (user_friendly)
- What’s Normal Anyway? (whatsnormalanyway)
- Wondermark (wondermark)
- xkcd (xkcd)
- Zen Pencils (zenpencils)
- Ziggy (ziggy)
Templates
GrabCartoons also includes templates that allow you to fetch any comic from a given site or using a common mechanism. At the moment we have the following templates:
Templates defined: arcamax.com Comics hosted at arcamax.com comics.com Comics hosted at gocomics.com comicskingdom.com Comics hosted at comicskingdom.com gocomics.com Comics hosted at gocomics.com og-image Comics that can be extracted from the og:image property on their page
Templates define a common way of fetching all the comics from certain sites (such as comics.com or comicskingdom.com) that host multiple comic strips, or by using a common mechanism (e.g. sites that publish their latest comic using the og:image
property). If a template exists, you can easily define new modules for comics from that site, or even request them on the fly without having to write a module, by specifying the comic_id as template:title
.
How to define your own comics:
Modules are defined in files with .pl
extension which specify where and how to fetch the comic.
Each comic definition is a set of pair/value keys assigned as a Perl hashref to an element of the %COMIC
hash. For example:
If the comic is from a site for which a template exists, the definition is even easier, you just have to specify the comic name and the template. For example:
Each template defines how to automatically convert the comic title into a “tag” (which normally becomes part of the URL for the comic). If the automatic conversion does not work appropriately, you can manually specify the tag. For example:
The key used for the %COMIC
hash is the “short name” of the comic. The valid fields in the hash are:
- Title
- title of the comic
- Page
- URL where to get it
- Regex
- regex to obtain image, must put the image in
$1
(the first parenthesized group) - LinkRelImageSrc
- if true, the image URL will be automatically obtained from the first
<link rel = "image_src">
element in the page. This is increasingly being used by web comics to ease sharing on Facebook and other sites. If this flag is specified noRegex
or other method needs to be specified. - MultipleMatches
- if true, then all matches of
Regex
will be returned, concatenated, after doing any changes specified bySubstOnRegexResult
orPrepend
/Append
on each element. IfMultipleMatches
is in effect, then the result of$1
+SubstOnRegexResult
+Prepend
/Append
is expected to be an HTML snippet, not just an image URL. - ExtraImgAttrsRegex
- regular expression to obtain additional attributes of the comic’s
<img>
tag. It has to match on the same line thatRegex
matches. If not specified, a generic text is used for the “alt” image attribute. - TitleRegex
- regular expression to capture the title of the comic. It can match on any line before Regex matches. If it does not match, no title is displayed (just the comic name). Only works for comics for which
Regex
is also defined. - SubstOnRegexResult
- an array of two- or three-element array references containing
[ regex, string, [global] ]
. If specified, the substitution specified by each element will be applied to the string captured byRegex
or byStartRegex
/EndRegex
, before applying anyPrepend
/Append
strings. Each tuple will be applied in the order they are specified. If “global” is given and true, a global replace will be done, otherwise only the first ocurrence will be replaced. The replacement string may include other fields, referenced as{FieldName}
. - Prepend/Append
- strings to prepend or append to
$1
(or to the string captured byStartRegex
/EndRegex
) before returning it. May make use of other fields, referenced as{FieldName}
. - StartRegex/EndRegex
- regular expressions that specify the first and last lines to capture. The matching lines are included in the output if
InclusiveCapture
is true, and not included ifInclusiveCapture
is false (the default). IfEndRegex
is not specified, everything fromStartRegex
to the end of the page is captured. IfRegex
is also specified, it is only matched for inside the region defined byStartRegex
/EndRegex
. - InclusiveCapture
- true/false value that specifies whether the lines that match
StartRegex
/EndRegex
should be returned in the output. False by default. - RedirectMatch / RedirectURLCapture / RedirectURLAppend / RedirectURLPrepend / MultipleRedirects
- These parameters control generalized redirection support. By default, these parameters are set so that standard redirection using the
META REFRESH
tag is followed, but can be set to redirect on arbitrary patterns. This is how it works: if theRedirectMatch
regex matches on any line of the page, then theRedirectURLCapture
pattern is applied to the same line, and should contain one capture group which returns the new URL to fetch and use. IfRedirectURLAppend
/RedirectURLPrepend
are specified, these strings are concatenated with the result of the capture group before using it as the new URL. By default theRedirect*
patterns are passed NOT along when fetching the new page, to prevent infinite redirection. This behavior can be modified by settingMultipleRedirects
to a true value, so that multiple redirects using the same parameters are supported. - StaticURL
- static image URL to return
- StaticHTML
- static HTML snippet to return
- Function
- a function to call. It receives the comic snippet as argument, and must return
($html, $title, $error)
. - NoShowTitle
- if true, do not display the title of the comic (for those that always have it in the drawing).
- Template
- if present, specified a template that will be used for this comic (e.g. for comics coming from a single syndicated site, so the mechanism is the same for all of them) Essentially the fields from the template and the
$COMIC
snippet are merged and then processed in the usual way. If the template contains a_Template_Code
atribute, it is executed on the merged snippet before processing it. Templates are defined in the file modules/20templates.pl.
Precedence (from higher to lower) is Function
, StaticURL
, StaticHTML
, StartRegex
/ EndRegex
and Regex
.
Both Regex
and StartRegex
/ EndRegex
use Page
, and optionally Prepend
, Append
, ExtraImgAttrsRegex
, TitleRegex
and SubstOnRegexResult
.
StartRegex
/ EndRegex
optionally uses InclusiveCapture
.
Comic definitions are loaded from the modules
directory, from your $HOME/.grabcartoons
modules directory, and from any directories (separated by colons) contained in the GRABCARTOONS_DIRS
environment variable.
The easiest way is probably to take one of the existing modules and base yours on that.
Contributions
If you develop any new modules, please share them! You can either post them to the project’s issue tracker, or fork the project, add your modules, and submit a pull request.