GrabCartoons is a comic-summarizing utility. It is modular, and it is very easy to write modules for new comics.
Perl Perl 6 JavaScript CSS Makefile
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
modules
.gitignore
ChangeLog
LICENSE
Makefile
README
convert.pl
grabcartoons.pl

README

grabcartoons
------------
Web page: http://zzamboni.org/grabcartoons/

This is the program I use for viewing comics on the web. It's not as
sophisticated as many of the other nice programs out there (check
freshmeat.net, search for "comics" or "strips"), but it does the job
for me. YMMV.

######################################################################
PLEASE NOTE

To avoid depriving free comic web sites of their well-deserved
traffic, grabcartoons does not download the comics to the local
machine. Instead, it creates a page with links to the comics at their
original locations. Therefore, when a page generated by grabcartoons
is viewed, the comics are downloaded from their original pages and not
from local copies. Furthermore, the link to the graphics are extracted
from the comics' web pages, which grabcartoons downloads when it
runs. Finally, each comic in a grabcartoons-generated page is a link
to the original web page.

Grabcartoons is intended for personal use only, not for providing
public access to web comics.

######################################################################


How to install:
---------------

You need to have either:
        LWP::UserAgent and HTTP::Request modules
     or wget in your path (grabcartoons knows how to use it)
     or lynx or some other utility (modify $XTRN_CMD in grabcartoons.pl)

Other than that, you should be able to just run grabcartoons.pl from the
directory where you unpacked the distribution, and it should
automatically find its modules/ directory.

You can also run "make install", after editing the Makefile to check
the value of PREFIX (or run "make install PREFIX=/some/path".

How to use:
-----------

Usage: ./grabcartoons.pl [ options ] [ comic_id ...]
    --all      or -a   generate a page with all the known comics on stdout.
    --list     or -l   produce a list of the known comic_id's on stdout.
    --htmllist         produce HTML list of known comic_id's on stdout.
    --file     or -f   read list of comics from specified file.
    --write    or -w   write output to specified file instead of stdout.
    --version  or -V   print version number
    --verbose  or -v   be verbose
    --help     or -h   print this message.
    --notitles or -t   do not show comic titles (for those that have them)
    --templates        produce a list of defined templates

By default, it will produce a page with the given comics on stdout.

comic_id can be:
  - Any of the predefined modules (e.g. sinfest, adam_at_home)
  - Of the form 'template:comic title', including quotes if the title has
    spaces (e.g. 'gocomis.com:Citizen Dog', comics.com_big:Frazz). This will
    generate on the fly a module for the given comic.

Any errors will be included in the generated page.

About the --notitles option: some comics have a title for each
individual strip. By default the title is displayed next to the comic
name, unless --notitles is specified.


About templates:
----------------

Templates define a common way of fetching all the comics from certain
sites (such as comics.com or gocomics.com) that host multiple comic
strips. If a template exists, you can easily define new modules for
comics from that site, or even request them on the fly without having
to write a module, by specifying the comic_id as "template:comic title".


How to define your own comics:
------------------------------

-----------------------------------------------------------------------
PLEASE NOTE: THIS HAS CHANGED COMPLETELY IN VERSION 2.0. If you have
modules written for pre-2.0 grabcartoons, you can use the convert.pl
included in the distribution to semi-automate the conversion. See the
comments at the top of convert.pl for instructions.
-----------------------------------------------------------------------

Modules are defined in files with .pl extension which define where and how
to fetch the comic.

Each comic definition is a set of pair/value keys assigned as a Perl hashref
to an element of the %COMIC hash. For example:

$COMIC{xkcd} = {
		Title => 'xkcd',
		Page => 'http://xkcd.com/',
		Regex => qr!img[^<>]*src="(http://imgs\.xkcd\.com/comics.*?)"!i,
		TitleRegex => qr!^<h1>(.+)</h1><br/>!i,
		ExtraImgAttrsRegex => qr!img[^<>]*src="http://imgs\.xkcd\.com/comics.*?" (title=".*")!i,
	       };

If the comic is from a site for which a template exists, the
definition is even easier, you just have to specify the comic name and
the template. For example:

$COMIC{calvin_and_hobbes} = {
			     'Title' => 'Calvin and Hobbes',
			     'Template' => 'gocomics.com',
			    };

Each template defines how to automatically convert the comic title
into a "tag" (which normally becomes part of the URL for the
comic). If the automatic conversion does not work appropriately, you
can manually specify the tag. For example:

$COMIC{adam_at_home}={
		      Title	=> 'Adam @ Home',
		      Tag       => 'adamathome',
		      Template  => 'gocomics.com',
		     };

The key used for the %COMIC hash is the "short name" of the comic, and the
valid fields in the hash are:

    Title   => title of the comic
    Page    => URL where to get it
    Regex   => regex to obtain image, must put the image in $1
                  (the first parenthesized group)
    LinkRelImageSrc => if true, the image URL will be automatically
               obtained from the first <link rel="image_src"> element in
               the page. This is increasingly being used by web comics
               to ease sharing on Facebook and other sites. If this
               flag is specified no Regex or other method needs to be
               specified.
    MultipleMatches => if true, then all matches of Regex will be
               returned, concatenated, after doing any changes
               specified by SubstOnRegexResult or Prepend/Append
               on each element. If MultipleMatches is in effect,
               then the result of $1 + SubstOnRegexResult + Prepend/Append
               is expected to be an HTML snippet, not just an image URL.
    ExtraImgAttrsRegex => regular expression to obtain additional
               attributes of the comic's <img> tag. It has to 
               match on the same line that Regex matches. If not
               specified, a generic text is used for the "alt"
               image attribute.
    TitleRegex => regular expression to capture the title of the
               comic. It can match on any line _before_ Regex matches.
               If it does not match, no title is displayed (just the comic name).
               Only works for comics for which Regex is also defined.
    SubstOnRegexResult => an array of two- or three-element array
           references containing [ regex, string, [global] ]. If
           specified, the substitution specified by each element
           will be applied to the string captured by Regex or by
           Start/EndRegex, before applying any Prepend/Append
           strings.  Each tuple will be applied in the order they
           are specified. If "global" is given and true, a global
           replace will be done, otherwise only the first ocurrence
           will be replaced.  The replacement string may include
           other fields, referenced as {FieldName}.
    Prepend/Append => strings to prepend or append to $1 (or to the string
           captured by Start/EndRegex) before returning it. May make use of
           other fields, referenced as {FieldName}
    StartRegex/EndRegex => regular expressions that specify the first
           and last lines to capture. The matching lines are included in
           the output if InclusiveCapture == 1, and not included
           if InclusiveCapture == 0 (the default).
           If EndRegex is not specified, everything from StartRegex to
           the end of the page is captured.
           If Regex is also specified, it is only matched for inside the
           region defined by StartRegex/EndRegex
    InclusiveCapture => true/false value that specifies whether the lines
           that match Start/EndRegex should be returned in the output. By
           default InclusiveCapture == false.
    RedirectMatch
    RedirectURLCapture
    RedirectURLAppend
    RedirectURLPrepend
    MultipleRedirects
          These parameters control generalized redirection
          support. By default, these parameters are set so that
          standard redirection using the META REFRESH tag is
          followed, but can be set to redirect on arbitrary
          patterns. This is how it works: if the RedirectMatch regex
          matches on any line of the page, then the
          RedirectURLCapture pattern is applied to the same line,
          and should contain one capture group which returns the new
          URL to use. If RedirectURLAppend/Prepend are specified,
          these strings are concatenated with the result of the
          capture group before using it as the new URL. By default
          the patterns are passed NOT along when fetching the new page,
          to prevent infinite redirection. This behavior can be
          modified by setting MultipleRedirects to a true value, so that
          multiple redirects using the same parameters are
          supported.
    StaticURL => static image URL to return
    StaticHTML => static HTML snippet to return
    Function  => a function to call. It receives the commic snippet as
          argument, and must return ($html, $title, $error)
    NoShowTitle => if true, do not display the title of the comic
          (for those that always have it in the drawing)
    Template => if present, specified a template that will be used
          for this comic (e.g. for comics coming from a single
          syndicated site, so the mechanism is the same for all of them)
          Essentially the fields from the template and the $COMIC snippet
          are merged and then processed in the usual way.
          If the template contains a _Template_Code atribute, it is
          executed on the merged snippet before processing it.
          Templates are defined in modules/20templates.pl.

Precedence (from higher to lower) is Function, StaticURL, StaticHTML,
StartRegex/EndRegex and Regex.

Both Regex and Start/EndRegex use Page, and optionally Prepend, Append,
ExtraImgAttrsRegex, TitleRegex and SubstOnRegexResult.

Start/EndRegex optionally uses InclusiveCapture.

Comic definitions are loaded from the modules/ directory, from your
$HOME/.grabcartoons/modules directory, and from any directories
(separated by colons) contained in the GRABCARTOONS_DIRS environment
variable.

The easiest way is probably to take one of the existing modules (for
example, xkcd.pl) and base yours on that.

If you develop any new modules, please share them with me! You can either:

a) Post them to the issue tracker in the github project page:
   http://github.com/zzamboni/grabcartoons/issues
b) Fork the project at github, add your modules, and send me a pull
   request.

Notes:
------

I developed this program for personal use. If you have any problems,
try looking at the code first. Feel free to post any questions,
suggestions or code to the issue tracker. I cannot promise any kind of
support, but I would be happy to hear from you if you use
grabcartoons.

Contributors:
-------------

Ben Kuperman - first user apart from me, developer and maintainer of
               many modules.

Module contributors: Nick Adams, Alain Brunet, Scott Baker, Yanick. 


Enjoy,
--Diego Zamboni
  diego@zzamboni.org