1 - I got the source:
wget -r http://mitpress.mit.edu/sicp/full-text/book/book.html
2 - used
hpricot Nokogiri to:
- remove 'navigation' divs
<mbp:pagebreak />tags at the top of each html body to keep lines from getting split at page-breaks
3 - removed cover page 'book.html' since there's already a cover image
4 - set text-indent: 0 for
p tags, since kindle indents about 1em by default, which deformatted the code snips (
<p><tt> is used instead of
set height="2em" on div tags in 'References' section (kindle doesn't support the CSS for controlling this)
6 - added jump table to top of index
7 - built opf and ncx with ruby. toc.ncx allows 'nav points' for the 5-way kindle knob to get you from chapter to chapter
8 - reverted item 5 above: several years later simple
p tags with an 85% font look fine.
The old build of
sicp.mobi in this repo seems to work fine. But if you want or need to build it yourself, you need to:
- generate the input
- run that input through 'kindlegen' to produce a large but functional book
- optionally strip out the source that kindlegen includes in the output using 'kindlestrip.py'
A set of
Rake tasks will do any or all of these tasks for you, in the correct order as needed.
- install 'kindlegen' somewhere in your path.
- make sure you have Ruby 1.8.7 or later
- make sure you have the
- ideally but optionally, have Python installed
$ cd ~/sicp-kindle # to build the book (compact binary -- requires Python) $ rake build # or just $ rake # to just build a large but usable book, without stripping (no Python needed) $ rake build_large # to clean up the output artifacts $ rake clean # hack/test cycle $ rake clean build # learn about Rake itself $ rake -h # to see the available tasks and their dependencies $ rake -T && rake -P # if you just want to see what the XML looks like without building, # generate it like this: $ rake artifacts/toc.ncx artifacts/sicp.opf
When you run Rake (
:build task), you'll see this for a few minutes:
mkdir -p artifacts generating 'artifacts/toc.ncx' generating 'artifacts/sicp.opf' running 'kindlegen artifacts/sicp.opf -c2 -verbose > artifacts/kindlegen.log' writing to directory 'artifacts'
Then probably this:
kindlegen exit status: 1 Success with warnings: see artifacts/kindlegen.log for information mv artifacts/sicp.mobi artifacts/sicp-large.mobi
The warnings refer to unresolved links that have no real effect on the formatting or navigability of the book itself. There are 2 kinds: html anchors referring back to the TOC page, and gif images. The anchors are actually broken links that persist in the live MIT source to this day. In the Kindle book they just don't do anything, so it's not a real issue. The gif's actually work like they should, so with respect to those Kindlegen is just crazy.
To deflate the output size by stripping out the input source that Amazon must have some reason for including, yet removes when they publish in their store anyway, the final
Rake task uses Paul Durrant's python script from this GitHub repo. So at the end, you'll see something like:
running 'kindlestrip/kindlestrip.py artifacts/sicp.mobi artifacts/sicp-stripped.mobi' KindleStrip v1.35.0. Written 2010-2012 by Paul Durrant and Kevin Hendricks. Found SRCS section number 750, and count 2 beginning at offset 1082728 and ending at offset 1202268 done Header Bytes: 53524353000000100000002f00000001 kindlestrip/kindlestrip.py exit status: 0
According to the usage output from just
you can say
-c0, -c1, -c2 or nothing at all. With respect to runtime and output size, "nothing" is about the same as
-c0 is just 2 or 3 seconds faster than those, but an ENORMOUS file is produced. Size of a house, I'm serious.
-c2 (my default) is SLOW but the smallest.
You can alter the setting in the Rakefile in the
LARGE_BOOK file task.
Here are some results from my 2010 MBP running KindleGen 2.9.
# No compression setting $ time kindlegen artifacts/sicp.opf -verbose > artifacts/kindlegen.log l0m9.258s user0m15.597s sys0m1.156s $ ll artifacts/*.mobi -rw-r--r-- 1 twcamper staff 4040242 Sep 4 18:51 artifacts/sicp-large.mobi -rw-r--r-- 1 twcamper staff 2833465 Sep 4 18:51 artifacts/sicp.mobi # '-c0' compression setting -- fast, fattest $ time kindlegen artifacts/sicp.opf -c0 -verbose > artifacts/kindlegen.log real0m7.864s user0m12.949s sys0m1.207s $ ll artifacts/*.mobi -rw-r--r-- 1 twcamper staff 6816182 Sep 4 18:52 artifacts/sicp-large.mobi -rw-r--r-- 1 twcamper staff 5609417 Sep 4 18:52 artifacts/sicp.mobi # '-c1' compression setting $ time kindlegen artifacts/sicp.opf -c1 -verbose > artifacts/kindlegen.log real0m9.245s user0m15.952s sys0m1.160s $ ll artifacts/*.mobi -rw-r--r-- 1 twcamper staff 4040078 Sep 4 18:52 artifacts/sicp-large.mobi -rw-r--r-- 1 twcamper staff 2833465 Sep 4 18:52 artifacts/sicp.mobi # '-c2' compression setting -- slowest, smallest $ time kindlegen artifacts/sicp.opf -c2 -verbose > artifacts/kindlegen.log real2m13.958s user3m58.323s sys0m1.684s $ ll artifacts/*.mobi -rw-r--r-- 1 twcamper staff 3249518 Sep 4 18:56 artifacts/sicp-large.mobi -rw-r--r-- 1 twcamper staff 2042629 Sep 4 18:56 artifacts/sicp.mobi
If you start with pristine HTML source from MIT (see links above), there is a
:fix_html_source task that uses Nokogiri as an HTML parser to add mobi pagebreaks.
You could build on that tiny amount of code to manipulate the source however you see fit. Nokogiri is a joy to use for such work, though installation of the gem has historically been tricky depending on the state of your local
Note that Nokogiri is required only at runtime in
FixHTMLSource now, so users just building the book from the included source will never need it.
$ rake fix_html_source
- Read the damn book!