Howdy y'all,
IndexTool is a quick and dirty tool of mine for reviewing the indexing of books written in LaTeX. It isn't used to generate the index pages, or to produce any typeset results at all; rather, it reviews the manually indexed content to look for duplicates and mistakes.
The tool assumes that you have one file per chapter, that the same word should not be indexed twice on one page or twice within a chapter, and that each word is defined first in ASCII with any accent marks or highlighting appearing later.
It first generates a SQLite3 database from the book's .idx
file and
from its .tex
source code, then successive calls are performed on
that database to allow for fast and interactive searches on individual
terms. You can make manual queries on the database without trouble.
73 from Pizza Rat City,
--Travis Goodspeed
IndexTool can be installed either with a traditional Unix make clean install
or by go install github.com/travisgoodspeed/indextool@latest
.
IndexTool works by first generating a SQLite3 database of your LaTeX
source code (*.tex
) and the output of mkindex (*.idx
). This needs
to be frequently regenerated as you correct your indexing, so you
ought to have a target in your Makefile
that generates these markings.
index: book.idx *.tex
indextool book.idx *.tex
Because indexing inherently involves the discretion of a human editor, the tool's findings are all considered warnings, rather than errors.
After the database is generated, run the tool with no parameters to do a quick sanity check of your indexing. Anything reported at this stage is likely a serious mistake, such as duplicate indexing.
x270% indextool
Duplicate entry 'PaX' on page 19.
Entry Capitalization: JavaScript or Javascript ?
x270%
Some queries are a bit more intensive, taking more than the tenth of a
second we'd like to budget for the default tests. These are grouped
into "deep mode", which must be enabled separately using the -d
flag.
dell% indextool -d
...
Index Capitalization: BrainFuck or Brainfuck ?
Index Capitalization: CoreBoot or Coreboot ?
Index Capitalization: FastColl or Fastcoll ?
Index Capitalization: GNUPG or GnuPG ?
Index Capitalization: GameBoy or Gameboy ?
Index Capitalization: JavaScript or Javascript ?
Index Capitalization: Nintendo!GameBoy or Nintendo!Gameboy ?
Index Capitalization: PDF.JS or PDF.js ?
Index Capitalization: PostScript or Postscript ?
Index Capitalization: SHAttered or Shattered ?
Index Capitalization: WINE or Wine ?
Index Capitalization: X86 or x86 ?
You can also perform specific queries. For example, we can do a full text search for PaX, to identify all files which contain the word but have not indexed it.
x270% indextool -s PaX
Missing 'PaX' index in sample/ch2.tex.
We can also do it in a case-sensitive manner, which is handy for words
like BASIC
that have very different meanings in lower case.
x271% indextool -S BASIC
Missing 'BASIC' index in submissions/rabbit.tex.
We can list the entries--those that appear in the .idx
file of
the book as it is actually rendered--by indextool -l
or the
indices--those that appear anywhere in the source code code, even if
they aren't rendered--by indextool -L
. The distinction is handy in
that you might be update the -L
listing without recompiling your
book; the -l
listing is more accurate, and might be confined to just
the volume you are currently compiling. For example, here is a listing
where the gameboy has been indexed multiple ways incorrectly.
dell% indextool -L | grep -i gameboy
GameBoy
GameBoy Advance
Gameboy
Nintendo!GameBoy
Nintendo!Gameboy
Super GameBoy
dell%
IndexTool's SQLite3 database is available with a default filename of
indextool.db
. You can open it to perform queries directly, if that
would be handy. Search for db.Query
in indextool.go
for example
queries that might be handy.
The database is roughly like this.
/* From the .tex files.*/
CREATE TABLE indices(filename, name);
/* From the .idx files. */
CREATE TABLE entries(filename, name, page);
CREATE VIRTUAL TABLE tex using fts4(filename, body)
/* tex(filename,body) */;
CREATE TABLE IF NOT EXISTS 'tex_content'(docid INTEGER PRIMARY KEY, 'c0filename', 'c1body');
CREATE TABLE IF NOT EXISTS 'tex_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'tex_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE TABLE IF NOT EXISTS 'tex_docsize'(docid INTEGER PRIMARY KEY, size BLOB);
CREATE TABLE IF NOT EXISTS 'tex_stat'(id INTEGER PRIMARY KEY, value BLOB);