HTTPS clone URL
Subversion checkout URL
A lightweight, low-memory character encoding detector suitable for massive files.
Fetching latest commit...
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
detenc A lightweight, low-memory character encoding detector. Paul Battley <firstname.lastname@example.org> http://www.reevoo.com/ detenc is a fast character encoding detector for Western European text. It can determine whether a file is encoded in US-ASCII, UTF-8, ISO-8859-15, WINDOWS-1252, or something else. It can distinguish ISO-8859-15 and WINDOWS-1252 where there is enough information: this means that Euro signs are handled correctly. The program was written to help normalise the encoding of very large data feeds (of the order of several gigabytes) at Reevoo. It uses very little memory and can determine the encoding of a two-gigabyte file in under a minute. Build The program is written in C and uses standard libraries. The test suite requires Ruby (1.8 or 1.9) and Rake to be available make # builds binary make check # runs test suite make install # installs binary to /usr/local/bin Use detenc FILENAME (FILENAME ...) This will output the filename and encoding, one per line. To print just the encoding, use the -q switch.