Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

41 lines (30 sloc) 1.98 KB
- Allow the user to only process certain aspects of the PDF file. For example, if they're only
interested in meta data or bookmarks, there's no point in walking the pages tree.
- maybe a third option to Reader.parse?
parse(io, receiver, {:pages => true, :fonts => false, :metadata => true, :bookmarks => false})
- detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
- Provide a way to get raw access to a particular object. Good for testing purposes
- Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
from the Original encoding to Unicode.
- Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
- Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
- Add a way to extract raster images
- see XObjects section of spec (section 4.7)
- Add a way to extract font data?
- Work out why specs/data/zlib*.pdf isn't parsed correctly when all the major PDF viewers can display it correctly
- Ship some extra receivers in the standard package, particuarly ones that are useful for running
rspec over generated PDF files
- When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there's no
sensible way to convert them to unicode
- Improve metadata support
- Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
- Add support for additional encodings:
- PDFDocEncoding
- Identity-V(I *think* this relates to vertical text. Not sure how we'd support it sensibly)
- Investigate how R->L text is handled
- Add support for object streams (spec section 3.4.6)
Jump to Line
Something went wrong with that request. Please try again.