Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

42 lines (32 sloc) 2.068 kb
- Allow more than just page content and metadata to be parsed (see spec section 3.6.1)
- bookmarks?
- outline?
- articles?
- viewer prefs?
- Don't remove comment when tokenising in the middle of a string
- Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
from the Original encoding to Unicode.
- detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
- Provide a way to get raw access to a particular object. Good for testing purposes
- Improve interpretation of non content stream data (ie metadata). Use PDFDofEncoding, recognise UTF16 strings, recognise dates, etc
- Support Cross Reference Streams (spec 3.4.7)
- Add a way to extract raster images
- see XObjects section of spec (section 4.7)
- Add a way to extract font data?
- Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
- Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
- Work out why specs/data/zlib*.pdf isn't parsed correctly when all the major PDF viewers can display it correctly
- Ship some extra receivers in the standard package, particuarly ones that are useful for running
rspec over generated PDF files
- When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there's no
sensible way to convert them to unicode
- Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
- Add support for additional encodings:
- PDFDocEncoding
- Identity-V(I *think* this relates to vertical text. Not sure how we'd support it sensibly)
- Investigate how R->L text is handled
- Add support for object streams (spec section 3.4.6)
Jump to Line
Something went wrong with that request. Please try again.