-
Notifications
You must be signed in to change notification settings - Fork 0
Conversion Issues
Description: this document catalogs conversion issues found when converting pdfs to html via pdf2htmlEX. It gives examples of each problem from each pdf they were found in. Also, an estimate to the frequency of each problem is included.
Frequency(low-medium)
*SS taken from HTML converted MFT-TEST-ASSEMBLED-LINKED-RGB.pdf
*SS taken from HTML converted clifford.pdf
pdf2htmlEX applies a with and a margin to spans to correct for curning
*SS taken from HTML converted MFT-TEST-ASSEMBLED-LINKED-RGB.pdf
*SS taken from HTML converted clifford.pdf
This is not a problem on the kindle
*SS taken from HTML converted MFT-TEST-ASSEMBLED-LINKED-RGB.pdf
*SS taken from HTML converted MFT-TEST-ASSEMBLED-LINKED-RGB.pdf
This happens because we use the command line option optimize text to remove some spans that interfere with word selection. Optimize text reduces the number of spans in a line and adjusts the letter spacing and word spacing of the entire line to account for this reduction. Its an imperfect approximation.
*SS taken from HTML converted Generation Kill.pdf
*SS taken from HTML converted clifford.pdf
pdf2htmlEX guesses when to insert a space in its offset spans. It guesses based on the width of a space and the curning of characters. If a false positive occurs, a word will be broken by a space character.
*SS taken from HTML converted Fire-in-My-Belly-TEST-RGB-LINKED.pdf
pdf2htmlEX guesses when to insert spaces between characters when it reduces spans with optimize text. It guesses based on the width of a space and the curning of characters. When this guessing renders a false positive, an extra space appears in the text output sometimes breaking up words.
*SS taken from HTML converted GS-26-pdftk.pdf
*SS taken from HTML converted clifford.pdf
*SS taken from HTML converted clifford.pdf
*SS taken from HTML converted Minecraft.pdf
PDFs Referenced













