You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some PDF files result in veraPDF including invalid characters in the output (that's the default mmr output). Hardly devastating, but it's actually a bit of a pain as it causes XMLStarlet to drop out of parsing the output. Here are some examples:
The test data sets I'm passing on to Carl would be useful to use to double check there are no remaining bugs of a similar nature once fixes have been applied (I can send on if helpful). Carl mentioned adding these datasets to his automated tests. Depending on the outcome of the last example (and any others we find) it might be useful to validate the output on one of these large corpora as part of the automated test to pick up any of these kinds of issues in the future.
The text was updated successfully, but these errors were encountered:
Some PDF files result in veraPDF including invalid characters in the output (that's the default mmr output). Hardly devastating, but it's actually a bit of a pain as it causes XMLStarlet to drop out of parsing the output. Here are some examples:
This has an invalid character in the title tag extracted from the PDF:
http://web.archive.org/web/20080511210957/http://www.plymouth.gov.uk/5th_december_2007.pdf
This is a similar source, but this time shows a series of invalid characters in the title tags in the Pages feature extract:
http://web.archive.org/web/20071030162909/http://www.somersetpct.nhs.uk/about_us/board_meetings/November_2006/Papers/8%20(D)%20PEC%20Terms%20of%20Reference%20and%20Chair.pdf
This has an invalid character in the description of a rule in the validation output:
http://web.archive.org/web/20060930024642/http://www.tvcs.org.uk/pdfs/06leaflet1.pdf
These seem to be the two sources of invalid characters that I've come across so far, other than this:
http://web.archive.org/web/20071031010646/http://www.somersetpct.nhs.uk/about_us/board_meetings/December_2006/Papers/11%20(H)%20Risk%20Management%20Strategy%20and%20Policy%20Appendix%206%20-%20IPEC.pdf
This one is reported by XMLStarlet as invalid, but XMLStarlet has no problems parsing it. As far as I can see there are no problems with it, but it may be worth further investigation with a decent XML tool, so I thought I'd include it anyway.
The test data sets I'm passing on to Carl would be useful to use to double check there are no remaining bugs of a similar nature once fixes have been applied (I can send on if helpful). Carl mentioned adding these datasets to his automated tests. Depending on the outcome of the last example (and any others we find) it might be useful to validate the output on one of these large corpora as part of the automated test to pick up any of these kinds of issues in the future.
The text was updated successfully, but these errors were encountered: