Skip to content

Tags: py-pdf/pypdf

Tags

5.3.1

Verified

This tag was signed with the committer’s verified signature.
REL: 5.3.1

## What's new

### Bug Fixes (BUG)
- Use the correct name StandardEncoding for the predefined cmap (#3156) by @stefan6419846
- Handle inline images containing `EI ` sequences (#3152) by @stefan6419846
- Fix check box value which should be name object (#3124) by @stefan6419846
- Fix stream position on inline image fallback extraction (#3120) by @stefan6419846
- Fix object count for incremental writer (#3117) by @m32

### Robustness (ROB)
- Avoid index errors on empty lines in xref table (#3162) by @stefan6419846
- Improve handling of LZW decoder table overflow (#3159) by @stefan6419846
- Ignore non-numbers for width when building font width map (#3158) by @stefan6419846
- Avoid negative seek values when reading partially broken files (#3157) by @stefan6419846

### Documentation (DOC)
- Fixed PageObject.images example usage for replacing image (#3149) by @jutoth

[Full Changelog](5.3.0...5.3.1)

5.3.0

Verified

This tag was signed with the committer’s verified signature.
REL: 5.3.0

## What's new

### New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846

### Bug Fixes (BUG)
- Handle annotations being None on merging (#3111) by @stefan6419846

### Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog

### Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma

### Developer Experience (DEV)
- Remove ignoring multiple Ruff rules by @j-t-1
- Remove unused mutmut configuration (#3092) by @stefan6419846

### Testing (TST)
- Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny

[Full Changelog](5.2.0...5.3.0)

5.2.0

Verified

This tag was signed with the committer’s verified signature.
REL: 5.2.0

## What's new

### Deprecations (DEP)
- Deprecate with replacement CCITParameters (#3019) by @j-t-1
- Correct deprecation of interiour_color (#2947) by @j-t-1

### New Features (ENH)
- Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846
- Adding support for reading .metadata.keywords (#2939) by @Lucas-C

### Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin
- Ensure `add_metadata` can deal with `_info = None` (#3040) by @xmo-odoo
- Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846
- Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846
- Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846
- Fixed reference of value with `str.__new__` in TextStringObject (#2952) by @thomas-forte
- Handle indirect objects in font width calculations (#2967) by @nsw42
- Title sometimes is bytes and not str (#2930) by @reformy
- Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846
- Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn

### Robustness (ROB)
- Handle zero height fonts when extracting text (#3075) by @blushingpenguin
- Deal with content streams not containing streams (#3005) by @stefan6419846
- Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846
- Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846
- Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846
- Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846
- Skip destination page being None in PdfWriter (#2963) by @dxsooo
- Fix infinite loop case when reading null objects within an Array by @jakep-allenai
- Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai

### Documentation (DOC)
- Add note about default line colors (#3014) by @stefan6419846

### Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1
- Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1
- Move Windows CI to Python 3.13 (#3003) by @stefan6419846
- Move to Ubuntu 22.04 (#3004) by @stefan6419846

### Maintenance (MAINT)
- Fix formatting of warning message and include exception message (#3076) by @stefan6419846
- Narrow return type for `ContentStream.operations` (#2941) by @kmurphy4

### Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846
- Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846

### Code Style (STY)
- Add form feed to WHITESPACES (#3054) by @j-t-1
- Lots of small internal changes by @j-t-1

[Full Changelog](5.1.0...5.2.0)

5.1.0

Verified

This tag was signed with the committer’s verified signature.
REL: 5.1.0

## What's new

### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001

### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei

### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846

### Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1

### Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz

### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma

### Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma

### Code Style (STY)
- Capitalize error messages (#2903) by @j-t-1
- Modify error messages in PdfWriter (#2902) by @j-t-1

[Full Changelog](5.0.1...5.1.0)

5.0.1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
REL: 5.0.1 (#2884)

## Version 5.0.1, 2024-09-29

### New Features (ENH)
- Add `full` parameter to PdfWriter constructor (#2865)

### Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)

### Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)

### Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)

### Maintenance (MAINT)
- Add tests to source distributions (#2874)
- Refactor _update_field_annotation (#2862)

[Full Changelog](5.0.0...5.0.1)

5.0.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
REL: 5.0.0 (#2851)

## Version 5.0.0, 2024-09-15

This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead).


### Deprecations (DEP)
- Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)

### New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)

### Bug Fixes (BUG)
- Fix sheared image (#2801)

### Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)

### Developer Experience (DEV)
- Fix coverage uploads (#2832)
- Test against Python 3.13 (#2776)


[Full Changelog](4.3.1...5.0.0)

4.3.1

Verified

This tag was signed with the committer’s verified signature.
MartinThoma Martin Thoma
## Version 4.3.1, 2024-07-21

### Bug Fixes (BUG)
- Cope with Matrix entry in field annotations (#2736)

### Robustness (ROB)
- Cope with fields with upside down box/rectangle (#2729)

### Maintenance (MAINT)
- Add deprecate_with_replacement to StreamObject.initializeFromD… (#2728)
- Deal with cryptography>=43 moving ARC4 (#2765)

[Full Changelog](4.3.0...4.3.1)

4.3.0

Unverified

This tag is not signed, but one or more authors requires that any tag attributed to them is signed.
REL: 4.3.0

## What's new

### New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz
- Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz
- context manager for PdfReader (#2666) by @tibor-reiss
- Add capability to set font and size in fields (#2636) by @pubpub-zz
- Allow to pass input file without named argument (#2576) by @pubpub-zz

### Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz
- Reading large compressed images takes huge time to process (#2644) by @snanda85
- Highlighted Text Cannot Be Printed (#2604) by @Nifury
- Fix UnboundLocalError on malformed pdf (#2619) by @farjasju

### Documentation (DOC)
- Various improvements on docstrings and examples by @j-t-1

### Robustness (ROB)
- Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz
- Improve inline image extraction (#2622) by @pubpub-zz
- Cope with loops in Fields tree (#2656) by @pubpub-zz
- Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz
- Cope with some issues in pillow (#2595) by @pubpub-zz
- Cope with some image extraction issues (#2591) by @pubpub-zz

### Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1
- Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1

### Code Style (STY)
- Change Link to be a non-markup annotation (#2714) by @j-t-1

[Full Changelog](4.2.0...4.3.0)

4.2.0

Verified

This tag was signed with the committer’s verified signature.
MartinThoma Martin Thoma
Version 4.2.0, 2024-04-07

## What's new

### New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream (#2585)
- Add support for /Kids in page labels (#2562)
- Allow to update fields on many pages (#2571)
- Tolerate PDF with invalid xref pointed objects (#2335)
- Add Enforce from PDF2.0 in viewer_preferences (#2511)
- Add += and -= operators to ArrayObject (#2510)

### Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ' (#2588)
- Fix fields update where annotations are kids of field (#2570)
- Process CMYK images without a filter correctly (#2557)
- Extract text in layout mode without finding resources (#2555)
- Prevent recursive loop in some PDF files (#2505)

### Robustness (ROB)
- Tolerate "truncated" xref (#2580)
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334)
- Rebuild xref table if one entry is invalid (#2528)
- Robustify stream extraction (#2526)

### Documentation (DOC)
- Update release process for latest changes (#2564)
- Encryption/decryption: Clone document instead of copying all pages (#2546)
- Minor improvements (#2542)
- Update annotation list (#2534)
- Update references and formatting (#2529)
- Correct threads reference, plus minor changes (#2521)
- Minor readability increases (#2515)
- Simplify PaperSize examples (#2504)
- Minor improvements (#2501)

### Developer Experience (DEV)
- Remove unused dependencies (#2572)
- Remove page labels PR link from message (#2561)
- Fix changelog generator regarding whitespace and handling of "Other" group (#2492)
- Add REL to known PR prefixes (#2554)
- Release using the REL commit instead of git tag (#2500)
- Unify code between PdfReader and PdfWriter (#2497)
- Bump softprops/action-gh-release from 1 to 2 (#2514)

### Maintenance (MAINT)
- Ressources → Resources (and internal name childs) (#2550)
- Fix typos found by codespell (#2549)
- Update Read the Docs configuration (#2538)
- Add root_object, _info and _ID to PdfReader (#2495)

### Testing (TST)
- Allow loading truncated images if required (#2586)
- Fix download issues from #2562 (#2578)
- Improve test_get_contents_from_nullobject to show real use-case (#2524)
- Add missing test annotations (#2507)

[Full Changelog](4.1.0...4.2.0)

4.1.0

Verified

This tag was signed with the committer’s verified signature.
MartinThoma Martin Thoma
Version 4.1.0, 2024-03-03

## What's new

### New Features (ENH)
-  Add get_pages_from_field  (#2494) by @pubpub-zz
-  Add reattach_fields function (#2480) by @pubpub-zz
-  Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz

### Bug Fixes (BUG)
-  missing error on name without leading / (#2387) by @Rak424
-  encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon
-  BI in text content identified as image tag (#2459) by @pubpub-zz

### Robustness (ROB)
-  Missing basefont entry in type 3 font (#2469) by @pubpub-zz

### Documentation (DOC)
-  Amend robustness documentation (#2479) by @j-t-1

### Developer Experience (DEV)
-  Fix changelog for UTF-8 characters (#2462) by @stefan6419846

### Maintenance (MAINT)
-  Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz
-  Remove user assignment for feature requests (#2483) by @stefan6419846
-  Remove reference to old 2.0.0 branch (#2482) by @stefan6419846

### Testing (TST)
-  Fix benchmark failures (#2481) by @stefan6419846
-  Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon

[Full Changelog](4.0.2...4.1.0)