Tags: py-pdf/pypdf
Tags
REL: 5.3.1 ## What's new ### Bug Fixes (BUG) - Use the correct name StandardEncoding for the predefined cmap (#3156) by @stefan6419846 - Handle inline images containing `EI ` sequences (#3152) by @stefan6419846 - Fix check box value which should be name object (#3124) by @stefan6419846 - Fix stream position on inline image fallback extraction (#3120) by @stefan6419846 - Fix object count for incremental writer (#3117) by @m32 ### Robustness (ROB) - Avoid index errors on empty lines in xref table (#3162) by @stefan6419846 - Improve handling of LZW decoder table overflow (#3159) by @stefan6419846 - Ignore non-numbers for width when building font width map (#3158) by @stefan6419846 - Avoid negative seek values when reading partially broken files (#3157) by @stefan6419846 ### Documentation (DOC) - Fixed PageObject.images example usage for replacing image (#3149) by @jutoth [Full Changelog](5.3.0...5.3.1)
REL: 5.3.0 ## What's new ### New Features (ENH) - Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846 ### Bug Fixes (BUG) - Handle annotations being None on merging (#3111) by @stefan6419846 ### Robustness (ROB) - Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog ### Documentation (DOC) - stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma ### Developer Experience (DEV) - Remove ignoring multiple Ruff rules by @j-t-1 - Remove unused mutmut configuration (#3092) by @stefan6419846 ### Testing (TST) - Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny [Full Changelog](5.2.0...5.3.0)
REL: 5.2.0 ## What's new ### Deprecations (DEP) - Deprecate with replacement CCITParameters (#3019) by @j-t-1 - Correct deprecation of interiour_color (#2947) by @j-t-1 ### New Features (ENH) - Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846 - Adding support for reading .metadata.keywords (#2939) by @Lucas-C ### Bug Fixes (BUG) - Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin - Ensure `add_metadata` can deal with `_info = None` (#3040) by @xmo-odoo - Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846 - Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846 - Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846 - Fixed reference of value with `str.__new__` in TextStringObject (#2952) by @thomas-forte - Handle indirect objects in font width calculations (#2967) by @nsw42 - Title sometimes is bytes and not str (#2930) by @reformy - Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846 - Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn ### Robustness (ROB) - Handle zero height fonts when extracting text (#3075) by @blushingpenguin - Deal with content streams not containing streams (#3005) by @stefan6419846 - Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846 - Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846 - Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846 - Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846 - Skip destination page being None in PdfWriter (#2963) by @dxsooo - Fix infinite loop case when reading null objects within an Array by @jakep-allenai - Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai ### Documentation (DOC) - Add note about default line colors (#3014) by @stefan6419846 ### Developer Experience (DEV) - Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1 - Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1 - Move Windows CI to Python 3.13 (#3003) by @stefan6419846 - Move to Ubuntu 22.04 (#3004) by @stefan6419846 ### Maintenance (MAINT) - Fix formatting of warning message and include exception message (#3076) by @stefan6419846 - Narrow return type for `ContentStream.operations` (#2941) by @kmurphy4 ### Testing (TST) - Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846 - Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846 ### Code Style (STY) - Add form feed to WHITESPACES (#3054) by @j-t-1 - Lots of small internal changes by @j-t-1 [Full Changelog](5.1.0...5.2.0)
REL: 5.1.0 ## What's new ### New Features (ENH) - Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920) by @hpierre001 ### Bug Fixes (BUG) - Fix font specificier for FreeText annotation (#2893) by @ssjkamei - Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei - Improve handling of spaces in text extraction (#2882) by @ssjkamei ### Robustness (ROB) - Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846 ### Documentation (DOC) - Use latest package versions (#2907) by @stefan6419846 - Correct example of reading FileAttachment annotation (#2906) by @j-t-1 ### Developer Experience (DEV) - Update pinned requirements (#2918) by @stefan6419846 - Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz ### Maintenance (MAINT) - Remove references to outdated Python versions (#2919) by @stefan6419846 - Generalize the method of obtaining space_code (#2891) by @ssjkamei - Unnecessary character mapping process (#2888) by @ssjkamei - New LZW decoding implementation (#2887) by @MartinThoma ### Testing (TST) - Add LzwCodec for encoding (#2883) by @MartinThoma ### Code Style (STY) - Capitalize error messages (#2903) by @j-t-1 - Modify error messages in PdfWriter (#2902) by @j-t-1 [Full Changelog](5.0.1...5.1.0)
REL: 5.0.1 (#2884) ## Version 5.0.1, 2024-09-29 ### New Features (ENH) - Add `full` parameter to PdfWriter constructor (#2865) ### Bug Fixes (BUG) - Update pyproject.toml with minimum Python version of 3.8 (#2859) - Cope with unbalanced delimiters in dictionary object (#2878) - Cope with encoding with too many differences (#2873) - Missing spaces in extract_text() method (#1328) (#2868) - Tolerate truncated files and no warning when jumping startxref (#2855) ### Robustness (ROB) - Repair PDF with invalid Root object (#2880) - Continue parsing dictionary object when error is detected (#2872) - Merge documents with invalid pages in named destinations (#2857) - Tolerate comments in arrays (#2856) ### Developer Experience (DEV) - Use latest Python version for benchmarking (#2879) ### Maintenance (MAINT) - Add tests to source distributions (#2874) - Refactor _update_field_annotation (#2862) [Full Changelog](5.0.0...5.0.1)
REL: 5.0.0 (#2851) ## Version 5.0.0, 2024-09-15 This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead). ### Deprecations (DEP) - Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813) - Drop Python 3.7 support (#2793) ### New Features (ENH) - Add capability to remove /Info from PDF (#2820) - Add incremental capability to PdfWriter (#2811) - Add UniGB-UTF16 encodings (#2819) - Accept utf strings for metadata (#2802) - Report PdfReadError instead of RecursionError (#2800) - Compress PDF files merging identical objects (#2795) ### Bug Fixes (BUG) - Fix sheared image (#2801) ### Robustness (ROB) - Robustify .set_data() (#2821) - Raise PdfReadError when missing /Root in trailer (#2808) - Fix extract_text() issues on damaged PDFs (#2760) - Handle images with empty data when processing an image from bytes (#2786) ### Developer Experience (DEV) - Fix coverage uploads (#2832) - Test against Python 3.13 (#2776) [Full Changelog](4.3.1...5.0.0)
## Version 4.3.1, 2024-07-21 ### Bug Fixes (BUG) - Cope with Matrix entry in field annotations (#2736) ### Robustness (ROB) - Cope with fields with upside down box/rectangle (#2729) ### Maintenance (MAINT) - Add deprecate_with_replacement to StreamObject.initializeFromD… (#2728) - Deal with cryptography>=43 moving ARC4 (#2765) [Full Changelog](4.3.0...4.3.1)
REL: 4.3.0 ## What's new ### New Features (ENH) - Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz - Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz - context manager for PdfReader (#2666) by @tibor-reiss - Add capability to set font and size in fields (#2636) by @pubpub-zz - Allow to pass input file without named argument (#2576) by @pubpub-zz ### Bug Fixes (BUG) - Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846 - Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz - Reading large compressed images takes huge time to process (#2644) by @snanda85 - Highlighted Text Cannot Be Printed (#2604) by @Nifury - Fix UnboundLocalError on malformed pdf (#2619) by @farjasju ### Documentation (DOC) - Various improvements on docstrings and examples by @j-t-1 ### Robustness (ROB) - Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz - Improve inline image extraction (#2622) by @pubpub-zz - Cope with loops in Fields tree (#2656) by @pubpub-zz - Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz - Cope with some issues in pillow (#2595) by @pubpub-zz - Cope with some image extraction issues (#2591) by @pubpub-zz ### Maintenance (MAINT) - Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1 - Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1 ### Code Style (STY) - Change Link to be a non-markup annotation (#2714) by @j-t-1 [Full Changelog](4.2.0...4.3.0)
Version 4.2.0, 2024-04-07 ## What's new ### New Features (ENH) - Allow multiple charsets for NameObject.read_from_stream (#2585) - Add support for /Kids in page labels (#2562) - Allow to update fields on many pages (#2571) - Tolerate PDF with invalid xref pointed objects (#2335) - Add Enforce from PDF2.0 in viewer_preferences (#2511) - Add += and -= operators to ArrayObject (#2510) ### Bug Fixes (BUG) - Fix merge_page sometimes generating unknown operator 'QQ' (#2588) - Fix fields update where annotations are kids of field (#2570) - Process CMYK images without a filter correctly (#2557) - Extract text in layout mode without finding resources (#2555) - Prevent recursive loop in some PDF files (#2505) ### Robustness (ROB) - Tolerate "truncated" xref (#2580) - Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334) - Rebuild xref table if one entry is invalid (#2528) - Robustify stream extraction (#2526) ### Documentation (DOC) - Update release process for latest changes (#2564) - Encryption/decryption: Clone document instead of copying all pages (#2546) - Minor improvements (#2542) - Update annotation list (#2534) - Update references and formatting (#2529) - Correct threads reference, plus minor changes (#2521) - Minor readability increases (#2515) - Simplify PaperSize examples (#2504) - Minor improvements (#2501) ### Developer Experience (DEV) - Remove unused dependencies (#2572) - Remove page labels PR link from message (#2561) - Fix changelog generator regarding whitespace and handling of "Other" group (#2492) - Add REL to known PR prefixes (#2554) - Release using the REL commit instead of git tag (#2500) - Unify code between PdfReader and PdfWriter (#2497) - Bump softprops/action-gh-release from 1 to 2 (#2514) ### Maintenance (MAINT) - Ressources → Resources (and internal name childs) (#2550) - Fix typos found by codespell (#2549) - Update Read the Docs configuration (#2538) - Add root_object, _info and _ID to PdfReader (#2495) ### Testing (TST) - Allow loading truncated images if required (#2586) - Fix download issues from #2562 (#2578) - Improve test_get_contents_from_nullobject to show real use-case (#2524) - Add missing test annotations (#2507) [Full Changelog](4.1.0...4.2.0)
Version 4.1.0, 2024-03-03 ## What's new ### New Features (ENH) - Add get_pages_from_field (#2494) by @pubpub-zz - Add reattach_fields function (#2480) by @pubpub-zz - Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz ### Bug Fixes (BUG) - missing error on name without leading / (#2387) by @Rak424 - encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon - BI in text content identified as image tag (#2459) by @pubpub-zz ### Robustness (ROB) - Missing basefont entry in type 3 font (#2469) by @pubpub-zz ### Documentation (DOC) - Amend robustness documentation (#2479) by @j-t-1 ### Developer Experience (DEV) - Fix changelog for UTF-8 characters (#2462) by @stefan6419846 ### Maintenance (MAINT) - Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz - Remove user assignment for feature requests (#2483) by @stefan6419846 - Remove reference to old 2.0.0 branch (#2482) by @stefan6419846 ### Testing (TST) - Fix benchmark failures (#2481) by @stefan6419846 - Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon [Full Changelog](4.0.2...4.1.0)
PreviousNext