Skip to content

V4 Use EXIF byte order for EXIF encoded strings. #2943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 12, 2025
Merged

Conversation

JimBobSquarePants
Copy link
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Fixes #2906

When decoding Unicode-encoded text from EXIF, we should follow the specification's guidance and determine the correct decoding approach based on the following priority:

  1. Check for a Byte Order Mark (BOM):
    If a BOM is present at the start of the encoded string, its endianness should override any other assumptions. This is explicitly allowed by the EXIF 2.2 and 2.3 specifications, even though later versions (e.g., CIPA DC-X008-2019) no longer mention it directly.

  2. Fallback to TIFF Byte Order:
    If no BOM is present, the byte order of the Unicode text must match the byte order declared in the TIFF header. This was the originally intended behavior according to EXIF 2.2/2.3, and is still necessary for correctly decoding existing conformant files.

This change ensures that we're correctly interpreting the UserComment, GPSProcessingMethod, and GPSAreaInformation tags, all of which use the 8-byte encoding prefix scheme and may contain UTF-16 encoded text.

On encoding, we now honor the TIFF byte order for Unicode strings and optionally support emitting a BOM when desired.

This brings ImageSharp into alignment with the EXIF specification and improves compatibility with real-world image metadata.

In addition to this fix, I've also correctly reimplemented the equality operators for ExifTag and performed code quality cleanup to match standards.

Copilot

This comment was marked as outdated.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates EXIF metadata handling to correctly use the TIFF byte order for decoding and encoding EXIF encoded strings while cleaning up and modernizing the code.

  • Implements BOM detection and byte order handling for Unicode encoded strings.
  • Applies new C# pattern matching and array initializer syntax throughout the EXIF value implementations.
  • Improves XML documentation and equality operator implementations for ExifTag.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
IExifValue.cs Updated XML comments and added explicit public member modifiers.
ExifSignedShortArray.cs, ExifSignedShort.cs, ExifSignedByte.cs, ExifShortArray.cs, ExifShort.cs, ExifRationalArray.cs, ExifNumberArray.cs Refactored value range checks and array initializations to use new C# syntax patterns.
ExifEncodedString.cs Changed TrySetValue to accept a ByteOrder parameter with a parameterless wrapper defaulting to LittleEndian.
ExifByteArray.cs, ExifByte.cs, ExifArrayValue{TValueType}.cs Updated array initializer syntax for brevity.
ExifTag.cs Revised operator overloads and documentation capitalization.
ExifWriter.cs, ExifReader.cs, ExifProfile.cs, ExifEncodedStringHelpers.cs, ExifDataType.cs, Ifd/EntryReader.cs Modernized collection initializations and applied minor name/style updates.

@JimBobSquarePants JimBobSquarePants merged commit 0c13a36 into main Jun 12, 2025
8 checks passed
@JimBobSquarePants JimBobSquarePants deleted the js/issue-2906 branch June 12, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXIF UserComment tag writes Unicode text in UTF-16LE instead of UTF-16BE as specified in the standard
2 participants