Escape XML special chars in _build_gdal_metadata_xml (#1614)#1617
Merged
Conversation
`_build_gdal_metadata_xml` embedded caller-supplied dict keys and
values into the GDALMetadata XML payload using plain f-strings, with
no XML escaping. Any string containing `& < > " '` corrupted the
document and round-tripped through `_parse_gdal_metadata` as `{}`;
a crafted key like `foo" malicious="bar` injected extra attributes
into the `<Item>` element.
Route every text slot through `xml.sax.saxutils.escape` and every
attribute slot through `quoteattr`, mirroring the helpers added in
`_vrt.py` by #1607. Sample indices stay int-cast literals.
Reachable from `to_geotiff`, `_write_vrt_tiled`, and
`write_geotiff_gpu` whenever the user supplies
`attrs['gdal_metadata']` as a dict.
Found by the geotiff security sweep (Cat 5, MEDIUM). Same bug class
as #1607 on a separate code path. Existing
`test_features.TestGDALMetadata::test_build_gdal_metadata_xml`
still passes; new tests pin escape behaviour for every XML special
character and guard against attribute / element injection.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens GeoTIFF GDALMetadata XML serialization to prevent malformed XML, silent metadata loss on round-trip, and XML attribute/element injection when caller-supplied metadata contains XML special characters.
Changes:
- Escape GDALMetadata XML element text and quote/escape XML attribute values in
_build_gdal_metadata_xml(xrspatial/geotiff/_geotags.py). - Add regression tests covering all XML special characters, injection attempts, unicode handling, and an end-to-end
to_geotiffround trip (xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py). - Update the security sweep tracking state to record the fix for issue #1614 (
.claude/sweep-security-state.csv).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/_geotags.py |
Escapes/quotes all caller-controlled GDALMetadata XML fields to prevent malformed XML and injection. |
xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py |
Adds focused regression + round-trip tests for XML escaping and injection resistance. |
.claude/sweep-security-state.csv |
Records the #1614 geotiff sweep finding as addressed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_build_gdal_metadata_xml(xrspatial/geotiff/_geotags.py). The helper interpolated caller-supplied keys and values via plain f-strings, so& < > " 'in the metadata dict produced malformed XML that_parse_gdal_metadataswallowed as{}on read, and a crafted key likefoo" malicious="barinjected attributes into the<Item>element.xml.sax.saxutils.escapeand every attribute slot throughquoteattr, mirroring the helpers added in_vrt.pyby write_vrt: XML special characters in crs_wkt and source filenames are emitted unescaped #1607.xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.pycovering every XML special character, attribute injection, element injection, unicode pass-through, and an end-to-endto_geotiffround trip.Found by the geotiff security sweep (Cat 5, MEDIUM). Same bug class as #1607, on a separate code path the earlier fix did not cover. One fix per PR per the security-sweep policy.
Closes #1614.
Test plan
pytest xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py(15 new tests pass)pytest xrspatial/geotiff/tests/test_security.py xrspatial/geotiff/tests/test_geotags.py xrspatial/geotiff/tests/test_metadata_round_trip_1484.py xrspatial/geotiff/tests/test_vrt_xml_escape_1607.py xrspatial/geotiff/tests/test_features.py::TestGDALMetadata(90 passed, 2 skipped, no regressions)test_features.TestGDALMetadata::test_build_gdal_metadata_xmlstill passes (the legacy<Item name="DataType">Generic</Item>shape is preserved becausequoteattrpicks double quotes for plain strings).