Make image metadata available to the processors. #625

sbaechler · 2015-12-04T11:25:19Z

Metadata, not just Exif but also IPTC and XMP data can contain information that could be used by Thumbor.

As far as I can see, PIL can only read Exif data. This would mean a CLI tool such as exiftool would have to be used. Exif and IPTC are key-value based. XMP has a tree structure.

Possible use cases would be:

The focus point of an image could be set by the photographer and stored in the Exif data and would not have to be defined by whomever uploads an image into the CMS.
Some images should not be cropped because of copyright issues. If a metadata entry existed that prevents an image from being cropped, then it would just have padding added instead.

You could assign the issue to me. I'll try to implement a solution within the next few weeks.

The text was updated successfully, but these errors were encountered:

masom · 2015-12-04T13:48:23Z

There was a disucssion on creating a Media object to hold the image buffer + metadata at #577

Adding IPTC and XMP metadata would make a lot of sense.

heynemann · 2015-12-04T15:54:46Z

+1

sbaechler · 2016-01-07T17:10:25Z

This turned out to be more difficult than I thought.

There is a Python XMP Toolkit that can read and write XMP metadata to and from an image file. Unfortunately only to image files, not image buffers. The limitation seems to be set by the underlying C-Library Exempi.

The only way to extract XMP metadata from an image buffer for now is to do it by hand. I have created a prototype implementation for jpeg images and the PIL engine here: https://github.com/sbaechler/thumbor/commit/3cb53677870029ca09045f0ebfb03693a08899ee

The XMP data can then be further processed by the XMP toolkit.

My ultimate goal is to be able to write XMP data as well. If we used the Media object and the FileStorage, it would be possible to pass the file path along to a filter that would modify the meta data.

gi11es · 2016-01-13T17:19:14Z

If that framework only deals with files, why not dump the buffer into a temp file?

sbaechler · 2016-01-14T12:27:29Z

@gi11es This would add a dependency on the file system. I found another Python XMP Framework Pyexiv which is based on a different C library. I'll give that one a try first.

masom · 2016-01-14T14:28:28Z

@sbaechler the current filters and optimizers receive a tmp file buffer, already on the filesystem.

sbaechler · 2016-01-22T16:57:23Z

@masom Maybe that temp file won't be available once the Media object is used.
I tried another version with Pyexiv2 to read the image metadata. This library works well with image buffers. It is a bit tricky to install. It needs some C libraries that have to be installed first. It's also deprecated in favor of another library called Gexiv (A Gnome library). However, I did not manage to get that library working on my system. Since they are based on the same underlying library (Exiv), it should be easy to swap them once the maintainers fix the installation issues. Maybe it's even possible to support both libraries if they have the same interface.

The best thing about Pyexiv is that it not only supports XMP, but also IPTC and Exif. It can also converts all data into Python objects.

https://github.com/sbaechler/thumbor/commit/060f0e6ed70e3fbbe67b2339f63b0a2e4f47bb44

I added the code to the Engine class. Maybe it is better to keep it in a separate module. That way any filter that needed to access the metadata would have to instantiate the Metadata class, but the code itself would be independent from the engine (or the Media object).

masom · 2016-01-22T16:58:22Z

A lot of tools don't work well with STDIN / STDOUT.

ffmpeg for instance will stutter and produce weird output if the input file is STDIN.

gi11es · 2016-01-22T17:00:02Z

That's because by default ffmpeg is interactive while it runs. You can press a key to abort its processing. There's an ffmpeg option to turn that off, though.

gi11es · 2016-01-22T17:02:47Z

Actually maybe not an option, but I remember that there's a way to work around that problem.

sbaechler · 2016-01-24T21:52:52Z

@masom Pyexiv2 doesn't use STDIN/STDOUT. It's a Python library. The biggest issue is that it requires the boost and exiv2 C++ libraries wich have to be built with Python bindings. But still, this is the easiest and most comfortable way of getting the image metadata that I have found so far.

heynemann · 2016-01-24T21:58:53Z

I don't think hard dependencies are an issue as long as this is a plugin
and not built-in.
On Jan 24, 2016 19:52, "Simon Bächler" notifications@github.com wrote:

@masom https://github.com/masom Pyexiv2 doesn't use STDIN/STDOUT. It's
a Python library. The biggest issue is that it requires the boost
http://www.boost.org/ and exiv2 C++ libraries wich have to be built
with Python bindings. But still, this is the easiest and most comfortable
way of getting the image metadata that I have found so far.

—
Reply to this email directly or view it on GitHub
#625 (comment).

sbaechler · 2016-01-24T22:11:46Z

The metadata can only be extracted from the raw buffer, not the engine.image. PIL strips all metadata when creating an Image object. Therefore if the metadata should be available to the app, then it has to be extracted in the Engine.load() or Engine.create_image() method. Which are in core.

The metadata extraction is only done if the library is installed.

Another option is to extract the metadata from the temp file in the filters. The downside of this is that it creates additional I/O.

sbaechler · 2016-02-17T17:01:21Z

Fixed in #661

jimas14 · 2021-02-23T20:18:49Z

@sbaechler Does this fix only provide metadata to the engine? Is there still more work to do to have IPTC data persist on output images?

sbaechler · 2021-03-01T21:25:33Z

@jimas14 Back then there was an effort to create a rich Media object that gets passed through the pipeline instead of a global context and just passing the buffer around. #577. I don't know what the current state is on this one but it would simplify adding metadata back to the transformed image.

sbaechler mentioned this issue Feb 11, 2016

Add Metadata extraction and make this and the target size available for filters. #661

Merged

sbaechler closed this as completed Feb 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make image metadata available to the processors. #625

Make image metadata available to the processors. #625

sbaechler commented Dec 4, 2015

masom commented Dec 4, 2015

heynemann commented Dec 4, 2015

sbaechler commented Jan 7, 2016

gi11es commented Jan 13, 2016

sbaechler commented Jan 14, 2016

masom commented Jan 14, 2016

sbaechler commented Jan 22, 2016

masom commented Jan 22, 2016

gi11es commented Jan 22, 2016

gi11es commented Jan 22, 2016

sbaechler commented Jan 24, 2016

heynemann commented Jan 24, 2016

sbaechler commented Jan 24, 2016

sbaechler commented Feb 17, 2016

jimas14 commented Feb 23, 2021 •

edited

sbaechler commented Mar 1, 2021

Make image metadata available to the processors. #625

Make image metadata available to the processors. #625

Comments

sbaechler commented Dec 4, 2015

masom commented Dec 4, 2015

heynemann commented Dec 4, 2015

sbaechler commented Jan 7, 2016

gi11es commented Jan 13, 2016

sbaechler commented Jan 14, 2016

masom commented Jan 14, 2016

sbaechler commented Jan 22, 2016

masom commented Jan 22, 2016

gi11es commented Jan 22, 2016

gi11es commented Jan 22, 2016

sbaechler commented Jan 24, 2016

heynemann commented Jan 24, 2016

sbaechler commented Jan 24, 2016

sbaechler commented Feb 17, 2016

jimas14 commented Feb 23, 2021 • edited

sbaechler commented Mar 1, 2021

jimas14 commented Feb 23, 2021 •

edited