Fix expensive decompression in JPEG GetImageInfo #44066

mkuchnik · 2020-10-15T22:18:51Z

The current implementation of GetImageInfo starts a libjpeg decompression pass over input images. Certain images (e.g., progressive JPEG) trigger a full image decompression, resulting in a performance degradation. This commit switches to an equivalent but cheaper libjpeg call for evaluating image dimensions.

The end result is performance is currently slowed down on these images by over 50x when using ExtractJpegShape.

To reproduce, download input image as 'test_img.jpg':

Convert it to progressive jpeg:
jpegtran -progressive test_img.jpg > test_img_progressive.jpg

And benchmark using test_img_progressive.jpg with use_shape=False and use_shape=True. On my machine, the first finishes in 0.12 seconds and the second in 8.85 seconds (roughly 70x performance degradation).

import tensorflow as tf
import numpy as np

import timeit

# Inputs
#filenames = ["test_img.jpg"]
filenames = ["test_img_progressive.jpg"]
#use_shape = False
use_shape = True
# End Inputs

def read_path(file_path):
    img = tf.io.read_file(file_path)
    return img

def jpeg_to_shape(image_buffer):
  shape = tf.image.extract_jpeg_shape(image_buffer)
  return shape

def fake_jpeg_to_shape(image_buffer):
  """Pretend we read shape using extract jpeg shape"""
  shape = np.array([1920, 1080, 3], dtype=np.int32)
  return shape

def create_dataset():
    dataset = tf.data.Dataset.from_tensor_slices(filenames)
    dataset = dataset.map(read_path)
    dataset = dataset.cache()
    if use_shape:
        dataset = dataset.map(jpeg_to_shape)
    else:
        dataset = dataset.map(fake_jpeg_to_shape)
    return dataset

def noop():
    return None

def run_loop(dataset):
    for x in dataset:
        noop()

def main():
    dataset = create_dataset()
    dataset = dataset.repeat(1000) # run 1e3 times
    timeit_results = timeit.timeit(lambda: run_loop(dataset),
                                   number=1
                                   )
    print("Elapsed time:\n{}".format(timeit_results))

if __name__ == "__main__":
    main()

Inspecting profiling results, the data decoding part of the decompression pass is being performed on the JPEG. In other words, the JPEG is being decompressed at near the full cost of decompression rather than simply extracting the shape (a metadata operation).

This PR replaces the jpeg_start_decompress call with a call to jpeg_calc_output_dimensions, which fills out the required cinfo fields without decompressing the data.

The current implementation of GetImageInfo starts a libjpeg decompression pass over input images. Certain images (e.g., progressive JPEG) trigger a full image decompression, resulting in a performance degradation. This commit switches to an equivalent but cheaper libjpeg call for evaluating image dimensions.

google-ml-butler bot added the size:XS CL Change Size: Extra Small label Oct 15, 2020

google-cla bot added the cla: yes label Oct 15, 2020

mkuchnik mentioned this pull request Oct 15, 2020

Expensive decompression in JPEG GetImageInfo #44067

Closed

Fix typo

d846e71

gbaned self-assigned this Oct 16, 2020

gbaned requested a review from mihaimaruseac October 16, 2020 04:08

mihaimaruseac approved these changes Oct 16, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 16, 2020

mihaimaruseac requested a review from hyeygit October 16, 2020 18:52

kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 16, 2020

copybara-service bot merged commit 1ce9513 into tensorflow:master Oct 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix expensive decompression in JPEG GetImageInfo #44066

Fix expensive decompression in JPEG GetImageInfo #44066

Uh oh!

mkuchnik commented Oct 15, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix expensive decompression in JPEG GetImageInfo #44066

Fix expensive decompression in JPEG GetImageInfo #44066

Uh oh!

Conversation

mkuchnik commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mkuchnik commented Oct 15, 2020 •

edited

Loading