Error when train on customized dataset: Invalid JPEG data or crop window, data size 36864 #455

panfeng-hover · 2019-07-20T01:02:43Z

It seems to be Invalid JPEG data or crop window error, but I double-check the image format in my tf records are jpegs, I am wondering any possible reason that could cause this error?

The code I check the image format in tf records:

for tfrecord in tqdm(tfrecord_files):
    for example in tqdm(tf.python_io.tf_record_iterator(tfrecord)):
        data = tf.train.Example.FromString(example)
        encoded_jpg = data.features.feature['image/encoded'].bytes_list.value[0]
        img = Image.open(BytesIO(encoded_jpg))
        assert img.format == 'JPEG'

The log when I met the error:

E0719 23:46:18.549607 139925925385984 error_handling.py:70] Error recorded frominfeed: From /job:worker/replica:0/task:0:
Invalid JPEG data or crop window, data size 36864
         [[{{node parser/case/cond/else/_20/cond_jpeg/then/_0/DecodeJpeg}}]]
         [[input_pipeline_task0/while/IteratorGetNext_1]]
E0719 23:46:18.572818 139925916993280 error_handling.py:70] Error recorded fromoutfeed: From /job:worker/replica:0/task:0:
Bad hardware status: 0x1
         [[node OutfeedDequeueTuple_4 (defined at /home/panfeng/projects/tpu/models/official/mask_rcnn/distributed_executer.py:115) ]]

Original stack trace for u'OutfeedDequeueTuple_4':
  File "tpu/models/official/mask_rcnn/mask_rcnn_main.py", line 156, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "tpu/models/official/mask_rcnn/mask_rcnn_main.py", line 151, in main
    run_executer(params, train_input_fn, eval_input_fn)
  File "tpu/models/official/mask_rcnn/mask_rcnn_main.py", line 99, in run_executer
    executer.train(train_input_fn, FLAGS.eval_after_training, eval_input_fn)
  File "/home/panfeng/projects/tpu/models/official/mask_rcnn/distributed_executer.py", line 115, in train
input_fn=train_input_fn, max_steps=self._model_params.total_steps)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2721, in train
    saving_listeners=saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 362, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1184, in _train_model_default
    features, labels, ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2560, in _call_model_fn
    config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1142, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2870, in _model_fn
    host_ops = host_call.create_tpu_hostcall()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1943, in create_tpu_hostcall
    device_ordinal=ordinal_id)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_tpu_ops.py", line 3190, in outfeed_dequeue_tuple
    device_ordinal=device_ordinal, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()
E0719 23:46:19.930372 139927321310656 error_handling.py:70] Error recorded fromtraining_loop: From /job:worker/replica:0/task:0:
9 root error(s) found.
  (0) Cancelled: Node was closed
  (1) Cancelled: Node was closed
  (2) Cancelled: Node was closed
  (3) Cancelled: Node was closed
  (4) Cancelled: Node was closed
  (5) Cancelled: Node was closed
  (6) Cancelled: Node was closed
  (7) Cancelled: Node was closed
  (8) Invalid argument: Gradient for resnet50/batch_normalization_32/beta:0 is NaN : Tensor had NaN values
         [[node CheckNumerics_98 (defined at /home/panfeng/projects/tpu/models/official/mask_rcnn/distributed_executer.py:115) ]]
0 successful operations.
0 derived errors ignored.

The text was updated successfully, but these errors were encountered:

saberkun · 2019-07-20T03:10:52Z

Is there any data corruption? it turns out to be very common like: tensorflow/tensorflow#7434

In this case, the error happens in input pipeline. It is necessary to debug on cpu and validate if data can be accessed correctly. I would recommend to write a simple program to test data pipeline. Here is an example to read data in eager mode: https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/post_quantization.py#L49

panfeng-hover · 2019-07-21T07:19:39Z

Thanks for your reply. Yeah, it is due to file transfer issue, I generated the tf records on another remote machine. I later met the corrupted tf record files error similar to corrupted record at 12, fixed by increasing the number of shards.

mkr2667 · 2020-12-12T05:00:57Z

InvalidArgumentError: Invalid JPEG data or crop window, data size 114304 [[{{node DecodeJpeg}}]]

I am getting this error when i am running the below code

def load_image(image_path):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, (299, 299))
    img = tf.keras.applications.inception_v3.preprocess_input(img)
    return img, image_path

image_model = tf.keras.applications.InceptionV3(include_top=False,weights='imagenet')
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
image_features_extract_model = tf.keras.Model(new_input, hidden_layer)

encode_train = sorted(set(img_name_vector))
image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)
image_dataset = image_dataset.map(load_image, num_parallel_calls=1).batch(64)
for img, path in tqdm(image_dataset):
    print("\nimage path {} : {}".format(img, path))
    batch_features = image_features_extract_model(img)
    batch_features = tf.reshape(batch_features,(batch_features.shape[0], -1, batch_features.shape[3]))
    for bf, p in zip(batch_features, path):
        path_of_feature = p.numpy().decode("utf-8")
        #print("{}:{}".format(path_of_feature,bf.numpy()))
        np.save(path_of_feature, bf.numpy())

The log when i met the error:
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in execution_mode(mode)
2101 ctx.executor = executor_new
-> 2102 yield
2103 finally:

11 frames
InvalidArgumentError: Invalid JPEG data or crop window, data size 114304
[[{{node DecodeJpeg}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py in wait(self)
65 def wait(self):
66 """Waits for ops dispatched in this executor to finish."""
---> 67 pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
68
69 def clear_error(self):

InvalidArgumentError: Invalid JPEG data or crop window, data size 114304
[[{{node DecodeJpeg}}]]

Could please help how to resolve this same question is asked in stack overflow but no clear answer on internet please answer ASAP

milad-4274 · 2020-12-19T07:37:17Z

I faced similar problem. there is a problem in some of your training data. you can use code below to check which jpeg image is corrupted and delete it.

from struct import unpack
import os


marker_mapping = {
    0xffd8: "Start of Image",
    0xffe0: "Application Default Header",
    0xffdb: "Quantization Table",
    0xffc0: "Start of Frame",
    0xffc4: "Define Huffman Table",
    0xffda: "Start of Scan",
    0xffd9: "End of Image"
}


class JPEG:
    def __init__(self, image_file):
        with open(image_file, 'rb') as f:
            self.img_data = f.read()
    
    def decode(self):
        data = self.img_data
        while(True):
            marker, = unpack(">H", data[0:2])
            # print(marker_mapping.get(marker))
            if marker == 0xffd8:
                data = data[2:]
            elif marker == 0xffd9:
                return
            elif marker == 0xffda:
                data = data[-2:]
            else:
                lenchunk, = unpack(">H", data[2:4])
                data = data[2+lenchunk:]            
            if len(data)==0:
                break        


bads = []

for img in tqdm(images):
  image = osp.join(root_img,img)
  image = JPEG(image) 
  try:
    image.decode()   
  except:
    bads.append(img)


for name in bads:
  os.remove(osp.join(root_img,name))

I used yasoob script to decode jpeg image.

rdvelazquez · 2021-09-02T14:27:36Z

Thank you @milad-4274 (and yasoob) for sharing this jpeg checking script. It saved the day for us!

For others who may be looking at this, I made a few small revisions to your script to get it working for us, the most important of which was replacing:

            if len(data)==0:
                break

with:

            if len(data)==0:
               raise TypeError("issue reading jpeg file")

The other small changes were importing tqdm: from tqdm import tqdm, replacing osp.join with os.path.join and reading in the list of images with somethings like:

for dirName, subdirList, fileList in os.walk(img_dir):
    imagesList = fileList
    for img in tqdm(imagesList):

Thanks again 👍

UPDATE:
The script found one bad image (out of ~200,000) but after removing that image we still saw the invalid JPEG error.
Our next approach is using the image size printed out in the error message to try to find the offending image ls -l | grep <image_size> and then remove images with that exact file size (seems to work for JPEGs because, although our images are mostly all the same pixel dimensions, the image sizes are somewhat unique)

OnSebii · 2021-09-28T16:01:02Z

Thank you @milad-4274 (and yasoob) for sharing this jpeg checking script. It saved the day for us!

For others who may be looking at this, I made a few small revisions to your script to get it working for us, the most important of which was replacing:
            if len(data)==0:
                break    
with:
            if len(data)==0:
               raise TypeError("issue reading jpeg file")    
The other small changes were importing tqdm: from tqdm import tqdm, replacing osp.join with os.path.join and reading in the list of images with somethings like:
for dirName, subdirList, fileList in os.walk(img_dir):
    imagesList = fileList
    for img in tqdm(imagesList):
Thanks again 👍

UPDATE: The script found one bad image (out of ~200,000) but after removing that image we still saw the invalid JPEG error. Our next approach is using the image size printed out in the error message to try to find the offending image ls -l | grep <image_size> and then remove images with that exact file size (seems to work for JPEGs because, although our images are mostly all the same pixel dimensions, the image sizes are somewhat unique)

And how can I start this script?
for dirName, subdirList, fileList in os.walk(img_dir): NameError: name 'img_dir' is not defined

rdvelazquez · 2021-09-28T16:10:09Z

@OnSebii You need to define the path to the directory where your images are stored img_dir = "./<path_to_image_dir>/" as either an absolute path or a relative path (from where your python script is called) above the for dirName, subdirList, fileList in os.walk(img_dir): line.

antonison · 2022-06-10T00:30:30Z

@OnSebii You need to define the path to the directory where your images are stored img_dir = "./<path_to_image_dir>/" as either an absolute path or a relative path (from where your python script is called) above the for dirName, subdirList, fileList in os.walk(img_dir): line.

I do everything but it won't recognize the root_img. It raises an error that reads as follows:
NameError: name 'root_img' is not defined

What should I replace it with? Thank you!

sarLum52 · 2022-07-29T06:09:49Z

@OnSebii You need to define the path to the directory where your images are stored img_dir = "./<path_to_image_dir>/" as either an absolute path or a relative path (from where your python script is called) above the for dirName, subdirList, fileList in os.walk(img_dir): line.

I do everything but it won't recognize the root_img. It raises an error that reads as follows: NameError: name 'root_img' is not defined

What should I replace it with? Thank you!

I am having the same issue with root_img too. Were you able to resolve it? I am pretty new to all of this

choudharyfaisal · 2022-08-18T14:45:34Z

img_dir = ( 'same path ' )
root_img = ( ' same path ' )

biphasic · 2024-04-30T10:34:24Z

Here the complete code with modifications that does the job for me

from struct import unpack
import os
from tqdm import tqdm

marker_mapping = {
    0xffd8: "Start of Image",
    0xffe0: "Application Default Header",
    0xffdb: "Quantization Table",
    0xffc0: "Start of Frame",
    0xffc4: "Define Huffman Table",
    0xffda: "Start of Scan",
    0xffd9: "End of Image"
}


class JPEG:
    def __init__(self, image_file):
        with open(image_file, 'rb') as f:
            self.img_data = f.read()
    
    def decode(self):
        data = self.img_data
        while(True):
            marker, = unpack(">H", data[0:2])
            # print(marker_mapping.get(marker))
            if marker == 0xffd8:
                data = data[2:]
            elif marker == 0xffd9:
                return
            elif marker == 0xffda:
                data = data[-2:]
            else:
                lenchunk, = unpack(">H", data[2:4])
                data = data[2+lenchunk:]            
            if len(data)==0:
                raise TypeError("issue reading jpeg file")            


# list all files in directory
folder_path = 'data/train_v2'
image_paths = os.listdir(folder_path)

corrupted_jpegs = []

for img_path in tqdm(image_paths):
  full_image_path = os.path.join(folder_path, img_path)
  image = JPEG(full_image_path) 
  try:
    image.decode()   
  except:
    corrupted_jpegs.append(img_path)
    print(f"Corrupted image: {img_path}")

print(corrupted_jpegs)

panfeng-hover closed this as completed Jul 21, 2019

saikumarchalla mentioned this issue Oct 28, 2020

Premature end of JPEG file tensorflow/models#9424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when train on customized dataset: Invalid JPEG data or crop window, data size 36864 #455

Error when train on customized dataset: Invalid JPEG data or crop window, data size 36864 #455

panfeng-hover commented Jul 20, 2019 •

edited

Loading

saberkun commented Jul 20, 2019

panfeng-hover commented Jul 21, 2019

mkr2667 commented Dec 12, 2020 •

edited

Loading

milad-4274 commented Dec 19, 2020 •

edited

Loading

rdvelazquez commented Sep 2, 2021 •

edited

Loading

OnSebii commented Sep 28, 2021 •

edited

Loading

rdvelazquez commented Sep 28, 2021

antonison commented Jun 10, 2022 •

edited

Loading

sarLum52 commented Jul 29, 2022

choudharyfaisal commented Aug 18, 2022

biphasic commented Apr 30, 2024

Error when train on customized dataset: Invalid JPEG data or crop window, data size 36864 #455

Error when train on customized dataset: Invalid JPEG data or crop window, data size 36864 #455

Comments

panfeng-hover commented Jul 20, 2019 • edited Loading

saberkun commented Jul 20, 2019

panfeng-hover commented Jul 21, 2019

mkr2667 commented Dec 12, 2020 • edited Loading

milad-4274 commented Dec 19, 2020 • edited Loading

rdvelazquez commented Sep 2, 2021 • edited Loading

OnSebii commented Sep 28, 2021 • edited Loading

rdvelazquez commented Sep 28, 2021

antonison commented Jun 10, 2022 • edited Loading

sarLum52 commented Jul 29, 2022

choudharyfaisal commented Aug 18, 2022

biphasic commented Apr 30, 2024

panfeng-hover commented Jul 20, 2019 •

edited

Loading

mkr2667 commented Dec 12, 2020 •

edited

Loading

milad-4274 commented Dec 19, 2020 •

edited

Loading

rdvelazquez commented Sep 2, 2021 •

edited

Loading

OnSebii commented Sep 28, 2021 •

edited

Loading

antonison commented Jun 10, 2022 •

edited

Loading