Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM Object Detection Model Does Not Run #6253

Open
wrkgm opened this issue Feb 22, 2019 · 79 comments
Open

LSTM Object Detection Model Does Not Run #6253

wrkgm opened this issue Feb 22, 2019 · 79 comments
Assignees
Labels
models:research models that come under research directory

Comments

@wrkgm
Copy link

wrkgm commented Feb 22, 2019

System information

  • What is the top-level directory of the model you are using: lstm_object_detection
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Trying to
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 1.12.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: 9.0
  • GPU model and memory: GTX 1070 ti
  • Exact command to reproduce: python train.py --train_dir=training --pipeline_config_path=configs/lstm_ssd_mobilenet_v1_imagenet.config

Describe the problem

Training the LSTM object detection model does not work. After making a tfrecord, modifying the config as necessary, creating a training dir, and running the command, I get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to explicitly squeeze dimension 0 but dimension was not 1: 0
         [[Node: Squeeze_1 = Squeeze[T=DT_INT64, squeeze_dims=[0], _device="/job:localhost/replica:0/task:0/device:CPU:0"](split_2)]]

More documentation, including a simple example file of how to make a tfrecord and train, would be very helpful. I have tried two ways to create a tfrecord, both of which are shown below. I thought maybe the record structure is wrong, but if I put a typo in the record keys, I get a different error complaining about that, so perhaps I structured the records correctly. I tried looking at the model in tensorboard and modifying the training code in slim/learning.py to fetch values from individual nodes near Squeeze_1. I print the node, the output, and the shape of the output. Here are results from these attempts:

try run:  split_1:0
Tensor("split_1:0", shape=(?, ?, 4), dtype=float32, device=/device:CPU:0)
value: []
size: (0, 0, 4)

try run:  ParseSingleSequenceExample/ParseSingleSequenceExample:0
Tensor("ParseSingleSequenceExample/ParseSingleSequenceExample:0", shape=(), dtype=string, device=/device:CPU:0)
value: b''
size: ()

try run:  ResizeImage/resize_images/ResizeBilinear:0
Tensor("ResizeImage/resize_images/ResizeBilinear:0", shape=(4, 256, 256, 3), dtype=float32, device=/device:CPU:0)
value: (big numpy array)
size: (4, 256, 256, 3)

It seems that split_1 and ParseSingleSequenceExample are not actually receiving any data, and thus cause this squeeze error since there is nothing to squeeze. But resize image still gets data.
Additionally, if I ONLY fetch ResizeImage/resize_images/ResizeBilinear:0, I can fetch it a couple of times (repeatedly fetching in a loop), and then it fails. Perhaps the model fails after one batch?

I'm not sure if this counts a duplicate, but here are some related threads:
#6027
#5869
https://stackoverflow.com/questions/54093931/lstm-object-detection-tensorflow

I've also emailed the authors and heard nothing back.

EDIT:

I should mention, I removed ssd_random_crop from data augmentation options in the config because it was giving me an error "the function ssd_random_crop requires argument groundtruth_weights"
Not sure if this would matter at all

Source code / logs

I tried two ways of creating tfrecords. The first was taken from tf_sequence_example_decoder_test.py, in this repo. The only change was swapping to sequences of length 4 to match the config file.

writer = tf.python_io.TFRecordWriter(path)
with tf.Session() as sess:
    for _ in range(2000):
        image_tensor = np.random.randint(255, size=(16, 16, 3)).astype(np.uint8)
        print(image_tensor)

        encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()

        sequence_example = example_pb2.SequenceExample(
            context=feature_pb2.Features(
                feature={
                    'image/format':
                        feature_pb2.Feature(
                            bytes_list=feature_pb2.BytesList(
                                value=['jpeg'.encode('utf-8')])),
                    'image/height':
                        feature_pb2.Feature(
                            int64_list=feature_pb2.Int64List(value=[16])),
                    'image/width':
                        feature_pb2.Feature(
                            int64_list=feature_pb2.Int64List(value=[16])),
                }),
            feature_lists=feature_pb2.FeatureLists(
                feature_list={
                    'image/encoded':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg]))
                        ]),
                    'image/object/bbox/xmin':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0]))
                        ]),
                    'image/object/bbox/xmax':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0]))
                        ]),
                    'image/object/bbox/ymin':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0]))
                        ]),
                    'image/object/bbox/ymax':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0]))
                        ]),
                    'image/object/class/label':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1]))
                        ]),
                }))

        writer.write(sequence_example.SerializeToString())
writer.close()

I also tried adapting a method I found here: https://github.com/wakanda-ai/tf-detectors
For this I used a couple sample xml files in PASCAL VOC format from a training set I have for one of the normal object_detection models.

    # Iterate frames
    for data, img_path in zip(dicts, imgs_path):
        ## open single frame
        with tf.gfile.FastGFile(img_path, 'rb') as fid:
            encoded_jpg = fid.read()
        encoded_jpg_io = io.BytesIO(encoded_jpg)
        image = Image.open(encoded_jpg_io)
        if image.format != 'JPEG':
            raise ValueError('Image format not JPEG')
        key = hashlib.sha256(encoded_jpg).hexdigest()

        ## validation
        assert int(data['size']['height']) == height
        assert int(data['size']['width']) == width

        ## iterate objects
        xmin, ymin = [], []
        xmax, ymax = [], []
        name = []
        classval =  []
        occluded = []
        generated = []
        if 'object' in data:
            for obj in data['object']:
                xmin.append(float(obj['bndbox']['xmin']) / width)
                ymin.append(float(obj['bndbox']['ymin']) / height)
                xmax.append(float(obj['bndbox']['xmax']) / width)
                ymax.append(float(obj['bndbox']['ymax']) / height)
                name.append(obj['name'].encode('utf8'))
                classval.append(1)
                occluded.append(0)
                generated.append(0)
        else:
            xmin.append(float(-1))
            ymin.append(float(-1))
            xmax.append(float(-1))
            ymax.append(float(-1))
            name.append('NoObject'.encode('utf8'))
            classval.append(1)
            occluded.append(0)
            generated.append(0)

        ## append tf_feature to list
        filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
        encodeds.append(dataset_util.bytes_feature(encoded_jpg))
        sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
        keys.append(dataset_util.bytes_feature(key.encode('utf8')))
        formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
        xmins.append(dataset_util.float_list_feature(xmin))
        ymins.append(dataset_util.float_list_feature(ymin))
        xmaxs.append(dataset_util.float_list_feature(xmax))
        ymaxs.append(dataset_util.float_list_feature(ymax))
        names.append(dataset_util.bytes_list_feature(name))
        occludeds.append(dataset_util.int64_list_feature(occluded))
        generateds.append(dataset_util.int64_list_feature(generated))
        class_labels.append(dataset_util.int64_list_feature(classval))

    # Non sequential features
    context = tf.train.Features(feature={
        'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
        'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
        'video/height': dataset_util.int64_feature(height),
        'video/width': dataset_util.int64_feature(width),
        })
    # Sequential features
    tf_feature_lists = {
        'image/filename': tf.train.FeatureList(feature=filenames),
        'image/encoded': tf.train.FeatureList(feature=encodeds),
        'image/sources': tf.train.FeatureList(feature=sources),
        'image/key/sha256': tf.train.FeatureList(feature=keys),
        'image/format': tf.train.FeatureList(feature=formats),
        'image/object/bbox/xmin': tf.train.FeatureList(feature=xmins),
        'image/object/bbox/xmax': tf.train.FeatureList(feature=xmaxs),
        'image/object/bbox/ymin': tf.train.FeatureList(feature=ymins),
        'image/object/bbox/ymax': tf.train.FeatureList(feature=ymaxs),
        'image/object/class/text': tf.train.FeatureList(feature=names),
        'image/object/class/label': tf.train.FeatureList(feature=class_labels),
        'image/object/occluded': tf.train.FeatureList(feature=occludeds),
        'image/object/generated': tf.train.FeatureList(feature=generateds),
        }
    feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
    # Make single sequence example
    tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)
    return tf_example

Tfrecords created with both of these approaches yielded identical errors.

@whasyt
Copy link

whasyt commented Feb 25, 2019

I‘ve met the same "squeeze" issue with @wrkgm .

I tried to comment the data augmentation options in config:

#  data_augmentation_options {
#    random_horizontal_flip {
#    }
#  }
#  data_augmentation_options {
#    ssd_random_crop {
#      groundtruth_weights: 1.0
#    }
#  }

then meet the error:

root@747-Super-Server:~/tensorflow/models/research# python lstm_object_detection/train.py --train_dir=/home1/lstmDetection/model --pipeline_config_path=lstm_object_detection/configs/lstm_ssd_mobilenet_v1_imagenet.config
.....
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [All sequence lengths must match, but received lengths: 0 All sequence lengths must match, but received lengths: 0 All sequence lengths must match, but received lengths: 1]
         [[Node: batch_sequences_with_states/Assert_2/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_sequences_with_states/Equal_2, batch_sequences_with_states/StringJoin)]]

@ashkanee
Copy link

ashkanee commented Feb 25, 2019

System information

  • What is the top-level directory of the model you are using: lstm_object_detection
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Trying to
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 1.12.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: 9.0
  • GPU model and memory: GTX 1070 ti
  • Exact command to reproduce: python train.py --train_dir=training --pipeline_config_path=configs/lstm_ssd_mobilenet_v1_imagenet.config

Describe the problem

Training the LSTM object detection model does not work. After making a tfrecord, modifying the config as necessary, creating a training dir, and running the command, I get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to explicitly squeeze dimension 0 but dimension was not 1: 0
         [[Node: Squeeze_1 = Squeeze[T=DT_INT64, squeeze_dims=[0], _device="/job:localhost/replica:0/task:0/device:CPU:0"](split_2)]]

More documentation, including a simple example file of how to make a tfrecord and train, would be very helpful. I have tried two ways to create a tfrecord, both of which are shown below. I thought maybe the record structure is wrong, but if I put a typo in the record keys, I get a different error complaining about that, so perhaps I structured the records correctly. I tried looking at the model in tensorboard and modifying the training code in slim/learning.py to fetch values from individual nodes near Squeeze_1. I print the node, the output, and the shape of the output. Here are results from these attempts:

try run:  split_1:0
Tensor("split_1:0", shape=(?, ?, 4), dtype=float32, device=/device:CPU:0)
value: []
size: (0, 0, 4)

try run:  ParseSingleSequenceExample/ParseSingleSequenceExample:0
Tensor("ParseSingleSequenceExample/ParseSingleSequenceExample:0", shape=(), dtype=string, device=/device:CPU:0)
value: b''
size: ()

try run:  ResizeImage/resize_images/ResizeBilinear:0
Tensor("ResizeImage/resize_images/ResizeBilinear:0", shape=(4, 256, 256, 3), dtype=float32, device=/device:CPU:0)
value: (big numpy array)
size: (4, 256, 256, 3)

It seems that split_1 and ParseSingleSequenceExample are not actually receiving any data, and thus cause this squeeze error since there is nothing to squeeze. But resize image still gets data.
Additionally, if I ONLY fetch ResizeImage/resize_images/ResizeBilinear:0, I can fetch it a couple of times (repeatedly fetching in a loop), and then it fails. Perhaps the model fails after one batch?

I'm not sure if this counts a duplicate, but here are some related threads:
#6027
#5869
https://stackoverflow.com/questions/54093931/lstm-object-detection-tensorflow

I've also emailed the authors and heard nothing back.

Source code / logs

I tried two ways of creating tfrecords. The first was taken from tf_sequence_example_decoder_test.py, in this repo. The only change was swapping to sequences of length 4 to match the config file.

writer = tf.python_io.TFRecordWriter(path)
with tf.Session() as sess:
    for _ in range(2000):
        image_tensor = np.random.randint(255, size=(16, 16, 3)).astype(np.uint8)
        print(image_tensor)

        encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()

        sequence_example = example_pb2.SequenceExample(
            context=feature_pb2.Features(
                feature={
                    'image/format':
                        feature_pb2.Feature(
                            bytes_list=feature_pb2.BytesList(
                                value=['jpeg'.encode('utf-8')])),
                    'image/height':
                        feature_pb2.Feature(
                            int64_list=feature_pb2.Int64List(value=[16])),
                    'image/width':
                        feature_pb2.Feature(
                            int64_list=feature_pb2.Int64List(value=[16])),
                }),
            feature_lists=feature_pb2.FeatureLists(
                feature_list={
                    'image/encoded':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg])), feature_pb2.Feature(
                                bytes_list=feature_pb2.BytesList(
                                    value=[encoded_jpeg]))
                        ]),
                    'image/object/bbox/xmin':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0]))
                        ]),
                    'image/object/bbox/xmax':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0]))
                        ]),
                    'image/object/bbox/ymin':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[0.0]))
                        ]),
                    'image/object/bbox/ymax':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0])),
                            feature_pb2.Feature(
                                float_list=feature_pb2.FloatList(value=[1.0]))
                        ]),
                    'image/object/class/label':
                        feature_pb2.FeatureList(feature=[
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1])),
                            feature_pb2.Feature(
                                int64_list=feature_pb2.Int64List(value=[1]))
                        ]),
                }))

        writer.write(sequence_example.SerializeToString())
writer.close()

I also tried adapting a method I found here: https://github.com/wakanda-ai/tf-detectors
For this I used a couple sample xml files in PASCAL VOC format from a training set I have for one of the normal object_detection models.

    # Iterate frames
    for data, img_path in zip(dicts, imgs_path):
        ## open single frame
        with tf.gfile.FastGFile(img_path, 'rb') as fid:
            encoded_jpg = fid.read()
        encoded_jpg_io = io.BytesIO(encoded_jpg)
        image = Image.open(encoded_jpg_io)
        if image.format != 'JPEG':
            raise ValueError('Image format not JPEG')
        key = hashlib.sha256(encoded_jpg).hexdigest()

        ## validation
        assert int(data['size']['height']) == height
        assert int(data['size']['width']) == width

        ## iterate objects
        xmin, ymin = [], []
        xmax, ymax = [], []
        name = []
        classval =  []
        occluded = []
        generated = []
        if 'object' in data:
            for obj in data['object']:
                xmin.append(float(obj['bndbox']['xmin']) / width)
                ymin.append(float(obj['bndbox']['ymin']) / height)
                xmax.append(float(obj['bndbox']['xmax']) / width)
                ymax.append(float(obj['bndbox']['ymax']) / height)
                name.append(obj['name'].encode('utf8'))
                classval.append(1)
                occluded.append(0)
                generated.append(0)
        else:
            xmin.append(float(-1))
            ymin.append(float(-1))
            xmax.append(float(-1))
            ymax.append(float(-1))
            name.append('NoObject'.encode('utf8'))
            classval.append(1)
            occluded.append(0)
            generated.append(0)

        ## append tf_feature to list
        filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
        encodeds.append(dataset_util.bytes_feature(encoded_jpg))
        sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
        keys.append(dataset_util.bytes_feature(key.encode('utf8')))
        formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
        xmins.append(dataset_util.float_list_feature(xmin))
        ymins.append(dataset_util.float_list_feature(ymin))
        xmaxs.append(dataset_util.float_list_feature(xmax))
        ymaxs.append(dataset_util.float_list_feature(ymax))
        names.append(dataset_util.bytes_list_feature(name))
        occludeds.append(dataset_util.int64_list_feature(occluded))
        generateds.append(dataset_util.int64_list_feature(generated))
        class_labels.append(dataset_util.int64_list_feature(classval))

    # Non sequential features
    context = tf.train.Features(feature={
        'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
        'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
        'video/height': dataset_util.int64_feature(height),
        'video/width': dataset_util.int64_feature(width),
        })
    # Sequential features
    tf_feature_lists = {
        'image/filename': tf.train.FeatureList(feature=filenames),
        'image/encoded': tf.train.FeatureList(feature=encodeds),
        'image/sources': tf.train.FeatureList(feature=sources),
        'image/key/sha256': tf.train.FeatureList(feature=keys),
        'image/format': tf.train.FeatureList(feature=formats),
        'image/object/bbox/xmin': tf.train.FeatureList(feature=xmins),
        'image/object/bbox/xmax': tf.train.FeatureList(feature=xmaxs),
        'image/object/bbox/ymin': tf.train.FeatureList(feature=ymins),
        'image/object/bbox/ymax': tf.train.FeatureList(feature=ymaxs),
        'image/object/class/text': tf.train.FeatureList(feature=names),
        'image/object/class/label': tf.train.FeatureList(feature=class_labels),
        'image/object/occluded': tf.train.FeatureList(feature=occludeds),
        'image/object/generated': tf.train.FeatureList(feature=generateds),
        }
    feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
    # Make single sequence example
    tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)
    return tf_example

Tfrecords created with both of these approaches yielded identical errors.

@wrkgm I used this code to convert ImageNet VID 2015 to tfrecord, but get a different error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 0 [[{{node Squeeze_1}} = Squeeze[T=DT_INT64, squeeze_dims=[0], _device="/job:localhost/replica:0/task:0/device:CPU:0"](split_2)]]

@ashkanee
Copy link

ashkanee commented Feb 25, 2019

One important point, I also faced an error saying:

"the function ssd_random_crop requires argument groundtruth_weights"

I fixed it by adding the "groundtruth_weights" and putting it equal to "None".

Did you face the similar error? I thought that the current error might relate to it.

@wrkgm
Copy link
Author

wrkgm commented Feb 26, 2019

@ashkanee Good point! I forgot I did face this error as well. I simply removed ssd_random_crop entirely from data augmentation options. Will edit main post to mention this, thanks

@ashkanee
Copy link

@ashkanee Good point! I forgot I did face this error as well. I simply removed ssd_random_crop entirely from data augmentation options. Will edit main post to mention this, thanks

I added the groundtruth_weights for ssd_random_crop and fixed the bug, but got the same error as you. Based on this, I guess that may not be the source of the problem.

@ashkanee
Copy link

ashkanee commented Feb 26, 2019

@dreamdragon Also, one more observation:

1- If I use the check point, I get the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 10) and num_split 4 [[{{node split}} = Split[T=DT_UINT8, num_split=4, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_sequences_with_states/InputQueueingStateSaver/ExpandDims_1/dim, map/TensorArrayStack/TensorArrayGatherV3)]]

which seems that relates to GPU issues.

2- If I do not use the check point, I get the similar error as @wrkgm :

tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 0 [[{{node Squeeze_1}} = Squeeze[T=DT_INT64, squeeze_dims=[0], _device="/job:localhost/replica:0/task:0/device:CPU:0"](split_2)]]

Based on the above, it seems your issue relates to the missing check point.

Important point: Do you get warning on deprecated functions? I get them. May be the error is caused by deprecated functions in TensorFlow. This is probably worth working on.

@wrkgm Can I ask you the following:

1- Can you please check what happens if you add the check point? I use the check point weights here

2- Is it possible for you to explain how you used "tf_sequence_example_decoder_test.py" to generate tfrecord files? Does it give separate training and validation files?
Using this mixes the training and validation files.

Thanks!

Edit:
I am trying to train on one GPU: Tesla V100-SXM2-16GB, it seems that this issue relates when training using more than one GPU. There are more infomration here: #266

update: the error s most probably do not related to GPU

Edit:
Last observation: The errors are not related to check point and based on my experiments are based on the preprocessing part on data augmentation since empty tensors are passed on. This may caused by the way that data are converted to tfrecord.

@ashkanee
Copy link

ashkanee commented Feb 26, 2019

@wrkgm Do you get the warning:

WARNING:tensorflow:From /home/ashkan/models/research/object_detection/core/preprocessor.py:1218: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead

I guess it may help.

Edit: Fixing this still results in the error.

@ashkanee
Copy link

ashkanee commented Feb 26, 2019

I looked into the tensors in the file seq_dataset_builder.py and noticed that the following tensors are empty:

tensor_dict['groundtruth_boxes']
tensor_dict['groundtruth_classes']

In addition, I get one of the following errors each time (they appear in random order which seems to be related in the randomness of data augmentation part since they go away if you comment the data augmentation in the config file)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 0 [[{{node Squeeze_10}} = Squeeze[T=DT_INT64, squeeze_dims=[0], _device="/job:localhost/replica:0/task:0/device:CPU:0"](strided_slice_7)]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 10) and num_split 4 [[{{node split}} = Split[T=DT_UINT8, num_split=4, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_sequences_with_states/InputQueueingStateSaver/ExpandDims_1/dim, Print)]]

There are commands afterwards as tf.split and tf.squeeze and my understanding is that the errors are caused since empty tensors are being spitted or squeezed.

@ymodak ymodak added the models:research models that come under research directory label Feb 27, 2019
@ymodak ymodak added the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Feb 27, 2019
@wrkgm
Copy link
Author

wrkgm commented Mar 1, 2019

@ashkanee I just modified the code from that file to generate a tfrecord full of randomly generated numpy arrays. I pointed the train path config at this. Haven't worried about splitting train and test yet, I'm just trying to get the code to run for anything at all. I tried with and without checkpoint and got similar error.

It's almost certainly not related to the GPU. I've got a GTX 1070ti. And I get that same warning. I agree it likely has something to do with the input pipeline or preprocessing steps.

@Aaronreb
Copy link

While running train.py file from lstm_object_detection, I get the following error-

Traceback (most recent call last):
File "lstm_object_detection/train.py", line 185, in
tf.app.run()
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "lstm_object_detection/train.py", line 94, in main
FLAGS.pipeline_config_path)
File "/home/kt-ml1/models-master/models-master/research/lstm_object_detection/utils/config_util.py", line 46, in get_configs_from_pipeline_file
text_format.Merge(proto_str, pipeline_config)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/google/protobuf/text_format.py", line 574, in Merge
descriptor_pool=descriptor_pool)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/google/protobuf/text_format.py", line 631, in MergeLines
return parser.MergeLines(lines, message)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/google/protobuf/text_format.py", line 654, in MergeLines
self._ParseOrMerge(lines, message)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/google/protobuf/text_format.py", line 676, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/google/protobuf/text_format.py", line 735, in _MergeField
'that message's _pb2 module must be imported as well' % name)
google.protobuf.text_format.ParseError: 18:26 : Extension "object_detection.protos.lstm_model" not registered. Did you import the _pb2 module which defines it? If you are trying to place the extension in the MessageSet field of another message that is in an Any or MessageSet field, that message's _pb2 module must be imported as well

I figured out that there is something wrong with the config file in the lstm_object_detection.
Can someone please help me understand what changes do we need to do in the config file to run it successfully.

@dreamdragon
Copy link
Contributor

Replace object_detection.protos.lstm_model with lstm_object_detection.protos.lstm_model in the config.

We will fix this issue in the codebase shortly.

@Aaronreb
Copy link

Aaronreb commented May 15, 2019

Replace object_detection.protos.lstm_model with lstm_object_detection.protos.lstm_model in the config.

We will fix this issue in the codebase shortly.

Done. Thanks.
But, this got me another error as follows:-

TypeError: Expected binary or unicode string, got <object_detection.core.matcher.Match object at 0x7f5b089379b0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "lstm_object_detection/train.py", line 185, in
tf.app.run()
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "lstm_object_detection/train.py", line 181, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/kt-ml1/models-master/models-master/research/lstm_object_detection/trainer.py", line 293, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/kt-ml1/models-master/models-master/research/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "/home/kt-ml1/models-master/models-master/research/lstm_object_detection/trainer.py", line 174, in _create_losses
losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
File "/home/kt-ml1/models-master/models-master/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch.py", line 165, in loss
match_list = [matcher.Match(match) for match in tf.unstack(batch_match)]
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1149, in unstack
value = ops.convert_to_tensor(value)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
as_ref=False)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 304, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 245, in constant
allow_broadcast=True)
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 283, in _constant_impl
allow_broadcast=allow_broadcast))
File "/home/kt-ml1/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 562, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [<object_detection.core.matcher.Match object at 0x7f5b089379b0>, <object_detection.core.matcher.Match object at 0x7f5b087b75c0>, <object_detection.core.matcher.Match object at 0x7f5b08626fd0>, <object_detection.core.matcher.Match object at 0x7f5b084a9b00>, <object_detection.core.matcher.Match object at 0x7f5b083302b0>, <object_detection.core.matcher.Match object at 0x7f5b0819a6a0>, <object_detection.core.matcher.Match object at 0x7f5b03fe2710>, <object_detection.core.matcher.Match object at 0x7f5b03e672b0>, <object_detection.core.matcher.Match object at 0x7f5b03cd36a0>, <object_detection.core.matcher.Match object at 0x7f5b03b53710>, <object_detection.core.matcher.Match object at 0x7f5b039d92b0>, <object_detection.core.matcher.Match object at 0x7f5b038456a0>, <object_detection.core.matcher.Match object at 0x7f5b036ca710>, <object_detection.core.matcher.Match object at 0x7f5b0354e2b0>, <object_detection.core.matcher.Match object at 0x7f5b034396a0>, <object_detection.core.matcher.Match object at 0x7f5b032bb710>, <object_detection.core.matcher.Match object at 0x7f5b0313f2b0>, <object_detection.core.matcher.Match object at 0x7f5b02faf6a0>, <object_detection.core.matcher.Match object at 0x7f5b02e315f8>, <object_detection.core.matcher.Match object at 0x7f5b02cb32b0>, <object_detection.core.matcher.Match object at 0x7f5b02b226a0>, <object_detection.core.matcher.Match object at 0x7f5b029a5710>, <object_detection.core.matcher.Match object at 0x7f5b028282b0>, <object_detection.core.matcher.Match object at 0x7f5b026936a0>, <object_detection.core.matcher.Match object at 0x7f5b02518710>, <object_detection.core.matcher.Match object at 0x7f5b0239c2b0>, <object_detection.core.matcher.Match object at 0x7f5b022096a0>, <object_detection.core.matcher.Match object at 0x7f5b0208f710>, <object_detection.core.matcher.Match object at 0x7f5b01f0f2b0>, <object_detection.core.matcher.Match object at 0x7f5b01dfc6a0>, <object_detection.core.matcher.Match object at 0x7f5b01c03710>, <object_detection.core.matcher.Match object at 0x7f5b01a842b0>]. Consider casting elements to a supported type.

PS- I have commented data augmentation option in the config file because it was giving me groundtruth_error

As @ashkanee said, the problem seems to be coming from tensors in the file seq_dataset_builder.py, as the tensors are empty.

Any help would be appreciated.

@tensorflowbutler tensorflowbutler removed the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label May 16, 2019
@mswarrow
Copy link

I think I've found a solution. When trying to understand the scripts, I noticed that keys to features were specified in TFSequenceExampleDecoder and they were different from those in DatasetBuilderTest (seq_dataset_builder_test.py). So, I used a script for creating tf records similar to @wrkgm's first version, but replaced image/object/bbox/... with bbox/... and image/object/class/label with bbox/label/index.

@Aaronreb
Copy link

I think I've found a solution. When trying to understand the scripts, I noticed that keys to features were specified in TFSequenceExampleDecoder and they were different from those in DatasetBuilderTest (seq_dataset_builder_test.py). So, I used a script for creating tf records similar to @wrkgm's first version, but replaced image/object/bbox/... with bbox/... and image/object/class/label with bbox/label/index.

Did you successfully train the model?

@mswarrow
Copy link

Well, not yet. I've just solved this tensorflow.python.framework.errors_impl.InvalidArgumentError and verified that it started to train (iterations shown in tensorboard)

@mswarrow
Copy link

I do plan to train a model on my dataset - shall inform you here of the results

@Aaronreb
Copy link

I do plan to train a model on my dataset - shall inform you here of the results

On what dataset did you make the tfrecords?

@mswarrow
Copy link

It's a specific dataset I use for my project. I can't tell the details, but in general, it is a set of annotated (boxes + labels) images organised in folders - one folder for one sequence. I just take each sequence, split it into snippets of length 4 (as specified in lstm_ssd_mobilenet_v1_imagenet.config) and convert each snippet into TF SequenceExample.

@Aaronreb
Copy link

Aaronreb commented Jun 14, 2019

It's a specific dataset I use for my project. I can't tell the details, but in general, it is a set of annotated (boxes + labels) images organised in folders - one folder for one sequence. I just take each sequence, split it into snippets of length 4 (as specified in lstm_ssd_mobilenet_v1_imagenet.config) and convert each snippet into TF SequenceExample.

Can we use the same tfrecords we used for objectdetection?
Because on using those tfrecords im getting the following issue

tensorflow.python.framework.errors_impl.InvalidArgumentError: Name: , Feature list 'image/encoded' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
[[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]

@mswarrow
Copy link

mswarrow commented Jun 14, 2019

I haven't worked with object detection tfrecords, but I assume they are not SequenceExamples, but just Examples, and they don't have the required feature lists like image/encoded, bbox/xmin, etc.

@Aaronreb
Copy link

I haven't worked with object detection tfrecords, but I assume they are not SequenceExamples, but just Examples, and they don't have the required feature lists like image/encoded, bbox/xmin, etc.

So what do we have to edit in the lstm config file? what do we have to mention in the input path?

@mswarrow
Copy link

I'm not fully sure we can use object detection tfrecords for training lstm object detection. If I'm right the seq_dataset_builder.py script wants SequenceExamples for training. Their length can be configured in lstm config file, but you can't replace them with Examples (and I assume, images in object detection datasets are stored independently as Examples) unless you modify the dataset builder itself. It may happen that I'm wrong, of course, as I don't know the exact object detection tfrecord format :)

@Aaronreb
Copy link

I'm not fully sure we can use object detection tfrecords for training lstm object detection. If I'm right the seq_dataset_builder.py script wants SequenceExamples for training. Their length can be configured in lstm config file, but you can't replace them with Examples (and I assume, images in object detection datasets are stored independently as Examples) unless you modify the dataset builder itself. It may happen that I'm wrong, of course, as I don't know the exact object detection tfrecord format :)

Yeah, I feel i went for wrong approach.
So, what exactly should be "path/to/sequence_example/data" in the config file?

@mswarrow
Copy link

mswarrow commented Jun 14, 2019

I just have a single train.tfrecord file (path/to/train.tfrecord) which is a collection of tf.train.SequenceExample written one after another, each tf.train.SequenceExample has context with features image/format, image/height and image/width, and feature lists with features named image/encoded, bbox/xmin, ..., bbox/ymax and bbox/label/index. See the first tfrecord creating script from @wrkgm post. The only difference is in feature names

@Aaronreb
Copy link

I just have a single train.tfrecord file (path/to/train.tfrecord) which is a collection of tf.train.SequenceExample written one after another, each tf.train.SequenceExample has context with features image/format, image/height and image/width, and feature lists with features named image/encoded, bbox/xmin, ..., bbox/ymax and bbox/label/index. See the first tfrecord creating script from @wrkgm post. The only difference is in feature names

Iam trying to use tfrecords made out of oid dataset.
Do we have to make tfrrecords of vid dataset?

@spaul13
Copy link

spaul13 commented Jun 14, 2019

I have used the pets_example.record from object_detection/test_data/ and also getting the same error. @mswarrow, @Aaronreb can you please send/attach your tfrecord file what u r using for training and evaluation of this lstm model (path/to/sequence_example/data). (e.g., it will be great if you can test how to generate tfrecord files for lstm model).

I have tried @wrkgm code to build the sequential dataset tfrecords in order to train the model but I am getting the following error,

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [All sequence lengths must match, but received lengths: 0 All sequence lengths must match, but received lengths: 0 All sequence lengths must match, but received lengths: 4]
[[{{node batch_sequences_with_states/Assert_2/Assert}}]]

It seems like issue regarding the length mismatch. Can anyone please tell me what parameter to modify to resolve this issue?

@mswarrow @Aaronreb @ashkanee @dreamdragon @whasyt can u please tell me what to add as fine_tune_checkpoint file. I have tried to add "object_detection/test_ckpt/ssd_inception_v2.pb"
but its showing error thats why I have disabled the finetuning but can anyone plz tell me what should be put as the fine_tune_checkpoint file?

Any help will highly be appreciated.

@yuchen2580
Copy link

@yuchen2580
Hi, I attach my total loss graph in the tensorboard below. One reason I guess it is that it seems the input reader does not apply shuffle, so during a period of time, we are training with snippets from the same image sequence. When training with other snippets with different classes, the loss climb back. I am trying to train again with shuffle:true in the config.

total_loss

I will inform you if I solve the problem, and please let me know if you figure anything out :)

@KanaSukita
Hi, I get the same result.
I suspect the parameters in config is not the one they used, though no specific parameters are given in their paper as well.
what is your video length? In paper it says 10, but the config file shows 4.
And also the training procedure seems different in the paper.

@KanaSukita
Copy link

@yuchen2580
Hi, I am also using 4 for video length. I think he parameters in the config are definitely not the ones they applid for the experiment, in the config the width multiplier and resolution are fixed while in the paper they are 1, 0.5 and 320, 256. But I don't think they are the key to our problem

@yuchen2580
Copy link

yuchen2580 commented Jul 9, 2019

@yuchen2580
Hi, I am also using 4 for video length. I think he parameters in the config are definitely not the ones they applid for the experiment, in the config the width multiplier and resolution are fixed while in the paper they are 1, 0.5 and 320, 256. But I don't think they are the key to our problem

@KanaSukita
I'm still trying to check the evaluation code, which it seems originally written in python2.
I have doubt in restoring the weights.

Maybe we should train a mobilenet ssd on VID dataset first, then train from that checkpoint as the paper suggests. And use video sequence of 10 for training the network. But I doubt this is the problem too...

@Shruthi-Sampathkumar
Copy link

Shruthi-Sampathkumar commented Jul 9, 2019

@KanaSukita
Hi, I trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 10000 steps. video_length =4. total loss reaches 0.3-0.8.
The evaluation result is similar to yours.

Another thing I noticed is that somehow if I resume the training in the middle, the loss climb back.
Feels like it cannot restore to the state of previously loaded state.

Perhaps there is something wrong with loading parameters.
How many steps you used for training?
What is the loss when you finish training it?

Did you train on the 86GB 2015 VID data?@KanaSukita. Can you share your conversion to tfrecord script? Also, did you have 'image/format' in sequential features or context features?
I am encountering bus error when I run train.py. I am not sure if it is because of my conversion to tfrecord or my config file changes. Any help is appreciated. Thanks.

@KanaSukita
Copy link

@KanaSukita
Hi, I trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 10000 steps. video_length =4. total loss reaches 0.3-0.8.
The evaluation result is similar to yours.
Another thing I noticed is that somehow if I resume the training in the middle, the loss climb back.
Feels like it cannot restore to the state of previously loaded state.
Perhaps there is something wrong with loading parameters.
How many steps you used for training?
What is the loss when you finish training it?

Did you train on the 86GB 2015 VID data?@KanaSukita. Can you share your conversion to tfrecord script? Also, did you have 'image/format' in sequential features or context features?
I am encountering bus error when I run train.py. I am not sure if it is because of my conversion to tfrecord or my config file changes. Any help is appreciated. Thanks.

Hi @Shruthi-Sampathkumar
I used code based on tf-detectors.

Not sure if it is right.

import os
import io
import glob
import math
import hashlib
import logging
import utils.dataset_util as dataset_util
import tensorflow as tf

from lxml import etree
from PIL import Image

class_dict = {
'n02691156': 1,
'n02419796': 2,
'n02131653': 3,
'n02834778': 4,
'n01503061': 5,
'n02924116': 6,
'n02958343': 7,
'n02402425': 8,
'n02084071': 9,
'n02121808': 10,
'n02503517': 11,
'n02118333': 12,
'n02510455': 13,
'n02342885': 14,
'n02374451': 15,
'n02129165': 16,
'n01674464': 17,
'n02484322': 18,
'n03790512': 19,
'n02324045': 20,
'n02509815': 21,
'n02411705': 22,
'n01726692': 23,
'n02355227': 24,
'n02129604': 25,
'n04468005': 26,
'n01662784': 27,
'n04530566': 28,
'n02062744': 29,
'n02391049': 30
}

flags = tf.app.flags
flags.DEFINE_string('root_dir', '', 'Root directory to raw VID 2015 dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set.')
flags.DEFINE_string('output_path', './data/VID2015', 'Path to output TFRecord')
flags.DEFINE_integer('start_shard', 0, 'Start index of TFRcord files')
flags.DEFINE_integer('num_shards', 10, 'The number of TFRcord files')
flags.DEFINE_integer('num_frames', 4, 'The number of frame to use')
flags.DEFINE_integer('num_examples', -1, 'The number of video to convert to TFRecord file')
FLAGS = flags.FLAGS

SETS = ['train', 'val', 'test']
MAX_INTERVAL = 5

def sample_frames(xml_files):
samples_size = (len(xml_files) - 1) // FLAGS.num_frames + 1
samples = []
for s in range(samples_size):
start = FLAGS.num_frames * s
end = FLAGS.num_frames * (s+1)
sample = xml_files[start:end]
while len(sample) < FLAGS.num_frames:
sample.append(sample[-1])
samples.append(sample)
return samples

def gen_shard(examples_list, annotations_dir, out_filename,
root_dir, _set):
writer = tf.python_io.TFRecordWriter(out_filename)
for indx, example in enumerate(examples_list):
## sample frames
xml_pattern = os.path.join(annotations_dir, example + '/*.xml')
print(xml_pattern)
xml_files = sorted(glob.glob(xml_pattern))
samples = sample_frames(xml_files)
for sample in samples:
dicts = []
for xml_file in sample:
## process per single xml
with tf.gfile.GFile(xml_file, 'r') as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
dic = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
dicts.append(dic)
tf_example = dicts_to_tf_example(dicts, root_dir, _set)
writer.write(tf_example.SerializeToString())
writer.close()
return

def dicts_to_tf_example(dicts, root_dir, _set):
""" Convert XML derived dict to tf.Example proto.
"""
# Non sequential data
folder = dicts[0]['folder']
filenames = [dic['filename'] for dic in dicts]
height = int(dicts[0]['size']['height'])
width = int(dicts[0]['size']['width'])

# Get image paths

imgs_dir = os.path.join(root_dir,
                        'Data/VID/{}'.format(_set),
                        folder)
imgs_path = sorted([os.path.join(imgs_dir, filename) + '.JPEG'
                    for filename in filenames])
        #glob.glob(imgs_dir + '/*.JPEG'))

# Frames Info (image)
filenames = []
encodeds = []
sources = []
keys = []
formats = []
# Frames Info (objects)
xmins, ymins = [], []
xmaxs, ymaxs = [], []
class_indices = []
names = []
occludeds = []
generateds = []

# Iterate frames
for data, img_path in zip(dicts, imgs_path):
           
    ## open single frame
    with tf.gfile.FastGFile(img_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    if image.format != 'JPEG':
        raise ValueError('Image format not JPEG')
    key = hashlib.sha256(encoded_jpg).hexdigest()

    ## validation
    assert int(data['size']['height']) == height
    assert int(data['size']['width']) == width

    ## iterate objects
    xmin, ymin = [], []
    xmax, ymax = [], []
    class_index = []
    name = []
    occluded = []
    generated = []
    if 'object' in data:
        for obj in data['object']:
            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            class_index.append(class_dict[obj['name']])
            name.append(obj['name'].encode('utf8'))
            occluded.append(int(obj['occluded']))
            generated.append(int(obj['generated']))
    '''
    else:
        xmin.append(float(-1))
        ymin.append(float(-1))
        xmax.append(float(-1))
        ymax.append(float(-1))
        class_index.append(0)
        name.append('NoObject'.encode('utf8'))
        occluded.append(0)
        generated.append(0)
    '''
    ## append tf_feature to list
    filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
    encodeds.append(dataset_util.bytes_feature(encoded_jpg))
    sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
    keys.append(dataset_util.bytes_feature(key.encode('utf8')))
    formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
    xmins.append(dataset_util.float_list_feature(xmin))
    ymins.append(dataset_util.float_list_feature(ymin))
    xmaxs.append(dataset_util.float_list_feature(xmax))
    ymaxs.append(dataset_util.float_list_feature(ymax))

    class_indices.append(dataset_util.int64_list_feature(class_index))
    names.append(dataset_util.bytes_list_feature(name))
    occludeds.append(dataset_util.int64_list_feature(occluded))
    generateds.append(dataset_util.int64_list_feature(generated))

# Non sequential features
context = tf.train.Features(feature={
    'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
    'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
    'video/height': dataset_util.int64_feature(height),
    'video/width': dataset_util.int64_feature(width),
    })
# Sequential features
tf_feature_lists = {
    'image/filename': tf.train.FeatureList(feature=filenames),
    'image/encoded': tf.train.FeatureList(feature=encodeds),
    'image/sources': tf.train.FeatureList(feature=sources),
    'image/key/sha256': tf.train.FeatureList(feature=keys),
    'image/format': tf.train.FeatureList(feature=formats),
    'bbox/xmin': tf.train.FeatureList(feature=xmins),
    'bbox/xmax': tf.train.FeatureList(feature=xmaxs),
    'bbox/ymin': tf.train.FeatureList(feature=ymins),
    'bbox/ymax': tf.train.FeatureList(feature=ymaxs),
    'bbox/label/index': tf.train.FeatureList(feature=class_indices),
    'image/object/string': tf.train.FeatureList(feature=names),
    'image/object/occluded': tf.train.FeatureList(feature=occludeds),
    'image/object/generated': tf.train.FeatureList(feature=generateds),
    }
feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
# Make single sequence example
tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)

return tf_example

def main(_):
root_dir = FLAGS.root_dir

if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))

# Read Example list files
logging.info('Reading from VID 2015 dataset. ({})'.format(root_dir))
list_file_pattern = 'ImageSets/VID/{}*.txt'.format(FLAGS.set)
examples_paths = sorted(glob.glob(os.path.join(root_dir, list_file_pattern)))
#print('examples_paths', examples_paths)
examples_list = []
for examples_path in examples_paths:
    examples_list.extend(dataset_util.read_examples_list(examples_path))
if FLAGS.set != 'train':
    examples_list2 = [e[:-7] for e in examples_list]
    examples_list = sorted(list(set(examples_list2)))
if FLAGS.num_examples > 0:
    examples_list = examples_list[:FLAGS.num_examples]
#print('examples_list', examples_list)

# Sharding
start_shard = FLAGS.start_shard
num_shards = FLAGS.num_shards
num_digits = math.ceil(math.log10(max(num_shards-1,2)))
shard_format = '%0'+ ('%d'%num_digits) + 'd'
examples_per_shard = int(math.ceil(len(examples_list)/float(num_shards)))
annotations_dir = os.path.join(root_dir,
                               'Annotations/VID/{}'.format(FLAGS.set))
print('annotations_dir', annotations_dir)
# Generate each shard
for i in range(start_shard, num_shards):
    start = i * examples_per_shard
    end = (i+1) * examples_per_shard
    out_filename = os.path.join(FLAGS.output_path,
            'VID_2015-'+(shard_format % i)+'.tfrecord')
    if os.path.isfile(out_filename): # Don't recreate data if restarting
        continue
    print (str(i)+'of'+str(num_shards)+'['+str(start)+':'+str(end),']'+out_filename)
    gen_shard(examples_list[start:end], annotations_dir, out_filename,
            root_dir, FLAGS.set)
return

if name == 'main':
tf.app.run()

@Shruthi-Sampathkumar
Copy link

@KanaSukita
Hi, I trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 10000 steps. video_length =4. total loss reaches 0.3-0.8.
The evaluation result is similar to yours.
Another thing I noticed is that somehow if I resume the training in the middle, the loss climb back.
Feels like it cannot restore to the state of previously loaded state.
Perhaps there is something wrong with loading parameters.
How many steps you used for training?
What is the loss when you finish training it?

Did you train on the 86GB 2015 VID data?@KanaSukita. Can you share your conversion to tfrecord script? Also, did you have 'image/format' in sequential features or context features?
I am encountering bus error when I run train.py. I am not sure if it is because of my conversion to tfrecord or my config file changes. Any help is appreciated. Thanks.

Hi @Shruthi-Sampathkumar
I used code based on tf-detectors.

Not sure if it is right.

import os
import io
import glob
import math
import hashlib
import logging
import utils.dataset_util as dataset_util
import tensorflow as tf

from lxml import etree
from PIL import Image

class_dict = {
'n02691156': 1,
'n02419796': 2,
'n02131653': 3,
'n02834778': 4,
'n01503061': 5,
'n02924116': 6,
'n02958343': 7,
'n02402425': 8,
'n02084071': 9,
'n02121808': 10,
'n02503517': 11,
'n02118333': 12,
'n02510455': 13,
'n02342885': 14,
'n02374451': 15,
'n02129165': 16,
'n01674464': 17,
'n02484322': 18,
'n03790512': 19,
'n02324045': 20,
'n02509815': 21,
'n02411705': 22,
'n01726692': 23,
'n02355227': 24,
'n02129604': 25,
'n04468005': 26,
'n01662784': 27,
'n04530566': 28,
'n02062744': 29,
'n02391049': 30
}

flags = tf.app.flags
flags.DEFINE_string('root_dir', '', 'Root directory to raw VID 2015 dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set.')
flags.DEFINE_string('output_path', './data/VID2015', 'Path to output TFRecord')
flags.DEFINE_integer('start_shard', 0, 'Start index of TFRcord files')
flags.DEFINE_integer('num_shards', 10, 'The number of TFRcord files')
flags.DEFINE_integer('num_frames', 4, 'The number of frame to use')
flags.DEFINE_integer('num_examples', -1, 'The number of video to convert to TFRecord file')
FLAGS = flags.FLAGS

SETS = ['train', 'val', 'test']
MAX_INTERVAL = 5

def sample_frames(xml_files):
samples_size = (len(xml_files) - 1) // FLAGS.num_frames + 1
samples = []
for s in range(samples_size):
start = FLAGS.num_frames * s
end = FLAGS.num_frames * (s+1)
sample = xml_files[start:end]
while len(sample) < FLAGS.num_frames:
sample.append(sample[-1])
samples.append(sample)
return samples

def gen_shard(examples_list, annotations_dir, out_filename,
root_dir, _set):
writer = tf.python_io.TFRecordWriter(out_filename)
for indx, example in enumerate(examples_list):

sample frames

xml_pattern = os.path.join(annotations_dir, example + '/*.xml')
print(xml_pattern)
xml_files = sorted(glob.glob(xml_pattern))
samples = sample_frames(xml_files)
for sample in samples:
dicts = []
for xml_file in sample:

process per single xml

with tf.gfile.GFile(xml_file, 'r') as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
dic = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
dicts.append(dic)
tf_example = dicts_to_tf_example(dicts, root_dir, _set)
writer.write(tf_example.SerializeToString())
writer.close()
return

def dicts_to_tf_example(dicts, root_dir, _set):
""" Convert XML derived dict to tf.Example proto.
"""

Non sequential data

folder = dicts[0]['folder']
filenames = [dic['filename'] for dic in dicts]
height = int(dicts[0]['size']['height'])
width = int(dicts[0]['size']['width'])

# Get image paths

imgs_dir = os.path.join(root_dir,
                        'Data/VID/{}'.format(_set),
                        folder)
imgs_path = sorted([os.path.join(imgs_dir, filename) + '.JPEG'
                    for filename in filenames])
        #glob.glob(imgs_dir + '/*.JPEG'))

# Frames Info (image)
filenames = []
encodeds = []
sources = []
keys = []
formats = []
# Frames Info (objects)
xmins, ymins = [], []
xmaxs, ymaxs = [], []
class_indices = []
names = []
occludeds = []
generateds = []

# Iterate frames
for data, img_path in zip(dicts, imgs_path):
           
    ## open single frame
    with tf.gfile.FastGFile(img_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    if image.format != 'JPEG':
        raise ValueError('Image format not JPEG')
    key = hashlib.sha256(encoded_jpg).hexdigest()

    ## validation
    assert int(data['size']['height']) == height
    assert int(data['size']['width']) == width

    ## iterate objects
    xmin, ymin = [], []
    xmax, ymax = [], []
    class_index = []
    name = []
    occluded = []
    generated = []
    if 'object' in data:
        for obj in data['object']:
            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            class_index.append(class_dict[obj['name']])
            name.append(obj['name'].encode('utf8'))
            occluded.append(int(obj['occluded']))
            generated.append(int(obj['generated']))
    '''
    else:
        xmin.append(float(-1))
        ymin.append(float(-1))
        xmax.append(float(-1))
        ymax.append(float(-1))
        class_index.append(0)
        name.append('NoObject'.encode('utf8'))
        occluded.append(0)
        generated.append(0)
    '''
    ## append tf_feature to list
    filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
    encodeds.append(dataset_util.bytes_feature(encoded_jpg))
    sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
    keys.append(dataset_util.bytes_feature(key.encode('utf8')))
    formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
    xmins.append(dataset_util.float_list_feature(xmin))
    ymins.append(dataset_util.float_list_feature(ymin))
    xmaxs.append(dataset_util.float_list_feature(xmax))
    ymaxs.append(dataset_util.float_list_feature(ymax))

    class_indices.append(dataset_util.int64_list_feature(class_index))
    names.append(dataset_util.bytes_list_feature(name))
    occludeds.append(dataset_util.int64_list_feature(occluded))
    generateds.append(dataset_util.int64_list_feature(generated))

# Non sequential features
context = tf.train.Features(feature={
    'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
    'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
    'video/height': dataset_util.int64_feature(height),
    'video/width': dataset_util.int64_feature(width),
    })
# Sequential features
tf_feature_lists = {
    'image/filename': tf.train.FeatureList(feature=filenames),
    'image/encoded': tf.train.FeatureList(feature=encodeds),
    'image/sources': tf.train.FeatureList(feature=sources),
    'image/key/sha256': tf.train.FeatureList(feature=keys),
    'image/format': tf.train.FeatureList(feature=formats),
    'bbox/xmin': tf.train.FeatureList(feature=xmins),
    'bbox/xmax': tf.train.FeatureList(feature=xmaxs),
    'bbox/ymin': tf.train.FeatureList(feature=ymins),
    'bbox/ymax': tf.train.FeatureList(feature=ymaxs),
    'bbox/label/index': tf.train.FeatureList(feature=class_indices),
    'image/object/string': tf.train.FeatureList(feature=names),
    'image/object/occluded': tf.train.FeatureList(feature=occludeds),
    'image/object/generated': tf.train.FeatureList(feature=generateds),
    }
feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
# Make single sequence example
tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)

return tf_example

def main(_):
root_dir = FLAGS.root_dir

if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))

# Read Example list files
logging.info('Reading from VID 2015 dataset. ({})'.format(root_dir))
list_file_pattern = 'ImageSets/VID/{}*.txt'.format(FLAGS.set)
examples_paths = sorted(glob.glob(os.path.join(root_dir, list_file_pattern)))
#print('examples_paths', examples_paths)
examples_list = []
for examples_path in examples_paths:
    examples_list.extend(dataset_util.read_examples_list(examples_path))
if FLAGS.set != 'train':
    examples_list2 = [e[:-7] for e in examples_list]
    examples_list = sorted(list(set(examples_list2)))
if FLAGS.num_examples > 0:
    examples_list = examples_list[:FLAGS.num_examples]
#print('examples_list', examples_list)

# Sharding
start_shard = FLAGS.start_shard
num_shards = FLAGS.num_shards
num_digits = math.ceil(math.log10(max(num_shards-1,2)))
shard_format = '%0'+ ('%d'%num_digits) + 'd'
examples_per_shard = int(math.ceil(len(examples_list)/float(num_shards)))
annotations_dir = os.path.join(root_dir,
                               'Annotations/VID/{}'.format(FLAGS.set))
print('annotations_dir', annotations_dir)
# Generate each shard
for i in range(start_shard, num_shards):
    start = i * examples_per_shard
    end = (i+1) * examples_per_shard
    out_filename = os.path.join(FLAGS.output_path,
            'VID_2015-'+(shard_format % i)+'.tfrecord')
    if os.path.isfile(out_filename): # Don't recreate data if restarting
        continue
    print (str(i)+'of'+str(num_shards)+'['+str(start)+':'+str(end),']'+out_filename)
    gen_shard(examples_list[start:end], annotations_dir, out_filename,
            root_dir, FLAGS.set)
return

if name == 'main':
tf.app.run()

Thanks @KanaSukita. I modified my script according to your's. My training is up and running now.

@KanaSukita
Copy link

@KanaSukita
Hi, I trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 10000 steps. video_length =4. total loss reaches 0.3-0.8.
The evaluation result is similar to yours.
Another thing I noticed is that somehow if I resume the training in the middle, the loss climb back.
Feels like it cannot restore to the state of previously loaded state.
Perhaps there is something wrong with loading parameters.
How many steps you used for training?
What is the loss when you finish training it?

Did you train on the 86GB 2015 VID data?@KanaSukita. Can you share your conversion to tfrecord script? Also, did you have 'image/format' in sequential features or context features?
I am encountering bus error when I run train.py. I am not sure if it is because of my conversion to tfrecord or my config file changes. Any help is appreciated. Thanks.

Hi @Shruthi-Sampathkumar
I used code based on tf-detectors.
Not sure if it is right.
import os
import io
import glob
import math
import hashlib
import logging
import utils.dataset_util as dataset_util
import tensorflow as tf
from lxml import etree
from PIL import Image
class_dict = {
'n02691156': 1,
'n02419796': 2,
'n02131653': 3,
'n02834778': 4,
'n01503061': 5,
'n02924116': 6,
'n02958343': 7,
'n02402425': 8,
'n02084071': 9,
'n02121808': 10,
'n02503517': 11,
'n02118333': 12,
'n02510455': 13,
'n02342885': 14,
'n02374451': 15,
'n02129165': 16,
'n01674464': 17,
'n02484322': 18,
'n03790512': 19,
'n02324045': 20,
'n02509815': 21,
'n02411705': 22,
'n01726692': 23,
'n02355227': 24,
'n02129604': 25,
'n04468005': 26,
'n01662784': 27,
'n04530566': 28,
'n02062744': 29,
'n02391049': 30
}
flags = tf.app.flags
flags.DEFINE_string('root_dir', '', 'Root directory to raw VID 2015 dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set.')
flags.DEFINE_string('output_path', './data/VID2015', 'Path to output TFRecord')
flags.DEFINE_integer('start_shard', 0, 'Start index of TFRcord files')
flags.DEFINE_integer('num_shards', 10, 'The number of TFRcord files')
flags.DEFINE_integer('num_frames', 4, 'The number of frame to use')
flags.DEFINE_integer('num_examples', -1, 'The number of video to convert to TFRecord file')
FLAGS = flags.FLAGS
SETS = ['train', 'val', 'test']
MAX_INTERVAL = 5
def sample_frames(xml_files):
samples_size = (len(xml_files) - 1) // FLAGS.num_frames + 1
samples = []
for s in range(samples_size):
start = FLAGS.num_frames * s
end = FLAGS.num_frames * (s+1)
sample = xml_files[start:end]
while len(sample) < FLAGS.num_frames:
sample.append(sample[-1])
samples.append(sample)
return samples
def gen_shard(examples_list, annotations_dir, out_filename,
root_dir, _set):
writer = tf.python_io.TFRecordWriter(out_filename)
for indx, example in enumerate(examples_list):

sample frames

xml_pattern = os.path.join(annotations_dir, example + '/*.xml')
print(xml_pattern)
xml_files = sorted(glob.glob(xml_pattern))
samples = sample_frames(xml_files)
for sample in samples:
dicts = []
for xml_file in sample:

process per single xml

with tf.gfile.GFile(xml_file, 'r') as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
dic = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
dicts.append(dic)
tf_example = dicts_to_tf_example(dicts, root_dir, _set)
writer.write(tf_example.SerializeToString())
writer.close()
return
def dicts_to_tf_example(dicts, root_dir, _set):
""" Convert XML derived dict to tf.Example proto.
"""

Non sequential data

folder = dicts[0]['folder']
filenames = [dic['filename'] for dic in dicts]
height = int(dicts[0]['size']['height'])
width = int(dicts[0]['size']['width'])

# Get image paths

imgs_dir = os.path.join(root_dir,
                        'Data/VID/{}'.format(_set),
                        folder)
imgs_path = sorted([os.path.join(imgs_dir, filename) + '.JPEG'
                    for filename in filenames])
        #glob.glob(imgs_dir + '/*.JPEG'))

# Frames Info (image)
filenames = []
encodeds = []
sources = []
keys = []
formats = []
# Frames Info (objects)
xmins, ymins = [], []
xmaxs, ymaxs = [], []
class_indices = []
names = []
occludeds = []
generateds = []

# Iterate frames
for data, img_path in zip(dicts, imgs_path):
           
    ## open single frame
    with tf.gfile.FastGFile(img_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    if image.format != 'JPEG':
        raise ValueError('Image format not JPEG')
    key = hashlib.sha256(encoded_jpg).hexdigest()

    ## validation
    assert int(data['size']['height']) == height
    assert int(data['size']['width']) == width

    ## iterate objects
    xmin, ymin = [], []
    xmax, ymax = [], []
    class_index = []
    name = []
    occluded = []
    generated = []
    if 'object' in data:
        for obj in data['object']:
            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            class_index.append(class_dict[obj['name']])
            name.append(obj['name'].encode('utf8'))
            occluded.append(int(obj['occluded']))
            generated.append(int(obj['generated']))
    '''
    else:
        xmin.append(float(-1))
        ymin.append(float(-1))
        xmax.append(float(-1))
        ymax.append(float(-1))
        class_index.append(0)
        name.append('NoObject'.encode('utf8'))
        occluded.append(0)
        generated.append(0)
    '''
    ## append tf_feature to list
    filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
    encodeds.append(dataset_util.bytes_feature(encoded_jpg))
    sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
    keys.append(dataset_util.bytes_feature(key.encode('utf8')))
    formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
    xmins.append(dataset_util.float_list_feature(xmin))
    ymins.append(dataset_util.float_list_feature(ymin))
    xmaxs.append(dataset_util.float_list_feature(xmax))
    ymaxs.append(dataset_util.float_list_feature(ymax))

    class_indices.append(dataset_util.int64_list_feature(class_index))
    names.append(dataset_util.bytes_list_feature(name))
    occludeds.append(dataset_util.int64_list_feature(occluded))
    generateds.append(dataset_util.int64_list_feature(generated))

# Non sequential features
context = tf.train.Features(feature={
    'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
    'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
    'video/height': dataset_util.int64_feature(height),
    'video/width': dataset_util.int64_feature(width),
    })
# Sequential features
tf_feature_lists = {
    'image/filename': tf.train.FeatureList(feature=filenames),
    'image/encoded': tf.train.FeatureList(feature=encodeds),
    'image/sources': tf.train.FeatureList(feature=sources),
    'image/key/sha256': tf.train.FeatureList(feature=keys),
    'image/format': tf.train.FeatureList(feature=formats),
    'bbox/xmin': tf.train.FeatureList(feature=xmins),
    'bbox/xmax': tf.train.FeatureList(feature=xmaxs),
    'bbox/ymin': tf.train.FeatureList(feature=ymins),
    'bbox/ymax': tf.train.FeatureList(feature=ymaxs),
    'bbox/label/index': tf.train.FeatureList(feature=class_indices),
    'image/object/string': tf.train.FeatureList(feature=names),
    'image/object/occluded': tf.train.FeatureList(feature=occludeds),
    'image/object/generated': tf.train.FeatureList(feature=generateds),
    }
feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
# Make single sequence example
tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)

return tf_example

def main(_):
root_dir = FLAGS.root_dir

if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))

# Read Example list files
logging.info('Reading from VID 2015 dataset. ({})'.format(root_dir))
list_file_pattern = 'ImageSets/VID/{}*.txt'.format(FLAGS.set)
examples_paths = sorted(glob.glob(os.path.join(root_dir, list_file_pattern)))
#print('examples_paths', examples_paths)
examples_list = []
for examples_path in examples_paths:
    examples_list.extend(dataset_util.read_examples_list(examples_path))
if FLAGS.set != 'train':
    examples_list2 = [e[:-7] for e in examples_list]
    examples_list = sorted(list(set(examples_list2)))
if FLAGS.num_examples > 0:
    examples_list = examples_list[:FLAGS.num_examples]
#print('examples_list', examples_list)

# Sharding
start_shard = FLAGS.start_shard
num_shards = FLAGS.num_shards
num_digits = math.ceil(math.log10(max(num_shards-1,2)))
shard_format = '%0'+ ('%d'%num_digits) + 'd'
examples_per_shard = int(math.ceil(len(examples_list)/float(num_shards)))
annotations_dir = os.path.join(root_dir,
                               'Annotations/VID/{}'.format(FLAGS.set))
print('annotations_dir', annotations_dir)
# Generate each shard
for i in range(start_shard, num_shards):
    start = i * examples_per_shard
    end = (i+1) * examples_per_shard
    out_filename = os.path.join(FLAGS.output_path,
            'VID_2015-'+(shard_format % i)+'.tfrecord')
    if os.path.isfile(out_filename): # Don't recreate data if restarting
        continue
    print (str(i)+'of'+str(num_shards)+'['+str(start)+':'+str(end),']'+out_filename)
    gen_shard(examples_list[start:end], annotations_dir, out_filename,
            root_dir, FLAGS.set)
return

if name == 'main':
tf.app.run()

Thanks @KanaSukita. I modified my script according to your's. My training is up and running now.

Hi @Shruthi-Sampathkumar , did you successfully train the model?

@Shruthi-Sampathkumar
Copy link

@KanaSukita
Hi, I trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 10000 steps. video_length =4. total loss reaches 0.3-0.8.
The evaluation result is similar to yours.
Another thing I noticed is that somehow if I resume the training in the middle, the loss climb back.
Feels like it cannot restore to the state of previously loaded state.
Perhaps there is something wrong with loading parameters.
How many steps you used for training?
What is the loss when you finish training it?

Did you train on the 86GB 2015 VID data?@KanaSukita. Can you share your conversion to tfrecord script? Also, did you have 'image/format' in sequential features or context features?
I am encountering bus error when I run train.py. I am not sure if it is because of my conversion to tfrecord or my config file changes. Any help is appreciated. Thanks.

Hi @Shruthi-Sampathkumar
I used code based on tf-detectors.
Not sure if it is right.
import os
import io
import glob
import math
import hashlib
import logging
import utils.dataset_util as dataset_util
import tensorflow as tf
from lxml import etree
from PIL import Image
class_dict = {
'n02691156': 1,
'n02419796': 2,
'n02131653': 3,
'n02834778': 4,
'n01503061': 5,
'n02924116': 6,
'n02958343': 7,
'n02402425': 8,
'n02084071': 9,
'n02121808': 10,
'n02503517': 11,
'n02118333': 12,
'n02510455': 13,
'n02342885': 14,
'n02374451': 15,
'n02129165': 16,
'n01674464': 17,
'n02484322': 18,
'n03790512': 19,
'n02324045': 20,
'n02509815': 21,
'n02411705': 22,
'n01726692': 23,
'n02355227': 24,
'n02129604': 25,
'n04468005': 26,
'n01662784': 27,
'n04530566': 28,
'n02062744': 29,
'n02391049': 30
}
flags = tf.app.flags
flags.DEFINE_string('root_dir', '', 'Root directory to raw VID 2015 dataset.')
flags.DEFINE_string('set', 'train', 'Convert training set, validation set.')
flags.DEFINE_string('output_path', './data/VID2015', 'Path to output TFRecord')
flags.DEFINE_integer('start_shard', 0, 'Start index of TFRcord files')
flags.DEFINE_integer('num_shards', 10, 'The number of TFRcord files')
flags.DEFINE_integer('num_frames', 4, 'The number of frame to use')
flags.DEFINE_integer('num_examples', -1, 'The number of video to convert to TFRecord file')
FLAGS = flags.FLAGS
SETS = ['train', 'val', 'test']
MAX_INTERVAL = 5
def sample_frames(xml_files):
samples_size = (len(xml_files) - 1) // FLAGS.num_frames + 1
samples = []
for s in range(samples_size):
start = FLAGS.num_frames * s
end = FLAGS.num_frames * (s+1)
sample = xml_files[start:end]
while len(sample) < FLAGS.num_frames:
sample.append(sample[-1])
samples.append(sample)
return samples
def gen_shard(examples_list, annotations_dir, out_filename,
root_dir, _set):
writer = tf.python_io.TFRecordWriter(out_filename)
for indx, example in enumerate(examples_list):

sample frames

xml_pattern = os.path.join(annotations_dir, example + '/*.xml')
print(xml_pattern)
xml_files = sorted(glob.glob(xml_pattern))
samples = sample_frames(xml_files)
for sample in samples:
dicts = []
for xml_file in sample:

process per single xml

with tf.gfile.GFile(xml_file, 'r') as fid:
xml_str = fid.read()
xml = etree.fromstring(xml_str)
dic = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
dicts.append(dic)
tf_example = dicts_to_tf_example(dicts, root_dir, _set)
writer.write(tf_example.SerializeToString())
writer.close()
return
def dicts_to_tf_example(dicts, root_dir, _set):
""" Convert XML derived dict to tf.Example proto.
"""

Non sequential data

folder = dicts[0]['folder']
filenames = [dic['filename'] for dic in dicts]
height = int(dicts[0]['size']['height'])
width = int(dicts[0]['size']['width'])

# Get image paths

imgs_dir = os.path.join(root_dir,
                        'Data/VID/{}'.format(_set),
                        folder)
imgs_path = sorted([os.path.join(imgs_dir, filename) + '.JPEG'
                    for filename in filenames])
        #glob.glob(imgs_dir + '/*.JPEG'))

# Frames Info (image)
filenames = []
encodeds = []
sources = []
keys = []
formats = []
# Frames Info (objects)
xmins, ymins = [], []
xmaxs, ymaxs = [], []
class_indices = []
names = []
occludeds = []
generateds = []

# Iterate frames
for data, img_path in zip(dicts, imgs_path):
           
    ## open single frame
    with tf.gfile.FastGFile(img_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    if image.format != 'JPEG':
        raise ValueError('Image format not JPEG')
    key = hashlib.sha256(encoded_jpg).hexdigest()

    ## validation
    assert int(data['size']['height']) == height
    assert int(data['size']['width']) == width

    ## iterate objects
    xmin, ymin = [], []
    xmax, ymax = [], []
    class_index = []
    name = []
    occluded = []
    generated = []
    if 'object' in data:
        for obj in data['object']:
            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            class_index.append(class_dict[obj['name']])
            name.append(obj['name'].encode('utf8'))
            occluded.append(int(obj['occluded']))
            generated.append(int(obj['generated']))
    '''
    else:
        xmin.append(float(-1))
        ymin.append(float(-1))
        xmax.append(float(-1))
        ymax.append(float(-1))
        class_index.append(0)
        name.append('NoObject'.encode('utf8'))
        occluded.append(0)
        generated.append(0)
    '''
    ## append tf_feature to list
    filenames.append(dataset_util.bytes_feature(data['filename'].encode('utf8')))
    encodeds.append(dataset_util.bytes_feature(encoded_jpg))
    sources.append(dataset_util.bytes_feature(data['source']['database'].encode('utf8')))
    keys.append(dataset_util.bytes_feature(key.encode('utf8')))
    formats.append(dataset_util.bytes_feature('jpeg'.encode('utf8')))
    xmins.append(dataset_util.float_list_feature(xmin))
    ymins.append(dataset_util.float_list_feature(ymin))
    xmaxs.append(dataset_util.float_list_feature(xmax))
    ymaxs.append(dataset_util.float_list_feature(ymax))

    class_indices.append(dataset_util.int64_list_feature(class_index))
    names.append(dataset_util.bytes_list_feature(name))
    occludeds.append(dataset_util.int64_list_feature(occluded))
    generateds.append(dataset_util.int64_list_feature(generated))

# Non sequential features
context = tf.train.Features(feature={
    'video/folder': dataset_util.bytes_feature(folder.encode('utf8')),
    'video/frame_number': dataset_util.int64_feature(len(imgs_path)),
    'video/height': dataset_util.int64_feature(height),
    'video/width': dataset_util.int64_feature(width),
    })
# Sequential features
tf_feature_lists = {
    'image/filename': tf.train.FeatureList(feature=filenames),
    'image/encoded': tf.train.FeatureList(feature=encodeds),
    'image/sources': tf.train.FeatureList(feature=sources),
    'image/key/sha256': tf.train.FeatureList(feature=keys),
    'image/format': tf.train.FeatureList(feature=formats),
    'bbox/xmin': tf.train.FeatureList(feature=xmins),
    'bbox/xmax': tf.train.FeatureList(feature=xmaxs),
    'bbox/ymin': tf.train.FeatureList(feature=ymins),
    'bbox/ymax': tf.train.FeatureList(feature=ymaxs),
    'bbox/label/index': tf.train.FeatureList(feature=class_indices),
    'image/object/string': tf.train.FeatureList(feature=names),
    'image/object/occluded': tf.train.FeatureList(feature=occludeds),
    'image/object/generated': tf.train.FeatureList(feature=generateds),
    }
feature_lists = tf.train.FeatureLists(feature_list=tf_feature_lists)
# Make single sequence example
tf_example = tf.train.SequenceExample(context=context, feature_lists=feature_lists)

return tf_example

def main(_):
root_dir = FLAGS.root_dir

if FLAGS.set not in SETS:
    raise ValueError('set must be in : {}'.format(SETS))

# Read Example list files
logging.info('Reading from VID 2015 dataset. ({})'.format(root_dir))
list_file_pattern = 'ImageSets/VID/{}*.txt'.format(FLAGS.set)
examples_paths = sorted(glob.glob(os.path.join(root_dir, list_file_pattern)))
#print('examples_paths', examples_paths)
examples_list = []
for examples_path in examples_paths:
    examples_list.extend(dataset_util.read_examples_list(examples_path))
if FLAGS.set != 'train':
    examples_list2 = [e[:-7] for e in examples_list]
    examples_list = sorted(list(set(examples_list2)))
if FLAGS.num_examples > 0:
    examples_list = examples_list[:FLAGS.num_examples]
#print('examples_list', examples_list)

# Sharding
start_shard = FLAGS.start_shard
num_shards = FLAGS.num_shards
num_digits = math.ceil(math.log10(max(num_shards-1,2)))
shard_format = '%0'+ ('%d'%num_digits) + 'd'
examples_per_shard = int(math.ceil(len(examples_list)/float(num_shards)))
annotations_dir = os.path.join(root_dir,
                               'Annotations/VID/{}'.format(FLAGS.set))
print('annotations_dir', annotations_dir)
# Generate each shard
for i in range(start_shard, num_shards):
    start = i * examples_per_shard
    end = (i+1) * examples_per_shard
    out_filename = os.path.join(FLAGS.output_path,
            'VID_2015-'+(shard_format % i)+'.tfrecord')
    if os.path.isfile(out_filename): # Don't recreate data if restarting
        continue
    print (str(i)+'of'+str(num_shards)+'['+str(start)+':'+str(end),']'+out_filename)
    gen_shard(examples_list[start:end], annotations_dir, out_filename,
            root_dir, FLAGS.set)
return

if name == 'main':
tf.app.run()

Thanks @KanaSukita. I modified my script according to your's. My training is up and running now.

Hi @Shruthi-Sampathkumar , did you successfully train the model?

I was able to start the training, but the loss is stuck at 0.2955 @KanaSukita. I am not able to figure out the issue here. Maybe it is due to the num_classes. I have two non-background object classes. Should I include labels as 0 and 1 or 1 and 2 ?

@Shruthi-Sampathkumar
Copy link

Do you sharing your config parameters such as shuffle_buffer_size, queue_capacity, prefetch_size, min_after_dequeue along with unroll length and video length? I think I would need to modify those. My network is not learning. @KanaSukita @yuchen2580

@KanaSukita
Copy link

Do you sharing your config parameters such as shuffle_buffer_size, queue_capacity, prefetch_size, min_after_dequeue along with unroll length and video length? I think I would need to modify those. My network is not learning. @KanaSukita @yuchen2580

My config didn't get my network learning either.

@Shruthi-Sampathkumar
Copy link

Do you sharing your config parameters such as shuffle_buffer_size, queue_capacity, prefetch_size, min_after_dequeue along with unroll length and video length? I think I would need to modify those. My network is not learning. @KanaSukita @yuchen2580

My config didn't get my network learning either.

Should we convert labels to one hot encoding before converting to tfrecords? @KanaSukita @yuchen2580.

@Shruthi-Sampathkumar
Copy link

Shruthi-Sampathkumar commented Jul 23, 2019

My num_classes in config is 2 (I have two non-background classes). My input tfrecrd contains class indices : 0(for background), 1 and 2. I have enabled add_background_class. I am getting shape mismatch error between logits and labels : logits are (10,324,3) and labels are (10,324,2). May I know where I am going wrong? I am not able to understand why my target tensor is getting fed into the network as 2D. Thanks in advance for the help.

@rrtaylor
Copy link
Contributor

rrtaylor commented Aug 2, 2019

My num_classes in config is 2 (I have two non-background classes). My input tfrecrd contains class indices : 0(for background), 1 and 2. I have enabled add_background_class. I am getting shape mismatch error between logits and labels : logits are (10,324,3) and labels are (10,324,2). May I know where I am going wrong? I am not able to understand why my target tensor is getting fed into the network as 2D. Thanks in advance for the help.

It sounds like your model is assuming that there is a background class, but your labelmap does not. Are you configuring your labels such that it assumes that there is a background class (i.e. class index 0)?

@poltavski
Copy link

@KanaSukita @yuchen2580 thank you for providing the information
Did your results on training the model is adequate to use? If so, did you train your models from scratch like on VID 2015 or you used pretrained ssd layers as they mentioned in the paper?
If results is not appropriate for use, what other approaches or models you head on to?

@gtfaiwxm
Copy link

@KanaSukita @yuchen2580 I also trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 200000 steps. video_length =4. total loss reaches 0.8-1.2.
The evaluation result is similar to yours. What's more, I eval the train_data, but the result is
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003
I0823 11:58:52.690257 139904859080448 eval_util.py:80] Writing metrics to tf summary.
I0823 11:58:52.690582 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP: 0.000172
I0823 11:58:52.690917 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (large): 0.000198
I0823 11:58:52.691135 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (medium): 0.000000
I0823 11:58:52.691316 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (small): 0.000000
I0823 11:58:52.691494 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP@.50IOU: 0.001100
I0823 11:58:52.691670 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP@.75IOU: 0.000010
I0823 11:58:52.691846 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@1: 0.000519
I0823 11:58:52.692019 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@10: 0.000519
I0823 11:58:52.692484 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100: 0.000519
I0823 11:58:52.692668 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (large): 0.002523
I0823 11:58:52.692843 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (medium): 0.000000
I0823 11:58:52.693013 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (small): 0.000000. Have you solved this problem? How to slove this problem?

@yuchen2580
Copy link

@poltavski @gtfaiwxm @Shruthi-Sampathkumar @KanaSukita
Sorry for the late reply. I did not make further progress on this repo. The code needs to be examined carefully...

FYI, if you are not using it for research, you can use another video detector to boost the speed as well as maintaining accuracy.
https://github.com/msracver/Deep-Feature-Flow

@gtfaiwxm
Copy link

@yuchen2580 Hello, I want to ask when can you complete the code check?

@evgps
Copy link

evgps commented Sep 5, 2019

@KanaSukita @yuchen2580 I also trained the model on VID 2015 as well. The result is similar to yours.
I trained it for num_stpes = 200000 steps. video_length =4. total loss reaches 0.8-1.2.
The evaluation result is similar to yours. What's more, I eval the train_data, but the result is
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003
I0823 11:58:52.690257 139904859080448 eval_util.py:80] Writing metrics to tf summary.
I0823 11:58:52.690582 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP: 0.000172
I0823 11:58:52.690917 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (large): 0.000198
I0823 11:58:52.691135 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (medium): 0.000000
I0823 11:58:52.691316 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP (small): 0.000000
I0823 11:58:52.691494 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP@.50IOU: 0.001100
I0823 11:58:52.691670 139904859080448 eval_util.py:87] DetectionBoxes_Precision/mAP@.75IOU: 0.000010
I0823 11:58:52.691846 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@1: 0.000519
I0823 11:58:52.692019 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@10: 0.000519
I0823 11:58:52.692484 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100: 0.000519
I0823 11:58:52.692668 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (large): 0.002523
I0823 11:58:52.692843 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (medium): 0.000000
I0823 11:58:52.693013 139904859080448 eval_util.py:87] DetectionBoxes_Recall/AR@100 (small): 0.000000. Have you solved this problem? How to slove this problem?

@yuchen2580, @gtfaiwxm, @KanaSukita, Can you please share your conversion script for Imagenet-VID?

@Enlistedman
Copy link

@yuchen2580
Hi, I attach my total loss graph in the tensorboard below. One reason I guess it is that it seems the input reader does not apply shuffle, so during a period of time, we are training with snippets from the same image sequence. When training with other snippets with different classes, the loss climb back. I am trying to train again with shuffle:true in the config.
total_loss
I will inform you if I solve the problem, and please let me know if you figure anything out :)

@KanaSukita
Hi, I get the same result.
I suspect the parameters in config is not the one they used, though no specific parameters are given in their paper as well.
what is your video length? In paper it says 10, but the config file shows 4.
And also the training procedure seems different in the paper.

@yuchen2580
Hi,i meet the same problem.
Do you have tried to adjust the learning rate?
Any help would be appreciated.

@wangpichao
Copy link

@Enlistedman Hi, I meet the same problem, with the loss not converge. Have you figured out the problem? Any help would be appreciated.

@Enlistedman
Copy link

@Enlistedman Hi, I meet the same problem, with the loss not converge. Have you figured out the problem? Any help would be appreciated.

Sorry for the late reply. I did not solve this problem.Do you have any progress?

@wangpichao
Copy link

@Enlistedman Hi, I meet the same problem, with the loss not converge. Have you figured out the problem? Any help would be appreciated.

Sorry for the late reply. I did not solve this problem.Do you have any progress?

Not yet

@Enlistedman
Copy link

@yuchen2580 @KanaSukita Did you successfully train the model? Any help would be appreciated.

@UmarSpa
Copy link

UmarSpa commented Feb 17, 2020

Screenshot from 2020-02-17 10-20-27
My loss doesn't converge as well. Anyone able to train this model ?

@petinhoss7
Copy link

can you @gtfaiwxm or anyone that managed to train the model share the your labelMap for imagenetVID, I have issues when evaluationg because of the format. Thanks

@amiiiirrrr
Copy link

amiiiirrrr commented May 5, 2020

hi
i faced to the problem that model was not converge... after that i decided to train model (lstm ssd mobilenet v2) on part of the data about 2069 image (60 mb). and i saw the model converged. in the ariticle "Looking Fast and Slow: Memory-Guided Mobile Video Object Detection." the author mentioned that they used a pretrained network on imagenet classification dataset as initializing.
overfit
overfit (1)

the network overfit on this small dataset that shows there is not problem in tfrecords...

@uvipen
Copy link

uvipen commented Jun 18, 2020

@amirzzzz Hi, could you pls share code you use for creating tfrecord from VID 2015, and your label_map? Thank you in advance

@meikorol
Copy link

can you tell me how to train lstm object detection code?I have no clue how to train it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:research models that come under research directory
Projects
None yet
Development

No branches or pull requests