Training object detection model with multiple training sets #3031

izzrak · 2017-12-20T08:32:58Z

I didn't find any description in the document shows I can assign multiple input path. Is there any method to train a model with two or more datasets without converting them into one big tfrecords file?

monomon · 2017-12-20T08:45:42Z

+1, I was planning on building a script for this, but didn't think to ask if it's built-in already.

FightForCS · 2017-12-25T06:36:43Z

+1, I want to train detection model with more than 1 tfrecord
_, string_tensor = parallel_reader.parallel_read( config.input_path, reader_class=tf.TFRecordReader, num_epochs=(input_reader_config.num_epochs if input_reader_config.num_epochs else None), num_readers=input_reader_config.num_readers, shuffle=input_reader_config.shuffle, dtypes=[tf.string, tf.string], capacity=input_reader_config.queue_capacity, min_after_dequeue=input_reader_config.min_after_dequeue)
this line of code seems to read only one tfrecord, maybe add a loop to read multiply tfrecord will help.

izzrak · 2017-12-26T07:53:09Z

@FightForCS I know there is a way to read a series of tfrecords, you can take a look at the examples in the slim folder. It is possible to load a list of the file path by using slim.dataset.Dataset, but you may need to rewrite the script you are using.

byungjae89 · 2017-12-27T05:11:40Z

You can simply assign list of the file path by changing config file

from

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

to

train_input_reader: {
  tf_record_input_reader {
    input_path: ["PATH_TO_BE_CONFIGURED/train_a.record", 
                 "PATH_TO_BE_CONFIGURED/train_b.record"]
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

this change may only work when multiple tfrecord files use the same label_map.

izzrak · 2017-12-28T03:26:21Z

This issue is closed since I found answers in the code

The object detection API use parallel reader to import your dataset, here are the comments by the developer

Usage:
      data_sources = ['path_to/train*']
      key, value = parallel_read(data_sources, tf.CSVReader, num_readers=4)

  Args:
    data_sources: a list/tuple of files or the location of the data, i.e.
      /path/to/train@128, /path/to/train* or /tmp/.../train*

So basically you can define a list of input path as @byungjae89 mentioned above, or simply provide the input directory like

input_path: my_dataset/train/*

The reader will read the entire folder for you.

willSapgreen · 2018-05-21T18:37:36Z

Hello @izzrak and @byungjae89 ,

Thank you for sharing the approach to train multiple tfrecords.

However,

Do you confirm that the model really trains with those tfrecords, not just the first one?

I follow @byungjae89 's approach to add three tfrecords in the config file,

and intentionally put the incorrect names for the 2nd and 3rd tfrecords. ( I use the correct name for the 1st tfrecord )

The whole training(200k iteration) completes without any problem.

But the error will pop up immediately if I put the incorrect name for the 1st tfrecord.

It seems that only the 1st tfrecord is used in training.

I am working on visualizing the training input image to confirm my suspicion.

Please let me know if you have the similar experience.

Thank you.

CasiaFan · 2018-07-16T08:16:25Z

If using protoc 3.5, import multiple tfrecord files in following format:

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/tfrecord.1"
    input_path: "/path/to/tfrecord.2"
  }
  label_map_path: "/path/to/label_map.pbtxt"
}

failure-to-thrive · 2018-10-18T16:47:06Z

@willSapgreen, "surprisingly", those input_path are not exact paths, they are PATTERNS. Having scrutinized the tf.gfile.Glob function at https://github.com/tensorflow/models/blob/master/research/object_detection/builders/dataset_builder.py#L61 revealed that you will get NotFound error only if you put something incorrect in the folder's part of the path. Something incorrect in the file's part of the path will be accepted and an empty value will be returned. I.e.:
incorrect_folder/correct_file -> NotFound
correct_folder/incorrect_file -> OK
Of course if you have none of right input_path then you will anyway end up with some other error down the road.
Howhever, if you have at least one right input_path, chances are some mistypes in file's parts of others will be silently skipped!
Be careful!

kevin-apl · 2019-02-25T05:47:43Z

Hi @izzrak,

Could you specify where you found this documentation please?

Thanks,
Kevin

This issue is closed since I found answers in the code

The object detection API use parallel reader to import your dataset, here are the comments by the developer
Usage:
      data_sources = ['path_to/train*']
      key, value = parallel_read(data_sources, tf.CSVReader, num_readers=4)

  Args:
    data_sources: a list/tuple of files or the location of the data, i.e.
      /path/to/train@128, /path/to/train* or /tmp/.../train*
So basically you can define a list of input path as @byungjae89 mentioned above, or simply provide the input directory like

input_path: my_dataset/train/*

The reader will read the entire folder for you.

gzchenjiajun · 2019-11-12T05:15:20Z

When filling out the tf_record_input_reader parameter, can you specify a directory and then fill in the file rules? Fill in one of the sub-files, it feels too time consuming

@CasiaFan @izzrak

CasiaFan · 2019-11-13T06:33:29Z

If you want to read all the record files under a directory and these files are ended with the suffix record, the input configuration could be written like:

tf_record_input_reader {
    input_path: "/path/to/*.tfrecord"
  }

@gzchenjiajun

gzchenjiajun · 2019-11-15T03:51:26Z

Now generate more tfrecord and config loading multiple tfrecord has been solved, but found that it does not help the memory (still OOM, I thought that more tfrecord can reduce the memory usage, increase the batch size parameters), I would like to ask How should I handle it?

@CasiaFan @kevin-apl @failure-to-thrive @willSapgreen

CasiaFan · 2019-11-17T04:36:36Z

OMM is mainly caused by large input batch, rather than the number of tfrecords. The tfrecords only provide the data source for training and evaluating. If you don't want to reduce the batch size, add more GPUs cards or try a smaller input size. gzchenjiajun <notifications@github.com> 于2019年11月15日周五上午11:54写道：

…

Now generate more tfrecord and config loading multiple tfrecord has been solved, but found that it does not help the memory (still OOM, I thought that more tfrecord can reduce the memory usage, increase the batch size parameters), I would like to ask How should I handle it? @CasiaFan <https://github.com/CasiaFan> @kevin-apl <https://github.com/kevin-apl> @failure-to-thrive <https://github.com/failure-to-thrive> @willSapgreen <https://github.com/willSapgreen> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3031?email_source=notifications&email_token=ACQ6CWGC7JGFCDF5LBWZPZ3QTYMO5A5CNFSM4EJAHJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEG25I#issuecomment-554200437>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQ6CWG3NVKFFCKXVDTTDMTQTYMO5ANCNFSM4EJAHJCQ> .

gzchenjiajun · 2019-11-18T05:29:48Z

OMM is mainly caused by large input batch, rather than the number of tfrecords. The tfrecords only provide the data source for training and evaluating. If you don't want to reduce the batch size, add more GPUs cards or try a smaller input size. gzchenjiajun notifications@github.com 于2019年11月15日周五上午11:54写道：
…
Now generate more tfrecord and config loading multiple tfrecord has been solved, but found that it does not help the memory (still OOM, I thought that more tfrecord can reduce the memory usage, increase the batch size parameters), I would like to ask How should I handle it? @CasiaFan https://github.com/CasiaFan @kevin-apl https://github.com/kevin-apl @failure-to-thrive https://github.com/failure-to-thrive @willSapgreen https://github.com/willSapgreen — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3031?email_source=notifications&email_token=ACQ6CWGC7JGFCDF5LBWZPZ3QTYMO5A5CNFSM4EJAHJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEG25I#issuecomment-554200437>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ6CWG3NVKFFCKXVDTTDMTQTYMO5ANCNFSM4EJAHJCQ .

Meaning that when the hardware reaches the performance boundary, whether it is a tfrecord or multiple tfrecords, the batch size can no longer be expanded?
Or is there a way for multiple tfrecords to achieve a larger batch size?
For example, loading tfrecord in batches does not help memory? Is tensorflow/tensorflow officially supported? I have not found a lot of information.

gzchenjiajun · 2019-11-18T05:30:35Z

@CasiaFan

Meaning that when the hardware reaches the performance boundary, whether it is a tfrecord or multiple tfrecords, the batch size can no longer be expanded?
Or is there a way for multiple tfrecords to achieve a larger batch size?
For example, loading tfrecord in batches does not help memory? Is tensorflow/tensorflow officially supported? I have not found a lot of information.

CasiaFan · 2019-11-19T15:11:29Z

Yep, tf.data api would operate the input data stream from the tfrecord file. Even multiple tfrecords are provided, the input batch size would be set during reading based on hardware and model configuration, having nothing to do with the number of tfrecords. As for the reason to split the datasets into shards, I think this post may help you. @gzchenjiajun

gzchenjiajun · 2019-11-21T03:02:22Z

Ok, that means splitting multiple tfrecords and improving batch_size doesn't help.
Then I have two more questions:

In addition to directly upgrading hardware / similar to semi-precision reasoning, how can I improve batch_size?
Split tfrecord does not help, then can you change the batch read tfrecord when reading? (Read, finish training, then discard, start next one), is this helpful for memory?

@CasiaFan

CasiaFan · 2019-11-21T03:54:32Z

Try grouped convlolution. @gzchenjiajun gzchenjiajun <notifications@github.com> 于2019年11月21日周四上午11:02写道：

…

Ok, that means splitting multiple tfrecords and improving batch_size doesn't help. Then I have two more questions: 1. In addition to directly upgrading hardware / similar to semi-precision reasoning, how can I improve batch_size? 2. Split tfrecord does not help, then can you change the batch read tfrecord when reading? (Read, finish training, then discard, start next one), is this helpful for memory? @CasiaFan <https://github.com/CasiaFan> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3031?email_source=notifications&email_token=ACQ6CWFMJ35VAP45DP4ESPLQUX24DA5CNFSM4EJAHJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEYXCKA#issuecomment-556888360>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQ6CWEYZKY7HIX5APTRHUDQUX24DANCNFSM4EJAHJCQ> .

Muhammad-Talha-MT · 2021-08-09T09:10:03Z

Hi, it's quite easy to use all record files
I do have files this way

If you have similar files
you should have to place * at the end of the common names of your record

train_input_reader: {
  label_map_path: "/data/tf_template_detection/workspace/data/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/data/tf_template_detection/workspace/data/coco_train.record*"
  }
}

Petros626 · 2022-05-25T12:20:35Z

You can simply assign list of the file path by changing config file

from

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

to

train_input_reader: {
  tf_record_input_reader {
    input_path: ["PATH_TO_BE_CONFIGURED/train_a.record", 
                 "PATH_TO_BE_CONFIGURED/train_b.record"]
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

this change may only work when multiple tfrecord files use the same label_map.

@byungjae89 could you tell if you can pass several paths for train and test images? how to handle this?

Petros626 · 2022-05-25T12:26:01Z

If using protoc 3.5, import multiple tfrecord files in following format:

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/tfrecord.1"
    input_path: "/path/to/tfrecord.2"
  }
  label_map_path: "/path/to/label_map.pbtxt"
}

what if a other protoc version is used? dont know which is required for tensorflow 1.15 gpu, because i haven't set up my environment until now. im interested how you could add several image paths of this different tf.record files. if you have several tf.record files of different data locations, should the images all be only in one big train and test folder (all images of all locations) or in the same where the tf.record files?

shakeel-sial-arhamsoft · 2023-09-28T07:34:40Z

I have gone through the all the above solutions but sadly,no one worked for me. So, I got a simpler and easy solution, which I am going to share.
I simply changed the optional parameters while creating tfrecord file, which gives only one output tfrecord file. which I used for that.

In my case, I had to create coco tfrecord file, for which I used create_coco_tf_record.py file from tensorflow OD model zoo git repository, which has function named "_create_tf_record_from_coco_annotations" with parameter "num_shards", it asked for number of output files.

here is link to file: https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py

Petros626 · 2023-09-29T11:27:43Z

In my opinion your post failed the topic. The discussion is about how to read from several input paths the TFRecord file and not how to create the TFRecord in another format like in your case COCO.

izzrak closed this as completed Dec 28, 2017

TomKomar mentioned this issue Nov 8, 2019

Hello, how can I read multiple tfrecord files in the tensorflow/models project? #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training object detection model with multiple training sets #3031

Training object detection model with multiple training sets #3031

izzrak commented Dec 20, 2017

monomon commented Dec 20, 2017

FightForCS commented Dec 25, 2017

izzrak commented Dec 26, 2017

byungjae89 commented Dec 27, 2017

izzrak commented Dec 28, 2017 •

edited

willSapgreen commented May 21, 2018

CasiaFan commented Jul 16, 2018

failure-to-thrive commented Oct 18, 2018

kevin-apl commented Feb 25, 2019

gzchenjiajun commented Nov 12, 2019

CasiaFan commented Nov 13, 2019

gzchenjiajun commented Nov 15, 2019

CasiaFan commented Nov 17, 2019 via email

gzchenjiajun commented Nov 18, 2019

gzchenjiajun commented Nov 18, 2019

CasiaFan commented Nov 19, 2019

gzchenjiajun commented Nov 21, 2019

CasiaFan commented Nov 21, 2019 via email

Muhammad-Talha-MT commented Aug 9, 2021

Petros626 commented May 25, 2022

Petros626 commented May 25, 2022

shakeel-sial-arhamsoft commented Sep 28, 2023 •

edited

Petros626 commented Sep 29, 2023

Training object detection model with multiple training sets #3031

Training object detection model with multiple training sets #3031

Comments

izzrak commented Dec 20, 2017

monomon commented Dec 20, 2017

FightForCS commented Dec 25, 2017

izzrak commented Dec 26, 2017

byungjae89 commented Dec 27, 2017

izzrak commented Dec 28, 2017 • edited

willSapgreen commented May 21, 2018

CasiaFan commented Jul 16, 2018

failure-to-thrive commented Oct 18, 2018

kevin-apl commented Feb 25, 2019

gzchenjiajun commented Nov 12, 2019

CasiaFan commented Nov 13, 2019

gzchenjiajun commented Nov 15, 2019

CasiaFan commented Nov 17, 2019 via email

gzchenjiajun commented Nov 18, 2019

gzchenjiajun commented Nov 18, 2019

CasiaFan commented Nov 19, 2019

gzchenjiajun commented Nov 21, 2019

CasiaFan commented Nov 21, 2019 via email

Muhammad-Talha-MT commented Aug 9, 2021

Petros626 commented May 25, 2022

Petros626 commented May 25, 2022

shakeel-sial-arhamsoft commented Sep 28, 2023 • edited

Petros626 commented Sep 29, 2023

izzrak commented Dec 28, 2017 •

edited

shakeel-sial-arhamsoft commented Sep 28, 2023 •

edited