Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training object detection model with multiple training sets #3031

Closed
izzrak opened this issue Dec 20, 2017 · 23 comments
Closed

Training object detection model with multiple training sets #3031

izzrak opened this issue Dec 20, 2017 · 23 comments

Comments

@izzrak
Copy link

izzrak commented Dec 20, 2017

I didn't find any description in the document shows I can assign multiple input path. Is there any method to train a model with two or more datasets without converting them into one big tfrecords file?

@monomon
Copy link

monomon commented Dec 20, 2017

+1, I was planning on building a script for this, but didn't think to ask if it's built-in already.

@FightForCS
Copy link

+1, I want to train detection model with more than 1 tfrecord
_, string_tensor = parallel_reader.parallel_read( config.input_path, reader_class=tf.TFRecordReader, num_epochs=(input_reader_config.num_epochs if input_reader_config.num_epochs else None), num_readers=input_reader_config.num_readers, shuffle=input_reader_config.shuffle, dtypes=[tf.string, tf.string], capacity=input_reader_config.queue_capacity, min_after_dequeue=input_reader_config.min_after_dequeue)
this line of code seems to read only one tfrecord, maybe add a loop to read multiply tfrecord will help.

@izzrak
Copy link
Author

izzrak commented Dec 26, 2017

@FightForCS I know there is a way to read a series of tfrecords, you can take a look at the examples in the slim folder. It is possible to load a list of the file path by using slim.dataset.Dataset, but you may need to rewrite the script you are using.

@byungjae89
Copy link

You can simply assign list of the file path by changing config file

from

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

to

train_input_reader: {
  tf_record_input_reader {
    input_path: ["PATH_TO_BE_CONFIGURED/train_a.record", 
                 "PATH_TO_BE_CONFIGURED/train_b.record"]
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

this change may only work when multiple tfrecord files use the same label_map.

@izzrak
Copy link
Author

izzrak commented Dec 28, 2017

This issue is closed since I found answers in the code

The object detection API use parallel reader to import your dataset, here are the comments by the developer

Usage:
      data_sources = ['path_to/train*']
      key, value = parallel_read(data_sources, tf.CSVReader, num_readers=4)

  Args:
    data_sources: a list/tuple of files or the location of the data, i.e.
      /path/to/train@128, /path/to/train* or /tmp/.../train*

So basically you can define a list of input path as @byungjae89 mentioned above, or simply provide the input directory like

input_path: my_dataset/train/*

The reader will read the entire folder for you.

@izzrak izzrak closed this as completed Dec 28, 2017
@willSapgreen
Copy link

Hello @izzrak and @byungjae89 ,

Thank you for sharing the approach to train multiple tfrecords.

However,

Do you confirm that the model really trains with those tfrecords, not just the first one?

I follow @byungjae89 's approach to add three tfrecords in the config file,

and intentionally put the incorrect names for the 2nd and 3rd tfrecords. ( I use the correct name for the 1st tfrecord )

The whole training(200k iteration) completes without any problem.

But the error will pop up immediately if I put the incorrect name for the 1st tfrecord.

It seems that only the 1st tfrecord is used in training.

I am working on visualizing the training input image to confirm my suspicion.

Please let me know if you have the similar experience.

Thank you.

@CasiaFan
Copy link

If using protoc 3.5, import multiple tfrecord files in following format:

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/tfrecord.1"
    input_path: "/path/to/tfrecord.2"
  }
  label_map_path: "/path/to/label_map.pbtxt"
}

@failure-to-thrive
Copy link

@willSapgreen, "surprisingly", those input_path are not exact paths, they are PATTERNS. Having scrutinized the tf.gfile.Glob function at https://github.com/tensorflow/models/blob/master/research/object_detection/builders/dataset_builder.py#L61 revealed that you will get NotFound error only if you put something incorrect in the folder's part of the path. Something incorrect in the file's part of the path will be accepted and an empty value will be returned. I.e.:
incorrect_folder/correct_file -> NotFound
correct_folder/incorrect_file -> OK
Of course if you have none of right input_path then you will anyway end up with some other error down the road.
Howhever, if you have at least one right input_path, chances are some mistypes in file's parts of others will be silently skipped!
Be careful!

@kevin-apl
Copy link

Hi @izzrak,

Could you specify where you found this documentation please?

Thanks,
Kevin

This issue is closed since I found answers in the code

The object detection API use parallel reader to import your dataset, here are the comments by the developer

Usage:
      data_sources = ['path_to/train*']
      key, value = parallel_read(data_sources, tf.CSVReader, num_readers=4)

  Args:
    data_sources: a list/tuple of files or the location of the data, i.e.
      /path/to/train@128, /path/to/train* or /tmp/.../train*

So basically you can define a list of input path as @byungjae89 mentioned above, or simply provide the input directory like

input_path: my_dataset/train/*

The reader will read the entire folder for you.

@gzchenjiajun
Copy link

When filling out the tf_record_input_reader parameter, can you specify a directory and then fill in the file rules? Fill in one of the sub-files, it feels too time consuming

@CasiaFan @izzrak

@CasiaFan
Copy link

If you want to read all the record files under a directory and these files are ended with the suffix record, the input configuration could be written like:

tf_record_input_reader {
    input_path: "/path/to/*.tfrecord"
  }

@gzchenjiajun

@gzchenjiajun
Copy link

Now generate more tfrecord and config loading multiple tfrecord has been solved, but found that it does not help the memory (still OOM, I thought that more tfrecord can reduce the memory usage, increase the batch size parameters), I would like to ask How should I handle it?

@CasiaFan @kevin-apl @failure-to-thrive @willSapgreen

@CasiaFan
Copy link

CasiaFan commented Nov 17, 2019 via email

@gzchenjiajun
Copy link

OMM is mainly caused by large input batch, rather than the number of tfrecords. The tfrecords only provide the data source for training and evaluating. If you don't want to reduce the batch size, add more GPUs cards or try a smaller input size. gzchenjiajun notifications@github.com 于2019年11月15日周五 上午11:54写道:

Now generate more tfrecord and config loading multiple tfrecord has been solved, but found that it does not help the memory (still OOM, I thought that more tfrecord can reduce the memory usage, increase the batch size parameters), I would like to ask How should I handle it? @CasiaFan https://github.com/CasiaFan @kevin-apl https://github.com/kevin-apl @failure-to-thrive https://github.com/failure-to-thrive @willSapgreen https://github.com/willSapgreen — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3031?email_source=notifications&email_token=ACQ6CWGC7JGFCDF5LBWZPZ3QTYMO5A5CNFSM4EJAHJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEEG25I#issuecomment-554200437>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ6CWG3NVKFFCKXVDTTDMTQTYMO5ANCNFSM4EJAHJCQ .

Meaning that when the hardware reaches the performance boundary, whether it is a tfrecord or multiple tfrecords, the batch size can no longer be expanded?
Or is there a way for multiple tfrecords to achieve a larger batch size?
For example, loading tfrecord in batches does not help memory? Is tensorflow/tensorflow officially supported? I have not found a lot of information.

@gzchenjiajun
Copy link

@CasiaFan

Meaning that when the hardware reaches the performance boundary, whether it is a tfrecord or multiple tfrecords, the batch size can no longer be expanded?
Or is there a way for multiple tfrecords to achieve a larger batch size?
For example, loading tfrecord in batches does not help memory? Is tensorflow/tensorflow officially supported? I have not found a lot of information.

@CasiaFan
Copy link

Yep, tf.data api would operate the input data stream from the tfrecord file. Even multiple tfrecords are provided, the input batch size would be set during reading based on hardware and model configuration, having nothing to do with the number of tfrecords. As for the reason to split the datasets into shards, I think this post may help you. @gzchenjiajun

@gzchenjiajun
Copy link

Ok, that means splitting multiple tfrecords and improving batch_size doesn't help.
Then I have two more questions:

  1. In addition to directly upgrading hardware / similar to semi-precision reasoning, how can I improve batch_size?
  2. Split tfrecord does not help, then can you change the batch read tfrecord when reading? (Read, finish training, then discard, start next one), is this helpful for memory?

@CasiaFan

@CasiaFan
Copy link

CasiaFan commented Nov 21, 2019 via email

@Muhammad-Talha-MT
Copy link

Hi, it's quite easy to use all record files
I do have files this way
image

If you have similar files
you should have to place * at the end of the common names of your record

train_input_reader: {
  label_map_path: "/data/tf_template_detection/workspace/data/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/data/tf_template_detection/workspace/data/coco_train.record*"
  }
}

@Petros626
Copy link

You can simply assign list of the file path by changing config file

from

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

to

train_input_reader: {
  tf_record_input_reader {
    input_path: ["PATH_TO_BE_CONFIGURED/train_a.record", 
                 "PATH_TO_BE_CONFIGURED/train_b.record"]
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
}

this change may only work when multiple tfrecord files use the same label_map.

@byungjae89 could you tell if you can pass several paths for train and test images? how to handle this?

@Petros626
Copy link

If using protoc 3.5, import multiple tfrecord files in following format:

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/tfrecord.1"
    input_path: "/path/to/tfrecord.2"
  }
  label_map_path: "/path/to/label_map.pbtxt"
}

what if a other protoc version is used? dont know which is required for tensorflow 1.15 gpu, because i haven't set up my environment until now. im interested how you could add several image paths of this different tf.record files. if you have several tf.record files of different data locations, should the images all be only in one big train and test folder (all images of all locations) or in the same where the tf.record files?

@shakeel-sial-arhamsoft
Copy link

shakeel-sial-arhamsoft commented Sep 28, 2023

I have gone through the all the above solutions but sadly,no one worked for me. So, I got a simpler and easy solution, which I am going to share.
I simply changed the optional parameters while creating tfrecord file, which gives only one output tfrecord file. which I used for that.

In my case, I had to create coco tfrecord file, for which I used create_coco_tf_record.py file from tensorflow OD model zoo git repository, which has function named "_create_tf_record_from_coco_annotations" with parameter "num_shards", it asked for number of output files.

here is link to file: https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py

@Petros626
Copy link

In my opinion your post failed the topic. The discussion is about how to read from several input paths the TFRecord file and not how to create the TFRecord in another format like in your case COCO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests