Train my own dataset (string handwritten digits) #1

bilalltf · 2022-03-11T13:43:59Z

0, 'datasets should not be an empty iterable' # type: ignore[arg-type]
AssertionError: datasets should not be an empty iterable

My config.py :

""" Default CONFIGURATIONS """
exp_name = 'logs' # Where to store logs and models
train_data = '../data_lmdb/training/' # path to training dataset
valid_data = '../data_lmdb/validation/' # path to validation dataset

eval_data = '../data_lmdb/validation/' # path to evaluation dataset
benchmark_all_eval = False # evaluate 10 benchmark evaluation datasets

manualSeed = 1111 # for random seed setting
workers = 4 # number of data loading workers, default=4
batch_size = 768 # input batch size
num_gpu = 1 # number of GPU devices, by default 0
num_iter = 300000 # number of iterations to train for
valInterval = 2000 # Interval between each validation
saved_model = '' # path to model to continue training, if you have no any saved_model to continue left it as ''
FT = False # whether to do fine-tuning
adam = False # Whether to use adam (default is Adadelta)
lr = 1.0 # learning rate, default=1.0 for Adadelta
beta1 = 0.9 # beta1 for adam. default=0.9
rho = 0.95 # decay rate rho for Adadelta. default=0.95'
eps = 1e-8 # eps for Adadelta. default=1e-8'
grad_clip = 5 # gradient clipping value. default=5
baiduCTC = False # for data_filtering_off mode
""" Data processing """
select_data = 'MJ-ST' # select training data (default is MJ-ST, which means MJ and ST used as training data)
batch_ratio = '0.5-0.5' # assign ratio for each selected data in the batch # assign ratio for each selected data in the batch
total_data_usage_ratio = 1.0 # total data usage ratio, this ratio is multiplied to total number of data
batch_max_length = 25 # maximum-label-length
imgH = 32 # the height of the input image
imgW = 100 # the width of the input image
rgb = False # use rgb input
character='0123456789' # character label
sensitive = False # for sensitive character mode
PAD = False # whether to keep ratio then pad for image resize
data_filtering_off = False # for data_filtering_off mode
""" Model Architecture """
Transformation = 'TPS' # Transformation stage. None|TPS
FeatureExtraction = 'ResNet' # FeatureExtraction stage. VGG|RCNN|ResNet
SequenceModeling = 'BiLSTM' # SequenceModeling stage. None|BiLSTM
Prediction = 'Attn' # Prediction stage. CTC|Attn
num_fiducial = 20 # number of fiducial points of TPS-STN
input_channel = 1 # the number of input channel of Feature extractor
output_channel = 512 # the number of output channel of Feature extractor
hidden_size = 256 # the size of the LSTM hidden state

I changed in create_lmdb_dataset.py the line:
env = lmdb.open(outputPath, map_size=1099511627776)
To:
env = lmdb.open(outputPath, map_size=1073741824)

Because I had an error :[lmdb.Error: There is not enough space on the disk]

yakhyo · 2022-03-11T14:31:28Z

hi @bilalltf, it occurs when you don't have enough space to load data to RAM. Decrease the map_size . You changed the map_size but it's still huge to fit into RAM. You should decrease the map_size until it fits into RAM memory.

bilalltf · 2022-03-11T14:34:27Z

Thank you for your reply I fixed that but I still have this problem:

dataset_root: ./data_lmdb/training dataset: /
sub-directory: /. num samples: 6698
num total samples of /: 6698 x 1.0 (total_data_usage_ratio) = 6698
num samples of / per batch: 768 x 1.0 (batch_ratio) = 768
Traceback (most recent call last):
File "C:\Users\rosetta\train.py", line 262, in
train(opt)
File "C:\Users\rosetta\train.py", line 31, in train
train_dataset = Batch_Balanced_Dataset(opt)
File "C:\Users\rosetta\utils\dataset.py", line 69, in init
self.dataloader_iter_list.append(iter(_data_loader))

bilalltf · 2022-03-11T15:07:15Z

Can you help me resolve this with a meeting?

yakhyo · 2022-03-11T15:15:14Z

@bilalltf yes, GTM +9, at 2pm.

bilalltf · 2022-03-11T23:39:27Z

I fixed the issue by setting worker = 0

bilalltf closed this as completed Mar 11, 2022

yakhyo added the question Further information is requested label Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train my own dataset (string handwritten digits) #1

Train my own dataset (string handwritten digits) #1

bilalltf commented Mar 11, 2022

yakhyo commented Mar 11, 2022

bilalltf commented Mar 11, 2022

bilalltf commented Mar 11, 2022

yakhyo commented Mar 11, 2022

bilalltf commented Mar 11, 2022

Train my own dataset (string handwritten digits) #1

Train my own dataset (string handwritten digits) #1

Comments

bilalltf commented Mar 11, 2022

yakhyo commented Mar 11, 2022

bilalltf commented Mar 11, 2022

bilalltf commented Mar 11, 2022

yakhyo commented Mar 11, 2022

bilalltf commented Mar 11, 2022