Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train my own dataset (string handwritten digits) #1

Closed
bilalltf opened this issue Mar 11, 2022 · 5 comments
Closed

Train my own dataset (string handwritten digits) #1

bilalltf opened this issue Mar 11, 2022 · 5 comments
Labels
question Further information is requested

Comments

@bilalltf
Copy link

0, 'datasets should not be an empty iterable' # type: ignore[arg-type]
AssertionError: datasets should not be an empty iterable

My config.py :

""" Default CONFIGURATIONS """
exp_name = 'logs' # Where to store logs and models
train_data = '../data_lmdb/training/' # path to training dataset
valid_data = '../data_lmdb/validation/' # path to validation dataset

eval_data = '../data_lmdb/validation/' # path to evaluation dataset
benchmark_all_eval = False # evaluate 10 benchmark evaluation datasets

manualSeed = 1111 # for random seed setting
workers = 4 # number of data loading workers, default=4
batch_size = 768 # input batch size
num_gpu = 1 # number of GPU devices, by default 0
num_iter = 300000 # number of iterations to train for
valInterval = 2000 # Interval between each validation
saved_model = '' # path to model to continue training, if you have no any saved_model to continue left it as ''
FT = False # whether to do fine-tuning
adam = False # Whether to use adam (default is Adadelta)
lr = 1.0 # learning rate, default=1.0 for Adadelta
beta1 = 0.9 # beta1 for adam. default=0.9
rho = 0.95 # decay rate rho for Adadelta. default=0.95'
eps = 1e-8 # eps for Adadelta. default=1e-8'
grad_clip = 5 # gradient clipping value. default=5
baiduCTC = False # for data_filtering_off mode
""" Data processing """
select_data = 'MJ-ST' # select training data (default is MJ-ST, which means MJ and ST used as training data)
batch_ratio = '0.5-0.5' # assign ratio for each selected data in the batch # assign ratio for each selected data in the batch
total_data_usage_ratio = 1.0 # total data usage ratio, this ratio is multiplied to total number of data
batch_max_length = 25 # maximum-label-length
imgH = 32 # the height of the input image
imgW = 100 # the width of the input image
rgb = False # use rgb input
character='0123456789' # character label
sensitive = False # for sensitive character mode
PAD = False # whether to keep ratio then pad for image resize
data_filtering_off = False # for data_filtering_off mode
""" Model Architecture """
Transformation = 'TPS' # Transformation stage. None|TPS
FeatureExtraction = 'ResNet' # FeatureExtraction stage. VGG|RCNN|ResNet
SequenceModeling = 'BiLSTM' # SequenceModeling stage. None|BiLSTM
Prediction = 'Attn' # Prediction stage. CTC|Attn
num_fiducial = 20 # number of fiducial points of TPS-STN
input_channel = 1 # the number of input channel of Feature extractor
output_channel = 512 # the number of output channel of Feature extractor
hidden_size = 256 # the size of the LSTM hidden state

I changed in create_lmdb_dataset.py the line:
env = lmdb.open(outputPath, map_size=1099511627776)
To:
env = lmdb.open(outputPath, map_size=1073741824)

Because I had an error :[lmdb.Error: There is not enough space on the disk]

@yakhyo
Copy link
Owner

yakhyo commented Mar 11, 2022

hi @bilalltf, it occurs when you don't have enough space to load data to RAM. Decrease the map_size . You changed the map_size but it's still huge to fit into RAM. You should decrease the map_size until it fits into RAM memory.

@bilalltf
Copy link
Author

Thank you for your reply I fixed that but I still have this problem:

dataset_root: ./data_lmdb/training dataset: /
sub-directory: /. num samples: 6698
num total samples of /: 6698 x 1.0 (total_data_usage_ratio) = 6698
num samples of / per batch: 768 x 1.0 (batch_ratio) = 768
Traceback (most recent call last):
File "C:\Users\rosetta\train.py", line 262, in
train(opt)
File "C:\Users\rosetta\train.py", line 31, in train
train_dataset = Batch_Balanced_Dataset(opt)
File "C:\Users\rosetta\utils\dataset.py", line 69, in init
self.dataloader_iter_list.append(iter(_data_loader))

@bilalltf
Copy link
Author

Can you help me resolve this with a meeting?

@yakhyo
Copy link
Owner

yakhyo commented Mar 11, 2022

@bilalltf yes, GTM +9, at 2pm.

@bilalltf
Copy link
Author

I fixed the issue by setting worker = 0

@yakhyo yakhyo added the question Further information is requested label Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants