Issue running GenderTraining program on Ubuntu #205

MisterMcDuck · 2022-06-25T20:57:29Z

Hello,

I attempted to follow the instructions provided at https://github.com/takuya-takeuchi/DlibDotNet/wiki/Tutorial-for-Linux
and https://github.com/takuya-takeuchi/FaceRecognitionDotNet/tree/master/tools/GenderTraining

to train a gender model as specified. I compiled everything with CUDA support, and can confirm that works as I've previously trained dlib networks on this machine.

I always specified 64/desktop cuda 112 when building the libraries. However, when I try to run the training program, I receive this error:

ubuntu@ip-172-30-0-90:~/DNN/DotNet/FaceRecognitionDotNet/tools/GenderTraining/bin/x64/Release/netcoreapp2.0$ ls
DlibDotNet.dll  DlibDotNet.xml            GenderTraining.dll  GenderTraining.runtimeconfig.dev.json  libDlibDotNetNativeDnn.so                   libDlibDotNetNativeDnnGenderClassification.so
DlibDotNet.pdb  GenderTraining.deps.json  GenderTraining.pdb  GenderTraining.runtimeconfig.json      libDlibDotNetNativeDnnAgeClassification.so

ubuntu@ip-172-30-0-90:~/DNN/DotNet/FaceRecognitionDotNet/tools/GenderTraining/bin/x64/Release/netcoreapp2.0$ dotnet GenderTraining.dll train -d=/home/ubuntu/DNN/DotNet/FaceRecognitionDotNet/tools/GenderTraining/UTKDataset -b=400 -e=600 -v=20
           Dataset: /home/ubuntu/DNN/DotNet/FaceRecognitionDotNet/tools/GenderTraining/UTKDataset
             Epoch: 600
     Learning Rate: 0.001
 Min Learning Rate: 1E-05
    Min Batch Size: 400
Validation Interval: 20

Start load train images
Load train images: 7824
Start load test images
Load test images: 1954

**************************** FATAL ERROR DETECTED ****************************

Error detected at line 202.
Error detected in file /opt/data/FaceRecognitionDotNet/src/DlibDotNet/src/dlib/dlib/../dlib/dnn/trainer.h.
Error detected in function void dlib::dnn_trainer<net_type, solver_type>::train_one_step(const std::vector<typename net_type::input_type>&, const std::vector<typename net_type::training_label_type>&) [with net_type = dlib::add_loss_layer<dlib::loss_multiclass_log_, dlib::add_layer<dlib::fc_<2ul, (dlib::fc_bias_mode)0u>, dlib::add_layer<dlib::dropout_, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::fc_<512ul, (dlib::fc_bias_mode)0u>, dlib::add_layer<dlib::dropout_, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::fc_<512ul, (dlib::fc_bias_mode)0u>, dlib::add_layer<dlib::max_pool_<3l, 3l, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::con_<384l, 3l, 3l, 1, 1, 1, 1>, dlib::add_layer<dlib::bn_<(dlib::layer_mode)0u>, dlib::add_layer<dlib::max_pool_<3l, 3l, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::con_<256l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::bn_<(dlib::layer_mode)0u>, dlib::add_layer<dlib::max_pool_<3l, 3l, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::con_<96l, 7l, 7l, 4, 4>, dlib::input_rgb_image_sized<227ul>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void> >; solver_type = dlib::sgd; typename net_type::input_type = dlib::matrix<dlib::rgb_pixel>; typename net_type::training_label_type = long unsigned int].

Failing expression was data.size() == labels.size().


******************************************************************************

Aborted (core dumped)

I'm not sure how the two std:vectors could have a differing size. If you think it would help I could try this on a windows OS as this is just an Amazon EC2 instance.

Thanks for any advice you can give!

The text was updated successfully, but these errors were encountered:

takuya-takeuchi · 2022-06-27T03:40:49Z

@MisterMcDuck
This issue may be ralated to takuya-takeuchi/DlibDotNet#272
So I think we have to modify DlibDotNet code.

MisterMcDuck · 2022-06-27T18:45:58Z

@takuya-takeuchi

Thanks for the link. I can confirm that with a simple modification as done in the linked PR, it's now training.

The modification I made, just for testing:

---- src/GenderClassification/dlib/dnn/loss/multiclass_log/gender/Gender.h ----
index a9daf2d..e779c16 100644
@@ -9,8 +9,8 @@
 #include "defines.h"
 #include "DlibDotNet.Native.Dnn/dlib/dnn/loss/multiclass_log/template.h"
 
-typedef unsigned long gender_out_type;
-typedef unsigned long gender_train_label_type;
+typedef uint32_t gender_out_type;
+typedef uint32_t gender_train_label_type;
 
 MAKE_LOSSMULTICLASSLOG_FUNC(gender_train_type,  matrix_element_type::RgbPixel, dlib::rgb_pixel, matrix_element_type::UInt32, gender_train_label_type, 100)

and

 src/DlibDotNet.Native.Dnn/dlib/dnn/loss/multiclass_log/LossMulticlassLogBase.h 
index 38d30f3..6f73d7d 100644
@@ -8,8 +8,8 @@
 
 #include "../LossBase.h"
 
-typedef unsigned long loss_multiclass_log_out_type;
-typedef unsigned long loss_multiclass_log_train_label_type;
+typedef uint32_t loss_multiclass_log_out_type;
+typedef uint32_t loss_multiclass_log_train_label_type;
 
 using namespace dlib;
 using namespace std;

and the result:

dotnet GenderTraining.dll train -d /media/chris/DATA/Datasets/UTKDataset/output
            Dataset: /media/chris/DATA/Datasets/UTKDataset/output
              Epoch: 300
      Learning Rate: 0.001
  Min Learning Rate: 1E-05
     Min Batch Size: 256
Validation Interval: 30

Start load train images
Load train images: 7824
Start load test images
Load test images: 1954
step#: 0     learning rate: 0.001  average loss: 0            steps without apparent progress: 0
step#: 5     learning rate: 0.001  average loss: 0.769476     steps without apparent progress: 0
step#: 9     learning rate: 0.001  average loss: 0.769381     steps without apparent progress: 0
step#: 14    learning rate: 0.001  average loss: 0.725918     steps without apparent progress: 7

If I get some time I'll try to bring together a PR, but it'd need to cover all the cases rather than just this one.

takuya-takeuchi · 2022-06-28T14:42:07Z

I think we should use uint64_t.
Because dlib uses uint64_t when it is build in linux.
Otherwise, using uint32_t occurs 'explicit type conversion'.

But you can continue to train by your code.
This issue is not matter but it could occur only compile warning.
Thanks :)

MisterMcDuck · 2022-06-29T23:16:02Z

I did see issues keeping UInt32, e.g. 1/2 sized arrays during the Validation phase, but worked around them. Out of curiosity I implemented UInt64 support for loss multiloss log, but I think they'd be breaking changes for the library which I think you would want to avoid. I separated the commits into basic support in std vector and the breaking changes in loss multiloss log if you're interested:

takuya-takeuchi/DlibDotNet@master...MisterMcDuck:DlibDotNet:feature/UInt64

To see them. I guess overrides could be used, but I don't know how much interest there are in these changes.

takuya-takeuchi · 2022-07-23T09:43:29Z

@MisterMcDuck
Thanks for your contribution and sorry for the late contact.

I created new PR from your branch.
takuya-takeuchi/DlibDotNet#281

Your change looks good to me :)
TBH, I do not take care of breaking changes.

I try to build and test it on windows, linux and osx.

Thanks a lot.

takuya-takeuchi · 2022-07-29T16:01:23Z

It should be resolved by 1.3.0.7

takuya-takeuchi self-assigned this Jun 27, 2022

takuya-takeuchi added the bug Something isn't working label Jun 27, 2022

takuya-takeuchi mentioned this issue Jul 19, 2022

System.AccessViolationException when running the age training #206

Closed

This was referenced Jul 23, 2022

Fix wrong type for LossMulticlassLog takuya-takeuchi/DlibDotNet#281

Closed

Fix wrong type for LossMulticlassLog takuya-takeuchi/DlibDotNet#282

Merged

Age/Gender/Emotion classification does not work on Linux/OSX #208

Merged

takuya-takeuchi closed this as completed Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue running GenderTraining program on Ubuntu #205

Issue running GenderTraining program on Ubuntu #205

MisterMcDuck commented Jun 25, 2022

takuya-takeuchi commented Jun 27, 2022

MisterMcDuck commented Jun 27, 2022

takuya-takeuchi commented Jun 28, 2022

MisterMcDuck commented Jun 29, 2022 •

edited

takuya-takeuchi commented Jul 23, 2022

takuya-takeuchi commented Jul 29, 2022

Issue running GenderTraining program on Ubuntu #205

Issue running GenderTraining program on Ubuntu #205

Comments

MisterMcDuck commented Jun 25, 2022

takuya-takeuchi commented Jun 27, 2022

MisterMcDuck commented Jun 27, 2022

takuya-takeuchi commented Jun 28, 2022

MisterMcDuck commented Jun 29, 2022 • edited

takuya-takeuchi commented Jul 23, 2022

takuya-takeuchi commented Jul 29, 2022

MisterMcDuck commented Jun 29, 2022 •

edited