Imbalance of C/NC samples #9

d-zh · 2021-03-31T14:18:05Z

Hi,
I split PIE and JAAD dataset as your paper and codes. But I find imbalance of C/NC samples.
In PIE dataset, the numbers of different samples are as follows:

	NC	C
Train	3576	1194
Test	2742	1074

In JAAD_beh dataset, , the numbers of different samples are as follows:

	NC	C
Train	374	1760
Test	704	1177

In PIE dataset, the number of NC samples is far more than the number of C samples. In JAAD dataset, the number of C samples is more than the number of NC samples. I think it is harmful to train a model. Is the split result correct? Could you please explain this distribution?

ykotseruba · 2021-03-31T14:44:37Z

Yes, we are aware of the imbalance in the dataset. In JAAD the imbalance is caused by the fact that we were mostly interested in pedestrians who cross. PIE contains continuous driving footage so it corresponds to a more naturalistic distribution of pedestrians, many of whom stand near the road waiting to cross.

Please see the code of the benchmark and the text of the paper for details on how to deal with the imbalance. In short, you can either subsample the larger class or keep the data and set the class weights inversely proportional to the samples in each class. Both methods work well in training, with the second option you get more training data.

ykotseruba closed this as completed Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imbalance of C/NC samples #9

Imbalance of C/NC samples #9

d-zh commented Mar 31, 2021

ykotseruba commented Mar 31, 2021 •

edited

Loading

Imbalance of C/NC samples #9

Imbalance of C/NC samples #9

Comments

d-zh commented Mar 31, 2021

ykotseruba commented Mar 31, 2021 • edited Loading

ykotseruba commented Mar 31, 2021 •

edited

Loading