Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imbalance of C/NC samples #9

Closed
d-zh opened this issue Mar 31, 2021 · 1 comment
Closed

Imbalance of C/NC samples #9

d-zh opened this issue Mar 31, 2021 · 1 comment

Comments

@d-zh
Copy link

d-zh commented Mar 31, 2021

Hi,
I split PIE and JAAD dataset as your paper and codes. But I find imbalance of C/NC samples.
In PIE dataset, the numbers of different samples are as follows:

NC C
Train 3576 1194
Test 2742 1074

In JAAD_beh dataset, , the numbers of different samples are as follows:

NC C
Train 374 1760
Test 704 1177

In PIE dataset, the number of NC samples is far more than the number of C samples. In JAAD dataset, the number of C samples is more than the number of NC samples. I think it is harmful to train a model. Is the split result correct? Could you please explain this distribution?

@ykotseruba
Copy link
Owner

ykotseruba commented Mar 31, 2021

Yes, we are aware of the imbalance in the dataset. In JAAD the imbalance is caused by the fact that we were mostly interested in pedestrians who cross. PIE contains continuous driving footage so it corresponds to a more naturalistic distribution of pedestrians, many of whom stand near the road waiting to cross.

Please see the code of the benchmark and the text of the paper for details on how to deal with the imbalance. In short, you can either subsample the larger class or keep the data and set the class weights inversely proportional to the samples in each class. Both methods work well in training, with the second option you get more training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants