In generating phase, the necessarity of increasing batch size to the length of dataset. #7

jiangtann · 2020-08-11T12:42:57Z

https://github.com/THUDM/GCC/blob/master/generate.py#L90

I find if comment this line, i.e. maintain batch size at 32, inference will be much faster.

Why do you set batch size to the length of dataset. Will it influence the performance?

qibinc · 2020-08-11T13:33:00Z

TL;DR. It won't influence the performance.

We once had a set of experiments to try embedding generation with BatchNorm set in the train mode, which computes batch mean and variance on the fly, mitigating the discrepancy between the downstream graphs and the pretraining graphs. In that case, if the instances in the test dataset are not shuffled, it will cause data leaks if bs is set small. This is why we set it to be the dataset set in the first place. In case you want to try the same thing, you can ignore this.

As for the inference time, this line won't have too much effect. I ran time xx command twice with/without this line, the cpu time is:

314.11s system
312.81s system

jiangtann · 2020-08-11T14:37:16Z

OK, I see.

The speed of inference on my side is about ten times different. And I carefully observe via htop, I find when bs is the length of the dataset, only one dataloader num_woker works. When bs is the length of the dataset // 2, only 2 dataloader num_woker work. When bs is the length of the dataset // 4, only 4 dataloader num_worker work, though my num_worker=12.

Only when bs < dataset // num_worker, all num_worker work normally.

You can see when I set bs=length of the dataset // 8, only 8 num_worker are working, while some other num_worker are touching fish:

Anyway, thanks for your detailed reply 😋 .

qibinc · 2020-08-11T14:53:53Z

Hi @jiangtanzju ,

The computation of each dataloader is mainly on scipy.linalg.eigsh here https://github.com/THUDM/GCC/blob/master/gcc/datasets/data_util.py#L251.

It seems that in your setup, this function is not parallel, which leads to a single dataloader only utilizing 100% CPU. With MKL LAPACK installed, this function can be parallel. In that case, even if other dataloaders are touching fish, the one dataloader will utilize all of the CPUs so the time doesn't change so much in my setup. Anyway this shouldn't matter since you can simply increase the number of loaders by decreasing batch_size.

jiangtann closed this as completed Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In generating phase, the necessarity of increasing batch size to the length of dataset. #7

In generating phase, the necessarity of increasing batch size to the length of dataset. #7

jiangtann commented Aug 11, 2020

qibinc commented Aug 11, 2020

jiangtann commented Aug 11, 2020 •

edited

Loading

qibinc commented Aug 11, 2020

In generating phase, the necessarity of increasing batch size to the length of dataset. #7

In generating phase, the necessarity of increasing batch size to the length of dataset. #7

Comments

jiangtann commented Aug 11, 2020

qibinc commented Aug 11, 2020

jiangtann commented Aug 11, 2020 • edited Loading

qibinc commented Aug 11, 2020

jiangtann commented Aug 11, 2020 •

edited

Loading