update and fix bugs #51

zhanghang1989 · 2018-05-15T18:36:50Z

This PR should have addressed most of the issues:
fixes #48
fixes #47
fixes #46
fixes #45
fixes #35

d-li14 · 2018-05-15T19:42:34Z

Sorry, in my case, the loss exposion issue during evaluation in #35 and #45 still exists. I use the pytorch 0.4.0 compatible code from encoding.nn import BatchNorm2d. Is there anything wrong? @zhanghang1989

zhanghang1989 · 2018-05-15T19:58:36Z

Hi @d-li14, The code is working okay with the pytorch master branch.
Can you provide a minimum error reproduction code and i will take a look into it?

d-li14 · 2018-05-15T20:11:36Z

Thanks for your timely reply. I have tried that module in some semantic segmentation code, and taking the drn training on cityscapes dataset as a naive example, with just replacing this line of original syncbn with encoding.nn.BatchNorm2d, the losses in evaluation will be weird like this

[2018-05-16 03:24:16,377 segment.py:232 validate] Test: [0/31]  Time 16.048 (16.048)    Loss 26639412.0000 (26639412.0000)    Score 0.593 (0.593)
[2018-05-16 03:24:20,453 segment.py:232 validate] Test: [10/31] Time 0.405 (1.829)      Loss 33462836.0000 (29009771.8182)    Score 0.880 (0.561)
[2018-05-16 03:24:28,128 segment.py:232 validate] Test: [20/31] Time 0.412 (1.324)      Loss 20237872.0000 (27516075.8095)    Score 2.562 (0.626)
[2018-05-16 03:24:32,216 segment.py:232 validate] Test: [30/31] Time 0.423 (1.029)      Loss 15786181.0000 (22639639.2903)    Score 2.251 (0.671)

zhanghang1989 · 2018-05-15T20:16:10Z

Typically, it is not necessary to calculate loss in evaluation mode. You can try to add the following code https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/syncbn.py#L41

def forward(self, input):
        if not self.training:
            return batch_norm(
                input, self.running_mean, self.running_var, self.weight, self.bias,
                self.training, self.momentum, self.eps)

d-li14 · 2018-05-15T20:38:15Z

Well, there is no need to calculate the loss in evaluation mode exactly, but as we can see in the log above, the score is also abnormally low in contrast to that in the training mode, which seems to mean that the model makes totally unreasonable predictions during evaluation.

penguinshin · 2018-10-16T22:17:15Z

I am not a user of this repo, but I am experiencing a similar issue with my own code for eval/train. Basically, if I train a 1 layer neural net using train mode on a single batch, and then switch the eval mode, and evaluate on the same batch that I trained on, the loss is much worse. I'm not sure if there's something stupid that I'm doing, or if theres a bug in pyTorch.

zhanghang1989 · 2018-10-16T22:41:24Z

That is because different behavior of BatchNorm layer in training/eval mode

penguinshin · 2018-10-16T22:44:35Z

Yes, but this problem is still happening for a single batch only. Surely after training 100 steps on a single batch during training mode the parameters for the running mean/std wouldn't change right?

zhanghang1989 · 2018-10-16T23:33:15Z

train mode: using mean(x) and var(x)
eval mode: using accumulated_mean and accumulated_var, which are averaged over the dataset

penguinshin · 2018-10-17T00:01:02Z

Yes I am aware, but that does not solve the problem- the dataset is a single batch- I use a single batch only and repeatedly feed it through the batch Norm layer. Because of this, the batch norm running mean and std stay the same after a few iterations. I even compare the running mean and average for both train and eval mode and it’s the same - which is to be expected since were only looking at one batch of data. Despite all of this, using eval mode gives different results than train mode. Mean(x) should be the same as accumulated mean(x) if your dealing with a single batch right?

…

Sent from my iPhone

On Oct 16, 2018, at 7:33 PM, Hang Zhang ***@***.***> wrote: train mode: using mean(x) and var(x) eval mode: using accumulated_mean and accumulated_var, which are averaged over the dataset — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

update and fix bugs

3084471

zhanghang1989 merged commit 67e153d into master May 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update and fix bugs #51

update and fix bugs #51

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

penguinshin commented Oct 16, 2018

zhanghang1989 commented Oct 16, 2018

penguinshin commented Oct 16, 2018

zhanghang1989 commented Oct 16, 2018

penguinshin commented Oct 17, 2018 via email

update and fix bugs #51

update and fix bugs #51

Conversation

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

zhanghang1989 commented May 15, 2018

d-li14 commented May 15, 2018

penguinshin commented Oct 16, 2018

zhanghang1989 commented Oct 16, 2018

penguinshin commented Oct 16, 2018

zhanghang1989 commented Oct 16, 2018

penguinshin commented Oct 17, 2018 via email