Various module improvements #165

RussellALA · 2023-08-18T13:11:24Z

These changes include a few stability improvements and minor bugfixes to FrEIA modules:

The hard permutation in the AllInOne coupling block implementation was changed from a permutation matrix multiplication (ch x ch parameters) to an indexing tensor (ch parameters), which significantly reduces model size especially after flattening high dimensional data
softclamping in the AllInOne block now no longer scales the near-linear part by the clamping boundaries (clamp * tanh(x) -> clamp * tanh(x/clamp))
added a new parameter "domain_clamping" to BinnedSplineBase, which softclamps the domain width and height of the spline to (-domain_clamping, domain_clamping), since total width/height > 0 is always true before clamping, effectively this restricts the splines to a domain of (0, domain_clamping).
In rational quadratic splines, if the discriminant check fails, added the violating value to the error message to differentiate between numerical errors (negative discriminant close to 0) and training instabilities (NaN values).
fixed a bug in ActNorm that prevented loading of >1D ActNorm layers. The actnorm scale and mean are now initialized to be full-dimensional, not just on the channel dimension (which only was the case after the values were set on the first batch).

…2 parameters in image case)

…e model output)

…-clamps the width of the spline to (-domain_clampling, domain_clamping). Since the total width before clamping is always > 0, effectively the domain is clamped to (0, domain_clamping).

…s to be in line with the original actnorm implementation of glow. This full shape was already used when actnorm is initialized, but the inconsistency in the constructor prevented loading of >1D actnorm models.

fdraxler

Looks good, only minor comments :)

FrEIA/modules/all_in_one_block.py

fdraxler · 2023-08-18T13:29:19Z

FrEIA/modules/all_in_one_block.py

            self.w_perm = nn.Parameter(torch.FloatTensor(w).view(channels, channels, *([1] * self.input_rank)),
                                       requires_grad=False)
            self.w_perm_inv = nn.Parameter(torch.FloatTensor(w.T).view(channels, channels, *([1] * self.input_rank)),
                                           requires_grad=False)


While we're here: Isn't torch.from_numpy preferred over tensor(array)?

Also a bit strange that the parameter is being saved as a view. A reshape or view().contiguous() seems more appropriate, unless the view has a specific purpose?

I tested the suggested changes, and they don't seem to violate any of my local or PR pipepline testing. torch.from_numpy loses the explicit typecast to float, and instead uses double (as the scipy default). I personally think this is unnecessary and if you agree I propose revert this change to torch.FloatTensor. as for view().contiguous() I see no reason this should not be included.

FrEIA/modules/splines/binned.py

fdraxler · 2023-08-18T13:34:44Z

And it looks like one of the tests is not passing – can you run on your machine to see if this is caused by your changes?

…it docstring for BinnedSplineBase domain_clamping

… into VariousModuleImprovements

FrEIA/modules/splines/binned.py

FrEIA/modules/splines/rational_quadratic.py

…ant assertion in rational quadratic splines runtime error, raise runtime error in allinone block if x does not match dims

….from_numpy(numpy) and tensor.view() to tensor.view().contiguous()

RussellALA · 2023-08-18T15:50:11Z

And it looks like one of the tests is not passing – can you run on your machine to see if this is caused by your changes?

Yes, the AIO block failed to throw an Exception if the wrong input shape was given with hard permutes. $\rightarrow$ Fixed

mjack3 · 2023-09-20T21:00:49Z

The new ActNorm implementation is not according to the official implementation.

RussellALA · 2023-09-21T06:02:46Z

The new ActNorm implementation is not according to the official implementation.

In what respect do they differ? Note that the "new" implementation does not change how ActNorm functionally behaves in FrEIA, it simply fixes a bug that prevented non vector-valued models using ActNorm being loaded.

mjack3 · 2023-09-22T17:19:27Z

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width. However, if we attend to the original paper:

"We propose an actnorm layer (for activation normalizaton), that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization".

This can also be checked using its official code. To sum up, to me the previous implementation was the correct one.

LarsKue · 2023-09-22T17:37:59Z

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width

I verified this as true.

The previous implementation was the correct one.

This is not correct. The later dimensions should be initialized to be of unit size. I think something like this code should solve the issue:

dims = list(dims_in)
param_dims = copy(dims)
param_dims[1:] = 1
scale = torch.empty(1, *param_dims)

mjack3 · 2023-09-22T19:27:49Z

I do not understand you with The later dimensions should be initialized to be of unit size. Could you elaborate on the problem regarding the original done by @ardizzone ?

self.dims_in = dims_in[0]
param_dims = [1, self.dims_in[0]] + [1 for i in range(len(self.dims_in) - 1)]
self.scale = nn.Parameter(torch.zeros(*param_dims))
self.bias = nn.Parameter(torch.zeros(*param_dims))

I see it perfectly well.

LarsKue · 2023-09-22T19:39:21Z

The original code from Lynton Ardizzone, while functional, was unreadable and not maintainable. I refactored this block to reflect its intended use in f1e73d1 and e0287a2, but my implementation relies on implicit tensor broadcasting. I am not sure when this is a problem, but nevertheless @RussellALA approached me later stating that it is. In any case, for the initialization, Lyntons code should work fine, and so should my proposal above.

I will push a solution to this on Monday, if @RussellALA has not by then, but otherwise I consider this issue solved.

mjack3 · 2023-09-22T21:05:03Z

Well, although you did a good change, ActNorm should normalize along the channels only...am I wrong?

RussellALA · 2023-09-22T21:36:07Z

I think there has been some confusion what the actual change introduced in this PR was and how ActNorm should work.

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width.

This is incorrect. The ActNorm layer in FrEIA does indeed act per c x h x w, but was already modified in f1e73d1 to do so.

"We propose an actnorm layer (for activation normalizaton), that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization".

This is correct. ActNorm was not faithful to the original implementation in FrEIA and still is not. This should be fixed (in another PR ;) ).

The reason for the confusion might be that the parameters for actnorm were overwritten during initialization with the first batch. So event though we define

dim = next(iter(dims_in))[0]
self.log_scale = nn.Parameter(torch.empty(1, dim))
self.loc = nn.Parameter(torch.empty(1, dim))

in __init__, later we call

def initialize(self, batch: torch.Tensor):
    self.is_initialized.data = torch.tensor(True)
    self.log_scale.data = torch.log(torch.std(batch, dim=0, keepdim=True))
    self.loc.data = torch.mean(batch, dim=0, keepdim=True)

This overwrites the tensor data with a different shape than they were initialized with.

RussellALA added 6 commits June 20, 2023 13:50

more descriptive spline coupling error, when discriminant is negative

2577189

permutation in AllInOneBlock now via index tensor (saves resolution**…

3ca46d4

…2 parameters in image case)

fixed soft clamping in all in one block (higher values no longer scal…

ae74fd4

…e model output)

added a new parameter domain_clamping to BinnedSplineBase, which soft…

d748f44

…-clamps the width of the spline to (-domain_clampling, domain_clamping). Since the total width before clamping is always > 0, effectively the domain is clamped to (0, domain_clamping).

constructed actnorm with full input shape instead of just the channel…

f5690e3

…s to be in line with the original actnorm implementation of glow. This full shape was already used when actnorm is initialized, but the inconsistency in the constructor prevented loading of >1D actnorm models.

fixed shape for actnorm constructor

120c520

RussellALA requested review from fdraxler and LarsKue August 18, 2023 13:11

fdraxler reviewed Aug 18, 2023

View reviewed changes

RussellALA added 2 commits August 18, 2023 15:56

deleted unused hard permutation matrix in AllInOne block, more explic…

ff7221a

…it docstring for BinnedSplineBase domain_clamping

Merge branch 'VariousModuleImprovements' of github.com:vislearn/FrEIA…

612d7c6

… into VariousModuleImprovements

LarsKue reviewed Aug 18, 2023

View reviewed changes

FrEIA/modules/splines/binned.py Outdated Show resolved Hide resolved

LarsKue reviewed Aug 18, 2023

View reviewed changes

FrEIA/modules/splines/rational_quadratic.py Outdated Show resolved Hide resolved

RussellALA added 4 commits August 18, 2023 17:05

made clamp_domain a class method instead of lambda, changed discrimin…

5896c1b

…ant assertion in rational quadratic splines runtime error, raise runtime error in allinone block if x does not match dims

unwrap input in dimension check AllInOne block

e4b1253

proper unwrap in AIO block, changed from torch.tensor(numpy) to torch…

b983024

….from_numpy(numpy) and tensor.view() to tensor.view().contiguous()

fixed typing in AIO input check

31ef2b8

soft permutes in AIO now initialize of type float

4068375

fdraxler merged commit 2eb14cd into master Aug 23, 2023
4 checks passed

LarsKue mentioned this pull request Sep 26, 2023

Adjusted ActNorm to work as described in the paper #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various module improvements #165

Various module improvements #165

RussellALA commented Aug 18, 2023

fdraxler left a comment

fdraxler Aug 18, 2023

LarsKue Aug 18, 2023

RussellALA Aug 18, 2023

fdraxler commented Aug 18, 2023

RussellALA commented Aug 18, 2023

mjack3 commented Sep 20, 2023

RussellALA commented Sep 21, 2023

mjack3 commented Sep 22, 2023

LarsKue commented Sep 22, 2023 •

edited

Loading

mjack3 commented Sep 22, 2023 •

edited

Loading

LarsKue commented Sep 22, 2023

mjack3 commented Sep 22, 2023

RussellALA commented Sep 22, 2023

Various module improvements #165

Various module improvements #165

Conversation

RussellALA commented Aug 18, 2023

fdraxler left a comment

Choose a reason for hiding this comment

fdraxler Aug 18, 2023

Choose a reason for hiding this comment

LarsKue Aug 18, 2023

Choose a reason for hiding this comment

RussellALA Aug 18, 2023

Choose a reason for hiding this comment

fdraxler commented Aug 18, 2023

RussellALA commented Aug 18, 2023

mjack3 commented Sep 20, 2023

RussellALA commented Sep 21, 2023

mjack3 commented Sep 22, 2023

LarsKue commented Sep 22, 2023 • edited Loading

mjack3 commented Sep 22, 2023 • edited Loading

LarsKue commented Sep 22, 2023

mjack3 commented Sep 22, 2023

RussellALA commented Sep 22, 2023

LarsKue commented Sep 22, 2023 •

edited

Loading

mjack3 commented Sep 22, 2023 •

edited

Loading