Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various module improvements #165

Merged
merged 13 commits into from
Aug 23, 2023
Merged

Various module improvements #165

merged 13 commits into from
Aug 23, 2023

Conversation

RussellALA
Copy link
Collaborator

These changes include a few stability improvements and minor bugfixes to FrEIA modules:

  • The hard permutation in the AllInOne coupling block implementation was changed from a permutation matrix multiplication (ch x ch parameters) to an indexing tensor (ch parameters), which significantly reduces model size especially after flattening high dimensional data
  • softclamping in the AllInOne block now no longer scales the near-linear part by the clamping boundaries (clamp * tanh(x) -> clamp * tanh(x/clamp))
  • added a new parameter "domain_clamping" to BinnedSplineBase, which softclamps the domain width and height of the spline to (-domain_clamping, domain_clamping), since total width/height > 0 is always true before clamping, effectively this restricts the splines to a domain of (0, domain_clamping).
  • In rational quadratic splines, if the discriminant check fails, added the violating value to the error message to differentiate between numerical errors (negative discriminant close to 0) and training instabilities (NaN values).
  • fixed a bug in ActNorm that prevented loading of >1D ActNorm layers. The actnorm scale and mean are now initialized to be full-dimensional, not just on the channel dimension (which only was the case after the values were set on the first batch).

…-clamps the width of the spline to (-domain_clampling, domain_clamping). Since the total width before clamping is always > 0, effectively the domain is clamped to (0, domain_clamping).
…s to be in line with the original actnorm implementation of glow. This full shape was already used when actnorm is initialized, but the inconsistency in the constructor prevented loading of >1D actnorm models.
Copy link
Collaborator

@fdraxler fdraxler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only minor comments :)

FrEIA/modules/all_in_one_block.py Outdated Show resolved Hide resolved
Comment on lines 163 to 166
self.w_perm = nn.Parameter(torch.FloatTensor(w).view(channels, channels, *([1] * self.input_rank)),
requires_grad=False)
self.w_perm_inv = nn.Parameter(torch.FloatTensor(w.T).view(channels, channels, *([1] * self.input_rank)),
requires_grad=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here: Isn't torch.from_numpy preferred over tensor(array)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit strange that the parameter is being saved as a view. A reshape or view().contiguous() seems more appropriate, unless the view has a specific purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the suggested changes, and they don't seem to violate any of my local or PR pipepline testing. torch.from_numpy loses the explicit typecast to float, and instead uses double (as the scipy default). I personally think this is unnecessary and if you agree I propose revert this change to torch.FloatTensor. as for view().contiguous() I see no reason this should not be included.

FrEIA/modules/splines/binned.py Outdated Show resolved Hide resolved
FrEIA/modules/splines/binned.py Show resolved Hide resolved
@fdraxler
Copy link
Collaborator

And it looks like one of the tests is not passing – can you run on your machine to see if this is caused by your changes?

…ant assertion in rational quadratic splines runtime error, raise runtime error in allinone block if x does not match dims
….from_numpy(numpy) and tensor.view() to tensor.view().contiguous()
@RussellALA
Copy link
Collaborator Author

And it looks like one of the tests is not passing – can you run on your machine to see if this is caused by your changes?

Yes, the AIO block failed to throw an Exception if the wrong input shape was given with hard permutes. $\rightarrow$ Fixed

@fdraxler fdraxler merged commit 2eb14cd into master Aug 23, 2023
4 checks passed
@mjack3
Copy link

mjack3 commented Sep 20, 2023

The new ActNorm implementation is not according to the official implementation.

@RussellALA
Copy link
Collaborator Author

The new ActNorm implementation is not according to the official implementation.

In what respect do they differ? Note that the "new" implementation does not change how ActNorm functionally behaves in FrEIA, it simply fixes a bug that prevented non vector-valued models using ActNorm being loaded.

@mjack3
Copy link

mjack3 commented Sep 22, 2023

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width. However, if we attend to the original paper:

"We propose an actnorm layer (for activation normalizaton), that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization".

This can also be checked using its official code. To sum up, to me the previous implementation was the correct one.

@LarsKue
Copy link
Collaborator

LarsKue commented Sep 22, 2023

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width

I verified this as true.

The previous implementation was the correct one.

This is not correct. The later dimensions should be initialized to be of unit size. I think something like this code should solve the issue:

dims = list(dims_in)
param_dims = copy(dims)
param_dims[1:] = 1
scale = torch.empty(1, *param_dims)

@mjack3
Copy link

mjack3 commented Sep 22, 2023

I do not understand you with The later dimensions should be initialized to be of unit size. Could you elaborate on the problem regarding the original done by @ardizzone ?

self.dims_in = dims_in[0]
param_dims = [1, self.dims_in[0]] + [1 for i in range(len(self.dims_in) - 1)]
self.scale = nn.Parameter(torch.zeros(*param_dims))
self.bias = nn.Parameter(torch.zeros(*param_dims))

I see it perfectly well.

@LarsKue
Copy link
Collaborator

LarsKue commented Sep 22, 2023

The original code from Lynton Ardizzone, while functional, was unreadable and not maintainable. I refactored this block to reflect its intended use in f1e73d1 and e0287a2, but my implementation relies on implicit tensor broadcasting. I am not sure when this is a problem, but nevertheless @RussellALA approached me later stating that it is. In any case, for the initialization, Lyntons code should work fine, and so should my proposal above.

I will push a solution to this on Monday, if @RussellALA has not by then, but otherwise I consider this issue solved.

@mjack3
Copy link

mjack3 commented Sep 22, 2023

Well, although you did a good change, ActNorm should normalize along the channels only...am I wrong?

@RussellALA
Copy link
Collaborator Author

I think there has been some confusion what the actual change introduced in this PR was and how ActNorm should work.

In the latest pull Request, the ActNorm layer was modified by adding parameters per channel, height, and width.

This is incorrect. The ActNorm layer in FrEIA does indeed act per c x h x w, but was already modified in f1e73d1 to do so.

"We propose an actnorm layer (for activation normalizaton), that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization".

This is correct. ActNorm was not faithful to the original implementation in FrEIA and still is not. This should be fixed (in another PR ;) ).

The reason for the confusion might be that the parameters for actnorm were overwritten during initialization with the first batch. So event though we define

dim = next(iter(dims_in))[0]
self.log_scale = nn.Parameter(torch.empty(1, dim))
self.loc = nn.Parameter(torch.empty(1, dim))

in __init__, later we call

def initialize(self, batch: torch.Tensor):
    self.is_initialized.data = torch.tensor(True)
    self.log_scale.data = torch.log(torch.std(batch, dim=0, keepdim=True))
    self.loc.data = torch.mean(batch, dim=0, keepdim=True)

This overwrites the tensor data with a different shape than they were initialized with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants