Description
The model defined in compression/models/bmshj2018.py differs from the one shown in the paper (whose figure is shown below):
The paper has 3 ReLUs in the hyper-decoder, but the HyperSynthesisTransform class omits the last ReLU.
Are the results shown in the paper based on the paper's definition, or the published code?
I am trying to replicate Balle2018 to compare the results of my work, for that I used the pytorch implementation defined in https://github.com/liujiaheng/compression and I was getting good results, but I noticed several differences from the paper's definition. One is the lack of a ReLU which is present in this code as well. liujiaheng also replaces the gaussian distribution by a laplace distribution and the hyper-encoder absolute value with an exponential function (that also doesn't seem to be mentioned in a paper). Once I revert those modifications I no longer seem to get as good results (although training is still far from complete).
edit: liujiaheng's exponential function is in the decoder, I put it back because it seems to be mentioned in the paper's equations and possibly in your code as well. I edited the post accordingly with strikethrough, though that means I don't know yet the performance of each configuration.