New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T1 flag optimizations (#172) #945

merged 19 commits into from Jun 13, 2017


None yet
2 participants

rouault commented Jun 2, 2017

This patch set consists in :

  • porting Carl Hetherington T1 patch that optimimzed encoder side
  • fix/implement VSC encoding in it
  • adapt C. Hetherington tricks to the decoder side
  • use macros in MQC decoding and sig/ref/cleanpass for better assembly generation

Results on the performance test suite (ref time is current master):

../../data/input/nonregression/kodak_2layers_lrcp.j2c, 3 iterations, 1 threads, DECOMPRESS: ref_time 2226 ms, new_time 1967 ms, (improvement) -11.6 %
../../data/input/nonregression/kodak_2layers_lrcp.j2c, 5 iterations, 2 threads, DECOMPRESS: ref_time 3019 ms, new_time 2781 ms, (improvement) -7.9 %
../../data/input/nonregression/kodak_2layers_lrcp.j2c, 10 iterations, 4 threads, DECOMPRESS: ref_time 4864 ms, new_time 4548 ms, (improvement) -6.5 %
../../data/input/conformance/p0_07.j2k, 3 iterations, 1 threads, DECOMPRESS: ref_time 5647 ms, new_time 4959 ms, (improvement) -12.2 %
../../data/input/conformance/p0_04.j2k, 10 iterations, 1 threads, DECOMPRESS: ref_time 703 ms, new_time 622 ms, (improvement) -11.5 %
../../data/input/nonregression/X_4_2K_24_185_CBR_WB_000.tif, 3 iterations, 1 threads, COMPRESS: ref_time 6588 ms, new_time 5291 ms, (improvement) -19.7 %
TOTAL: ref_time 23049 ms, new_time 20172 ms, (improvement) -12.5 %

On (private image) MAPA.jp2 (recoded with standard flags), decoding time goes from 50.168 s to 45.030, so a reduction by 10% of decoding time as wel.

rouault added some commits May 20, 2017

T1: use more compact flags to optimize cache usage in encoder passes. (

Ported from Carl Hetherington work (actually through Matthieu Darbois's port
on top of OpenJPEG 2.1.0)

Can reduce total encoding time by 10-15%

WARNING: VSC mode is not implemented, and so is a temporary regression
that must be fixed.
Force inlining of mqc decoding and pass steps through heavy use of ma…
…cros, so as to get better register allocation
Simplify VSC handling: instead of masking out bits when reading the 4…
…th row.

Do not set them when updating flags of the 1st row
MQC/RAW decoder: use an artificial 0xFF 0xFF terminating marker.
This saves comparing the current pointer with the end of buffer pointer.
This results at least in tiny speed improvement for raw decoding, and
smaller code size for MQC as well.

This kills the remains of the raw.h/.c files that were only used for
decoding. Encoding using the mqc structure already.

@detonin detonin added the in progress label Jun 2, 2017

@rouault rouault merged commit 9a9b069 into uclouvain:master Jun 13, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
continuous-integration/travis-ci/pr The Travis CI build passed

@detonin detonin removed the in progress label Jun 13, 2017


This comment has been minimized.


rouault commented Jun 13, 2017

Has been merged into master.

@detonin detonin added the enhancement label Aug 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment