Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding speed for 9/7 on 32bits platforms (x86/ARM) can be improved with a quick fix #220

Closed
gcode-importer opened this issue Apr 15, 2013 · 8 comments

Comments

@gcode-importer
Copy link

Originally reported on Google Code with ID 220

The patch proposed patch has been tested on trunk at revision 2343

Tested on :
Win XP SP3 x86 VC10 SP1
Linux CentOS 5.5 x86_64 compilation with -m32 (GCC 4.1.2 / Red Hat 4.1.2-48)
Linux Ubuntu 11.10 ARMEL compilation with -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-mtune=cortex-a9 (GCC 4.6.3 / Sourcery CodeBench Lite 2012.03-57)

Proposed patch does not require armv7 nor neon capabilities.

Overall time to compress Bretagne2.ppm, Cevennes1.bmp, X_4_2K_24_185_CBR_WB_000.tif
using : "time ./opj_compress -ImgDir ./tmp/ -OutFor jp2 -I" showed a 10-15% speed-up

Regards,
Matthieu DARBOIS


Reported by mayeut on 2013-04-15 13:55:20

@gcode-importer
Copy link
Author

This of course an issue of type enhancement, but I didn't see how to create one...

Reported by mayeut on 2013-04-15 13:59:39

@gcode-importer
Copy link
Author

Changed register constraints for ARM version. It enables to save (potentially) 2 registers.

Reported by mayeut on 2013-04-16 11:13:03


- _Attachment: [fix_mul.patch](https://storage.googleapis.com/google-code-attachments/openjpeg/issue-220/comment-2/fix_mul.patch)_

@gcode-importer
Copy link
Author

Reported by malaterre on 2014-02-25 12:43:32

  • Labels added: Priority-Low
  • Labels removed: Priority-Medium

@gcode-importer
Copy link
Author

Hi,

I updated the patch for tag 2.1.0.

Please find some time ratios below. The whole encoding time is taken into account.
Input images are 8bit grayscale images encoded using 9/7 wavelet. Timings include 8bit->32bit
conversion.

0,964 (linux x86 gcc4.4)
0,983 (linux armv7 gcc4.6)
0,989 (linux armv5 gcc4.6)
0,918 (windows x86 vc8)
0,872 (windows x86 vc10)

x64 shows almost no improvement (as expected, less than 1%)

Regards,
Matthieu

Reported by mayeut on 2014-05-28 07:12:03


- _Attachment: [openjpeg-2.1.0-emul.patch](https://storage.googleapis.com/google-code-attachments/openjpeg/issue-220/comment-4/openjpeg-2.1.0-emul.patch)_

@gcode-importer
Copy link
Author

Reported by mayeut on 2014-09-18 20:31:48

  • Labels added: Type-Enhancement
  • Labels removed: Type-Defect

@gcode-importer
Copy link
Author

Given the results, I took a look at assembly & it looks like gcc & clang are doing their
job so assembly is not needed for linux/macos x86 & arm.

The optimization is also true for MCT, even on x64 (got rid of a useless operation)
where it's speed up by 40% 

Reported by mayeut on 2014-12-13 10:10:42

  • Status changed: Started

@gcode-importer
Copy link
Author

This issue was updated by revision r2956.

Reported by mayeut on 2014-12-13 10:27:28

@gcode-importer
Copy link
Author

Still need to get VC 8+ optimization.

Reported by mayeut on 2014-12-13 10:28:16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants