-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory decoding #968
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can use the same buffer for the tile decoding and the final image, and save the intermediate buffer to transfer between those. Effect on the decoding of MAPA (9944 x 13498 x 3 components of size byte) Peak memory from 4.5 GB to 2.7 GB Now: n5: 2699708767 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E77E07: opj_aligned_malloc (opj_malloc.c:61) <-- final image n1: 1610689344 0x4E7195B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E722D2: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 1610689344 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 1610689344 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n1: 815554560 0x4E72231: opj_tcd_init_decode_tile (tcd.c:1217) <-- working memory for code blocks: 9944*13498/64/64*8192*3 n1: 815554560 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 815554560 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 815554560 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 815554560 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 815554560 0x40369E: main (opj_decompress.c:1459) n1: 219758391 0x4E4C0BF: opj_j2k_read_tile_header (j2k.c:4661) <-- ingestion of code stream n1: 219758391 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 219758391 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 219758391 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 219758391 0x40369E: main (opj_decompress.c:1459) n1: 39822000 0x4E7224F: opj_tcd_init_decode_tile (tcd.c:1224) <-- OPJ_J2K_DEFAULT_NB_SEGS*sizeof(opj_tcd_seg_t) per codeblock n1: 39822000 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 39822000 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 39822000 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 39822000 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 39822000 0x40369E: main (opj_decompress.c:1459) n0: 13884472 in 49 places, all below massif's threshold (1.00%) Before: n5: 4493329848 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n2: 1610709160 0x4E77C87: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E717DB: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E72152: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 1610689344 0x4E4C64A: opj_j2k_decode_tiles (j2k.c:10318) n1: 1610689344 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n0: 19816 in 2 places, all below massif's threshold (1.00%) n1: 1610689344 0x4E43F36: opj_j2k_update_image_data.isra.7 (j2k.c:8743) n1: 1610689344 0x4E4C5C1: opj_j2k_decode_tiles (j2k.c:10358) n1: 1610689344 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n1: 815554560 0x4E720B1: opj_tcd_init_decode_tile (tcd.c:1217) n1: 815554560 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 815554560 0x4E4C64A: opj_j2k_decode_tiles (j2k.c:10318) n1: 815554560 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 815554560 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 815554560 0x40369E: main (opj_decompress.c:1459) n1: 402672336 0x4E4C545: opj_j2k_decode_tiles (j2k.c:10336) n1: 402672336 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 402672336 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 402672336 0x40369E: main (opj_decompress.c:1459) n0: 53704448 in 58 places, all below massif's threshold (1.00%)
rouault
force-pushed
the
reduce_memory_decoding
branch
from
August 7, 2017 16:23
f21c9e2
to
6287ed2
Compare
b7594c0 may put opj_tcd_tilecomp_t->data allocated by opj_alloc_tile_component_data() as the image->comps[].data. As opj_alloc_tile_component_data() use opj_aligned_malloc() we must be sure to ue opj_alined_malloc()/_free() in all places where we alloc/free image->comps[].data. Note: this might have some compatibility impact in case user code does itself the allocation/free of image->comps[].data
As bin/common/color.c used to directly call malloc()/free(), we need to export functions dedicated to allocating/freeing image component data.
Currently we allocate at least 8192 bytes for each codeblock, and copy the relevant parts of the codestream in that per-codeblock buffer as we decode packets. As the whole codestream for the tile is ingested in memory and alive during the decoding, we can directly point to it instead of copying. But to do that, we need an intermediate concept, a 'chunk' of code-stream segment, given that segments may be made of data at different places in the code-stream when quality layers are used. With that change, the decoding of MAPA_005.jp2 goes down from the previous improvement of 2.7 GB down to 1.9 GB. New profile: n4: 1885648469 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E78287: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E71D7B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E7272C: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618) n1: 1610689344 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 1610689344 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 1610689344 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40374E: main (opj_decompress.c:1459) n1: 219232541 0x4E4BBF0: opj_j2k_read_tile_header (j2k.c:4685) n1: 219232541 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 219232541 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 219232541 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 219232541 0x40374E: main (opj_decompress.c:1459) n1: 39822000 0x4E727A9: opj_tcd_init_decode_tile (tcd.c:1219) n1: 39822000 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618) n1: 39822000 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 39822000 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 39822000 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 39822000 0x40374E: main (opj_decompress.c:1459) n0: 15904584 in 52 places, all below massif's threshold (1.00%)
Instead of having the chunk array at the segment level, we can move it down to the codeblock itself since segments are filled in sequential order. Limit the number of memory allocation, and decrease slightly the memory usage. On MAPA_005.jp2 n4: 1871312549 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E781E7: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E71D1B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E726CF: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617) n1: 1610689344 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 1610689344 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 1610689344 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40374E: main (opj_decompress.c:1459) n1: 219232541 0x4E4BC50: opj_j2k_read_tile_header (j2k.c:4683) n1: 219232541 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 219232541 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 219232541 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 219232541 0x40374E: main (opj_decompress.c:1459) n1: 23893200 0x4E72735: opj_tcd_init_decode_tile (tcd.c:1225) n1: 23893200 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617) n1: 23893200 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 23893200 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 23893200 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 23893200 0x40374E: main (opj_decompress.c:1459) n0: 17497464 in 52 places, all below massif's threshold (1.00%)
rouault
force-pushed
the
reduce_memory_decoding
branch
from
August 7, 2017 16:33
6287ed2
to
9211469
Compare
Community probing regarding the change for allocation components of opj_image was done in this email thread : https://groups.google.com/forum/#!topic/openjpeg/RoG0gICchsk |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On MAPA_002.jp2, peak memory goes from 4.5 GB to 1.9 GB (including the 1.6 GB needed for the whole image)
The improvements consist of two main independant changes:
The major one has an annoying side effect: given that tile component data buffers are 16-byte aligned (for SSE code paths), we need in all code paths that alloc/free image->comps[].data to use aligned malloc/free as well (on Linux, with posix_memalign() this isn't visible since such buffers are free'd by free() too, but on Windows mismatch of allocator/deallocator is crashing). This might be seen as a slight API incompatibility in case someone would manually allocate/free the buffers of a opj_image (but nothing in the API indicates that this is something people are actually allowed to do and what the allocators/freeors are supposed to be), which for standard decoding/encoding code is not needed.