-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading a 100Mb .mat file produces peak RSS of 20Gb #55
Comments
There are 2 variables with 5011x5011 cells. Each cell allocates 80Bytes for the As a hack, the empty cells can be freed by diff --git a/src/mat5.c b/src/mat5.c
index 075d3d2..cd281f6 100644
--- a/src/mat5.c
+++ b/src/mat5.c
@@ -1792,6 +1792,8 @@ ReadNextCell( mat_t *mat, matvar_t *matvar )
nbytes = uncomp_buf[1];
if ( !nbytes ) {
/* empty cell */
+ Mat_VarFree(cells[i]);
+ cells[i] = NULL;
continue;
} else if ( uncomp_buf[0] != MAT_T_MATRIX ) {
Mat_VarFree(cells[i]); which helps in your case. But technically it is no longer the same and I do not feel comfortable to commit this change. I'd rather recommend you to get rid of the high-dimensional cell array. |
Just noticed that if MATLAB reads such a cell array with empty cells, it does not allocate the usual array header (overhead). Hence, I will think about the hack and its consequences for e.g., Mat_VarSize or Mat_VarPrint. |
Thanks for very fast response! Now I see, I would strip the extra empty dimensions. Somehow The issue might still be troublesome in adversarial / DoS setting. |
* Memory optimization: Only allocate one empty field or cell per struct or cell array, respectively * Use reference counter num_empty of internal structure to keep track of number of referenced empty fields or cells * As reported by #55
* Memory optimization: Only allocate one empty field or cell per struct or cell array, respectively * Use reference counter num_empty of internal structure to keep track of number of referenced empty fields or cells * As reported by #55
Can you please test and confirm that 464de5c also solves the issue for matio-ffi.torch. Thanks. |
|
Do you know if function /usr/bin/time -f %M ./tools/matdump SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 391452
/usr/bin/time -f %M ./tools/matdump -d SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 779612 |
I also noticed that the testsuite currently misses struct/cell arrays with empty fields or cells. I'll add these cases. |
It definitely loads the data. Even more, from what I can see, |
Finally, this explains the doubled memory consumption of |
…s from v5 MAT file * Memory optimization: Free internal struct member, which is unused for empty variables * As reported by #55
Performance comparison1. Matio v1.5.107751748 (last commit before dd1d2cd), so bascially matio v1.5.10 (and former) /usr/bin/time -f "%es %MK" ./tools/matdump SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 474.62s 3639144K
/usr/bin/time -f "%es %MK" ./tools/matdump -d SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 752.46s 3629448K 2. Upcoming matio v1.5.11dd1d2cd as part of upcoming matio v1.5.11 /usr/bin/time -f "%es %MK" ./tools/matdump SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 34.50s 2744772K
/usr/bin/time -f "%es %MK" ./tools/matdump -d SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 56.12s 3132860K For memory usage, this simply is some compromise between performance and backward-compatibility. But I was surprised about the observed speed improvements of more than one order of magnitude. 3. Even betterThe best values you can get, is, if empty cells are freed again according to the mentioned hack in #55 (comment). However, such a patch would not be backward-compatible, e.g., API functions like Mat_VarPrint or Mat_VarWrite will result in different output then. /usr/bin/time -f "%es %MK" ./tools/matdump SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 18.17s 391072K
/usr/bin/time -f "%es %MK" ./tools/matdump -d SelectiveSearchVOC2007trainval.mat.edgeboxes.mat > log.txt
# 40.91s 779360K |
…s from v5 MAT file * Memory optimization: Free internal struct member, which is unused for empty variables * As reported by tbeu#55
I'm using matio 1.5.10 and matio-ffi.torch. I have a 100Mb file that makes matio to allocate suspiciously a lot of memory:
Probably I'm missing something obvious, but such memory consumption seems a little fishy to me. Doing the same with
matdump
gives:Does this discrepancy of 5Gb vs 20Gb mean matio-ffi.torch is using matio sub-optimally?
log.txt
contains substringEmpty
many times:File uploaded to my OneDrive: https://1drv.ms/u/s!Apx8USiTtrYmprRlRQmgSbPJNcWzEw
The text was updated successfully, but these errors were encountered: