Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello model RAM size required #5

Closed
noomio opened this issue Nov 23, 2020 · 11 comments
Closed

Hello model RAM size required #5

noomio opened this issue Nov 23, 2020 · 11 comments

Comments

@noomio
Copy link

noomio commented Nov 23, 2020

Hi,

I'm trying to run the hello example on a small embedded system but im unsure of the memory required to allocate this model ( when running onnx_context_alloc).

I have roughly 2MB, is that enough?
Is there a smaller model that I can test with the model defined as a const char array?
Like the static const unsigned char mnist_onnx[] = { ... }

@jianjunjiang
Copy link
Member

Add this function to show memory information.

`static void display_mallinfo(void)
{
struct mallinfo mi = mallinfo();

printf("Total non-mmapped bytes (arena):       %d\n", mi.arena);
printf("of free chunks (ordblks):            %d\n", mi.ordblks);
printf("of free fastbin blocks (smblks):     %d\n", mi.smblks);
printf("of mapped regions (hblks):           %d\n", mi.hblks);
printf("Bytes in mapped regions (hblkhd):      %d\n", mi.hblkhd);
printf("Max. total allocated space (usmblks):  %d\n", mi.usmblks);
printf("Free bytes held in fastbins (fsmblks): %d\n", mi.fsmblks);
printf("Total allocated space (uordblks):      %d\n", mi.uordblks);
printf("Total free space (fordblks):           %d\n", mi.fordblks);
printf("Topmost releasable block (keepcost):   %d\n", mi.keepcost);

}`

============== Before alloc context ==============
Total non-mmapped bytes (arena): 138816
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 3536
Total free space (fordblks): 135280
Topmost releasable block (keepcost): 135280

============== After alloc context ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536

============== Befor onnx run ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536

============== After onnx run ==============
Total non-mmapped bytes (arena): 450112
of free chunks (ordblks): 3
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 235552
Total free space (fordblks): 214560
Topmost releasable block (keepcost): 133728

2MB memory is enough. mnist is the smallest model, you can usinig xxd -i for other models.

@noomio
Copy link
Author

noomio commented Nov 23, 2020

Thanks.

Unfortunately I'm not running on Linux.

It's a cortex-a7 with ThreadX and debugging is very limited (no JTAG).

I'm unable to run much at the moment as it fails and I can trace it easily.

@noomio
Copy link
Author

noomio commented Nov 23, 2020

Hi,

I traced the fault down to memalign. I had to add my own implementation as I have done for malloc ,free and realloc.

It run the benchmark but freeing some objects isn't performed well, probably due to memalign.

Thanks!

@jianjunjiang
Copy link
Member

just using malloc instead of memalign,512 bytes align is not necessary.

@noomio
Copy link
Author

noomio commented Nov 24, 2020

So leaving it as align 4 and allocating the len shall be sufficient?

@noomio
Copy link
Author

noomio commented Nov 24, 2020

It worked ;)

@jianjunjiang
Copy link
Member

Must ensure 8-byte alignment, double type。for 32-bits system, malloc usually 8-byte aligned, for 64-bits system, usually 16-byte aligned, the twice of void * type, Confirm your malloc alignment。

@noomio
Copy link
Author

noomio commented Nov 24, 2020

I have added this:

UCHAR mem_heap[MALLOC_BYTE_POOL_SIZE] attribute ((aligned (8)));

@jianjunjiang
Copy link
Member

write customized malloc may be ok. 8-bytes align for onnx_tensor_t's datas.

@noomio
Copy link
Author

noomio commented Nov 24, 2020

It seems to work. Im also able to run the mnist model.

image

Just the debug output isnt quite right. Need to figure out the printf implementation.

@noomio
Copy link
Author

noomio commented Nov 24, 2020

I just need to append LF on every CR as I'm on windows.
So far so good.
Thanks for your help. The library is great!

@noomio noomio closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants