Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error:Check failed: r == 0 (-1 vs. 0) , cannot allocate memory #1430

Closed
Yuxin-Yu opened this issue Apr 30, 2024 · 2 comments
Closed

error:Check failed: r == 0 (-1 vs. 0) , cannot allocate memory #1430

Yuxin-Yu opened this issue Apr 30, 2024 · 2 comments

Comments

@Yuxin-Yu
Copy link

Yuxin-Yu commented Apr 30, 2024

I deployed the DPU using my own board, and in order to debug, I opened all the DEBUG in the vart library. Then, I executed the resnet50 inference process in the app and encountered the following error:

I20240313 10:02:46.674741  1102 buffer_object_dpcma.cpp:170]  ret=1 register factory methord of BufferObjectDpCma for  xir::BufferObject with priority `2`
I20240313 10:02:48.139557  1102 dpu_controller_dnndk.cpp:272] **start register the dnndk dpu controller***
I20240313 10:02:48.147981  1102 dpu_controller.cpp:42] add factory method 00_dnndk
I20240313 10:02:48.151921  1102 dpu_controller_dnndk.cpp:278] register the dnndk dpu controller
I20240313 10:02:48.892221  1102 dpu_controller_dnndk.cpp:75]  fingerprint: 0x101000056010400 0x101000056010400
I20240313 10:02:48.899423  1102 dpu_controller_dnndk.cpp:244] sfm_num 0 dpu_num 2
I20240313 10:02:48.906765  1102 dpu_controller.cpp:57] create dpu controller via 00_dnndk ret= 0x2aac5223a8
I20240313 10:02:48.930130  1102 dpu_session_imp.cpp:68] create dpu session @0x2aac522230 device_core_id_ 0 device_id 0 is_ddr 1 dpu_name unknown
I20240313 10:03:40.180531  1102 dpu_kernel.cpp:40] filename resnet50.xmodel kernel resnet50_0 ret.get() 0x2aac522868
I20240313 10:03:40.186887  1102 dpu_kernel.cpp:187] get workspace sizes for ResNet_0
I20240313 10:03:40.198680  1102 dpu_kernel.cpp:210] total workspace size = 2189936
I20240313 10:03:40.202526  1102 dpu_kernel.cpp:53] create dpu kernel. graph ResNet_0;sub graph subgraph_ResNet__ResNet_AvgPool2d_avgpool__8346_fix @0x2aac68e1f0
I20240313 10:03:40.207616  1102 dpu_kernel_ddr.cpp:39]  create dpu kernel @0x2aac5226c8 cu=unknown:dpu0 device_id=0 device_core_id=0
I20240313 10:03:40.211458  1102 dpu_kernel.cpp:94] loading parameter for ResNet_0
I20240313 10:03:45.230115  1102 dpu_kernel.cpp:143] loading release code for ResNet_0
I20240313 10:03:45.606895  1102 buffer_object_dpcma.cpp:142] phy 0x87800000 offset 0x0 size 1349424
I20240313 10:03:45.701643  1102 buffer_object_dpcma.cpp:129] sync_for_write offset 0 size 1349424
I20240313 10:03:45.704753  1102 dpu_kernel_ddr.cpp:104] loading release code  1349424 bytes to 0x87800000
I20240313 10:03:57.239750  1102 dpu_session_base_imp.cpp:120] session is created.subgraph: subgraph_ResNet__ResNet_AvgPool2d_avgpool__8346_fix
I20240313 10:03:57.244108  1102 dpu_session_base_imp.cpp:124] input tensor:mytensor{ResNet__ResNet_QuantStub_quant_stub__input_1_fix:(1,224,224,3), fixpos=5}
I20240313 10:03:57.251224  1102 dpu_session_base_imp.cpp:127] output tensor:mytensor{ResNet__ResNet_Linear_fc__inputs_fix:(1,1000), fixpos=2}
I20240313 10:03:57.257324  1102 dpu_session_imp.cpp:141]  create dpu runner @ 0x2aac522230 device_id= 0 device_core_id=0
I20240313 10:03:57.332792  1102 tensor_buffer_allocator_imp.cpp:162] device_id 0 device_core_id 0 cu_name
I20240313 10:04:02.656090  1102 zero_copy_helper.cpp:250]
{       reg_id = 0;
        type = CONST;
        size = 14753792; 
        reg_id = 1;
        type = DATA_GLOBAL;
        size = 2038400;  
        reg_id = 2;
        type = DATA_LOCAL_INPUT;
        size = 150528;   
        reg_id = 3;
        type = DATA_LOCAL_OUTPUT;
        size = 1008;
},
I20240313 10:04:03.177399  1102 tensor_buffer_allocator_imp.cpp:267] info[0]=reg_info_t{id=0;type=CONST;location=HOST_PHY;size=14753792;batch=1;device_id=0;device_core_id=0;cu_name=;backstore=null}
I20240313 10:04:03.181497  1102 tensor_buffer_allocator_imp.cpp:267] info[1]=reg_info_t{id=1;type=DATA_GLOBAL;location=HOST_PHY;size=2038400;batch=1;device_id=0;device_core_id=0;cu_na
[ 1371.282644] ------------[ cut here ]------------
[ 1371.282812] WARNING: CPU: 0 PID: 1102 at mm/page_alloc.c:4544 __alloc_pages+0x134/0x194
[ 1371.283378] Modules linked in: dpu
[ 1371.283658] CPU: 0 PID: 1102 Comm: resnet50 Not tainted 6.7.4-dirty #2
[ 1371.283910] Hardware name: 
[ 1371.284020] epc : __alloc_pages+0x134/0x194
[ 1371.284350]  ra : __dma_direct_alloc_pages.isra.0+0xec/0x202
[ 1371.284748] epc : ffffffff80154752 ra : ffffffff8007645c sp : ffffffd807007b90
[ 1371.284930]  gp : ffffffff812ddc08 tp : ffffffd801724000 t0 : ffffffd8070f0480
[ 1371.285098]  t1 : ffffffff80c00508 t2 : ffffffff80c00588 s0 : ffffffd807007c00
[ 1371.285284]  s1 : 0000000000000cc0 a0 : 0000000000000cc0 a1 : 000000000000000c
[ 1371.285432]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000001
[ 1371.285574]  a5 : ffffffff812bd110 a6 : 1827c73817390500 a7 : 000000000000001d
[ 1371.285738]  s2 : 0000000000000e11 s3 : 0000000000000000 s4 : ffffffff812df0b0
[ 1371.285890]  s5 : 000000000000000c s6 : ffffffff80df1488 s7 : 6db6db6db6db6db7
[ 1371.286052]  s8 : ffffffff812dfb60 s9 : 0000000000000000 s10: 0000000000000041
[ 1371.286204]  s11: f000000000000000 t3 : 0000003f80d167f2 t4 : 0000000000000076
[ 1371.286366]  t5 : 0000000000000065 t6 : 0000000000000064
[ 1371.286488] status: 0000000200000120 badaddr: ffffffff80154752 cause: 0000000000000003
[ 1371.286670] [<ffffffff80154752>] __alloc_pages+0x134/0x194
[ 1371.287070] [<ffffffff8007645c>] __dma_direct_alloc_pages.isra.0+0xec/0x202
[ 1371.287486] [<ffffffff80076756>] dma_direct_alloc+0x144/0x2d6
[ 1371.287832] [<ffffffff8007584c>] dma_alloc_attrs+0x80/0x94
[ 1371.288182] [<ffffffff0132e530>] xlnx_dpu_ioctl+0xcc6/0xf66 [dpu]
[ 1371.303750] [<ffffffff8017bdca>] __riscv_sys_ioctl+0x70/0x8e
[ 1371.304108] [<ffffffff806b4018>] do_trap_ecall_u+0x56/0xbe
[ 1371.304514] [<ffffffff806bc1c6>] ret_from_exception+0x0/0x66
[ 1371.304886] ---[ end trace 0000000000000000 ]---
me=;backstore=null}
I20240313 10:04:03.186955  1102 tensor_buffer_allocator_imp.cpp:267] info[2]=reg_info_t{id=2;type=DATA_LOCAL_INPUT;location=HOST_PHY;size=150528;batch=1;device_id=0;device_core_id=0;cu_name=;backstore=null}
I20240313 10:04:03.191352  1102 tensor_buffer_allocator_imp.cpp:267] info[3]=reg_info_t{id=3;type=DATA_LOCAL_OUTPUT;location=HOST_PHY;size=1008;batch=1;device_id=0;device_core_id=0;cu_name=;backstore=null}
I20240313 10:04:03.197788  1102 tensor_buffer_allocator_imp.cpp:313] key=reg_0_sg_0x2aac68e1f0_device_0
F20240313 10:04:03.419384  1102 buffer_object_dpcma.cpp:57] Check failed: r == 0 (-1 vs. 0) , cannot allocate memory
*** Check failure stack trace: ***
./run2.sh: line 16:  1102 Aborted                 env LD_LIBRARY_PATH=/root/app-riscv/samples/lib-riscv-debug ./samples/bin/resnet50 "$image_file"

The error message is located here: auto r = ioctl(fd_->fd(), DPUIOC_CREATE_BO, &req_alloc);. What could be the reason for this?

@KrishnaGaihre
Copy link

@Yuxin-Yu can you share your boot log (petalinux boot log)? Is show_dpu and xdputil query command working? If then include the output of those command and also dmesg in boot log.
Are you using Vitis AI 3.0 based DPU IP and resnet50 app? Which tool version are you using, 2022.2? or new?
Regards,

@Yuxin-Yu
Copy link
Author

Hi @KrishnaGaihre, I have resolved this issue, which was caused by the small CMA capacity. I changed the CMA capacity to the default 256MB and resolved this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants