Improve spec test execution by adding retry logic for transient errors #4393

lum1n0us · 2025-06-19T03:41:31Z

Let's see how it goes.

…nt errors (bytecodealliance#4393)" This reverts commit 64cafaf.

* Revert "Improve spec test execution by adding retry logic for transient errors (#4393)" This reverts commit 64cafaf. * Revert "Add error handling for sgx ci (#4222)" This reverts commit 8ad4789.

* build(deps): Bump github/codeql-action from 3.28.18 to 3.28.19 (#4346) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.18 to 3.28.19. - [Release notes](https://github.com/github/codeql-action/releases) - [Commits](github/codeql-action@v3.28.18...v3.28.19) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.28.19 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * wasi_socket_ext.c: avoid tls to make this library-friendly (#4338) * Enable aot memory64 sw bounds checks by default (#4350) - enable aot memory64 sw bounds checks by default * build(deps): Bump requests from 2.32.3 to 2.32.4 in /build-scripts (#4349) Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * wasi_nn_types.h: remove a seemingly stale comment (#4348) * add heap-type check for GC when ref.null (#4300) - According to [Link 1](https://webassembly.github.io/gc/core/valid/instructions.html#xref-syntax-instructions-syntax-instr-ref-mathsf-ref-null-mathit-ht), we must ensure that the heap type is valid when ref.null. - According to [Link 2](https://webassembly.github.io/gc/core/valid/types.html#heap-types), a heap type is considered valid if it is either a concrete heap type or an abstract heap type. However, in this function, the check for abstract heap types (absheaptype) was clearly missing, so this condition needs to be added explicitly in the if statement. - When GC is disabled, no change is needed. - When GC is enabled, heap types in WAMR are LEB-encoded values ([Link 3](https://webassembly.github.io/gc/core/appendix/index-types.html)). Therefore, we must use read_leb_int32 to parse the heap type correctly. And we can compute the original type1 using type1 = (uint8)((int32)0x80 + heap_type);. * wamr-wasi-extensions: add a cmake package to provide our wasi extension (#4344) * wasi_ephemeral_nn.h: add a convenience wrapper header * wamr-wasi-extensions: add a cmake package to provide our wasi extension the sample app was tested with: * wasmtime * iwasm with #4308 currently only contains wasi-nn. maybe it makes sense to add lib-socket things as well. cf. #4288 * wasi_nn_openvino.c: remove the tensor layout adjustment logic (#4308) the logic in question seems like an attempt to work around some application bugs. my wild guess is that it was for classification-example. cf. bytecodealliance/wasmtime#10867 * Update type validation in load_table_import() and load_table() (#4296) Prevent from value type. https://webassembly.github.io/spec/core/valid/types.html#table-types https://webassembly.github.io/gc/core/syntax/types.html#reference-types * Follow #4268 to deprecate wamr_ide-related components (#4341) refer to: Bypass wamr_ide-related components from the release process. (#4268) * clean up incompatible running mode checks in test script and ci (#4342) Rearrange the content of do_execute_in_running_mode() in alphabetical order. Add an incompatible check for x86_32. Now, all belows will be bypassed: - jit, fast-jit, multi-tier-jit - memory64 - multi-memory - simd * Update WABT downloads URL (#4357) Plus, skip unsupported running mode instead quit during wamr compiler test * Modify AOT static PGO to conform to llvm-18 and add a CI job to test static PGO on the coremark benchmark (#4345) * static PGO compatible with llvm18 and add CI job to test static PGO on coremark benchmark * update comments and warning info, bitmaps section in llvm profdata shouldn't be used in PGO * Collective fix for typos and minor bugs (#4369) * wasi-nn: fix backend leak on multiple loads (#4366) cf. #4340 * build(deps): Bump github/codeql-action from 3.28.19 to 3.29.0 (#4371) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.19 to 3.29.0. - [Release notes](https://github.com/github/codeql-action/releases) - [Commits](github/codeql-action@v3.28.19...v3.29.0) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 3.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add validation for array type in load_init_expr(GC only) (#4370) * wasi_nn_openvino.c: remove broken xml check (#4365) `xml.buf[xml.size]` check is broken because it accesses past the end of the buffer. anyway, openvino doesn't seem to care the NUL termination. * wamr-wasi-extensions: add lib-socket things (#4360) * improve installation steps for wasi-sdk and wabt on Windows (#4359) * wasi_ephemeral_nn.h: prefix identfiers to avoid too generic names (#4358) * wasi_nn_openvino.c: add a missing buffer overflow check in get_output (#4353) cf. #4351 * send an empty/error reply from server (#4362) Signed-off-by: Su Yihan <yihan.su@intel.com> * wasi_nn_openvino.c: remove pre/postprocessing and layout assumptions (#4361) as wasi-nn doesn't have these concepts, the best we can do without risking breaking certain applications here is to pass through tensors as they are. this matches wasmtime's behavior. tested with: * wasmtime classification-example (with this change, this example fails on tensor size mismatch instead of implicitly resizing it.) * license-plate-recognition-barrier-0007, a converted version with non-fp32 output. [1] (with this change, this model outputs integers as expected.) [1] https://github.com/openvinotoolkit/open_model_zoo/tree/cd7ebe313b69372763e76b82e5d24935308fece4/models/public/license-plate-recognition-barrier-0007 * add nn-cli example (#4373) an example application with flexible cli options which aims to allow us to perform any wasi-nn operations. eg. ``` --load-graph=file=fixture/model.xml,file=fixture/model.bin,id=graph --init-execution-context=graph-id=graph,id=ctx --set-input=file=fixture/tensor.bgr,context-id=ctx,dim=1,dim=3,dim=224,dim=224 --compute=context-id=ctx --get-output=context-id=ctx,file=output.bin ``` * wasi-nn: apply the shared library hack to darwin as well (#4374) copied from the linux version. i'm a bit skeptical with this workaround though. it might be simpler to prohibit the use of wamr api in these shared libraries. after all, what these libraries do is nothing specific to wasm. * wasi-nn: don't try to deinit uninitialized backend (#4375) cf. #4339 * core/iwasm/libraries/wasi-nn/test/build.sh: add a tip for intel mac (#4389) i keep forgetting this and had to re-investigate it at least twice. hopefully this can be helpful for others too. * wasi_nn_tensorflowlite.cpp: reject non-fp32 input earlier (#4388) this backend assumes fp32 here and there. it's safer to reject unexpected inputs explicitly. * Fix several issues related to night-run CI and test scripts. (#4385) - remove duplicated options - fix test script - change ci to use binary * core/iwasm/libraries/wasi-nn/test: use the correct version of keras (#4383) * wasi-nn: fix tensor_data abi for wasi_ephemeral_nn (#4379) it's "(list u8)" in the witx definition. the new definition matches both of our own host definition (struct tensor_wasm) and wasmtime. cf. #4352 * enable WAMR_BUILD_WASI_EPHEMERAL_NN by default (#4381) cf. #4326 * deprecate legacy WAMR-specific "wasi_nn" module (#4382) wasi_nn.h: deprecate legacy "wasi_nn" cf. #4326 * wasi-nn: add minimum serialization on WASINNContext (#4387) currently this is not necessary because context (WASINNContext) is local to instance. (wasm_module_instance_t) i plan to make a context shared among instances in a cluster when fixing #4313. this is a preparation for that direction. an obvious alternative is to tweak the module instance context APIs to allow declaring some kind of contexts instance-local. but i feel, in this particular case, it's more natural to make "wasi-nn handles" shared among threads within a "process". note that, spec-wise, how wasi-nn behaves wrt threads is not defined at all because wasi officially doesn't have threads yet. i suppose, at this point, that how wasi-nn interacts with wasi-threads is something we need to define by ourselves, especially when we are using an outdated wasi-nn version. with this change, if a thread attempts to access a context while another thread is using it, we simply make the operation fail with the "busy" error. this is intended for the mimimum serialization to avoid problems like crashes/leaks/etc. this is not intended to allow parallelism or such. no functional changes are intended at this point yet. cf. #4313 #2430 * Improve spec test execution by adding retry logic for transient errors (#4393) * wasi_nn_openvino.c: implement multiple models per instance (#4380) tested with two models: ``` --load-graph=id=graph1,file=public/license-plate-recognition-barrier-0007/FP32/license-plate-recognition-barrier-0007.xml,file=public/license-plate-recognition-barrier-0007/FP32/license-plate-recognition-barrier-0007.bin \ --load-graph=id=graph2,file=classify/model.xml,file=classify/model.bin \ --init-execution-context=id=exec1,graph-id=graph1 \ --init-execution-context=id=exec2,graph-id=graph2 \ --set-input=context-id=exec1,dim=1,dim=24,dim=94,dim=3,file=out.bin \ --set-input=context-id=exec2,file=classify/banana-3x224x224-bgr.bin,dim=1,dim=3,dim=224,dim=224 \ --compute=context-id=exec1 \ --compute=context-id=exec2 \ --get-output=context-id=exec1,file=exec1-result.bin \ --get-output=context-id=exec2,file=exec2-result.bin ``` a detailed HOWTO: #4380 (comment) * wamr-wasi-extensions/socket: disable reference-types (#4392) and add a comment to explain why. * CI: fix the description of upload_url (#4407) * wasi-nn: fix context lifetime issues (#4396) * wasi-nn: fix context lifetime issues use the module instance context api instead of trying to roll our own with a hashmap. this fixes context lifetime problems mentioned in #4313. namely, * wasi-nn resources will be freed earlier now. before this change, they used to be kept until the runtime shutdown. (wasm_runtime_destroy) after this change, they will be freed together with the associated instances. * wasm_module_inst_t pointer uniqueness assumption (which is wrong after wasm_runtime_deinstantiate) was lifted. as a side effect, this change also makes a context shared among threads within a cluster. note that this is a user-visible api/abi breaking change. before this change, wasi-nn "handles" like wasi_ephemeral_nn_graph were thread-local. after this change, they are shared among threads within a cluster, similarly to wasi file descriptors. spec-wise, either behavior should be ok simply because wasi officially doesn't have threads yet. althogh i feel the latter semantics is more intuitive, if your application depends on the thread-local behavior, this change breaks your application. tested with wamr-wasi-extensions/samples/nn-cli, modified to call each wasi-nn operations on different threads. (if you are interested, you can find the modification at https://github.com/yamt/wasm-micro-runtime/tree/yamt-nn-wip-20250619.) cf. #4313 #2430 * runtime_lib.cmake: enable WAMR_BUILD_MODULE_INST_CONTEXT for wasi-nn as we do for wasi (WAMR_BUILD_LIBC_WASI) * wasi_nn_tensorflowlite.cpp: fix get_output return size (#4390) it should be byte size, not the number of (fp32) values. i'm ambivalent about how to deal with the compatibility for the legacy wamr-specific "wasi_nn". for now, i avoided changing it. (so that existing tests using the legacy abi, namely test_tensorflow.c and test_tensorflow_quantized.c, passes as they are.) if we have any users who still want to use the legacy abi, i suppose they consider the compatibility is more important than the consistency with other backends. cf. #4376 * Refactor copy callstack feature (#4401) - Change `WAMR_ENABLE_COPY_CALLSTACK` to `WAMR_BUILD_COPY_CALL_STACK`, as `WAMR_BUILD` is the prefix for a command line option. - Change `WAMR_ENABLE_COPY_CALLSTACK` to `WASM_ENABLE_COPY_CALL_STACK`, as `WASM_ENABLE` is the prefix for a macro in the source code. - Change `CALLSTACK` to `CALL_STACK` to align with the existing `DUMP_CALL_STACK` feature. - Continue using `WASMCApiFrame` instead of `wasm_frame_t` outside of *wasm_c_api.xxx* to avoid a typedef redefinition warning, which is identified by Clang. * loader: add type index checking (#4402) * Fix handling of non-nullable global_type during global import (#4408) * wasi_nn_llamacpp.c: make this compilable (#4403) * fix bug in bh_vector when extending (#4414) * Collective fix (#4413) * Fix vector growth check and typos in core (#9) * Fix resource cleanup in memory and running modes tests (#10) * Add end of file empty line in wasm_running_modes_test.cc * wasi-nn: make the host use the wasi_ephemeral_nn version of tensor_data (#4411) the motivations: * make the actual input size available to the backends. (currently the backends have to make a guess from shape/type.) * make the host logic look a bit similar to wasi_ephemeral_nn. this is a backend api/abi change. * wasi_nn_llamacpp.c: fix buffer overruns in set_input (#4420) note: for some reasons, wasmedge seems to ignore type/dimensions for the input of ggml. some user code relies on it. cf. second-state/WasmEdge-WASINN-examples#196 note: despite the comment in our code, the input doesn't seem nul-terminated. * wasi_nn_llamacpp.c: remove an unused variable (#4415) * Fix few shadow warnings (#4409) - declaration of ‘memidx’ shadows a previous local - declaration of ‘count’ shadows a previous local * CI: build wamr-wasi-extensions (#4394) * wamr-wasi-extensions: separate test scripts also, allow to specify the prefix directory. for the convenience of the CI. * CI: build wamr-wasi-extensions fragments are copied from compilation_on_macos.yml. (thus intel copyright notice) * wasi_nn_openvino.c: fix a debug build (#4416) after "wasi_nn_openvino.c: implement multiple models per instance" change. (#4380) * loader: fix a potential overflow issue (#4427) * CI: revert SGX retry attempts (#4421) * Revert "Improve spec test execution by adding retry logic for transient errors (#4393)" This reverts commit 64cafaf. * Revert "Add error handling for sgx ci (#4222)" This reverts commit 8ad4789. * implement extended const expr (#4318) * add a toggle to enable extended const on wamrc (#4412) --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Su Yihan <yihan.su@intel.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: YAMAMOTO Takashi <yamamoto@midokura.com> Co-authored-by: TianlongLiang <111852609+TianlongLiang@users.noreply.github.com> Co-authored-by: Liu Jia <jia3.liu@intel.com> Co-authored-by: liang.he <liang.he@intel.com> Co-authored-by: Su Yihan <yihan.su@intel.com>

lum1n0us linked an issue Jun 19, 2025 that may be closed by this pull request

CI: SGX 143 failure #4363

Open

lum1n0us force-pushed the fix/retry_test_wamr_on_sgx branch 2 times, most recently from 7221de1 to 4310210 Compare June 19, 2025 06:13

Improve spec test execution by adding retry logic for transient errors

87144d6

lum1n0us force-pushed the fix/retry_test_wamr_on_sgx branch from 4310210 to 87144d6 Compare June 19, 2025 06:22

lum1n0us marked this pull request as ready for review June 19, 2025 06:57

lum1n0us requested review from loganek, no1wudi, TianlongLiang, wenyongh, xujuntwt95329 and yamt as code owners June 19, 2025 06:57

yamt approved these changes Jun 19, 2025

View reviewed changes

TianlongLiang approved these changes Jun 20, 2025

View reviewed changes

lum1n0us merged commit 64cafaf into bytecodealliance:main Jun 20, 2025
66 checks passed

lum1n0us deleted the fix/retry_test_wamr_on_sgx branch June 20, 2025 07:49

yamt added a commit to yamt/wasm-micro-runtime that referenced this pull request Jun 27, 2025

Revert "Improve spec test execution by adding retry logic for transie…

31a80f3

…nt errors (bytecodealliance#4393)" This reverts commit 64cafaf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve spec test execution by adding retry logic for transient errors #4393

Improve spec test execution by adding retry logic for transient errors #4393

Uh oh!

lum1n0us commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Improve spec test execution by adding retry logic for transient errors #4393

Improve spec test execution by adding retry logic for transient errors #4393

Uh oh!

Conversation

lum1n0us commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lum1n0us commented Jun 19, 2025 •

edited

Loading