Skip to content

[Web] Different result for a simple two-layer network between wasm and linux builds #24618

@ognjentodic

Description

@ognjentodic

Describe the issue

I have a simple two-layer network that predicts a score between 0 and 1. We recently updated onnruntime from 1.16 to 1.21 and I am noticing a weird behavior in wasm build of our application. What should be low score is never lower than value 0.56, even though that's not the case with linux builds where they are closer to 0 as they should be.

I've made sure that the inputs are the same (ended up hard-coding specific values for testing purposes) and I consistently get this difference. Is there any reasonable explanation for this behavior?

Below I am sharing logging output for the session initialization (there are couple of differences, but nothing stands out). I also tried recompiling with onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1 to get more details about inference (e.g intermediate results, to try to see where the difference starts happening), but I only got some basic information that matches between linux and wasm.

What is a good way to log more details from the inference? I am setting verbose level for the session, and I think I saw somewhere that I would need to use debug builds to be able to control verbosity via RunOptions (these are release builds of onnx).

SESSION LOGS:
Linux:

2025-04-30 16:18:43.566952841 [I:onnxruntime:, inference_session.cc:590 TraceSessionOptions] Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath:"" enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 1 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:0 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { } }
2025-04-30 16:18:43.567092674 [I:onnxruntime:, inference_session.cc:491 ConstructorCommon] Using global/env threadpools since use_per_session_threads_ is false
2025-04-30 16:18:43.569227882 [I:onnxruntime:, inference_session.cc:1732 Initialize] Initializing session.
2025-04-30 16:18:43.569321591 [I:onnxruntime:, inference_session.cc:1769 Initialize] Adding default CPU execution provider.
2025-04-30 16:18:43.569462966 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:29 BFCArena] Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0
2025-04-30 16:18:43.569514132 [V:onnxruntime:ONNX_Runtime, bfc_arena.cc:66 BFCArena] Creating 21 bins of max chunk size 256 to 268435456
2025-04-30 16:18:43.570167549 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK
2025-04-30 16:18:43.570207132 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer ConvActivationFusion modified: 0 with status: OK
2025-04-30 16:18:43.570247549 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
2025-04-30 16:18:43.570331882 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
2025-04-30 16:18:43.570648757 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer NhwcTransformer modified: 0 with status: OK
2025-04-30 16:18:43.570720382 [V:onnxruntime:, session_state.cc:1243 VerifyEachNodeIsAssignedToAnEp] Node placements
2025-04-30 16:18:43.570761507 [V:onnxruntime:, session_state.cc:1246 VerifyEachNodeIsAssignedToAnEp] All nodes placed on [CPUExecutionProvider]. Number of nodes: 2
2025-04-30 16:18:43.570929882 [V:onnxruntime:, session_state.cc:131 CreateGraphInfo] SaveMLValueNameIndexMapping
2025-04-30 16:18:43.571038716 [V:onnxruntime:, session_state.cc:177 CreateGraphInfo] Done saving OrtValue mappings.
2025-04-30 16:18:43.571082091 [I:onnxruntime:, allocation_planner.cc:2574 CreateGraphPartitioner] Use DeviceBasedPartition as default
2025-04-30 16:18:43.571546174 [I:onnxruntime:, session_state_utils.cc:280 SaveInitializedTensors] Saving initialized tensors.
2025-04-30 16:18:43.571642216 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:347 AllocateRawInternal] Extending BFCArena for Cpu. bin_num:0 (requested) num_bytes: 16 (actual) rounded_bytes:256
2025-04-30 16:18:43.571709882 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:206 Extend] Extended allocation by 1048576 bytes.
2025-04-30 16:18:43.571745424 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:209 Extend] Total allocated bytes: 1048576
2025-04-30 16:18:43.571796299 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:212 Extend] Allocated memory at 0x8510d40 to 0x8610d40
2025-04-30 16:18:43.572197924 [I:onnxruntime:, session_state_utils.cc:432 SaveInitializedTensors] Done saving initialized tensors
2025-04-30 16:18:43.572354174 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:281 Reserve] Reserving memory in BFCArena for Cpu size: 30336
2025-04-30 16:18:43.572585966 [I:onnxruntime:ONNX_Runtime, bfc_arena.cc:281 Reserve] Reserving memory in BFCArena for Cpu size: 256
2025-04-30 16:18:43.572823757 [I:onnxruntime:, inference_session.cc:2185 Initialize] Session successfully initialized.

Wasm:
2025-05-01 10:55:30.001159 [I:onnxruntime:, inference_session.cc:590 TraceSessionOptions] Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath:"" enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 1 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:0 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { } }
2025-05-01 10:55:30.001619 [I:onnxruntime:, inference_session.cc:410 operator()] Flush-to-zero and denormal-as-zero are off
2025-05-01 10:55:30.001769 [I:onnxruntime:, inference_session.cc:491 ConstructorCommon] Using global/env threadpools since use_per_session_threads_ is false
2025-05-01 10:55:30.009573 [I:onnxruntime:, inference_session.cc:1732 Initialize] Initializing session.
2025-05-01 10:55:30.015554 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK
2025-05-01 10:55:30.015728 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer ConvActivationFusion modified: 0 with status: OK
2025-05-01 10:55:30.015879 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer MatMulNBitsFusion modified: 0 with status: OK
2025-05-01 10:55:30.016183 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK
2025-05-01 10:55:30.017184 [I:onnxruntime:, graph_transformer.cc:15 Apply] GraphTransformer NhwcTransformer modified: 0 with status: OK
2025-05-01 10:55:30.017658 [V:onnxruntime:, session_state.cc:1243 VerifyEachNodeIsAssignedToAnEp] Node placements
2025-05-01 10:55:30.017813 [V:onnxruntime:, session_state.cc:1246 VerifyEachNodeIsAssignedToAnEp] All nodes placed on [CPUExecutionProvider]. Number of nodes: 2
2025-05-01 10:55:30.018543 [V:onnxruntime:, session_state.cc:131 CreateGraphInfo] SaveMLValueNameIndexMapping
2025-05-01 10:55:30.018798 [V:onnxruntime:, session_state.cc:177 CreateGraphInfo] Done saving OrtValue mappings.
2025-05-01 10:55:30.019069 [I:onnxruntime:, allocation_planner.cc:2574 CreateGraphPartitioner] Use DeviceBasedPartition as default
2025-05-01 10:55:30.021789 [I:onnxruntime:, session_state_utils.cc:280 SaveInitializedTensors] Saving initialized tensors.
2025-05-01 10:55:30.022654 [I:onnxruntime:, session_state_utils.cc:432 SaveInitializedTensors] Done saving initialized tensors
2025-05-01 10:55:30.024779 [I:onnxruntime:, inference_session.cc:2185 Initialize] Session successfully initialized.

INFERENCE LOGS:
Placement: CPUExecutionProvider
FusedGemm node: fused /fc1/Gemm
Input 0 Name: input
Shape: {3,474}
Input 1 Name: fc1.weight
was missing data type
Input 2 Name: ortshared_1_1_4_0_token_25
Shape: {4}
Placement: CPUExecutionProvider
Output 0 Name: /Relu_output_0
Shape: {3,4}
Placement: CPUExecutionProvider
FusedGemm node: fused /fc2/Gemm
Input 0 Name: /Relu_output_0
Shape: {3,4}
Input 1 Name: ortshared_1_2_4_0_token_24
was missing data type
Input 2 Name: ortshared_1_1_1_0_token_26
Shape: {1}
Placement: CPUExecutionProvider
Output 0 Name: output
Shape: {3,1}

To reproduce

not available at the moment

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.21

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

Metadata

Metadata

Assignees

Labels

platform:webissues related to ONNX Runtime web; typically submitted using template

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions