Skip to content

requested allocation size 0xd55559f0 exceeds maximum supported size of 0xc0000000 when running tf.conv2d on wasm backend #7617

@liliquan0118

Description

@liliquan0118

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):compile from source
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow.js installed from (npm or script link):tfjs-4.4.0
  • TensorFlow.js version (use command below):tfjs-4.4.0
  • Browser version: Chrome 112.0.5615.121 (Official Build) (64-bit)
  • Tensorflow.js Converter Version:

Describe the current behavior

When running the following code under the tfjs-backend-wasm package compiled with the "-fsanitizer=address" option, AddressSanitizer detected an error with the message "requested allocation size 0xd55559f0 exceeds maximum supported size of 0xc0000000."

        var input=tf.ones([1,16,7,4]);
        var filter =tf.fill([17,13,4,4],3,"float32");
        var prediction = await tf.conv2d(input,filter,[25,24],-4,"NHWC",[1,1],"ceil");

The error messages as follows:
image

After debugging, we found that the issue was caused by the parameter pad being negative. In the function conv2d( args: {inputs: Conv2DInputs, backend: BackendWasm, attrs: Conv2DAttrs})(In File tfjs-backend-wasm/src/kernels/Conv2D.ts), pad was -4 with the type of number. However, when passed to the wasmConv2d function (In File tfjs/tfjs-backend-wasm/src/cc/conv2d_impl.cc), pad changed from a number to a size_t type, and its value changed to 4294967292 from -4. This caused the value of "indirection_buffer_size" calculated in the function

static enum xnn_status setup_convolution2d_nhwc(
  xnn_operator_t convolution_op,
  size_t batch_size,
  size_t input_height,
  size_t input_width,
  const void* input,
  void* output,
  uint32_t datatype_init_flags,
  uint32_t log2_input_element_size,
  uint32_t log2_filter_element_size,
  uint32_t extra_weights_elements_size,
  uint32_t log2_output_element_size,
  size_t num_threads){
...
 const size_t indirection_buffer_size = sizeof(void*) * kernel_size * tiled_output_size; // value is 3579148592

      if (input_height != convolution_op->last_input_height ||
          input_width != convolution_op->last_input_width)
      {
        // error: requested allocation size 0xd55559f0 exceeds maximum supported size of 0xc0000000
        const void** indirection_buffer = (const void**) xnn_reallocate_memory((void*) convolution_op->indirection_buffer, indirection_buffer_size);  
...
}

(in File xnnpack/src/operators/convolution-nhwc.c) to be 3579148592, which exceeded the maximum value that can be allocated by xnn_reallocate_memory.

Describe the expected behavior

The program gives a normal numerical result and does not encounter memory errors.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.

To reproduce this issue, you need to compile the tfjs-backend-wasm source code with the option "-fsanitizer=address," and then run the above code under the compiled package. I have included the compiled package in the attachment, and you can reproduce the issue by running "bug.html" in reproducecode.zip directly.
reproducecode.zip

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions