[Performance] ORT takes ~11GB memory for quantizing a model of size ~1GB

### Describe the issue

I observed that ORT takes 11541.5MB of GPU memory with CUDAExecutionProvider while quantizing a model of size 1.3GB. The model has a single input of shape 1x2x1024x2048. I was able to reduce the memory usage using the following optimizations, but it wont reduce further than what I have shared above.

```python
sess_options.add_session_config_entry("session.use_device_allocator_for_initializers", "1")
("CUDAExecutionProvider, {"arena_extend_strategy": "kSameAsRequested"})
run_options.add_run_config_entry("memory.enable_memory_arena_shrinkage", f"cpu:0;{gpu_str}")
```

Is there a more optimal options configuration that can reduce the GPU memory utilization even further?


### To reproduce

NA

### Urgency

NA

### Platform

Linux

### OS Version

ubuntu 24.04

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.22

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

cuda 12.9

### Model File

NA

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] ORT takes ~11GB memory for quantizing a model of size ~1GB #24954

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] ORT takes ~11GB memory for quantizing a model of size ~1GB #24954

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions