Jiaxuan Ren2, Junhan Zhu1, Huan Wang1*
1Westlake University, 2University of Electronic Science and Technology of China
Corresponding author: wanghuan [at] westlake [dot] edu [dot] cn*
This paper presents CR-Diff, a method to improve the cross-resolution visual consistency of UNet-based diffusion models by masking out some parameters in the model, i.e., network pruning. While pruning is traditionally used to reduce model size, here we repurpose it to help diffusion models generalize to unseen resolutions. The samples above compare the original SDXL model with its counterpart modified by CR-Diff. The original SDXL is trained at 1024×1024 resolution and can hardly generalize to other resolutions (e.g., 400×560, 480×360), whereas after CR-Diff prunes some parameters (the remaining parameters are unchanged), it generates much more coherent images at these unseen resolutions. This suggests that some parameters in UNet-based diffusion models may act like “impurities”, and pruning—which used to be considered harmful to model capacity—can actually purify the model and improve its cross-resolution generalization.
Large-scale text-to-image diffusion models, while powerful, suffer from performance degradation when applied to unseen resolutions. Existing approaches primarily focus on architectural modifications or retraining, and fail to provide a simple, training-free solution to improve cross-resolution generalization. To bridge this gap, we present CR-Diff, a pruning-based framework that enhances the generative quality of diffusion UNets under resolution shifts.
Specifically,
-
We reveal a counterintuitive phenomenon that moderate pruning can improve generation quality, particularly at unseen resolutions, indicating that certain parameters beneficial at default resolutions may become adverse under distribution shifts;
-
To exploit this insight, we propose a block-wise pruning strategy that assigns differentiated sparsity ratios across downsampling, middle, and upsampling blocks, aligning pruning strength with their intrinsic functional importance;
-
Furthermore, we introduce a pruned output amplification mechanism that rebalances the outputs of dense and pruned subnetworks, enhancing beneficial signals while suppressing adverse effects, and supporting prompt-specific refinement for targeted improvements.
Extensive experiments demonstrate that CR-Diff can improve perceptual fidelity and semantic coherence across various diffusion backbones and unseen resolutions, while largely preserving the performance at default resolutions.
Illustration of the proposed CR-Diff framework for improving cross-resolution generation in UNet-based diffusion models. The method follows a two-stage pruning and optimizing paradigm. First, target UNet blocks are assigned adaptive sparsity levels via a block-wise pruning strategy, where magnitude-based criteria are applied with differentiated ratios across downsampling, middle, and upsampling modules to extract resolution-stable parameter subsets. Subsequently, a pruned output amplification mechanism refines the denoising process by amplifying the pruned subnetwork outputs with a coefficient k > 1, effectively suppressing adverse contributions from the dense model. This process leads to more stable denoising trajectories and produces higher-quality images with improved structural coherence and semantic alignment across resolutions.
Performance comparison on unseen resolutions. Across the evaluated models and resolutions, CR-Diff improves most metrics relative to the dense model, with bold values indicating superior performance of our CR-Diff against the dense model.
Cross-resolution comparison on a subset of 5K prompts from the MS-COCO 2014 validation set. CR-Diff shows consistent gains in both ImageReward and visual fidelity compared to the original SDXL. Dense denotes the original unpruned model. Each group corresponds to a specific prompt, and the ImageReward (IR) scores are shown below each image.
This repository is a minimal, self-contained implementation of CR-Diff for reproducing our pruning and Pruned Output Amplification (POA) experiments. It is distilled from the original Masking_Diffusion codebase and keeps only:
- Pruning configuration evaluation pipeline: load pretrained diffusion models, apply structured UNet/DiT pruning, generate images, and compute metrics (FID / CLIP / ImageReward / PickScore / Aesthetic).
- Lightweight evaluation scripts under
evaluation/. - Pruning core libraries under
lib/. - Simple single-image inference entry (POA-based) under
POA_inference.py.
The goal is to make it easy to reproduce CR-Diff’s pruning and evaluation results and to provide a simple entry to generate images from a text prompt.
git clone https://github.com/xuan9-9/CR-Diff.git
cd CR-Diff
conda create -n crdiff python=3.10 -y
conda activate crdiff
pip install -r requirements.txtYou need to install the following diffusion models from Hugging Face (or have them cached locally):
- SD1.5:
runwayml/stable-diffusion-v1-5 - SD2.1:
stabilityai/stable-diffusion-2-1 - SDXL:
stabilityai/stable-diffusion-xl-base-1.0 - SD3-Medium:
stabilityai/stable-diffusion-3-medium-diffusers - FLUX:
black-forest-labs/FLUX.1-devandblack-forest-labs/FLUX.1-schnell
You also need an evaluation dataset:
- COCO 2014 (train/val + captions) if you want to reproduce the COCO-based experiments:
- Images:
train2014/,val2014/(see the COCO official site) - Captions:
annotations/captions_train2014.json,annotations/captions_val2014.json
- Images:
If you want to use the COCO 5k subset we used in the paper:
cd CR-Diff
python MSCOCO_pre_process/prepare_coco_5k.pyThis will create:
data/val2014_samples/– 5k sampled imagesdata/val2014_samples_prompts.csv– prompts CSV compatible with the evaluation scripts
You can also prepare your own dataset as long as it follows the same CSV and naming conventions (see the Data section above).
Run the Block-Wise Pruning Ratio Strategy (Simulated Annealing search):
cd CR-Diff
python BWP_strategy.pyAfter it finishes, you should get a config file like:
outresults/SA_Ratio/sdxl_512x512/config.json
This file will be used by the subsequent pruning experiments and POA inference.
You can either use the provided shell script:
cd CR-Diff
bash unet_pruning_experiment.shor call the Python script directly:
cd CR-Diff
python unet_pruning_experiment.py \
--config_json outresults/SA_Ratio/sdxl_512x512/config.json \
--prompt "A cat holding a sign that says hello world" \
--prompt_tag demoThis will prune the UNet according to best_global_params_ratio in config.json, run an experiment, save logs and a demo image under outresults/unet_experiment/.
Finally, run POA inference for a single prompt:
cd CR-Diff
python POA_inference.py \
--config_json outresults/SA_Ratio/sdxl_512x512/config.json \
--prompt "A cat holding a sign that says hello world" \
--output outputs/poa_sample.png \
--poa_weight 1.5Or use the minimal helper script:
cd CR-Diff
bash POA_inference.shYou can then compare:
- Dense baseline: run
POA_inference.pywith--no_prune - Pruned + POA: run with the BWP
config.jsonand a non-zero--poa_weight
If you have any questions, please contact us at jiaxuan.ren@std.uestc.edu.cn.
If you find this work useful, please consider citing:
@misc{ren2026crdiff,
title={Cross-Resolution Diffusion Models via Network Pruning},
author={Jiaxuan Ren and Junhan Zhu and Huan Wang},
year={2026},
eprint={2604.05524},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.05524},
}