Skip to content

An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design

License

Notifications You must be signed in to change notification settings

xfhelen/MMBench

Repository files navigation

MMBench: End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs

Ⅰ. Introduction & Background

​ Multi-modal DNNs have become increasingly popular across various application domains due to their significant accuracy improvement compared to SOTA uni-modal DNNs.

Multimodal DNN

image-20230726131917741 image-20230726132014952image-20230726132026796image-20230726132039464Self-driving                                     Medical                                           Multimedia                                           Robotic

​ To understand the implications of multi-modal DNNs on hardware-software co-designs, we have developed MMBench, an end-to-end benchmarking tool designed to evaluate the performance of multi-modal DNNs at both architecture and system levels.

II. Overview of MMBench

Proposed method

MMBench provides profiling tools based on integrated profilers in both CPU and NVIDIA GPU, including PyTorch profiler, Nsight System, and Nsight Compute. These tools enable researchers to comprehensively understand the execution of multi-modal DNNs. See the figure below for how they work together to analyze DNN performance.

image-20230726132234532

Unique features

​ In all, MMBench possesses the following unique features closely related with the characteristics of multi-modal DNNs, which distinguishes itself from general-purpose benchmarks in these specific areas:

  • Fine-grained Network Characterization
  • End-to-End Application
  • ExecutionUser-friendly Profiler Integration

Ⅲ. Implementation Details

Workloads in MMBench

​ MMBench includes 9 different applications from the five most important multi-modal research domains as shown below. It can cover a wide range of the multi-modal DNNs workloads today.

Application Domain Size Modalities Unimodal models Fusion models Task type
Avmnist Multimedia Small Image
Audio
CNN Concate/Tensor Classification
MMimdb Multimedia Medium Image
Text
CNN+transformer Concate/Tensor Classification
CMU-MOSEI Affective computing Large Language
Vision
Audio
CNN+transformer Concate/Tensor/Transformer Regression
Sarcasm Affective computing Small Language
Vision
Audio
CNN+transformer Concate/Tensor/Transformer Classification
Medical VQA Medical Large Image
Text
CNN+transformer Transformer Generation
Medical Segmentation Medical Large MRI scans
(T1, T1c, T2, FLAIR)
CNN+transformer Transformer Segmentation
MuJoCo Push Robotics Medium Image, force, proprioception, control CNN+RNN Concate/Tensor/Transformer Classification
Vison & Touch Robotics Large Image, force, proprioception, depth CNN+RNN Concate/Tensor Classification
TransFuser Automatic driving Large Image
LiDAR
ResNet-34
ResNet-18
Transformer Classification

image-20230726132314122

Encoders, fusion and head methods

​ From software aspects, the applications we choose apply many kinds of subnets (mainly as encoders) , fusion ways and head methods, which consititue a whole multi-modal DNN.

image-20230726132334520

Ⅳ. Profiling Method and Code

Nsight System and Nsight Compute

Nsight System and Nsight Compute measurement scripts are provided in the scripts folder. You can follow instructions there to run experiments.

Pytorch Profiler

The code for measuring using the Pytorch Profiler is contained within each application's own folder. The result will be generated in the log folder.

Ⅴ. Acknowledgement

Some codes and applications were adapted from the MultiBench.

Ⅵ. Contributors

Our team has been working on related technologies since 2018. Thank you to everyone for contributing to this project.

Correspondence to:

Ⅶ. Related Publications

Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs
Xiaofeng Hou, Cheng Xu, Jiacheng Liu, Xuehan Tang, Lingyu Sun, Chao Li and Kwang-Ting Cheng
IEEE Computer Architecture Letters (CAL)

MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software Implications.
Cheng Xu, Xiaofeng Hou, Jiacheng Liu, Chao Li, Tianhao Huang, Xiaozhi Zhu, Mo Niu, Lingyu Sun, Peng Tang, Tongqiao Xu, Kwang-Ting Cheng, Minyi Guo
IEEE International Symposium on Workload Characterization (IISWC)

If you find this repository useful, please cite our paper:

@article{hou2022characterizing,
  title={Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs},
  author={Xiaofeng Hou and Cheng Xu and Jiacheng Liu and Xuehan Tang and Lingyu Sun and Chao Li and Kwang-Ting Cheng},
  journal={IEEE Computer Architecture Letters (CAL)},
  year={2022}
}

@INPROCEEDINGS{10289409,
  author={Xu, Cheng and Hou, Xiaofeng and Liu, Jiacheng and Li, Chao and Huang, Tianhao and Zhu, Xiaozhi and Niu, Mo and Sun, Lingyu and Tang, Peng and Xu, Tongqiao and Cheng, Kwang-Ting and Guo, Minyi},
  booktitle={2023 IEEE International Symposium on Workload Characterization (IISWC)}, 
  title={MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software Implications}, 
  year={2023},
}
```xxxxxxxxxx @article{hou2022characterizing,  title={Characterizing and Understanding End-to-End Multi-modal Neural Networks on GPUs},  author={Xiaofeng Hou and Cheng Xu and Jiacheng Liu and Xuehan Tang and Lingyu Sun and Chao Li and Kwang-Ting Cheng},  journal={IEEE Computer Architecture Letters (CAL)},  year={2022}}bibtex

About

An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published