GitHub - xiaoxiawu-microsoft/MLPruning: MLPruning, PyTorch, NLP, BERT, Structured Pruning

Introduction

MLPruning is a MultiLevel structured Pruning library for transformer-based models. The library supports the training of BERT models with head/row pruning and block-wise sparsity pruning. Meanwhile, we also incorporate the block sparse MatMul from Triton to get the real speedup.

Please see this paper for more details on the MLPruning algorithm.

For Training

Please refer to the training folder for more details.

For Sparse Kernel Inference

Please refer to the inference folder for more details.

Citation

MLPruning has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:

@article{yao2021mlpruning,
  title={MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models},
  author={Yao, Zhewei and Ma, Linjian and Shen, Sheng and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2105.14636},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
imgs		imgs
inference		inference
training		training
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imgs

imgs

inference

inference

training

training

visualization

visualization

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Introduction

For Training

For Sparse Kernel Inference

Citation

About

Releases

Packages

Languages

License

xiaoxiawu-microsoft/MLPruning

Folders and files

Latest commit

History

Repository files navigation

Introduction

For Training

For Sparse Kernel Inference

Citation

About

Resources

License

Stars

Watchers

Forks

Languages