From ee8388d928a7ef643af5a3348f21f75fd6042da2 Mon Sep 17 00:00:00 2001 From: Woosuk Kwon Date: Thu, 14 Sep 2023 00:10:18 +0000 Subject: [PATCH 1/4] Add link to paper --- README.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 66b683480b69..5d5ea7de79da 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ Easy, fast, and cheap LLM serving for everyone

-| Documentation | Blog | Discussions | +| Documentation | Blog | Paper | Discussions |

@@ -104,3 +104,17 @@ For details, check out our [blog post](https://vllm.ai). We welcome and value any contributions and collaborations. Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved. + +## Citation + +If you use vLLM for your research, please cite our [paper](https://arxiv.org/abs/2309.06180): +```bibtex +@misc{kwon2023efficient, + title={Efficient Memory Management for Large Language Model Serving with PagedAttention}, + author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica}, + year={2023}, + eprint={2309.06180}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} +``` From 1536394a4c418af095dad11b06fd21740465fb18 Mon Sep 17 00:00:00 2001 From: Woosuk Kwon Date: Thu, 14 Sep 2023 00:16:04 +0000 Subject: [PATCH 2/4] Add news --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 5d5ea7de79da..5847043e490e 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ Easy, fast, and cheap LLM serving for everyone --- *Latest News* 🔥 +- [2023/09] We released our [PagedAttention paper](https://arxiv.org/abs/2309.06180) on arXiv! - [2023/08] We would like to express our sincere gratitude to [Andreessen Horowitz](https://a16z.com/2023/08/30/supporting-the-open-source-ai-community/) (a16z) for providing a generous grant to support the open-source development and research of vLLM. - [2023/07] Added support for LLaMA-2! You can run and serve 7B/13B/70B LLaMA-2s on vLLM with a single command! - [2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click [example](https://github.com/skypilot-org/skypilot/blob/master/llm/vllm) to start the vLLM demo, and the [blog post](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) for the story behind vLLM development on the clouds. From e86a1810b33eed59e95edd3912996dc84a00d2e5 Mon Sep 17 00:00:00 2001 From: Woosuk Kwon Date: Thu, 14 Sep 2023 00:33:14 +0000 Subject: [PATCH 3/4] Add paper in doc --- docs/source/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index 6420b98e591e..e6d0bc67c003 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -43,6 +43,7 @@ vLLM is flexible and easy to use with: For more information, check out the following: * `vLLM announcing blog post `_ (intro to PagedAttention) +* `vLLM paper `_ (SOSP 2023) * `How continuous batching enables 23x throughput in LLM inference while reducing p50 latency `_ by Cade Daniel et al. From f3d944899b79e0aebcc0bd284522f77de0a8ba2e Mon Sep 17 00:00:00 2001 From: Woosuk Kwon Date: Thu, 14 Sep 2023 00:35:05 +0000 Subject: [PATCH 4/4] arxiv -> sosp --- README.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 5847043e490e..ab5007288a74 100644 --- a/README.md +++ b/README.md @@ -110,12 +110,10 @@ Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved. If you use vLLM for your research, please cite our [paper](https://arxiv.org/abs/2309.06180): ```bibtex -@misc{kwon2023efficient, - title={Efficient Memory Management for Large Language Model Serving with PagedAttention}, - author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica}, - year={2023}, - eprint={2309.06180}, - archivePrefix={arXiv}, - primaryClass={cs.LG} +@inproceedings{kwon2023efficient, + title={Efficient Memory Management for Large Language Model Serving with PagedAttention}, + author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica}, + booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles}, + year={2023} } ```