Bump up version to 0.1.1 #204

zhuohan123 · 2023-06-22T07:33:13Z

No description provided.

WoosukKwon

Thanks!

SUMMARY: * update NIGHTLY workflow to be whl centric * update benchmarking jobs to use generated whl TEST PLAN: runs on remote push. i'm also triggering NIGHTLY manually. --------- Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <dbarbuzzi@gmail.com>

…ero-sized tensor, on which skinny gemm fails (vllm-project#204)

"variables" in `docker-bake.hcl` can have defaults, but are overridden by env vars with the same name. We can remove these (useless) defaults and fix the name for `GITHUB_REPO` (it's actually `GITHUB_REPOSITORY`) Example: ```bash env \ GITHUB_REPOSITORY=neuralmagic/nm-vllm-ent \ PYTHON_VERSION=3.12 \ GITHUB_SHA=$(git rev-parse HEAD) \ VLLM_VERSION=0.8.3 \ docker buildx bake cuda --print ``` output: ```json { "group": { "default": { "targets": [ "cuda" ] } }, "target": { "cuda": { "context": ".", "dockerfile": "Dockerfile.ubi", "args": { "BASE_UBI_IMAGE_TAG": "9.5-1739420147", "FLASHINFER_VERSION": "https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.1.post1/flashinfer_python-0.2.1.post1+cu124torch2.5-cp38-abi3-linux_x86_64.whl", "LIBSODIUM_VERSION": "1.0.20", "PYTHON_VERSION": "3.12", "VLLM_TGIS_ADAPTER_VERSION": "0.6.3" }, "labels": { "org.opencontainers.image.source": "https://github.com/neuralmagic/nm-vllm-ent", "vcs-ref": "9803ee1c6d30330c9dc3fca6d42491794f135013", "vcs-type": "git" }, "tags": [ "quay.io/vllm/vllm:0.8.3", "quay.io/vllm/vllm:9803ee1c6d30330c9dc3fca6d42491794f135013", "quay.io/vllm/vllm:2025-04-04-17-55" ], "platforms": [ "linux/amd64" ] } } } ```

* use 2025.1.1 instead (vllm-project#196) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * Use standalone_compile by default in torch >= 2.8.0 (vllm-project#18846) Signed-off-by: rzou <zou3519@gmail.com> * fix xpu compile issue --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: rzou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>

…nd v_cache. (vllm-project#204) This PR changes the shape of kv cache to avoid the view of k_cache and v_cache. What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance. Signed-off-by: hw_whx <wanghexiang7@huawei.com>

Bump up version to 0.1.1

0076c85

zhuohan123 requested a review from WoosukKwon June 22, 2023 07:33

WoosukKwon approved these changes Jun 22, 2023

View reviewed changes

zhuohan123 merged commit 83658c8 into main Jun 22, 2023

zhuohan123 deleted the bumpup-version-0-1-1 branch June 22, 2023 07:33

zhuohan123 restored the bumpup-version-0-1-1 branch June 22, 2023 07:34

WoosukKwon deleted the bumpup-version-0-1-1 branch June 22, 2023 08:01

michaelfeil pushed a commit to michaelfeil/vllm that referenced this pull request Jun 24, 2023

Bump up version to 0.1.1 (vllm-project#204)

04e1a31

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Bump up version to 0.1.1 (vllm-project#204)

7b5d55c

mht-sharma pushed a commit to mht-sharma/vllm that referenced this pull request Oct 30, 2024

With chunked prefil, for large prompts, the sampler can encounter a z…

48c0cb4

…ero-sized tensor, on which skinny gemm fails (vllm-project#204)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bump up version to 0.1.1 #204

Bump up version to 0.1.1 #204

Uh oh!

zhuohan123 commented Jun 22, 2023

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Bump up version to 0.1.1 #204

Bump up version to 0.1.1 #204

Uh oh!

Conversation

zhuohan123 commented Jun 22, 2023

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!