[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing #24925

lgeiger · 2025-09-16T00:09:00Z

Purpose

As mentioned in #22044 hashing large multimodal input can be inefficient. #19484 removed a data copy of numpy arrays by relying on memory views for c-contiguous data.

blake3 already accepts uint8 buffers so this PR removes the need to convert everything back to bytes in item_to_bytes

vllm/vllm/multimodal/hasher.py

Line 81 in 5bcc153

return b''.join(kb + vb for kb, vb in cls.iter_item_to_bytes(key, obj))

Implementation wise serialize_item now yields either bytes or a memoryview which blake3 directly consumes.

Test Plan

Correctness should be covered by the existing hasher tests on CI.

The performance can be measured using:

import numpy as np
from vllm.multimodal.hasher import MultiModalHasher

np.random.seed(42)
data = np.random.randn(3840, 2160, 4)

%timeit MultiModalHasher.hash_kwargs(data=data)

Test Result

For a 4k image this speeds up hashing by ~30%. This is not massive, but for a lot of multimodal input this might become noticeably.

# main
168 ms ± 892 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# This PR
120 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

gemini-code-assist

Code Review

This pull request introduces a performance optimization for hashing multimodal inputs by avoiding intermediate byte conversions and copies. The use of generators to yield bytes or memoryview objects, which are then directly consumed by the blake3 hasher, is a solid approach. The changes are well-reasoned and the performance gains are evident. However, I've identified a pre-existing critical issue in the handling of bfloat16 tensors that will cause a runtime error. This should be addressed to ensure the stability of the hashing mechanism.

vllm/multimodal/hasher.py

lgeiger · 2025-09-16T00:11:17Z

vllm/multimodal/hasher.py

Unrelated to this PR: what's the reason for converting all images to RGBA before hashing? This not only introduces some compute costs but also increases the numpy data which needs to be hashed by 30%.

I remember this was due to some security concerns. See #17378

Looking at the unittests, this prevents hash collisions for images with different modes or palettes. How do you feel about hashing the mode, palette and data separately which should prevent the need for converting all images first. I'm happy to make a followup PR for something like this.

data = {"mode": obj.mode, "data": np.asarray(obj)} if obj.palette is not None: data["palette"] = obj.palette.palette yield from cls.iter_item_to_bytes("image", data)

hashing the mode, palette and data separately

Yea I was going to suggest that and that should be much cheaper than needing to convert the image itself

Feel free to do it

I'll make a followup PR, since this would actually change the hash.

lgeiger · 2025-09-16T00:12:28Z

vllm/multimodal/hasher.py

.view(np.uint8) needs to be used since blake3 only supports uint8/int8 buffers. Though calling .view(np.uint8) is very fast since it doesn't need to copy the data.

ywang96

Very much appreciate the contribution and optimization! @lgeiger

blake3 already accepts uint8 buffers so this PR removes the need to convert everything back to bytes in item_to_bytes

I actually didn't know this 😅 and I left a question

vllm/multimodal/hasher.py

ywang96

LGTM but @DarkLight1337 should also take a look

vllm/multimodal/hasher.py

…shing Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

vllm/multimodal/hasher.py

DarkLight1337 · 2025-09-16T13:59:08Z

Thanks for optimizing this!

…shing (vllm-project#24925) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

kexinoh · 2025-09-28T09:18:50Z

This solution still has problems. In fact, there are still many parameters in the image that may affect its state, such as the tRNS state.
@lgeiger

DarkLight1337 · 2025-09-28T09:43:08Z

This PR aims at addressing efficiency. The uniqueness of the hash should be same as before. If that's not the case (or if you feel that the uniqueness has some problems), please open a separate issue and provide more details

lgeiger requested review from DarkLight1337, ywang96 and NickLucche as code owners September 16, 2025 00:09

mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

vllm/multimodal/hasher.py Outdated Show resolved Hide resolved

lgeiger commented Sep 16, 2025

View reviewed changes

ywang96 reviewed Sep 16, 2025

View reviewed changes

vllm/multimodal/hasher.py Outdated Show resolved Hide resolved

ywang96 approved these changes Sep 16, 2025

View reviewed changes

DarkLight1337 reviewed Sep 16, 2025

View reviewed changes

vllm/multimodal/hasher.py Outdated Show resolved Hide resolved

lgeiger added 2 commits September 16, 2025 14:25

[Core][MultiModalHasher] Don't convert memoryviews to bytes during ha…

0447e40

…shing Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Prefer early return over yield

2d5f345

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

lgeiger force-pushed the mm-hash-memoryview branch from c8ddc39 to 2d5f345 Compare September 16, 2025 13:25

DarkLight1337 reviewed Sep 16, 2025

View reviewed changes

vllm/multimodal/hasher.py Show resolved Hide resolved

DarkLight1337 approved these changes Sep 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 16, 2025 13:59

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

DarkLight1337 merged commit 0836928 into vllm-project:main Sep 16, 2025
49 of 51 checks passed

lgeiger deleted the mm-hash-memoryview branch September 16, 2025 15:37

lgeiger mentioned this pull request Sep 16, 2025

[Core][MultiModalHasher] Hash images without converting image mode #24969

Merged

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Core][MultiModalHasher] Don't convert memoryviews to bytes during ha…

9525dc6

…shing (vllm-project#24925) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>

Uh oh!

[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing #24925

[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing #24925

Uh oh!

Conversation

lgeiger commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

lgeiger Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

lgeiger Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywang96 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

lgeiger Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

lgeiger Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Sep 16, 2025

Uh oh!

Uh oh!

kexinoh commented Sep 28, 2025

Uh oh!

DarkLight1337 commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lgeiger commented Sep 16, 2025 •

edited by github-actions bot

Loading

lgeiger Sep 16, 2025 •

edited

Loading

lgeiger Sep 16, 2025 •

edited

Loading

DarkLight1337 commented Sep 28, 2025 •

edited

Loading