Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash collisions in SVG loader #2020

Closed
lmdsp opened this issue Mar 2, 2024 · 4 comments
Closed

Hash collisions in SVG loader #2020

lmdsp opened this issue Mar 2, 2024 · 4 comments
Assignees
Labels
bug Something isn't working renderer Core rendering showstopper Regression bugs / Critical errors
Milestone

Comments

@lmdsp
Copy link
Collaborator

lmdsp commented Mar 2, 2024

The picture loader in tvgLoader.cpp uses this 'hash' function:

uint64_t HASH_KEY(const char* data, uint64_t size)
{
    return (((uint64_t) data) << 32) | size;
}

static LoadModule* _findFromCache(const char* data, uint32_t size, const string& mimeType)
{
    // ...
    auto loader = _activeLoaders.head;

    auto key = HASH_KEY(data, size);

    while (loader) {
        if (loader->type == type && loader->hashkey == key) {

I'm not sure what the intended effect was, but this doesn't hash the image (svg or other) data at all.
It will only work for example if the user loads data from static const char* buffers.

We ran into this problem when attempting to load two svgs that only differ by their opacity value from an buffer external which turns out stayed had the same adress (but obviously different contents).
This is extremely confusing as their is no mention of this behavior in the Picture::load() documentation.

This can easily be fixed by using a real hash function that hashes the full data contents.
In our internal fork we replaced this with XXH3 (BSD) which is probably the fastest hash implementation out there (31Gb/s) and of great quality (low collision rate).

While I understand it might not be feasible to rely on this library for various reasons,
IMHO their should be away to customize the hash function ThorVG uses, either by passing a function pointer or through the use of a macro/define.

To entirely avoid any potential collisions, one could also do a memcmp of the data when hash keys and size are identical.

@hermet hermet added bug Something isn't working renderer Core rendering showstopper Regression bugs / Critical errors labels Mar 3, 2024
@hermet hermet self-assigned this Mar 3, 2024
@hermet hermet added this to the 0.13 milestone Mar 3, 2024
@hermet
Copy link
Member

hermet commented Mar 3, 2024

@lmdsp thanks for reporting & suggestion. We will look into this soon.

@hermet
Copy link
Member

hermet commented Mar 3, 2024

In our internal fork we replaced this with XXH3 (BSD) which is probably the fastest hash implementation out there (31Gb/s) and of great quality (low collision rate).

ps. ThorVG welcomes contributions offering better solutions.

hermet added a commit that referenced this issue Mar 5, 2024
The previous loader cache mechanism encountered a problem
when the user changed the content of the cached data.

In such cases, a new request would not be processed
because the renderer would use the previously cached content.

So far, the TVG cache mechanism utilizes a pointer hash key
for the fastest hashing mechanism available.
One limitation is that it assumes the address is unique for the data.

To resolve this, we modified the caching policy.
Now, the renderer will not cache copied data;
it will only cache the given data when it is deemed shareable.

issue: #2020
@hermet
Copy link
Member

hermet commented Mar 5, 2024

After this change, ThorVG does not cache data when picture data is copied.
In fact, employing a complex caching mechanism here is not proper,
As in most cases, user would not expect the caching with the data copying.

#2026

hermet added a commit that referenced this issue Mar 6, 2024
The previous loader cache mechanism encountered a problem
when the user changed the content of the cached data.

In such cases, a new request would not be processed
because the renderer would use the previously cached content.

So far, the TVG cache mechanism utilizes a pointer hash key
for the fastest hashing mechanism available.
One limitation is that it assumes the address is unique for the data.

To resolve this, we modified the caching policy.
Now, the renderer will not cache copied data;
it will only cache the given data when it is deemed shareable.

issue: #2020
@hermet
Copy link
Member

hermet commented Mar 6, 2024

the fix will be applied in 0.12.7

@hermet hermet closed this as completed Mar 6, 2024
hermet added a commit that referenced this issue Mar 8, 2024
The previous loader cache mechanism encountered a problem
when the user changed the content of the cached data.

In such cases, a new request would not be processed
because the renderer would use the previously cached content.

So far, the TVG cache mechanism utilizes a pointer hash key
for the fastest hashing mechanism available.
One limitation is that it assumes the address is unique for the data.

To resolve this, we modified the caching policy.
Now, the renderer will not cache copied data;
it will only cache the given data when it is deemed shareable.

issue: #2020
hermet added a commit that referenced this issue Apr 5, 2024
The previous loader cache mechanism encountered a problem
when the user changed the content of the cached data.

In such cases, a new request would not be processed
because the renderer would use the previously cached content.

So far, the TVG cache mechanism utilizes a pointer hash key
for the fastest hashing mechanism available.
One limitation is that it assumes the address is unique for the data.

To resolve this, we modified the caching policy.
Now, the renderer will not cache copied data;
it will only cache the given data when it is deemed shareable.

issue: #2020
hermet added a commit that referenced this issue Apr 6, 2024
The previous loader cache mechanism encountered a problem
when the user changed the content of the cached data.

In such cases, a new request would not be processed
because the renderer would use the previously cached content.

So far, the TVG cache mechanism utilizes a pointer hash key
for the fastest hashing mechanism available.
One limitation is that it assumes the address is unique for the data.

To resolve this, we modified the caching policy.
Now, the renderer will not cache copied data;
it will only cache the given data when it is deemed shareable.

issue: #2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working renderer Core rendering showstopper Regression bugs / Critical errors
Projects
Status: Done 0.13
Development

No branches or pull requests

2 participants