Skip to content

feat: add open addressing hash set #4705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

BorysTheDev
Copy link
Contributor

@BorysTheDev BorysTheDev commented Mar 5, 2025

#4915

Added main functionality of OAHSet class
added tests

Current hash table is a classic open addressing hash table with linear probing, but with next modification:

  1. every entry contains additional hash data that allows avoid collisions
  2. resize is done when the number of elements == size of buckets
  3. if there is no place for a new entry it is placed in the extension point vector (stored only in every 16th elements)
  4. search for element is linear in the next 16 entries or in the extension point vector

For now, benchmarks are the next, but the code wasn't optimized for performance yet

Benchmark CPU OAHSet CPU StringSet
BM_Clone/elements:32000 1425050 ns 1331906 ns
BM_Fill/elements:32000 3411021 ns 926307 ns
BM_Clear/elements:32000 179716 ns 360826 ns
BM_Add/elements:1000/Key Size:10 30650 ns 25309 ns
BM_Add/elements:10000/Key Size:10 635063 ns 251686 ns
BM_Add/elements:100000/Key Size:10 7044638 ns 3523484 ns
BM_Add/elements:1000/Key Size:100 34389 ns 29599 ns
BM_Add/elements:10000/Key Size:100 688113 ns 338721 ns
BM_Add/elements:100000/Key Size:100 7900725 ns 5036263 ns
BM_Add/elements:1000/Key Size:1000 82427 ns 76512 ns
BM_Add/elements:10000/Key Size:1000 1309386 ns 1025027 ns
BM_Add/elements:100000/Key Size:1000 16908210 ns 20320960 ns
BM_AddMany/elements:1000/Key Size:10 101403 ns 24962 ns
BM_AddMany/elements:10000/Key Size:10 1076909 ns 253995 ns
BM_AddMany/elements:100000/Key Size:10 11202105 ns 3062356 ns
BM_AddMany/elements:1000/Key Size:100 105353 ns 29585 ns
BM_AddMany/elements:10000/Key Size:100 1123937 ns 322189 ns
BM_AddMany/elements:100000/Key Size:100 12108941 ns 4798887 ns
BM_AddMany/elements:1000/Key Size:1000 150628 ns 77652 ns
BM_AddMany/elements:10000/Key Size:1000 1725259 ns 1169702 ns
BM_AddMany/elements:100000/Key Size:1000 20425583 ns 17881405 ns
BM_Erase/elements:1000/Key Size:10 17769 ns 19052 ns
BM_Erase/elements:10000/Key Size:10 270268 ns 201989 ns
BM_Erase/elements:100000/Key Size:10 5210262 ns 2716412 ns
BM_Erase/elements:1000/Key Size:100 23975 ns 22230 ns
BM_Erase/elements:10000/Key Size:100 356889 ns 274502 ns
BM_Erase/elements:100000/Key Size:100 8264147 ns 4470247 ns
BM_Erase/elements:1000/Key Size:1000 76743 ns 69454 ns
BM_Erase/elements:10000/Key Size:1000 1129342 ns 845411 ns
BM_Erase/elements:100000/Key Size:1000 17723758 ns 19108144 ns
BM_Get/elements:1000/Key Size:10 10129 ns 11026 ns
BM_Get/elements:10000/Key Size:10 72916 ns 101239 ns
BM_Get/elements:100000/Key Size:10 1700811 ns 2256349 ns
BM_Get/elements:1000/Key Size:100 15117 ns 14620 ns
BM_Get/elements:10000/Key Size:100 142955 ns 192986 ns
BM_Get/elements:100000/Key Size:100 2599509 ns 3984429 ns
BM_Get/elements:1000/Key Size:1000 58808 ns 59685 ns
BM_Get/elements:10000/Key Size:1000 674511 ns 790375 ns
BM_Get/elements:100000/Key Size:1000 12420843 ns 19972071 ns
BM_Grow 5333783 ns 7546771 ns

@BorysTheDev BorysTheDev changed the title feat: add intrusive string feat: add intrusive string DO NOT REVIEW Mar 5, 2025
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch 4 times, most recently from 72aad85 to 54db498 Compare March 7, 2025 15:15
@BorysTheDev BorysTheDev changed the title feat: add intrusive string DO NOT REVIEW feat: add intrusive stringset DO NOT REVIEW Mar 11, 2025
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch from 54db498 to 9e074cf Compare March 17, 2025 16:43
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch 4 times, most recently from f487504 to 6237fbd Compare April 18, 2025 13:35
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch 2 times, most recently from 085ba97 to 02002f1 Compare May 15, 2025 08:02
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch from a42f779 to c0eef55 Compare May 21, 2025 09:28
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch from 5af11e2 to b2dcb0f Compare May 29, 2025 15:18
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch 3 times, most recently from 1632b1d to 9e3ef11 Compare June 11, 2025 10:20
@BorysTheDev BorysTheDev force-pushed the feat_intrusive_string_set branch from 60557e7 to 14554d9 Compare June 17, 2025 11:59
@BorysTheDev BorysTheDev changed the title feat: add intrusive stringset DO NOT REVIEW feat: add intrusive stringset Jun 17, 2025
@BorysTheDev BorysTheDev marked this pull request as ready for review June 17, 2025 12:03
@BorysTheDev BorysTheDev requested review from kostasrim and romange June 17, 2025 12:03
@BorysTheDev BorysTheDev changed the title feat: add intrusive stringset feat: add open addressing hash set Jun 17, 2025
Copy link
Contributor

@vyavdoshenko vyavdoshenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising.
However, besides that code, the comprehensive documentation must also be added. For example, what to do if a new type should be added?

It seems additional testing should be added, i.e. memory leaks.

// we can assume that high 12 bits of user address space
// can be used for tagging. At most 52 bits of address are reserved for
// some configurations, and usually it's 48 bits.
// https://docs.kernel.org/arch/arm64/memory.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true for other hardware platforms? x86_64?
I guess this approach is useful (or even possible) for Linux only? I guess yes, so this is the performance update for Linux only. We have to have a general approach always to support cross-platform builds.

I think the build system should be updated to integrate this approach, to use it for Linux only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a crossplatform approach

Comment on lines +232 to +233
auto* expiry_pos = Raw();
std::memcpy(expiry_pos, &at_sec, sizeof(at_sec));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32_t* expiry_pos = Raw();
*expiry_pos = at_sec;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with reinterpret_cast it will work. thx

Comment on lines +368 to +372
if (IsVector()) {
delete &AsVector();
} else {
zfree(Raw());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me, if I am wrong, but data_ always initializes with zmalloc, but not always frees with zfree.
This code looks dangerous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_ can be vector whisch initializes with new or pointer to string which initializes with zmalloc. It looks dangerous, but it is ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants