Please sign in to comment.
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Switching the presized cuckoo map from using strict mod to Lemire's
uniform range mapping trick. Removes dependency on Eigen's TensorIntDiv, which doesn't work properly on Android, is 10-20% faster on x86, and should be much faster on Cuda if needed. (There are remaining optimizations to force the use of __umulhi on cuda, but this table is designed for CPU, so I don't see a reason to complicate things). OLD: CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_CuckooFill/1000 14859 14846 46766 BM_CuckooFill/10M 835154969 834427162 100 BM_CuckooRead/1000 10 10 67484647 BM_CuckooRead/10M 56 56 10000000 NEW: BM_CuckooFill/1000 12385 12374 56240 BM_CuckooFill/10M 696061920 695467681 100 BM_CuckooRead/1000 9 9 78725288 BM_CuckooRead/10M 44 44 15487881 This change will have bad consequences for people who violate the table's requirement that keys be pre-hashed into random-looking uint64's before inserting -- the table will not achieve its full capacity. (It won't return wrong results, and will return an error on insert.) That's a documented requirement, but we'll want to make sure that nobody's misusing it. Updates the TooManyKeys test to be more robust (assertion failure if the table fails to fill during the fill phase), and to pre-hash its keys as it should. Change: 131976392
- Loading branch information
Showing with 72 additions and 29 deletions.