Flat sorted array with very fast insert and erase operations
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

Copyright (C) 2012 Łukasz Czerwiński

AssocVector

AssocVector is a container like a std::map but it is implemented using an array instead of a balanced tree. Comparing with balanced tree an array uses less memory and due to keeping data locally binary search in a sorted array can be faster by a constant factor than in a balanced tree. AssocVector was inspired by Loki::AssocVector written by Andrei Alexandrescu, however insert and erase methods were significantly improved (O(sqrt(N)) vs. O(N)). All the other methods should not be slower.

Project website

https://github.com/wo3kie/assocVector

Requirements

C++11
Loki (optional for performance test)

How to build it?

make

How to run it?

For unit test

./ut

For performance test

./pt

How to use it?

AssocVector is absolutely compatible with std::map

    av[ "c++" ] = 1983;  
    av[ "java" ] = 1995;  
    av[ "scala" ] = 2001;  
    assert( av[ "c++" ] == 1983 );  
    assert( av.count( "java" ) == 1 );  
    assert( av.find( "scala" ) != av.end() );  

How does AssocVector work?

AssocVector is composed of three arrays

  • first with obects called 'storage'
  • second with objects called 'buffer'
  • third with pointers called 'erased'

All arrays are kept sorted all the time. Insert puts a new item into 'buffer'. Since 'buffer' is much shorter than 'storage' this is an effective oprtation. When 'buffer' is filled completely it is merged with 'storage'. This operation may not be cheap but it is performed only from time to time. Lets take a look at an example:

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    |   |   |   |  
    +---+---+---+  

After insert 2

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 2 |   |   |  
    +---+---+---+  

After insert 3

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 2 | 3 |   |  
    +---+---+---+  

After insert 49

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 2 | 3 | 49|  
    +---+---+---+  

After insert 5

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+  
    | 0 | 1 | 2 | 3 | 4 | 6 | 8 | 9 | 11| 13| 32| 49|   |   |   |   |  
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+  
    | 5 |   |   |   |
    +---+---+---+---+  

After AssocVector::merge called

    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 8 | 9 | 11| 13| 32| 49|   |   |   |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    |   |   |   |   |
    +---+---+---+---+  

Erase of item is implemented in two ways. Item present in 'buffer' is erased immediately. Item present in 'storage' is not erased, but it is marked as 'removed' using 'erased' table. If 'erased' table is full all coresponding items are removed from 'storage'. Size of 'erased' is the same as size of 'buffer' and it is equal to sqrt('storage'.size()).

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 2 | 3 | 49|  
    +---+---+---+  
    |   |   |   |  
    +---+---+---+  

After erase 6

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 2 | 3 | 49|  
    +---+---+---+  
    |*6 |   |   |  
    +---+---+---+  

After erase 2

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 3 | 49|   |  
    +---+---+---+  
    |*6 |   |   |  
    +---+---+---+  

After erase 49

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 3 |   |   |  
    +---+---+---+  
    |*6 |   |   |  
    +---+---+---+  

After erase 0

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 3 |   |   |  
    +---+---+---+  
    |*0 |*6 |   |  
    +---+---+---+  

After erase 11

    +---+---+---+---+---+---+---+---+---+
    | 0 | 1 | 4 | 6 | 8 | 9 | 11| 13| 32|
    +---+---+---+---+---+---+---+---+---+
    | 3 |   |   |  
    +---+---+---+  
    |*0 |*6 |*11|  
    +---+---+---+  

After erase 50

    +---+---+---+---+---+---+---+---+---+
    | 1 | 3 | 4 | 8 | 9 | 13| 32|   |   |
    +---+---+---+---+---+---+---+---+---+
    |   |   |   |  
    +---+---+---+  
    |*50|   |   |  
    +---+---+---+  

Why does a buffer have a size equal sqrt of storage?

Inserting items to the beginning of AssocVector may be considered as a sequence of following operations:

    InsertToBuffer  
    Merge  
    
    InsertToBuffer  
    Merge  
    
    InsertToBuffer  
    Merge  
    
    ...  
  
    InsertToBuffer  
    Merge  

We may assume that a storage has a size equal N and a buffer has a size equal B. If a buffer has a size of B, couple of operations (InsertToBuffer, Merge) is executed N/B times. The bigger buffer is, the longer it takes to insert an item into it but the less frequently it necesitates 'merge' operation, and on the other hand the smaller buffer is, the shorter it takes to insert an item into it but 'merge' operation has to be performed more often. We have to find such a buffer size B as a function of storage size N, to make all these operations optimal.

One InsertToBuffer takes 1+2+3+4+...+B = (1+B)(B/2) = (B/2)(1+B) assignments. All InsertToBuffer operations take N/B times more, that's (N/B)(B/2)(1+B) = (N/2)(1+B) assignments.

We can spot also that
1st Merge takes B assignments
2nd Merge takes 2B assignments (B moves for making a place, B assignments)
3rd Merge takes 3B assignments (2B moves for making a place, B assignments)
4th Merge takes 4B assignments (3B moves for making a place, B assignments)
...
last (N/B)th Merge takes (N/B)B = N assignments.

All Merge operations together take (B+2B+3B+4B+...+(N/B)B) assignments:
(B+2B+3B+4B+...+(N/B)B)
= B(1+2+3+4+...+(N/B))
= B((1+N/B)(N/B)/2)
= (1+N/B)(N/2) assignments

Putting it all together we are getting, that all operations take (N/2)(1+B)+(1+(N/B))(N/2)
= (N/2)((1+B)+(1+N/B))
= (N/2)(B+N/B+2) assignments.

Now we have to find such a B that is optimal. In matematical words we would like to find a minimum for the function
f(B) = (N/2)(B+N/B+2)

Function f(B) = (N/2)(B+N/B+2) has minimum value if f(B) = B+N/B has a minimum as well

f(B) = B+N/B
f'(B) = 1-N/(B^2)
f'(B) = 0
1 = N/(B^2)
B = sqrt(N).

Contributors

I would like to thank Arkadiusz Nagórka for making a great code review and finding some bugs.