-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non string (composite) keys #3
Comments
For now the HAT-trie only supports a contiguous sequence of chars (could be used to store binary data as a '\0' char inside the sequence is supported). I'm not that familiar with the composite keys of If you want to store something like: struct employee {
std::string first_name;
std::string last_name;
}; You could store in the HAT-trie the concatenation of the first name and the last name and provide a string_view into this concatenation in your interface. struct employee {
std::string_view first_name() const;
std::string_view last_name() const;
}; |
I'm more thinking of binary types like: struct order {
int64_t price;
int32_t id;
}; And then I'd need to find an order by it's price + id combination and iterate over orders at the same |
You can do something like this. #include <iostream>
#include "htrie_set.h"
struct order {
int64_t price;
int32_t id;
};
int main() {
order o1{10, 1};
order o2{5, 2};
order o3{10, 3};
order o4{16, 4};
static_assert(std::is_pod<order>::value, "");
tsl::htrie_set<char> test;
test.insert_ks(reinterpret_cast<char*>(&o1), sizeof(o1));
test.insert_ks(reinterpret_cast<char*>(&o2), sizeof(o2));
test.insert_ks(reinterpret_cast<char*>(&o3), sizeof(o3));
test.insert_ks(reinterpret_cast<char*>(&o4), sizeof(o4));
int64_t price = 10;
auto its_prefix = test.equal_prefix_range_ks(
reinterpret_cast<char*>(&price), sizeof(price));
order o;
std::string key_buffer;
// 3: 10 1: 10
for(auto it = its_prefix.first; it != its_prefix.second; ++it) {
it.key(key_buffer);
assert(key_buffer.size() == sizeof(o));
std::memcpy(&o, key_buffer.data(), sizeof(o));
std::cout << o.id << ": " << o.price << std::endl;
}
} If I'm not mistaken, there should be no undefined behaviour if You also can't really find objects with a id lower than 'x' (you could check for id bigger than 'x' if you use an unsigned int by knowing the binary representation, but well...). What is the problem with |
"hakish" and slight "undefined behavior" is perfectly fine as long as gcc compiled x64 is doing good.
the hash set is heavily modified too i.e. a lot of deletes, inserts going on. I'll try to benchmark trie vs boost for my cases, thanks so much for your help! if you'd also need to know the sum of all |
Yes I think it should be good to do a benchmark as I'm not sure that the HAT-trie offers a significant advantage compared to For the sum, I'm not sure to understand. You can do: std::size_t sum = 0;
for(auto it = its_prefix.first; it != its_prefix.second; ++it) {
it.key(key_buffer);
std::memcpy(&o, key_buffer.data(), sizeof(o));
sum += o.qty;
} You want to have an aggregate field for each price to avoid this calculation? |
Exactly, an aggregate and some way to get N highest & lowest aggregates by price. Now have a separate |
I think it's better to have a separate structure for the aggregates. As the trie stores binary data, you could store multiple data types in the trie. Example you will have a char buffer that looks like this for an order: And for an aggregate: The first byte differentiate an order from an aggregate. When you retrieve your data you would do: if(key_buffer[0] == ORDER_TYPE) {
std::memcpy(&ord, key_buffer.data() + 1, sizeof(order));
}
else {
assert(key_buffer[0] == AGGREGATE_TYPE);
std::memcpy(&aggr, key_buffer.data() + 1, sizeof(aggragate));
} But I don't see any advantage compared to a separate map while it really increases the complexity of the code. |
#define NONIUS_RUNNER
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/mem_fun.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/pool/pool.hpp>
#include "../../hat-trie/src/htrie_set.h"
#include "../../nonius/nonius.h++"
#include <unordered_set>
struct order
{
int64_t price;
int32_t id;
};
order ht_o1 {10, 1};
order us_o1 {10, 1};
order mi_o1 {10, 1};
struct order_hash
{
size_t operator()(const order &o) const
{
return std::hash<int64_t>()(o.price) + std::hash<int32_t>()(o.id);
}
};
struct order_equal
{
bool operator()(const order &x, const order &y) const
{
return x.price == y.price && x.id == y.id;
}
};
using namespace boost::multi_index;
using mindex_set = multi_index_container<
order,
indexed_by<
hashed_unique<
member<order, const int32_t, &order::id>>,
ordered_non_unique<
composite_key<
order,
member<order, int64_t, &order::price>,
member<order, int32_t, &order::id>>>>>;
tsl::htrie_set<char> ht_test;
std::unordered_set<order, order_hash, order_equal> us_test;
mindex_set mi_test;
NONIUS_BENCHMARK("htrie-basic", [](nonius::chronometer meter)
{
ht_test.clear();
meter.measure([&](int i)
{
++ht_o1.id;
if (ht_o1.id % 10 == 0)
++ht_o1.price;
ht_test.insert_ks(reinterpret_cast<char*>(&ht_o1), sizeof(ht_o1));
});
})
NONIUS_BENCHMARK("unordered_set-basic", [](nonius::chronometer meter)
{
us_test.clear();
meter.measure([&](int i)
{
++us_o1.id;
if (us_o1.id % 10 == 0)
++us_o1.price;
us_test.insert(us_o1);
});
})
NONIUS_BENCHMARK("multi_index-basic", [](nonius::chronometer meter)
{
mi_test.clear();
meter.measure([&](int i)
{
++mi_o1.id;
if (mi_o1.id % 10 == 0)
++mi_o1.price;
mi_test.insert(mi_o1);
});
})
|
htrie seem to be quite complicated indeed so basic inserts are getting slower. probably got to stick to simple data structures for now. |
Yes, I think the HAT-trie is not well-suited for your usecase. It could probably be speed-up to take advantage of the fixed size of the entries (the HAT-trie here must support strings of variable length) but it's not the main purpose of the structure, there are probably more suited datastructures for this. |
Hi! Does the hat-trie have to be string keys only? Would be really great to have something like arbitrary type composite keys along the lines of MultiIndex container http://www.boost.org/doc/libs/1_64_0/libs/multi_index/doc/tutorial/key_extraction.html#composite_keys
The text was updated successfully, but these errors were encountered: