Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache for gzipped elevation tiles #3120

Merged
merged 21 commits into from
Sep 9, 2021
Merged

Conversation

molind
Copy link
Contributor

@molind molind commented Jun 1, 2021

Issue

As discussed in #3071 current implementation don't work correctly with gzipped elevation tiles. This is basic implementation of cache with parallel tile decoding support. I've used unpacked tile limit = 50. But it could be tuned from config or any other way you want.

Please review and let me know what should I improve.

@kevinkreiser kevinkreiser self-requested a review June 1, 2021 18:26

// using memory maps
mutable std::vector<std::pair<format_t, midgard::mem_map<char>>> mapped_cache;
Copy link
Member

@kevinkreiser kevinkreiser Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is the main question to me. completely removing memory mapping will have some performance impact. i dont care as much about peformance impact for offline processing like that we have in valhalla_build_tiles but when we run the service and someone does a /height request it needs to be super fast (as it was before). with a naive caching scheme we will not be able to beat what the OS is doing with memory mapping but we'd have to test it to find out.

indeed i know that memory mapping many small files can be slower than just reading them from disk and in that scenario its faster to have an in memory cache like we do for graph tiles but there we are talking about 10s of kilobytes and here we are talking about tiles that are 25 megabytes in size. anyway we need to consider how this change affects the service. it could be for the better (ie we load up the 300GB of compressed elevation data and now its fast enough to be usable but still not need a 1.6TB disk). i just know from 3 years ago when i implemented lz4 it ended up not being fast enough performance wise though i will say i dont remember what my caching looked like (though i guess git history would tell us).

EDIT: i see memory mapping still exists as before but in a different section. ill take a closer look at the code in the coming days

anyway im going start downloading the elevation locally so i can try out all the different configurations. i do think i could test the compressed data on one of my nvme drives but for the other test configurations i'll need to use my raided sata disks. please dont be discouraged if it takes a while to get this merged in, testing properly will take a while!

@molind
Copy link
Contributor Author

molind commented Jun 2, 2021

It runs right the same way as earlier version. mapped_cache is now called just cache.

std::vector<cache_item_t> cache;

To store all required information cache_item_t was added.
class cache_item_t {
private:
format_t format;
midgard::mem_map<char> data;
int usages;
const char* unpacked;
public:
cache_item_t() : format(format_t::UNKNOWN), usages(0), unpacked(nullptr) {
}
cache_item_t(cache_item_t&&) = default;
~cache_item_t() {
free((void*)unpacked);
}
bool init(const std::string& file, format_t format);
inline const char* get_data() const {
return data.get();
}
inline format_t get_format() const {
return format;
}
inline int& get_usages() {
return usages;
}
inline const char* get_unpacked() {
return unpacked;
}
inline const char* detach_unpacked() {
auto rv = unpacked;
unpacked = nullptr;
return rv;
}
bool unpack(const char* unpacked);
static boost::optional<std::pair<uint16_t, sample::format_t>>
parse_hgt_name(const std::string& name);
};

I've tried to keep workflow identical to current one without any useless thread synchronisation when all data stored in RAW files. cache filled in sample constructor.
auto files = get_files(data_source);
for (const auto& f : files) {
// make sure its a valid index
auto data = cache_item_t::parse_hgt_name(f);
if (data && data->second != format_t::UNKNOWN) {
if (!cache[data->first].init(f, data->second)) {
LOG_WARN("Corrupt elevation data: " + f);
}
}
}
}

And cache_item_t::init maps the file. No matter it's RAW or GZIP.
bool sample::cache_item_t::init(const std::string& path, format_t format) {
auto size = file_size(path);
if (format == format_t::RAW && size != HGT_BYTES) {
return false;
}
this->format = format;
data.map(path, size, POSIX_MADV_SEQUENTIAL);
return true;
}

When coordinate requested in get or get_all function they call source function.
sample::tile_data sample::source(uint16_t index) {
// bail if its out of bounds
if (index >= TILE_COUNT) {
return tile_data(this, index, false, nullptr);
}
// if we dont have anything maybe its lazy loaded
auto& item = cache[index];
if (item.get_data() == nullptr) {
auto f = data_source + get_hgt_file_name(index);
item.init(f, format_t::RAW);
}
if (item.get_format() == format_t::UNKNOWN) {
return tile_data(this, index, false, nullptr);
}
// we have it raw or we dont
if (item.get_format() == format_t::RAW) {
return tile_data(this, index, false, (const int16_t*)item.get_data());
}

It works the same way as earlier version. Checks tile inside the cache. If there is no mapped data inside cache_item it tries to lazy load hgt file. This part don't use any locks and it should work at the same speeds as version in master.
Magic happens later in source function:
mutex.lock();
auto it = pending_tiles.find(index);
if (it != pending_tiles.end()) {
auto future = it->second;
mutex.unlock();
return future.get();
}
const char* unpacked = item.get_unpacked();
if (unpacked) {
auto rv = tile_data(this, index, true, (const int16_t*)unpacked);
mutex.unlock();
return rv;
}
std::promise<tile_data> promise;
it = pending_tiles.emplace(index, promise.get_future()).first;
if (reusable.size() >= UNPACKED_TILES_COUNT) {
for (auto i : reusable) {
if (cache[i].get_usages() <= 0) {
reusable.erase(i);
unpacked = cache[i].detach_unpacked();
break;
}
}
}
if (!unpacked) {
unpacked = (char*)malloc(HGT_BYTES);
}
reusable.insert(index);
auto rv = tile_data(this, index, true, (const int16_t*)unpacked);
mutex.unlock();
if (!item.unpack(unpacked)) {
rv = tile_data(this, index, false, nullptr);
}
mutex.lock();
promise.set_value(rv);
pending_tiles.erase(it);
mutex.unlock();
return rv;

It checks current tile in pending_tiles map. It stores future objects of tile_data class. Main goal of tile_data is to increment and decrement reference counter for unpacked buffers. We don't want to unpack everything at once. And it tries to keep count of unpacked buffers under 50 (UNPACKED_TILES_COUNT ).
If there is no pending job it checks unpacked data inside the cache and returns it. If there is no unpacked data in cache_item_t it allocates memory and releases the lock before unpack. Because of this different gzipped tiles could be unpacked in parallel. If requests for same tile called from different threads, they'll wait in future.get();

@kevinkreiser
Copy link
Member

@molind i have the data downloaded and should be able to test it out over the weekend. thanks for your patience

@molind
Copy link
Contributor Author

molind commented Jun 10, 2021

I’ve generated routing data for a whole planet this week with packed elevation data. Everything worked fine. Waiting for your test results.

@roadrouting
Copy link

Sorry for the off-topic comment, but I've been meaning to ask: roughly how long should a full planet build with elevation data take?

Our initial experiments have been taking 3-4 days on a very overprovisioned machine. From what we've seen in issues and comments, this feels way too long, and we suspect there is a slow disk causing problems.

@dnesbitt61
Copy link
Member

Last planet build I did (with elevation) was about 36 hours. Make sure you use an SSD - I remember trying on a laptop with a hard drive and it was agonizingly slow.

@kevinkreiser
Copy link
Member

in my experience on amazon infra (and on my personal machine) it takes about 24 hours. these are machines with SSDs and 24-32 threads and something like 32-64GB of ram.

@kevinkreiser kevinkreiser self-requested a review June 17, 2021 15:13
@kevinkreiser
Copy link
Member

@molind im sorry again i just havent found the time... i just wanted to let you konw i didnt forget, i will review it!

@@ -241,11 +200,155 @@ template <class coord_t> double sample::get(const coord_t& coord) const {
return value / adjust;
}

template <class coords_t> std::vector<double> sample::get_all(const coords_t& coords) const {
::valhalla::skadi::sample::sample(const std::string& data_source) : data_source(data_source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know it was like this before but do you mind removing the namespacing here since its not needed

Comment on lines 136 to 139
std::vector<cache_item_t> cache;
std::unordered_set<uint16_t> reusable;
std::unordered_map<uint16_t, std::shared_future<tile_data>> pending_tiles;
std::recursive_mutex mutex;
Copy link
Member

@kevinkreiser kevinkreiser Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@molind what do you think about using the pimpl idiom here to encapsulate all of the cache, basically:

// in sample.h we simple forward declare a generic cache inside the sample class
class sample {
 ...
 protected:
  class cache_t;
  std::unique_ptr<cache_t> cache;
};

//then into sample.cc we move all of the details (everything in protected access)
struct tile_data_t {
  ...
};
struct cache_item_t {
  ...
};
struct cache_t {
  ...
};

I would like to basically keep the public interface to the sample class as free of details as possible. i know its a little bit more work but i think its better for the end user of the class. maybe i have totally overlooked a detail about why we cant do this but i think the top level methods of sample:: are just the constructor and the couple methods to get samples from the data.

let me know what you think!

item.init(f, format_t::RAW);
}

if (item.get_format() == format_t::UNKNOWN) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (item.get_format() == format_t::UNKNOWN) {
// it wasnt in cache and when we tried to load it the file was of unknown type
if (item.get_format() == format_t::UNKNOWN) {

return tile_data(this, index, false, (const int16_t*)item.get_data());
}

mutex.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mutex.lock();
// we were able to load it but the format wasnt RAW, which only leaves the GZIP format
// now we've got to schedule or wait for some gunzipping to get the raw bytes into cache
mutex.lock();

@kevinkreiser
Copy link
Member

@molind ive added a couple trivial suggestions and one bigger refactor type suggestion. please have a look and let me know what you think. other than those things, this pr looks perfectly fine to me thank you for clearing that up!

@molind
Copy link
Contributor Author

molind commented Jul 21, 2021

Great. I'll change it according to the code review.

BTW, Generated routing data for planet on EX52-NVME from Hetzner and it took just 2 hours to add elevation info from gzipped tiles.

2021/07/21 15:02:53.166361 [INFO] Adding elevation to 181525 tiles with 12 threads...
2021/07/21 17:11:14.106379 [INFO] Finished

@molind
Copy link
Contributor Author

molind commented Sep 6, 2021

@kevinkreiser Refactored as you suggested.

src/skadi/sample.cc Outdated Show resolved Hide resolved
src/skadi/sample.cc Outdated Show resolved Hide resolved
src/skadi/sample.cc Outdated Show resolved Hide resolved
@kevinkreiser
Copy link
Member

refactor looks great, i like how narrow the api in the header file remains even t hough you added all this functionality. i had a couple of nitpicks and questions, but nothign major!

Evgen Bodunov added 2 commits September 6, 2021 21:32
Clear cache pointer before assignement, otherwise it will try to call decrement_usages on garbage.
@molind
Copy link
Contributor Author

molind commented Sep 7, 2021

@kevinkreiser Done.

Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you so much for bearing with me during the review process

@@ -37,6 +37,7 @@
* REMOVED: Unused overloads of `to_response` function [#3167](https://github.com/valhalla/valhalla/pull/3167)

* **Bug Fix**
* FIXED: Cache for gzipped elevation tiles [#3120](https://github.com/valhalla/valhalla/pull/3120)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in the wrong section, since we've release since you prd this. i will fix it manually after merge

@kevinkreiser kevinkreiser merged commit 1ad0030 into valhalla:master Sep 9, 2021
std::unordered_set<uint16_t> reusable;
// Map of pending tiles. No matter how many requests received, only one inflate job per tile
// started.
std::unordered_map<uint16_t, std::shared_future<tile_data>> pending_tiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging here that this fails to compile in newest gcc and clang:

Gcc

/usr/include/c++/11.1.0/bits/unordered_map.h:105:18:   required from ‘class std::unordered_map<short unsigned int, std::shared_future<valhalla::skadi::tile_data> >’
/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:173:63:   required from here
/usr/include/c++/11.1.0/future:906:21: error: static assertion failed: result type must be destructible
  906 |       static_assert(is_destructible<_Res>{},
      |                     ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/11.1.0/future:906:21: note: ‘std::is_destructible<valhalla::skadi::tile_data>()’ evaluates to false
/usr/include/c++/11.1.0/future: In instantiation of ‘class std::future<valhalla::skadi::tile_data>’:
/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:329:23:   required from here
/usr/include/c++/11.1.0/future:769:21: error: static assertion failed: result type must be destructible
  769 |       static_assert(is_destructible<_Res>{},
      |                     ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/11.1.0/future:769:21: note: ‘std::is_destructible<valhalla::skadi::tile_data>()’ evaluates to false
/usr/include/c++/11.1.0/future: In instantiation of ‘class std::promise<valhalla::skadi::tile_data>’:
/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:342:27:   required from here
/usr/include/c++/11.1.0/future:1065:21: error: static assertion failed: result type must be destructible
 1065 |       static_assert(is_destructible<_Res>{},
      |                     ^~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/11.1.0/future:1065:21: note: ‘std::is_destructible<valhalla::skadi::tile_data>()’ evaluates to false
[106/180] Building CXX object src/thor/CMakeFiles/valhalla-thor.dir/astar_bss.cc.o


# version
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (GCC) 

Clang

/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:173:63: note: in instantiation of template class 'std::unordered
_map<unsigned short, std::shared_future<valhalla::skadi::tile_data>>' requested here                                                   
  std::unordered_map<uint16_t, std::shared_future<tile_data>> pending_tiles;                                                           
                                                              ^                                                                        In file included from /home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:6:                                        /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0/future:769:21: error: cannot initialize object parameter of type 'const std::integral_constant<bool, false>' with an expression of type 'is_destructible<valhalla::skadi::tile_data>'               static_assert(is_destructible<_Res>{},                                                                                                               ^~~~~~~~~~~~~~~~~~~~~~~                                                                                            
/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:329:23: note: in instantiation of template class 'std::future<va
lhalla::skadi::tile_data>' requested here                                                                                              
    auto future = it->second;                                                                                                          
                      ^                                                                                                                
In file included from /home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:6:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0/future:1065:21: error: cannot initialize object paramet
er of type 'const std::integral_constant<bool, false>' with an expression of type 'is_destructible<valhalla::skadi::tile_data>'
      static_assert(is_destructible<_Res>{},
                    ^~~~~~~~~~~~~~~~~~~~~~~
/home/anders/Documents/Programming/mapbox/valhalla/src/skadi/sample.cc:342:27: note: in instantiation of template class 'std::promise<v
alhalla::skadi::tile_data>' requested here
  std::promise<tile_data> promise;
                          ^
3 errors generated.

# Version

clang++ --version
clang version 12.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on STL version on your machine. It builds fine with libc++ 9, but won't build on libc++ 11.1 :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems like I didn’t build master since the merge.. can smth be done about that? If I can’t build Valhalla anymore on arch linux, I’d have a semi-serious problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh another arch user! I have a semi-serious problem too :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make an issue out of it and who ever gets to it first can take a stab at it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc with libstdc++ 11.1.0 throws this error if tile_data forward declared before cache_t
https://godbolt.org/z/8sYhf8Wqb
while clang 12.0.1 with libc++ works fine. -stdlib=libc++
https://godbolt.org/z/vPYTdb3aW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants