Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch writes to edges file #2851

Merged
merged 18 commits into from
Feb 15, 2021
Merged

Conversation

kevinventullo
Copy link
Contributor

Issue

Instead of writing to the edge file inline, keep track of the writes in a
vector in-memory, sort them by edge id, and apply them all at the end. Sorting them
means the writes are now sequential for the edges file, which vastly speeds it up.

In our internal system on the whole world, this reduced the processing time of this stage from 10 hours to about 7 minutes.

Tasklist

  • Add tests
  • Add #fixes with the issue number that this PR addresses
  • Update the docs with any new request parameters or changes to behavior described
  • Update the changelog

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

@kevinventullo kevinventullo changed the title Update graphbuilder.cc Batch writes to edges file Feb 10, 2021
@kevinventullo
Copy link
Contributor Author

cc @kevinkreiser

@kevinkreiser kevinkreiser self-requested a review February 10, 2021 01:31
uint32_t run_index = 0;
uint32_t node_index = 0;
size_t node_count = 0;
Node last_node{};
std::map<GraphId, size_t> tiles;
nodes.transform([&edges, &run_index, &node_index, &node_count, &last_node, &tiles](Node& node) {
// If these get too large, consider using a sequence instead of a vector here.
std::vector<std::pair<uint32_t, uint32_t>> starts, ends;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at 8 bytes per record and about 400000000 graph edges (double that for because of start/end) we get something like 3.2GB (and maybe double that again if we are unlucky with vector allocation). this is definitely close to needing a sequence. would you be opposed to something as simple as a struct (right here above the function) to encapsulate the two indices and then throwing them into a sequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I can do that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be easier to have two separate sequences; otherwise you need an extra bit of data for each entry indicating whether it's a start or an end. If instead the sequence was a mapping [edge -> (start,end)], you run into the same random insert issue as you iterate through the nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yep sorry i was being imprecise with my wording above. i meant 2 sequences in total, 1 for each mapping.

}

// next node
last_node = node;
++node_index;
});

LOG_INFO("Nodes processed. Now adding data to edges.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of this looks perfectly fine. its unfortunate that looping over both sets at the same time ends up being so verbose but there isnt a clearer way to do it so it looks good to me!

@mandeepsandhu
Copy link
Contributor

Curious as to whats the reason for the big speedup in writes? Typically the writes would go to the page cache first (assuming write-back), which is quite fast and then later flushed to disk. This is of course on a typical linux server/desktop setup which would be configured as such. Was the system where you observed the speedup differently configured?

auto edge = *element;
edge.sourcenode_ = run_index;
element = edge;
starts.emplace_back(node.start_of, run_index);
}
// if this node marks the end of an edge, go tell the edge where the first node in the
// series is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps update comment same as above.

@kevinventullo
Copy link
Contributor Author

Curious as to whats the reason for the big speedup in writes? Typically the writes would go to the page cache first (assuming write-back), which is quite fast and then later flushed to disk. This is of course on a typical linux server/desktop setup which would be configured as such. Was the system where you observed the speedup differently configured?

I don't think it was differently configured. My guess is that while the system does use a write-back cache, all the jumping around means that roughly every write is going to a new page, so you end up needing to flush much more often as the in-memory write cache fills up.

@kevinkreiser
Copy link
Member

@kevinventullo ill get this review through this evening, thanks again!

@kevinventullo
Copy link
Contributor Author

@kevinventullo ill get this review through this evening, thanks again!

Thank you for your patience! I'm curious to see how this performs on your hardware.

@kevinkreiser
Copy link
Member

I'm going to reduce the blast radius of this a bit. I really wish from the beginning we would have passed a base directory for all the temp stuff we create (like many other applications do) and then we could just choose what all the random .bin files would be named (i mean we do now but its at the very beginning of the process). im sure it was just something that changed over time and never got refactored

@kevinkreiser
Copy link
Member

i ran again with germany to see what the difference would be

master:

2021/02/13 04:01:32.526292 [INFO] Creating graph edges from ways...
2021/02/13 04:01:35.279684 [INFO] Finished with 22840796 graph edges
2021/02/13 04:01:35.385875 [INFO] Sorting graph...
2021/02/13 04:01:41.573242 [INFO] Finished with 17861214 graph nodes

this branch:

2021/02/13 03:40:09.233344 [INFO] Creating graph edges from ways...
2021/02/13 03:40:11.829682 [INFO] Finished with 22840796 graph edges
2021/02/13 03:40:11.928681 [INFO] Sorting graph...
2021/02/13 03:40:19.518808 [INFO] Nodes processed. Now adding data to edges.
2021/02/13 03:40:24.296359 [INFO] Sorting starts and ends done. Populating edges...
2021/02/13 03:40:24.751309 [INFO] Finished with 17861214 graph nodes

this branch is quite a bit slower. that difference may evaporate when we are talking about the planet which i havent tried. ill have to give that another go once i have one downloaded

@kevinventullo
Copy link
Contributor Author

I'm going to reduce the blast radius of this a bit. I really wish from the beginning we would have passed a base directory for all the temp stuff we create (like many other applications do) and then we could just choose what all the random .bin files would be named (i mean we do now but its at the very beginning of the process). im sure it was just something that changed over time and never got refactored

Oh, perfect! Yeah, it felt a little weird piping the tmp file through, but didn't feel emboldened enough to do a whole refactor 😃

@kevinventullo
Copy link
Contributor Author

i ran again with germany to see what the difference would be

master:

2021/02/13 04:01:32.526292 [INFO] Creating graph edges from ways...
2021/02/13 04:01:35.279684 [INFO] Finished with 22840796 graph edges
2021/02/13 04:01:35.385875 [INFO] Sorting graph...
2021/02/13 04:01:41.573242 [INFO] Finished with 17861214 graph nodes

this branch:

2021/02/13 03:40:09.233344 [INFO] Creating graph edges from ways...
2021/02/13 03:40:11.829682 [INFO] Finished with 22840796 graph edges
2021/02/13 03:40:11.928681 [INFO] Sorting graph...
2021/02/13 03:40:19.518808 [INFO] Nodes processed. Now adding data to edges.
2021/02/13 03:40:24.296359 [INFO] Sorting starts and ends done. Populating edges...
2021/02/13 03:40:24.751309 [INFO] Finished with 17861214 graph nodes

this branch is quite a bit slower. that difference may evaporate when we are talking about the planet which i havent tried. ill have to give that another go once i have one downloaded

Ah I see. Yeah, the reduction I saw went from 10 hours on the whole planet to 7 minutes. It definitely sounds like our problems would be fixed by adding more memory.

@kevinkreiser
Copy link
Member

@kevinventullo for reference im on a machine with 32gb for ram but to be honest if we can get similar performance on a machine with a couple gigs of ram, then its worth it. it would be great to one day finish making everything distributed and allow a swarm of low ram vms work on the whole tileset in parallel. the the main hurdle there is that that swarm will often need access to adjacent tiles though but something like NFS (while its usually too slow for these needs) might be just fine for this need where you only need to look up a couple of adjacent tiles rather than access the whole dataset... anyway just thinking ahead 😄

anyway i ran the planet last night and should have the results of the change here shortly. even if its a couple minutes longer i think its still an ok change to make to enable lower ram situations. it is a bit concerning that code coverage doesnt hit all the branches inside the loop. i need to closely look at that to see why that would work out to be the case depite all the different datasets we load in CI

@kevinkreiser
Copy link
Member

Good news. Even for me its slightly faster on the planet:
master:

2021/02/13 18:38:16.460137 [INFO] Sorting graph...
2021/02/13 18:46:39.194044 [INFO] Finished with 284443165 graph nodes

this branch:

2021/02/14 01:03:51.243432 [INFO] Sorting graph...
2021/02/14 01:07:53.464111 [INFO] Nodes processed. Now adding data to edges.
2021/02/14 01:09:31.314837 [INFO] Sorting starts and ends done. Populating edges...

2021/02/14 01:09:43.687472 [INFO] Finished with 284443165 graph nodes

@kevinkreiser
Copy link
Member

@kevinventullo i couldnt ignore that coverage check and so after thinking about it a bit i couldnt see how it was possible that we'd have any mis matching in the begin/end node lists. we should have exactly one begin and one end node for each edge and the same set of edges in each list. so ive added a couple of asserts and simplified the looping down a bit. let me know what you think: 4d2f110

@kevinkreiser
Copy link
Member

@nilsnolde ive again disabled the mingw workflow as its failign to get one of the depenencies again ☹️

@kevinventullo
Copy link
Contributor Author

@kevinventullo i couldnt ignore that coverage check and so after thinking about it a bit i couldnt see how it was possible that we'd have any mis matching in the begin/end node lists. we should have exactly one begin and one end node for each edge and the same set of edges in each list. so ive added a couple of asserts and simplified the looping down a bit. let me know what you think: 4d2f110

Oh, I was wondering about this! Yes, that looks so much cleaner. Thank you for all these awesome improvements to the PR!

@kevinkreiser kevinkreiser merged commit b34a9d9 into valhalla:master Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants