-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Arm Builds and Add CI Workflow for Arm #4213
Conversation
How's it going any luck so far 🤔 |
This will not solve your problem, it’s „just“ about testing on arm64. Once GHA adds a M1/2 runner for public use, we can think about distributing Docker images for M1. The docker build already takes 30 mins which a bit annoying on master merges. Cross compiling would easily double that. GH is still has it on their roadmap for end of 2023. |
this does seem to reproduce many of the issues we have on m1s though. i got it building on arm linux but a bunch of tests fail. most of them are just that the expected shape of a route doesnt quite match (small differences in floating point numbers) but there are a few that are more substantial. isochrones test fails in a way that makes the outputs seem quite different and the gtfs tests do some boost polygon crap that doesnt seem to work at all (why do we even use that, we know already its not aware of spherical geom imho we should remove it). |
so i decided to focus most of my time on the isochrones issue since it seems to be the one with the most egregious output. what i did was build the
I can check the actual output just to see what it looks like but based on the above log, it looks like the data that the arm machine created, when run through the algorithm on the amd machine works as we expect it to. This means that we can trust the whole data creation pipeline and can focus on where the algorithm might be screwing up on the arm platform. my bet is all the iterator unordered map shenanigans i did to turn the output of the conrec algorithm into actual polygons/lines. in fact, i recently refactored the whole admins stuff to use geos' c api and in doing so i rewrote the code that pieces segments together into rings. i was very happy at the simplification of this algorithm (though geos has its own version of linemerging) and i thought why didnt it do it this way in the isochrone generation code. i might take a swag at that first as a solution. just for completeness here is the output of the isochrone test when run completely within the arm machine:
you'll see that the rings it creates are very small in terms of number of vertices compared to the expected answers |
alright after reviewing the isochrones geometry generation code i added some informational printing to it. the good news about the algorithm is that its deterministic in its iteration and doesnt rely on unordered datastructures for that. its scans by contour line value and by grid cell so the output of the algorithms can be directly compared. so what i first did was made it output whenever it generated a line segment from a particular gird cell and diffed the line segments generated on both platforms. as we've seen in other tests, they are only off by tiny amounts of floating point wiggle (0.000001): yet the output of the above arm run suggests that the algorithm is unable to connect the segments to each other. let me see if i can prove that... what i will do is annotate each segment with what happens to it. there are a number of options for that:
my hypothesis is that on arm many more segments stay orphans because somehow looking up connectable segments in the map doesnt work (it uses direct pointll equality). looking at the logs we get pretty convincing numbers regarding teh hypothesis: we can look at the logs in detail and reason about why a segment becomes an orphan when it shouldnt... and there you see it right off the bat: the second segment on the arm platform should be prepended to the first one as it is on the amd platform but it gets orphaned instead. i can only imagine this is because of floating point noise on the end of point which is very unfortunate... |
ok yep if i round the points on the segments to the nearest 7 decimal places the test still fails but its only off by one coordinate meaning its acceptable. so a quick run down of whats left to do:
|
…he newer compiler
looks like the docker stuff should work its building fine in the github workflow but seems like it will be a while before it completes: https://github.com/valhalla/valhalla/actions/runs/6574607552/job/17859996388 im going to merge this in the mean time |
circle ci has support for arm. the docs say its 20.04 which is not what we target on x86 but at the very least i plan to get the code actually building on arm. locally ive built it on arm just fine but it segfaults during
valhalla_build_tiles
. we'll see what the matter is and see if we can patch it up