Fix Arm Builds and Add CI Workflow for Arm #4213

kevinkreiser · 2023-07-21T02:35:25Z

circle ci has support for arm. the docs say its 20.04 which is not what we target on x86 but at the very least i plan to get the code actually building on arm. locally ive built it on arm just fine but it segfaults during valhalla_build_tiles. we'll see what the matter is and see if we can patch it up

Asher-JH · 2023-08-09T05:47:44Z

How's it going any luck so far 🤔

nilsnolde · 2023-08-09T08:25:25Z

This will not solve your problem, it’s „just“ about testing on arm64. Once GHA adds a M1/2 runner for public use, we can think about distributing Docker images for M1. The docker build already takes 30 mins which a bit annoying on master merges. Cross compiling would easily double that.

GH is still has it on their roadmap for end of 2023.

kevinkreiser · 2023-10-11T01:44:32Z

this does seem to reproduce many of the issues we have on m1s though. i got it building on arm linux but a bunch of tests fail. most of them are just that the expected shape of a route doesnt quite match (small differences in floating point numbers) but there are a few that are more substantial. isochrones test fails in a way that makes the outputs seem quite different and the gtfs tests do some boost polygon crap that doesnt seem to work at all (why do we even use that, we know already its not aware of spherical geom imho we should remove it).

kevinkreiser · 2023-10-11T14:27:21Z

so i decided to focus most of my time on the isochrones issue since it seems to be the one with the most egregious output. what i did was build the run-isochrone test target on both aarch64 and amd64 platforms. i then took the data created on the aarach64 platform, copied it over to the amd64 platform and ran test/isochrone manually. here's the output:

kkreiser@L50X1P73:~/sandbox/valhalla/build$ test/isochrone 
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from Isochrones
[ RUN      ] Isochrones.Basic
/home/kkreiser/sandbox/valhalla/test/isochrone.cc:96: Failure
Expected equality of these values:
  actual_geom.Size()
    Which is: 271
  expected_geom.Size()
    Which is: 272
/home/kkreiser/sandbox/valhalla/test/isochrone.cc:96: Failure
Expected equality of these values:
  actual_geom.Size()
    Which is: 271
  expected_geom.Size()
    Which is: 272
[  FAILED  ] Isochrones.Basic (89 ms)
[ RUN      ] Isochrones.OriginEdge
[          ] generating map PBF at test/data/isochrones/origin_edge/map.pbf
[          ] building tiles in test/data/isochrones/origin_edge
[          ] isochrone with mjolnir.tile_dir = test/data/isochrones/origin_edge with locations b with costing pedestrian
[          ] Valhalla request is: {"locations":[{"lon":0.03593248178978409,"lat":0.0,"type":"break"}],"costing":"pedestrian","costing_options":{"pedestrian":{"speed_types":["freeflow","constrained","predicted"]}},"verbose":true,"contours":[{"time":"10"}],"shape_match":"map_snap"}
[       OK ] Isochrones.OriginEdge (394 ms)
[ RUN      ] Isochrones.LongEdge
[          ] generating map PBF at test/data/isochrones/long_edge/map.pbf
[          ] building tiles in test/data/isochrones/long_edge
[          ] isochrone with mjolnir.tile_dir = test/data/isochrones/long_edge with locations a with costing pedestrian
[          ] Valhalla request is: {"locations":[{"lon":0.0,"lat":-0.0017966240891925996,"type":"break"}],"costing":"pedestrian","costing_options":{"pedestrian":{"speed_types":["freeflow","constrained","predicted"]}},"verbose":true,"contours":[{"time":"15"}],"shape_match":"map_snap"}
[       OK ] Isochrones.LongEdge (393 ms)
[ RUN      ] Isochrones.test_clear_reserved_memory
[       OK ] Isochrones.test_clear_reserved_memory (0 ms)
[ RUN      ] Isochrones.test_max_reserved_labels_count
[       OK ] Isochrones.test_max_reserved_labels_count (0 ms)
[----------] 5 tests from Isochrones (877 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (877 ms total)
[  PASSED  ] 4 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Isochrones.Basic

I can check the actual output just to see what it looks like but based on the above log, it looks like the data that the arm machine created, when run through the algorithm on the amd machine works as we expect it to. This means that we can trust the whole data creation pipeline and can focus on where the algorithm might be screwing up on the arm platform. my bet is all the iterator unordered map shenanigans i did to turn the output of the conrec algorithm into actual polygons/lines.

in fact, i recently refactored the whole admins stuff to use geos' c api and in doing so i rewrote the code that pieces segments together into rings. i was very happy at the simplification of this algorithm (though geos has its own version of linemerging) and i thought why didnt it do it this way in the isochrone generation code. i might take a swag at that first as a solution.

just for completeness here is the output of the isochrone test when run completely within the arm machine:

[FAIL] isochrone
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from Isochrones
[ RUN      ] Isochrones.Basic
/home/ubuntu/valhalla/test/isochrone.cc:96: Failure
Expected equality of these values:
  actual_geom.Size()
    Which is: 4
  expected_geom.Size()
    Which is: 356
/home/ubuntu/valhalla/test/isochrone.cc:96: Failure
Expected equality of these values:
  actual_geom.Size()
    Which is: 5
  expected_geom.Size()
    Which is: 272
/home/ubuntu/valhalla/test/isochrone.cc:96: Failure
Expected equality of these values:
  actual_geom.Size()
    Which is: 4
  expected_geom.Size()
    Which is: 272
[  FAILED  ] Isochrones.Basic (269 ms)
[ RUN      ] Isochrones.OriginEdge
[          ] generating map PBF at test/data/isochrones/origin_edge/map.pbf
[          ] building tiles in test/data/isochrones/origin_edge
[          ] isochrone with mjolnir.tile_dir = test/data/isochrones/origin_edge with locations b with costing pedestrian
[          ] Valhalla request is: {"locations":[{"lon":0.03593248178978409,"lat":0.0,"type":"break"}],"costing":"pedestrian","costing_options":{"pedestrian":{"speed_types":["freeflow","constrained","predicted"]}},"verbose":true,"contours":[{"time":"10"}],"shape_match":"map_snap"}
/home/ubuntu/valhalla/test/isochrone.cc:188: Failure
Expected equality of these values:
  within(WaypointToBoostPoint("b"), polygon)
    Which is: false
  true
[  FAILED  ] Isochrones.OriginEdge (1330 ms)
[ RUN      ] Isochrones.LongEdge
[          ] generating map PBF at test/data/isochrones/long_edge/map.pbf
[          ] building tiles in test/data/isochrones/long_edge
[          ] isochrone with mjolnir.tile_dir = test/data/isochrones/long_edge with locations a with costing pedestrian
[          ] Valhalla request is: {"locations":[{"lon":0.0,"lat":-0.0017966240891925996,"type":"break"}],"costing":"pedestrian","costing_options":{"pedestrian":{"speed_types":["freeflow","constrained","predicted"]}},"verbose":true,"contours":[{"time":"15"}],"shape_match":"map_snap"}
[       OK ] Isochrones.LongEdge (1294 ms)
[ RUN      ] Isochrones.test_clear_reserved_memory
[       OK ] Isochrones.test_clear_reserved_memory (0 ms)
[ RUN      ] Isochrones.test_max_reserved_labels_count
[       OK ] Isochrones.test_max_reserved_labels_count (0 ms)
[----------] 5 tests from Isochrones (2894 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (2894 ms total)
[  PASSED  ] 3 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] Isochrones.Basic
[  FAILED  ] Isochrones.OriginEdge

 2 FAILED TESTS
make[3]: *** [test/CMakeFiles/run-isochrone.dir/build.make:73: test/isochrone.log] Error 1
make[3]: *** Deleting file 'test/isochrone.log'
make[2]: *** [CMakeFiles/Makefile2:8718: test/CMakeFiles/run-isochrone.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:8725: test/CMakeFiles/run-isochrone.dir/rule] Error 2
make: *** [Makefile:3699: run-isochrone] Error 2

you'll see that the rings it creates are very small in terms of number of vertices compared to the expected answers

kevinkreiser · 2023-10-11T16:47:48Z

alright after reviewing the isochrones geometry generation code i added some informational printing to it. the good news about the algorithm is that its deterministic in its iteration and doesnt rely on unordered datastructures for that. its scans by contour line value and by grid cell so the output of the algorithms can be directly compared.

so what i first did was made it output whenever it generated a line segment from a particular gird cell and diffed the line segments generated on both platforms. as we've seen in other tests, they are only off by tiny amounts of floating point wiggle (0.000001):

yet the output of the above arm run suggests that the algorithm is unable to connect the segments to each other. let me see if i can prove that... what i will do is annotate each segment with what happens to it. there are a number of options for that:

the segment completes a ring
the segment bridges two other segments
the segment is appended to another segment
the segment is prepended to another segment
the segment is an orphan (at least temporarily)

my hypothesis is that on arm many more segments stay orphans because somehow looking up connectable segments in the map doesnt work (it uses direct pointll equality). looking at the logs we get pretty convincing numbers regarding teh hypothesis:

we can look at the logs in detail and reason about why a segment becomes an orphan when it shouldnt... and there you see it right off the bat:

the second segment on the arm platform should be prepended to the first one as it is on the amd platform but it gets orphaned instead. i can only imagine this is because of floating point noise on the end of point which is very unfortunate...

kevinkreiser · 2023-10-11T18:45:42Z

ok yep if i round the points on the segments to the nearest 7 decimal places the test still fails but its only off by one coordinate meaning its acceptable.

so a quick run down of whats left to do:

unbork CI so that all the builds run again including the arm one
arm builds need to disable warnings because the compiler does a lot more warning
add testing work arounds for all the tests where shape is ever so slightly different (or if its too annoying disable those tests for arm)

…he newer compiler

kevinkreiser · 2023-10-19T14:41:17Z

looks like the docker stuff should work its building fine in the github workflow but seems like it will be a while before it completes: https://github.com/valhalla/valhalla/actions/runs/6574607552/job/17859996388

im going to merge this in the mean time

kevinkreiser added 13 commits July 20, 2023 22:26

see if arm has a jammy image available

da9d83e

mason wont have clangformat for arm probably

8709d32

forget format for real th is time

921341a

use newer vanilla ubuntu for arm, older versions of luajit are broken

9c8f079

circle syntax

3a77b61

syntax again

30cf8f6

try to share more and always use venv

ecda4ed

not sure where to get pip thought it came with venv

5d26478

really every time?

39ef8b5

git too

8f4617c

stuff

adc0942

no format for now

2a656d1

need sleep

62ed8b9

ImreSamu mentioned this pull request Aug 9, 2023

Segmentation Fault when running the image on M1 Mac gis-ops/docker-valhalla#102

Closed

nilsnolde mentioned this pull request Aug 17, 2023

mac os arm build isochrones incorrect #4108

Closed

have to disable warnings to get it to build on arm

dc1670d

Merge branch 'master' into kk_arm

1df558f

kevinkreiser added 7 commits October 12, 2023 09:00

Merge remote-tracking branch 'origin/master' into kk_arm

6f2ceee

try with sudo

b400188

Merge remote-tracking branch 'origin/kk_arm' into kk_arm

6048fe8

kick

8fabb74

no sudo on arm?

d4fb673

circle syntax

c0f094c

skip format check for c++ on arm

298a494

kevinkreiser added 11 commits October 18, 2023 20:07

Merge remote-tracking branch 'origin/kk_arm' into kk_arm

dcd992b

circle lint

9972b5f

more ci lint

dfe5dda

Merge remote-tracking branch 'origin/master' into kk_arm

5fa05a5

maybe this will work

48be349

check if we need to install stuff

83e4a38

use virtual env if in one..

32b5ea1

no file command

e448c9e

on the release build we have to disable benchmark werror because of t…

778a52d

…he newer compiler

compiler too new with bugs!

cccaf54

typo

3c269b1

nilsnolde previously approved these changes Oct 19, 2023

View reviewed changes

trying out multiplatform builds for docker

c3236ea

kevinkreiser dismissed nilsnolde’s stale review via c3236ea October 19, 2023 11:34

kevinkreiser added 6 commits October 19, 2023 07:35

typo

8d40470

try with sudo

0ab4f33

typo

4f14f5e

tag not working?

721b952

oops

152fc8d

Merge branch 'master' into kk_arm

86f9598

nilsnolde previously approved these changes Oct 19, 2023

View reviewed changes

kevinkreiser added 2 commits October 19, 2023 10:33

update automatic workflow for docker publishing

08746a6

Merge remote-tracking branch 'origin/kk_arm' into kk_arm

5416fcd

kevinkreiser dismissed nilsnolde’s stale review via 5416fcd October 19, 2023 14:33

kevinkreiser merged commit 9aa1c1c into master Oct 19, 2023
2 of 7 checks passed

kevinkreiser deleted the kk_arm branch October 19, 2023 14:41

kevinkreiser mentioned this pull request Oct 19, 2023

split arm and amd workflows #4346

Merged

Temetz mentioned this pull request Oct 20, 2023

Latest build broken because of cmake / protobuf changes? gis-ops/docker-valhalla#123

Closed

kevinkreiser changed the title ~~playing around with arm builds~~ Fix Arm Builds and Add CI Workflow for Arm Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Arm Builds and Add CI Workflow for Arm #4213

Fix Arm Builds and Add CI Workflow for Arm #4213

kevinkreiser commented Jul 21, 2023

Asher-JH commented Aug 9, 2023

nilsnolde commented Aug 9, 2023

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 11, 2023 •

edited

Loading

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 19, 2023

Fix Arm Builds and Add CI Workflow for Arm #4213

Fix Arm Builds and Add CI Workflow for Arm #4213

Conversation

kevinkreiser commented Jul 21, 2023

Asher-JH commented Aug 9, 2023

nilsnolde commented Aug 9, 2023

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 11, 2023 • edited Loading

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 11, 2023

kevinkreiser commented Oct 19, 2023

kevinkreiser commented Oct 11, 2023 •

edited

Loading