-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
our cmake find modules are not robust to pkgconfig installations #4253
Conversation
…n a fresh almalinux with only vcpkg dependencies. it's easiest to keep this as a patch for valhalla packaging and not contribute it upstream. this is just for demmo
OSX is segfaulting?! That's weird.. |
I tried hard to reproduce the CI failure for OSX, no chance.. Also, how can it even cause a segfault if nothing is being run?! This is what's happening while building with
Anyone got any idea? Or could even try this out on a local x64 Mac? git clone https://github.com/valhalla/valhalla --branch nn-cmake-find-modules --single-branch
cd valhalla
cmake -B build_test
make -C build_test -j$(sysctl -n hw.logicalcpu) |
I have a feeling all those compiler warnings might've caught up with us.. |
Or might just try the M1 one.. |
oh jeez.. tried some more to rule out some clang bug:
So nothing ruled out, but honestly out of strategies.. I'll focus on M1. |
Finally I got a traceback from
seems like with pkg-config spatialite doesn't play balls with sqlite.. I upgraded XCode now to the 14.2.0, which is what I have locally on my x64 OSX. if that's it, I'll forever hate mac even more! for reference, here the commands for circleci ssh:
|
last try with xcode 15.1.0. might also be the spatialite version, which was 5.0.1 on previous xcode versions, but is now 5.1.0. ah right, robin hood has deprecations. so much for "good forever";) |
boost failed some shit too where clang must've removed some deprecated c++03 shite.. trying 1.83.0 |
ok that finally seems to have worked! I have ZERO idea why this tiny change of using pkg-config to find libspatialite would segfault all of a sudden when nothing else changed.. but well, happy to see it's working with the latest version. do note, this was only a problem with intel mac. ubuntu 22.04 works just fine with the older 5.0.1 version. |
@@ -3,15 +3,15 @@ version: 2.1 | |||
executors: | |||
macos: | |||
macos: | |||
xcode: 13.4.1 | |||
xcode: 15.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this enables libspatialite 5.1.0 which seems to be needed on intel mac, with 5.0.1 there was a segfault, see #4253 (comment)
environment: | ||
HOMEBREW_NO_AUTO_UPDATE: 1 | ||
CXXFLAGS: -DGEOS_INLINE | ||
|
||
commands: | ||
mac_deps: | ||
steps: | ||
- run: brew install protobuf cmake ccache libtool libspatialite pkg-config luajit curl wget czmq lz4 spatialite-tools unzip | ||
- run: brew install autoconf automake protobuf cmake ccache libtool libspatialite pkg-config luajit curl wget czmq lz4 spatialite-tools unzip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn't installed with the newer mac image apparently
mkdir -p build | ||
cd build | ||
cmake .. | ||
- run: cmake -B build -DENABLE_SINGLE_FILES_WERROR=OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the new clang there's tons of more warnings treated as errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also no way to enable this again anytime soon, it's also robin hood which causes the errors/warnings
CMakeLists.txt
Outdated
@@ -134,7 +134,7 @@ find_package(Threads REQUIRED) | |||
find_package(ZLIB REQUIRED) | |||
|
|||
# try to find an installed boost or install locally with conan | |||
set(boost_VERSION "1.71") | |||
set(boost_VERSION "1.80") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should bump this to 1.83 too to be consistent with the conanfile.txt. might as well take the chance to upgrade to latest version IMO
if(NOT MSVC) | ||
set_property(TARGET CURL::CURL APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${CURL_LIBRARIES}") | ||
target_link_libraries(CURL::CURL INTERFACE ${CURL_LIBRARIES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit easier to look at
@@ -192,7 +195,7 @@ if(NOT Protobuf_FOUND) | |||
set(CMAKE_FIND_PACKAGE_PREFER_CONFIG OFF) | |||
endif() | |||
|
|||
message(STATUS "Using pbf headers from ${PROTOBUF_INCLUDE_DIR}") | |||
message(STATUS "Using pbf headers from ${PROTOBUF_INCLUDE_DIRS}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is weird.. I compiled on intel mac and all of a sudden it wouldn't see the protobuf include dir anymore.. I dug a bit and found that the protobuf-options.cmake
was using the plural version. seems to be working on linux too, probably there both is defined
boost:without_context=True | ||
boost:without_contract=True | ||
boost:without_coroutine=True | ||
boost:without_date_time=True | ||
boost:without_exception=True | ||
boost:without_fiber=True | ||
boost:without_filesystem=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also pretty annoying.. it wouldn't let me "build" the header-only boost without including those. IIRC filesystem is actually a compiled library. but even so, I couldn't see that conan compiles anything, so it's probably just an optional dependency of some other library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to move to vcpkg as soon as possible after merging this PR
@@ -416,7 +416,8 @@ TEST(UtilMidgard, TestTrimPolylineWithFloatGeoPoint) { | |||
// Worst case is they may quantized at 1.69m intervals (for an epsilon change). | |||
// https://stackoverflow.com/a/28420164 | |||
// The length comparisons below do better than that, but not a lot. | |||
constexpr double MAX_FLOAT_PRECISION = 0.05; // Should be good for 5cm at this lon/lat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux-debug was fine, but osx failed with
[ RUN ] UtilMidgard.TestTrimPolylineWithFloatGeoPoint
/Users/distiller/project/test/util_midgard.cc:451: Failure
The difference between length(clip.begin(), clip.end()) and length(line.begin(), line.end()) * 0.0001f is 0.056293416414652825, which exceeds MAX_FLOAT_PRECISION, where
length(clip.begin(), clip.end()) evaluates to 0.095041990280151367,
length(line.begin(), line.end()) * 0.0001f evaluates to 0.038748573865498542, and
MAX_FLOAT_PRECISION evaluates to 0.050000000000000003.
0.1% portion should be clipped
OH WTFFF.. 2 admin tests fail too:
Of course locally it doesn't fail.. Don't know what to do anymore for real.. FWIW, these are the cmake configs: local cmake config
CI cmake config
|
If anyone has any inclination to help out/advice/fix, I'd appreciate it! |
Some further investigation: Curiously it only fails on a valhalla/src/mjolnir/adminbuilder.cc Lines 142 to 146 in 7eb4784
The actual segfault happens on line 146 in See the map here, there's only one outer polygon with a few ways, no inners: I stepped through the code with What's going on here? GEOS segfault traceback circleci
|
all this confirms to me is that we should in fact completely remove boost geometry from the equation. we shouldnt have started using it anywhere in the project but sadly we didnt know what was going on with its development. one less dependency to wonder about would be good too. why do i say that? to me what you are saying sounds a lot like boosts container has random garbage, when default initialized, in the inners. default initialization will be zero'd out in a debug build but not in a release build. maybe you can find a way to zero initialize our use of the boost objects but honestly i wish i had continued my quest here: #3863 |
those containers are all |
if that doesn't work, I'll try another boost version.. if it's not that, it must be some shit with clang?! the only suspicious block might be this one: for (auto& poly : polys) {
multipolygon_t buffered;
// here the postponed_inners vector wasn't explicitly initialized, but it's class `poly` was default initialized
// and inners() returns also an explicitly default intialized vector
// could swap() be problematic? no right?
poly.polygon.inners().swap(poly.postponed_inners);
buffer_polygon(poly.polygon, multipolygon);
} |
no you are right, vector should be immune to default initialization problems |
src/mjolnir/adminbuilder.cc
Outdated
inner_rings.push_back(geos_helper_t::from_striped_container(inner)); | ||
// there is some weird AppleClang bug where it would iterate over non-existing elements | ||
// in the inners() vector and eventually segfault in from_striped_container() | ||
std::cout << "Size of polygon.inners(): " << polygon.inners().size() << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not kidding: this line printing to stdout
is actually making the tests pass! Compiler bug wasting many hours of my life..
After pulling my hair out on this one, I migrate to M1 after all: #4500. I expected it to be more work, but it seems to just work:) |
this is also ready finally |
I tried for way too long to get a clean build of Valhalla on a fresh (more or less) CentOS 8, with dependencies only installed by
vcpkg
. After an initial struggle with the build environment, it started to build, but couldn't link properly, it was all linking errors for libgeos and libspatialite. I was going down the rabbit hole of brushing up my CMake knowledge and eventually came to the conclusion that our CMake Find modules for those 2 libraries for some reason don't link the dependencies to spatialite (quite a lot) and geos (mostly its own c++ -> c). I don't know why, since it found the libraries when CMake was configuring. I tried a few more things, but it in the end it still couldn't properly link libz.a to spatialite and I just gave up. Though I'm quite sure I almost had it.. I really do wonder how the hell it's working so smoothly on Windows.The below solution is to use pkg-config instead of relying on the CMake Find modules. That should work for all platforms/environments with pkg-config installed. Also Windows should be fine if
vcpkg
is used as a package manager. I don't have a strong opinion about merging this, I can keep it as a patch downstream.I'd PR a bit of README soon, now that I have a vcpkg-only Valhalla build without any system package manager or manual builds (no ENABLE_SERVICES yet).