-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jenkins build sometimes fails in postgres #5380
Comments
I looked through a number of build logs looking for patterns. Found several builds that had real compile errors. Found one that seemed to be a network issue (no route to host). I also found a few that seemed to be incorrect dependency files, such as: .deps/fmgrtab.Po:139: *** missing separator. Stop. These errors seem to most likely be parallelism or network issues. |
The generation of dependency files could be causing this error with parallel make, if a .Po dependency file is being written while another make process is trying to read it, we could see makefile format error. The postgres makefile has an option to not generate dependencies, which might be good for jenkins running a clean build, but it is not clear if that changes other compiler configuration. |
Trial build with autoupdate=no looks very promising to build postgres without dependency files. Would need to make that optional flag that jenkins can use. |
Changing build to not enable dependencies works okay with release/debug builds but not with asan/tsan builds. Alternative approach is to leave dependencies alone and retry postgres build if it hits this error. |
Summary: Re-try postgres build if transient error message is found in the error log. For now, the only message we look for is due to creation of dependency files that can cause a make error: .deps/fmgrtab.Po:139: *** missing separator. Stop. Test Plan: yb_build.sh, seeding a postgres code error that looks like the transient error. Jenkins: compile only Reviewers: mbautin, jharveysmith Reviewed By: jharveysmith Subscribers: rsami, yql Differential Revision: https://phabricator.dev.yugabyte.com/D9158
I think we should not close this because the current workaround does not address the root cause. |
…ent error. Summary: Re-try postgres build if transient error message is found in the error log. For now, the only message we look for is due to creation of dependency files that can cause a make error: .deps/fmgrtab.Po:139: *** missing separator. Stop. Test Plan: yb_build.sh, seeding a postgres code error that looks like the transient error. Jenkins: compile only Reviewers: mbautin, jharveysmith Reviewed By: jharveysmith Subscribers: rsami, yql Differential Revision: https://phabricator.dev.yugabyte.com/D9158
Summary: Improving `compile_commands.json` generation to achieve fully error-free offline indexing of our C/C++ code using clangd-indexer. Soon clangd-indexer will be packaged with the LLVM build that we distribute and use, and then we can run clangd-indexer during our CI/CD and verify that the compile_commands.json file remains correct. Also fixing a bug in the `build_postgres.py` script where we are detecting and retrying transient errors during PostgreSQL build. It looks like these retries were not happening because we were not splitting postgres build stderr into lines. Removing logic to re-execute the remote compilation script on the same host with Bash debugging turned on compiler-wrapper.sh in case of empty stderr. This was causing errors from which the retry loop was failing to recover, while making the logic more complicated and not providing any benefits in diagnosing issues. Instead of this, simply retrying all non-zero error codes in case stderr is empty. Also fixing various shellcheck issues. Test Plan: Jenkins ``` /opt/yb-build/llvm/yb-llvm-v12.0.0-1624139287-d28af7c6/bin/clangd-indexer --executor=all-TUs compile_commands.json >clangd.dex ``` Then start Visual Studio Code with the following settings: ``` "clangd.path": "/opt/yb-build/llvm/yb-llvm-v12.0.0-1624139287-d28af7c6/bin/clangd", "clangd.arguments": [ "-index-file=/my/home/directory/code/yugabyte-db/clangd.dex", "--background-index=false" ], ``` and verify that code navigation works and that the Clangd server initializes properly and loads the static index: ``` I[18:58:31.050] clangd version 12.0.0 (https://github.com/llvm/llvm-project.git d28af7c654d8db0b68c175db5ce212d74fb5e9bc) I[18:58:31.050] PID: 554546 I[18:58:31.050] Working directory: /my/home/directory/code/yugabyte-db I[18:58:31.050] argv[0]: /opt/yb-build/llvm/yb-llvm-v12.0.0-1624139287-d28af7c6/bin/clangd I[18:58:31.050] argv[1]: -index-file=/my/home/directory/code/yugabyte-db/clangd.dex I[18:58:31.050] argv[2]: --background-index=false I[18:58:31.050] Starting LSP over stdin/stdout I[18:58:31.050] <-- initialize(0) I[18:58:31.050] Client supports legacy semanticHighlights notification and standard semanticTokens request, choosing the latter (no notifications). I[18:58:31.051] --> reply:initialize(0) 0 ms I[18:58:31.101] <-- initialized I[18:58:31.103] <-- textDocument/didOpen I[18:58:31.196] Loaded compilation database from /my/home/directory/code/yugabyte-db/compile_commands.json ``` Also `./yb_build.sh --shellcheck` Reviewers: rsami, rskannan, jharveysmith, steve.varnau Reviewed By: steve.varnau Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D12036
Standard error from external program {{ make MAKELEVEL=0 -j 80 }} running in '/nfusr/centos-gcp-cloud/jenkins-worker-4a1/jenkins/jenkins-github-yugabyte-db-centos-master-clang-release-1176/build/release-clang-dynamic-ninja/postgres_build', saving stdout to {{ /nfusr/centos-gcp-cloud/jenkins-worker-4a1/jenkins/jenkins-github-yugabyte-db-centos-master-clang-release-1176/build/release-clang-dynamic-ninja/postgres_build/make.out }}, stderr to {{ /nfusr/centos-gcp-cloud/jenkins-worker-4a1/jenkins/jenkins-github-yugabyte-db-centos-master-clang-release-1176/build/release-clang-dynamic-ninja/postgres_build/make.err }}:
cat: access/objfiles.txt: No such file or directory
cat: bootstrap/objfiles.txt: No such file or directory
cat: catalog/objfiles.txt: No such file or directory
cat: parser/objfiles.txt: No such file or directory
cat: commands/objfiles.txt: No such file or directory
cat: executor/objfiles.txt: No such file or directory
cat: foreign/objfiles.txt: No such file or directory
cat: lib/objfiles.txt: No such file or directory
...
The text was updated successfully, but these errors were encountered: