Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuildWindows on Azure DevOps is broken (build docs hangs) #10820

Closed
hdorio opened this issue Feb 6, 2022 · 7 comments
Closed

BuildWindows on Azure DevOps is broken (build docs hangs) #10820

hdorio opened this issue Feb 6, 2022 · 7 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@hdorio
Copy link
Contributor

hdorio commented Feb 6, 2022

Zig Version

0.10.0-dev.369+12c2de6ee

Steps to Reproduce

I'm unable to reproduce the issue.

Expected Behavior

The BuildWindows should terminate.

Actual Behavior

The BuildWindows > 'Build and test' step timeout.
My guess is that the build doc command hangs.


Here are the result of BuildWindows with a seperate build doc on 10812
I made the "Documentation" timeout after 60 min

run run start end of tests runtime tests documentation result screenshot link
20220206.42 2022-02-06T19:37:19.8335761Z 2022-02-06T20:32:48.3944696Z 1h45m48s 48m42s 6m46s success 20220206.42 https://dev.azure.com/ziglang/zig/_build/results?buildId=19720&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=048377b9-4d72-5444-6498-1de1d1101a1e&l=1
20220207.1 2022-02-07T00:22:37.7758166Z 2022-02-07T01:39:21.4958734Z   1h07m37s timeout fail 20220207.1 https://dev.azure.com/ziglang/zig/_build/results?buildId=19732&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=45960c34-79ee-509a-888b-9d7338b9a12b&l=23889
20220207.18 2022-02-07T09:50:41.5920169Z 2022-02-07T11:06:47.9031813Z   1h06m25s timeout fail  20220207.18 https://dev.azure.com/ziglang/zig/_build/results?buildId=19749&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=45960c34-79ee-509a-888b-9d7338b9a12b
20220207.25 2022-02-07T14:11:30.5585816Z 2022-02-07T15:55:27.3106282Z   1h30m28s timeout fail  20220207.25 https://dev.azure.com/ziglang/zig/_build/results?buildId=19756&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=45960c34-79ee-509a-888b-9d7338b9a12b&l=23913
20220207.28 2022-02-07T18:02:13.3611039Z 2022-02-07T20:12:47.2468025Z   1h53m51s timeout fail 20220207.28 https://dev.azure.com/ziglang/zig/_build/results?buildId=19759&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=45960c34-79ee-509a-888b-9d7338b9a12b&l=23913
After PR 10813 merge and rebase
20220207.50 2022-02-08T06:04:27.5285924Z 2022-02-08T07:22:27.1751063Z 2h3m37s 1h08m11s 9m22s success 20220207.50 https://dev.azure.com/ziglang/zig/_build/results?buildId=19781&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=45960c34-79ee-509a-888b-9d7338b9a12b&l=23929
@hdorio hdorio added the bug Observed behavior contradicts documented or intended behavior label Feb 6, 2022
@marler8997
Copy link
Contributor

marler8997 commented Feb 6, 2022

It's currently unclear whether build doc command is hanging or just going too slow sometimes. There appears to be alot of variance in the performance of azure pipelines, up to a factor of 3 has been observed. On successful runs, we've seen the "build doc" command take 1 hour or 3 hours. The timeout on the whole job is 6 hours and I've seen the it takes 2 hours just to get to the "build doc" command.

To confirm whether it's hanging or just taking a long time, I suggest we open a PR that temporarily adds some progress reporting into the "build doc" command. This will both confirm whether it's still running and give us a better idea on how much performance variance there actually is.

@hdorio
Copy link
Contributor Author

hdorio commented Feb 6, 2022

This is unrelated to #10008 .

If you inspect the failed runs:

#20220205.18 std/math: optimize division with divisors less than a half-limb
start: 2022-02-05T08:14:25.5548219Z
hangs: 2022-02-05T09:33:10.6464988Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19652&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34378

#20220205.22 stage2 ARM: clarify usage of unfreezeRegs in airSliceElemVal
start: 2022-02-05T11:46:59.3018466Z
hangs: 2022-02-05T12:56:36.1545144Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19656&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34372

#20220205.25 Add compiler-rt functions for f80
start: 2022-02-05T14:33:05.9805610Z
hangs: 2022-02-05T20:30:22.7094250Z (BuildMacOS)
https://dev.azure.com/ziglang/zig/_build/results?buildId=19659&view=logs&j=28c53fb9-3e1b-538f-b6d2-6fc003711cae&t=b577613a-f97c-568b-2fe2-5ce83fbc18c3&l=21187

#20220205.26 stage2: add support for Nvptx target
start: 2022-02-05T14:33:27.2461481Z
hangs: 2022-02-05T16:16:17.4711385Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19660&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34374

#20220205.27 std: allow tests to use cache and setOutputDir
start: 2022-02-05T14:34:19.6140127Z
hangs: 2022-02-05T15:49:21.7040484Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19661&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34369

#20220205.36 x86_64: add distinct MCValue representing symbol index in the linker
start: 2022-02-05T19:31:05.1271733Z
hangs: 2022-02-05T20:49:07.2658722Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19670&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34376

#20220205.41 stage2: implement @sqrt for f{16,32,64}
start: 2022-02-05T21:27:20.7202382Z
hangs: 2022-02-05T22:31:48.0159866Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19675&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34406

#20220206.2 x86_64: add distinct MCValue representing symbol index in the linker
start: 2022-02-06T00:51:17.9716887Z
hangs: 2022-02-06T02:48:06.0697917Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19680&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34380

#20220206.12 CLI: remove remainders of --verbose-ast and --verbose-tokenize
start: 2022-02-06T03:43:43.2386974Z
hangs: 2022-02-06T05:27:26.3100898Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19690&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34362

#20220206.32 stage2 ARM: fix load and store for abi_size < 4
start: 2022-02-06T17:20:31.8407079Z
hangs: 2022-02-06T19:09:00.5092720Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19710&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34373

#20220206.45 stage2: lower unnamed constants in Elf and MachO
start: 2022-02-06T22:12:21.1562100Z
hangs: 2022-02-06T23:30:16.3395080Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19723&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34408

#20220207.11 turn echo back on in windows_msvc_script.bat
start: 2022-02-07T05:47:43.6860484Z
hangs: 2022-02-07T07:13:53.7941157Z
https://dev.azure.com/ziglang/zig/_build/results?buildId=19742&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=c222f6c4-0121-56a3-9785-67eccb26a308&l=34414

You will see that (except for when MacOs timeout) every run hangs after executing the last test, at the 2.5 hours elapsed time mark (or less).

816/816 zig.system.darwin.macos.test "std-native-ReleaseSmall-bare-single detect"... OK
737 passed; 79 skipped; 0 failed.

Which gives 3.5 hours for the build doc command to execute.

The build doc command takes around 20 minutes to complete on my virtual machine running on my 7 years old laptop.

I expect the CI to complete a 20 minutes task, in less than 3.5 hours. Even if Azure is 5 times slower it should make it.

  Screenshots

20220205 18
20220205 22
20220205 25-MacOs
20220205 26
20220205 27
20220205 36
20220205 41
20220206 2
20220206 12
20220206 32
20220206 45
20220207 11

@hdorio
Copy link
Contributor Author

hdorio commented Feb 6, 2022

This pull request reactivates echo on which shows that the build doc command is effectively launched.
#10781

This pull request shows a possible candidate for the hang on the build doc command
#10813

@marler8997
Copy link
Contributor

It would be nice, but unfortunately I don't think #10813 applies to this CI issue. It only applies to programs that are using ChildProcess.exec but the docs step uses RunStep with the default stdout/stderr capture set to inherit which means no output collection.

I expect the CI to complete a 20 minutes task, in less than 3.5 hours.

It's not clear how long it should take because of all the variance. My guess is that azure machines are likely competing for resources which would account for this variance.

@hdorio
Copy link
Contributor Author

hdorio commented Feb 6, 2022

It only applies to programs that are using ChildProcess.exec but the docs step uses RunStep

Maybe we are not looking at the same version. At 12c2de6ee in the file doc/docgen.zig the function genHtml() calls ChildProcess.exec() at line 1329 (but not only).

It's not clear how long it should take because of all the variance. My guess is that azure machines are likely competing for resources which would account for this variance.

If that was the case, the time to execute the tests should be slower by the same factor and they are not slower.
The MacOs timeout is likely what you describe. The tests are not finished and the run is canceled in the middle of it because they are too slow.

@marler8997
Copy link
Contributor

Maybe we are not looking at the same version. At 12c2de6 in the file doc/docgen.zig the function genHtml() calls ChildProcess.exec() at line 1329 (but not only).

Oh I didn't realize that the "docgen" step itself also called more child processes, then in that case this could be the fix. Let's cross our fingers and hope :)

@hdorio hdorio changed the title BuildWindows on Azure CI is broken BuildWindows on Azure DevOps is broken (build docs hangs) Feb 7, 2022
@hdorio
Copy link
Contributor Author

hdorio commented Feb 9, 2022

@marler8997 since the merge of 10813 I counted 17 builds without any timeout issue on the master branch.
Thank you very much 👍 .

I consider this issue fixed.

@hdorio hdorio closed this as completed Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

2 participants