Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release CI asset upload failure #520

Closed
stephanosio opened this issue Jul 11, 2022 · 6 comments · Fixed by #522
Closed

Release CI asset upload failure #520

stephanosio opened this issue Jul 11, 2022 · 6 comments · Fixed by #522
Assignees
Labels
area: CI Issues related to Continuous Integration bug
Milestone

Comments

@stephanosio
Copy link
Member

The "Upload release assets" step in the Release CI fails for unknown reason and the release assets indeed to not get uploaded to the corresponding release.

Failure seen here: https://github.com/zephyrproject-rtos/sdk-ng/runs/7266059491?check_suite_focus=true

@stephanosio stephanosio added bug area: CI Issues related to Continuous Integration labels Jul 11, 2022
@stephanosio stephanosio added this to the 0.15.0 milestone Jul 11, 2022
@stephanosio stephanosio self-assigned this Jul 11, 2022
@stephanosio
Copy link
Member Author

This is likely either due to 1) the large amount of artifacts uploaded to the sdk-ng repository, or 2) the large size of some release artifacts.

Re 1: Try cleaning up old build artifacts and re-run the workflow.
Re 2: GitHub limits the size per release asset to 2GB and the Windows build is pretty close to this limit (1.8GB).

@stephanosio
Copy link
Member Author

  1. the large amount of artifacts uploaded to the sdk-ng repository

After cleaning up the stale artifacts, the release asset upload still fails:
https://github.com/zephyrproject-rtos/sdk-ng/runs/7277825063?check_suite_focus=true

Leaving the no. 2 more likely.

@stephanosio
Copy link
Member Author

Re no. 2, the issue #521 investigated the cause of the distribution bundle size increase and has determined that it is an inevitable consequence of upgrading from GCC 10.3 to 12.1 and no further action can be taken.

In this regard, if the release asset upload failure is indeed due to the (approximately) 2GB per-asset size limit, then an alternate distribution method has to be considered:

  1. Upload the full distribution bundle (and other files) to an alternate distribution point (e.g. S3)
    • additional cost ($0.09/GB for the first 10TB for S3)
  2. Distributing only the minimal distribution bundle via the GitHub release mechanism
    • this requires the users to have internet access at the time of installing the Zephyr SDK
    • the full distribution bundle may be uploaded to an alternate distribution point as a workaround for the above

@stephanosio
Copy link
Member Author

Quick and dirty cost simulation on the no. 1 approach above using AWS S3

Assuming:

  • 200 downloads per day
  • $0.09/GB for outbound data transfer
  • distribution bundle size of 1.7GB (average)

200 downloads/day * 1.7 GB/download = 340 GB/day
340 GB/day * 30 day/month = 10200 GB/month
10200 GB/month * 0.09 USD/GB = 918 USD/month

The cost of $918 per month is non-negligible and this is assuming a fairly generous 200 downloads per day. The downloads per day will likely increase in the future and the cost will likely become prohibitive.

@stephanosio
Copy link
Member Author

Re-running the failed step with the debug logging enabled, the following error message is displayed:

##[debug]Node Action run completed with exit code 137
##[debug]Finishing: Upload release assets

Full log: https://github.com/zephyrproject-rtos/sdk-ng/runs/7310832786?check_suite_focus=true#step:5:131

The exit code 137 is likely due to the node process, in which the release action runs, being killed because of the system running out of memory.

Looking at the softprops/action-gh-release implementation, this seems to make sense because it uses the readFileSync function to read the full content of the release assets into memory (instead of using, for instance, a stream).

Also note that the action attempts to read all release assets into the the memory at once:

      const files = paths(config.input_files);
...
      const assets = await Promise.all(
        files.map(async path => {
          const json = await upload(

The zephyr_runner has 32GB of RAM available to it and the total size of the release assets for the latest build is approx. 16GB; considering the default memory usage in the runner environment and all the overheads, this is not unlikely.

stephanosio added a commit to stephanosio/zephyr-sdk-ng that referenced this issue Jul 13, 2022
This commit modifies the CI release workflow to upload the release
assets in multiple parts because the `softprops/action-gh-release`
action attempts to load all specified release assets into the runner
memory at once and this may cause the runner instance to run out of
memory.

For more details, refer to the GitHub issue zephyrproject-rtos#520.

Revert this when the action is updated to use the streams.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
stephanosio added a commit that referenced this issue Jul 13, 2022
This commit modifies the CI release workflow to upload the release
assets in multiple parts because the `softprops/action-gh-release`
action attempts to load all specified release assets into the runner
memory at once and this may cause the runner instance to run out of
memory.

For more details, refer to the GitHub issue #520.

Revert this when the action is updated to use the streams.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@stephanosio
Copy link
Member Author

While #522 fixed the problem at hand, we will likely hit the 2GB per-asset size limit in the near future and will need to come up with a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: CI Issues related to Continuous Integration bug
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant