Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proprietary toolchain support sustainability #63738

Open
fabiobaltieri opened this issue Oct 10, 2023 · 19 comments · Fixed by #63759
Open

Proprietary toolchain support sustainability #63738

fabiobaltieri opened this issue Oct 10, 2023 · 19 comments · Fixed by #63759
Assignees
Labels
area: Toolchains Toolchains RFC Request For Comments: want input from the community

Comments

@fabiobaltieri
Copy link
Member

fabiobaltieri commented Oct 10, 2023

Introduction

Problem description

Hi, I recently bumped into few compiler quirks:

  • the first priority check implementation broke ARCMWDT due to differences in section naming
  • the second implementation broke ARMClang due to missing symbols
  • the troubleshooting of the second implementation was slowed down because ARMClang support was already broken
  • a recent build time optimization added a warning in clang builds for pretty much anything but native_posix that turned out to be an llvm bug
  • we are now blocking a Kconfig option removal for unclear compiler limitations (may not be the case, but it adds to the point)

None of the above were detectable in CI. On paper the project supports a handful of different compilers, but in practice the only ones tested in CI are

Any breakage that does not hit any of these is going to go undetected and has to get caught manually by a reviewer or post-submit, potentially post-release, so it's regression and potential backport.

More importantly, many of those toolchains are proprietary, most developer have no access to them, so one has to rely on someone else to test their code, but at the same time there's not toolchain maintainers or a known list of point of contacts for validating changes against a specific toolchain.

Proposed change

  • Supported proprietary toolchains should have an area in the MAINTAINER file listing users who can test potentially breaking changes. Ideally there should be a way for Zephyr contributors to access proprietary toolchains for troubleshooting, or the support team should provide help in running the changes on the toolchain and providing build artifacts, or contributing fixes and workarounds directly.
  • Supported open source toolchains should have some level of testing in CI (Clang/LLVM beyond native_posix is what I'm thinking)
@fabiobaltieri fabiobaltieri added the RFC Request For Comments: want input from the community label Oct 10, 2023
@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Oct 10, 2023

@fabiobaltieri fabiobaltieri changed the title Proprietary toolchain compatibility testing Proprietary toolchain support sustainability Oct 10, 2023
@cfriedt
Copy link
Member

cfriedt commented Oct 10, 2023

At one point, we spoke about reporting test results from trusted test runners outside of Zephyr CI such as SoC vendors like ST or NXP.

The toolchain spec should be an integral part of that. Possibly an enumeration. Custom LLVM and custom GCC should also be a thing. We have for example CROSS_COMPILE, but it would be great to capture more metadata than that for reporting purposes.

Like external modules, external toolchains would need to run a subset of tests (often proprietary toolchains only support a limited number of platforms).

Maybe we could reuse some of that machinery

@fabiobaltieri
Copy link
Member Author

At one point, we spoke about reporting test results from trusted test runners outside of Zephyr CI such as SoC vendors like ST or NXP.

Would it make sense to investigate the possibility of having some custom runners with those toolchains available running a limited set of core tests? Something on the order of magnitude of the current Clang/LLVM one.

@tejlmand
Copy link
Collaborator

Would it make sense to investigate the possibility of having some custom runners with those toolchains available running a limited set of core tests? Something on the order of magnitude of the current Clang/LLVM one.

it makes a lot of sense, and has been discussed in Toolchain WG, but guess it mostly has to do with priorities.
So if anyone have the skills and bandwidth to carry out such work, then it will be highly appreciated.

tejlmand added a commit to tejlmand/zephyr that referenced this issue Oct 10, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api toolchains.

This helps contributors to identify whom to contact in case of issues related
to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered as
part of the general cmake/toolchain,compiler,linker,bintools entry in the
MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@tejlmand
Copy link
Collaborator

please find the following proposal as a starting point: #63759

Hopefully this will provide more visible information on whom to contact in case of issues related to those toolchains.
Not the final solution, but a step in a better direction.

tejlmand added a commit to tejlmand/zephyr that referenced this issue Oct 10, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api toolchains.

This helps contributors to identify whom to contact in case of issues related
to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered as
part of the general cmake/toolchain,compiler,linker,bintools entry in the
MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@abrodkin
Copy link
Collaborator

I do support that discussion here as we (Synopsys ARC maintainers) and our customers are suffering from exactly that problem: even though we do quite extensive in-house testing with our proprietary tools, simulators and boards, we only do it post-merge upstream. In that sense we'd prefer to enable 3rd-party developers to see problems which are introduced by their changes before merging these changes in the upstream main tree.

But then we start seeing logistical problems:

  1. From one point of view it's hard to make proprietary tools and even more so HW boards freely available to the wide audience. There're usually legal and availability/scalability problems.
  2. From another point of view all the above difficulties could be solved in one or another way, but that requires some work to be done. For example, come-up with extensible Zephyr CI pipeline architecture so that external (vendor managed) runners could be involved. And on top of the initial implementation we'll get problems with stability (i.e. we'll need troubleshoot things in those external runners) and, again, scalability (even if we may allocate 10 boards, it will be nowhere close to required bandwidth).

That said I fully agree with @tejlmand here: if there's somebody who's willing to bite that bullet and work on that, we'll provide all the support with reviewing, testing etc.

tejlmand added a commit to tejlmand/zephyr that referenced this issue Oct 11, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api toolchains.

This helps contributors to identify whom to contact in case of issues related
to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered as
part of the general cmake/toolchain,compiler,linker,bintools entry in the
MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
tejlmand added a commit to tejlmand/zephyr that referenced this issue Oct 11, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api toolchains.

This helps contributors to identify whom to contact in case of issues related
to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered as
part of the general cmake/toolchain,compiler,linker,bintools entry in the
MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@fabiobaltieri
Copy link
Member Author

@tejlmand @abrodkin do you know if the licensing model of those toolchains would even allow usage in CI for a public project? I imagine we could not just make a public docker image with those, but if someone were to put the work to set it up, would it be possible licensing wise to have custom runners with proprietary compilers installed?

I'm not thinking board testing or anything particularly extensive, just something to build few core samples to just test the core stuff, at least as a start, but if the licensing does not allow that then it's not even worth discussing it. :-)

@tejlmand
Copy link
Collaborator

@tejlmand @abrodkin do you know if the licensing model of those toolchains would even allow usage in CI for a public project?

Past talks with ARM regarding Arm compiler 6 licenses has been positive, so should be possible, but we haven't gotten to the point where we have a concrete way of doing the implementation with CI.
I would say we're lacking that step before having a final 👍 / 👎 to the solution.

@cfriedt
Copy link
Member

cfriedt commented Oct 11, 2023

@tejlmand
GitHub also fully supports Remote Test Runners as shown in a recent Tech Talk by @kartben featuring @szczys from Golioth.
https://www.youtube.com/live/940O1CUgh4Q?si=zJlRCSqFqO73tHla

https://blog.golioth.io/golioth-hil-testing-part1/

Technically speaking, Synopsys (or another partner) could easily instantiate the existing public Zephyr container image (on-premesis for hardware-in-the-loop testing or in a private cloud for emulation), add a volume on top of that for the proprietary toolchains, add a secret for a toolchain license key, and lastly add another secret for AWS storage (or elsewhere) to provide test reports.

Very doable. My only concerns are the volume of traffic and of course security (it would be best to host the runner in an isolated environment).

@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Oct 11, 2023

Very doable. My only concerns are the volume of traffic and of course security (it would be best to host the runner in an isolated environment).

Yeah that would be the main concern, there's some documentation here about that: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#self-hosted-runner-security, it's not trivial to set up properly but it can certainly be done.

tejlmand added a commit to tejlmand/zephyr that referenced this issue Oct 11, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api
toolchains.

This helps contributors to identify whom to contact in case of issues
related to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered
as part of the general cmake/toolchain,compiler,linker,bintools entry in
the MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@abrodkin
Copy link
Collaborator

From discussion with @stephanosio a couple of months ago I understood that idea of using a private/internal CI image in the main Zephyr CI was rejected by the TSC in the past. That's because it limits user's ability to easily reproduce the issues seen in the CI locally (i.e. users won't be able to simply download the CI docker image and run the tests locally).

And indeed, that's a good question: how to react on failures reproduced by vendor runners?

I mean, what if your PR triggers some issue in a vendor CI. At best you see a build issue and from compiler error message may get enough information for the fix. But what if a test fails during execution: how are you going to debug it? I see only 2 options:

  1. Get creative and re-spin your PR until the problem is gone.
  2. Get hold off the vendor's maintainers and "co-debug" with the person.

If we're OK with such limitations, let's try to prototype some solution for "external" vendor Zephyr CI runners.

Also, speaking of Synopsys runner I'd say we don't need to have on-premises runners which will build tests and run on SDK's QEMU or on our proprietary nSIM simulator. Instead, the same cloud infrastructure which is used by the main Zephyr CI will do for us. I mean the same cloud provider like AWS or whatever is used now.

jhedberg pushed a commit that referenced this issue Oct 12, 2023
Fixes: #63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api
toolchains.

This helps contributors to identify whom to contact in case of issues
related to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered
as part of the general cmake/toolchain,compiler,linker,bintools entry in
the MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@fabiobaltieri fabiobaltieri reopened this Oct 12, 2023
@fabiobaltieri
Copy link
Member Author

Yeah ideally the toolchain would be generally available, but having it in CI only seems like a better compromise than not testing at all.

Instead, the same cloud infrastructure which is used by the main Zephyr CI will do for us. I mean the same cloud provider like AWS or whatever is used now.

Sounds ideal, private runners seems like a bit of a logistic nightmare. I've no idea how the docker setup works though, @cfriedt could you expand a bit on that?

I'm also wondering if there's some other open source project on GitHub running such a setup that we could look into.

@stephanosio stephanosio self-assigned this Oct 16, 2023
@stephanosio
Copy link
Member

As @abrodkin mentioned, the problem has more to do with the project policy (i.e. to ensure that any problems in the Zephyr main CI are locally reproducible) than technical hurdles.

In fact, I have already tested the "private" CI Docker image idea for supporting ARM FVP in our CI in #45099; but, it was rejected for the aforementioned reason.

Yeah ideally the toolchain would be generally available, but having it in CI only seems like a better compromise than not testing at all.

@fabiobaltieri The compromise suggested by the TSC at the time was to create standalone workflows (separate from the Zephyr main CI twister workflow) that run with proprietary tools; but, this comes with some logistical issues (see #45099 (comment)).

By the way, in terms of the CI infrastructure, we already have a very flexible Kubernetes-based self-hosted runner deployment where we can use private Docker images for a specific set of runners or even mount an NFS volume containing proprietary tools as needed. With the current infrastructure, we do not need externally hosted runners, which would truly be a logistical nightmare.

Chenhongren pushed a commit to Chenhongren/zephyr that referenced this issue Oct 26, 2023
Create dedicated entries for ARC MWDT, arm compiler 6, and one Api
toolchains.

This helps contributors to identify whom to contact in case of issues
related to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered
as part of the general cmake/toolchain,compiler,linker,bintools entry in
the MAINTAINERS file.

(cherry picked from commit 2f79f5e)

Original-Fixes: zephyrproject-rtos#63738
Original-Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
GitOrigin-RevId: 2f79f5e
Change-Id: Ic01f3f9411ac143e1faae10e19c13ebe60f818a4
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/4933966
Tested-by: Keith Short <keithshort@chromium.org>
Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com>
Reviewed-by: Keith Short <keithshort@chromium.org>
Commit-Queue: Keith Short <keithshort@chromium.org>
microbuilder pushed a commit to microbuilder/zephyr that referenced this issue Oct 30, 2023
Fixes: zephyrproject-rtos#63738

Create dedicated entries for ARC MWDT, arm compiler 6, and one Api
toolchains.

This helps contributors to identify whom to contact in case of issues
related to those toolchains.

The Zephyr SDK, cross-compile, and other GCC based compilers are covered
as part of the general cmake/toolchain,compiler,linker,bintools entry in
the MAINTAINERS file.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 29, 2024

I mean, what if your PR triggers some issue in a vendor CI. At best you see a build issue and from compiler error message may get enough information for the fix. But what if a test fails during execution: how are you going to debug it?

Good point. Also, trying to somewhat "centralize" the CI of multiple, proprietary toolchains may not be very "scalable"[1]

FWIW SOF CI (tries to) test the Zephyr main branch daily. The feedback loop is obviously slower than testing PRs before merge but it's not too bad and has exposed many toolchain issues fairly quickly. Post-merge but very soon after merge.

All in all this has so far proved to be a decent trade-off and this obviously scales to any number of proprietary tools because there is zero centralized involvement. The lack of permissions management and other administrative tasks like licenses is especially nice (I saw scary keywords like "security" and "logistical" lurking above already...)

Testing daily means the workload is constant, which in turn means more tests can be run for longer. Testing daily and pre-merge are of course not mutually exclusive; but trying to test every PR with some proprietary toolchain would be putting the cart before the horses if that toolchain does not already have a good track record with some independent automation able to test the Zephyr main branch regularly and engineers actually looking at failures and trying to fix them. In my experience, the second most common CI sin is trying to automate something that isn't maintained much in the first place [2]

My 2 SOF cents.

[1] An approach where some sort of "publisher" workflow notifies a list of distributed and independent CIs would help.
[2] The most common CI sin is regularly testing configurations that developers don't use.

@fabiobaltieri
Copy link
Member Author

Good point. Also, trying to somewhat "centralize" the CI of multiple, proprietary toolchains may not be very "scalable"[1]

I would more interested in a plan to make such toolchain available to any Zephyr developer as needed somehow.

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 29, 2024

I would more interested in a plan to make such toolchain available to any Zephyr developer as needed somehow.

Anyone can change sof/west.yml today, point it at their (unmerged) Zephyr commit and submit it to SOF CI.

I agree a more direct method would be extremely useful. It's also more complicated to set up.

@stephanosio
Copy link
Member

I would more interested in a plan to make such toolchain available to any Zephyr developer as needed somehow.

The problem is more on the legal (licensing) side than the technical side -- obviously, you cannot just let anyone use a proprietary toolchain for which Zephyr Project has been licensed, without getting a permission from the company that licensed it.

On the technical side, it is actually quite simple -- just provide a one-time SSH access to an ephemeral runner (mxschmitt/action-tmate) with the proprietary toolchain.

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 2, 2024

You mean it's technically easy to provide access to a proprietary toolchain if it is... not proprietary? :-)

@cfriedt
Copy link
Member

cfriedt commented Mar 2, 2024

Sounds ideal, private runners seems like a bit of a logistic nightmare. I've no idea how the docker setup works though, @cfriedt could you expand a bit on that?

I remember using private runners via gitlab a long time ago. It was surprisingly easy. Not entirely sure how it's done with GitHub, but I did linked to a video about it up above.

TL;DR - private runners get authenticated with a key and communicate via a server API as if they were regular runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Toolchains Toolchains RFC Request For Comments: want input from the community
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants