Skip to content

Deplatforming #7049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jul 10, 2025
Merged

Deplatforming #7049

merged 47 commits into from
Jul 10, 2025

Conversation

eddyashton
Copy link
Member

@eddyashton eddyashton commented Jun 11, 2025

The build-time distinction between SNP and Virtual is unhelpful, results in a huge amount of duplication in our CMake. This was necessary on SGX, where the targets had significantly different build options, but is no longer doing anything.

This PR removes the COMPILE_TARGET option from CMake. This has a bunch of knock-on effects, that I think fall into a few distinct camps:

  1. De-duplicated targets. We no longer have ccf_kv.host and ccf_kv.snp, we just have ccf_kv. I've manually tried to verify in each instance that the calls to each target were always identical, but I may have missed some.
  2. Removal of PLATFORM_ preprocessor define. Most places were #if PLATFORM_VIRTUAL || PLATFORM_SNP, so the #if/#endif pair can be trivially removed. Deciding what kind of quote we produce is now a run-time decision (each binary is capable of trying to produce either format), based on the existing enclave.platform config argument. I think that ought to be promoted to a CLI argument (security critical), but that's deferred.
  3. Test clarity. Functions that check whether an ioctl device is available (in C++ and Python) are now named to show that they're testing for SNP_SUPPORT. The decision of whether this run (of an e2e test or a node) should try to get a virtual or SNP attesation is driven by configuration, not by this capability detection.
  4. Simpler library names. We still (for now) produce a single cchost and then a .so for each "enclave" application. But those applications are now libjs_generic.so, rather than libjs_generic.virtual.so/libjs_generic.snp.so.
  5. Gluing those together it turns out it is still convenient to have a global "default test platform" in CMake, so that we can indicate whether we think we're testing virtual or SNP. That looks a bit like COMPILE_TARGET, but it's now called CCF_TEST_PLATFORM, and hopefully used in few enough places to clarify intent. Perhaps the default value should be derived from capability detection? TBD.

I've only run a handful of builds and tests locally, this is likely missing some obvious breakages, so opening as a Draft initially and awaiting CI coverage.

Also, I think this is a big enough breaking change that it shouldn't affect 6.0, so I propose a release/6.x branch.

Follow-ups:

  • Remove enclave section in configuration #6623
  • De-duplicate builds in CI: build once and copy to test runners
  • Tear out more enclave/platform distinctions in the test infra. I'm always wary of how hard we can go on this while maintaining some lts_ story, but I think right now we can be pretty aggressive? But let's do it separately to derisk.

UPDATE

As often happens, this PR lived for a while and changed significantly in that period. The description above is now slightly inaccurate, but kept for posterity. The final merge leans hard on platform detection to avoid any required top-level configuration. Overrides via env var are still possible, but the default behaviour (independently per-node and in Python infra) is to test once at startup for SNP device support, else fall-back to virtual.

@achamayou
Copy link
Member

achamayou commented Jun 11, 2025

Deciding what kind of quote we produce is now a run-time decision (each binary is capable of trying to produce either format), based on the existing enclave.platform config argument. I think that ought to be promoted to a CLI argument (security critical), but that's deferred.

It seems to me that in 99% of cases, this can be automatically decided without security implications, because either the join policy, or the consortium-issued transition service to open decide whether the attestation is acceptable.

Having a way to force virtual is likely useful, either through an env var or through the configuration, but is not security critical for the reason above. Whether that's good usability is arguable.

Gluing those together it turns out it is still convenient to have a global "default test platform" in CMake, so that we can indicate whether we think we're testing virtual or SNP. That looks a bit like COMPILE_TARGET, but it's now called CCF_TEST_PLATFORM, and hopefully used in few enough places to clarify intent. Perhaps the default value should be derived from capability detection? TBD.

Because we (still) use ctest as a launcher, I agree that this is good, but that's something we could decide to revisit, as the decrease in compile-time parameters also diminishes the value of that layer.

@achamayou
Copy link
Member

Also, I think this is a big enough breaking change that it shouldn't affect 6.0, so I propose a release/6.x branch.

100%

@eddyashton
Copy link
Member Author

It seems to me that in 99% of cases, this can be automatically decided without security implications, because either the join policy, or the consortium-issued transition service to open decide whether the attestation is acceptable.

Having a way to force virtual is likely useful, either through an env var or through the configuration, but is not security critical for the reason above. Whether that's good usability is arguable.

I'm coming round to the idea of automatic platform detection, let's discuss tomorrow. Just listing a few discussion points:

  • Can we still force the platform to emit a certain kind of attestation when we want to test that?
  • Do we silently lose test coverage when the automated capability detection fails/is flaky?
  • Do we promote visibility of the chosen platform somewhere? It's in the attestation if we get that far, and I guess we argue that behaviour up-to that point is identical? So we log as soon as we branch, and store it in the KV ASAP.

@achamayou
Copy link
Member

Can we still force the platform to emit a certain kind of attestation when we want to test that?

I assume you mean a virtual attestation? I would say yes, not because I can think of a reason why it might be useful, but because it seems really easy to do, and could come in handy when isolating failures.

Do we silently lose test coverage when the automated capability detection fails/is flaky?

So long as the tests/test infra enforce a specific platform is being tested, I don't think so? If we are worried that the test auto-detection can fail silently too, we could push the number of tests run to bencher, and set up a threshold alert?
Having said that ./tests --platform=SNP is pretty simple, and not subject to this failure mode at all. It may well be the right thing to do.

Do we promote visibility of the chosen platform somewhere? It's in the attestation if we get that far, and I guess we argue that behaviour up-to that point is identical? So we log as soon as we branch, and store it in the KV ASAP.

The attestation gets stored in Tx0. I think we want to tighten the initial consortium check to only open if we're happy with the platform, and we should be good from there.

@eddyashton eddyashton added the run-long-test Run Long Test job label Jun 13, 2025
@eddyashton
Copy link
Member Author

I'm currently convinced that auto-detection is fine, and we don't need any explicit/tested overrides. In my head:

This means a CCF node will look for SNP IOCTL devices, and produce an SNP attestation if it finds them (at this point requiring some additional SNP endorsement server config), else it'll fall back and produce a virtual attestation. The Python infra will do the same check independently, to decide whether it is expecting SNP attestations (with associated tests), and to confirm the attestation format before submitting service open. CMake needs no knowledge of what platforms the tests are running on/targeting, so this can go entirely (I think - maybe something about test labels that need to contain platform info, but I think that can be inserted later). Then we convince ourselves that some of the tests are actually running on SNP, by writing a standalone "confirm_this_is_snp" script/util, and calling that early in the appropriate CI jobs.

Possible future things we can support in future:

  • Running SNP nodes from non-SNP host-infra. We're moving away from this in the current infra, since it's expensive to maintain and unused. The story here would be simple - the detection needs to run on the target, not the host.
  • A manual override for getting a virtual attestation despite SNP being available. I've wanted to do this once before, but very manually. Tempted to bury a magic env var that provides this, and exclude it from any automated testing? Plumbing the same override through the Python infra would be expensive and unnecessary, but is also easier to patch in manually later.
  • Running cross-platform upgrade tests. This would need either of the above options - infra launches remote nodes, or is able to override what some local nodes do. Not currently tested, can add when it's needed.

@eddyashton eddyashton marked this pull request as ready for review July 9, 2025 13:31
@eddyashton eddyashton requested a review from a team as a code owner July 9, 2025 13:31
@achamayou achamayou requested a review from Copilot July 9, 2025 14:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the build‐time split between SNP and Virtual targets in both CMake and code, consolidating enclave handling into a single build with runtime platform detection.

  • Eliminates COMPILE_TARGET and PLATFORM_* defines, simplifying CMake scripts and sample configs
  • Introduces infra.platform_detection for test suites and replaces compile‐time checks with is_snp()/is_virtual()
  • Updates C++ host/enclave layers to use ccf::pal::platform and refactors quote generation

Reviewed Changes

Copilot reviewed 70 out of 71 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tla/make_traces.sh Removed -DCOMPILE_TARGET=virtual from trace‐generation script
tests/infra/platform_detection.py Added runtime detection (is_snp, is_virtual, SNP_SUPPORT)
tests/infra/network.py Unified key setup to always symlink
tests/connections.py Replaced conditional timeout with fixed value
tests/infra/utils.py Removed enclave_type parameter and adjusted signature
Comments suppressed due to low confidence (2)

tests/infra/network.py:542

  • The original code used cp for SNP platforms and ln -s for Virtual; unconditionally using ln -s may break the intended behavior on SNP. Consider applying infra.platform_detection.is_snp() here to choose cp when true, preserving the prior logic.
        cmd = ["ln", "-s"]

tests/connections.py:193

  • [nitpick] Tests previously used a longer timeout when running on SNP (timeout=3); fixing it at 1s may introduce flakiness on SNP platforms. Consider restoring a conditional timeout based on infra.platform_detection.is_snp().
                            timeout=1,

@achamayou achamayou added this pull request to the merge queue Jul 10, 2025
Merged via the queue into microsoft:main with commit e455069 Jul 10, 2025
24 checks passed
@achamayou achamayou deleted the no_platforms branch July 10, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-long-test Run Long Test job
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants