-
Notifications
You must be signed in to change notification settings - Fork 8
Compute PCRs, using labels and caches #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scripts/clean-cluster-kind.sh
Outdated
| fi | ||
| done | ||
| kubectl delete deploy cocl-operator -n confidential-clusters || true | ||
| kubectl delete job compute-pcrs -n confidential-clusters || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already left the comment in the other PR, but the job deletion should be handled by the operator
The other alternative is to have a file on a PVC and append the values there, but then it requires a CSI provider, not a big deal though |
Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
6b11791 to
b1c2782
Compare
| ..Default::default() | ||
| }, | ||
| ]), | ||
| restart_policy: Some("Never".to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do they have this constant already define in the crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no :(
| if status.completion_time.is_none() { | ||
| info!("Job {name} changed, but had not completed"); | ||
| return Ok(Action::requeue(Duration::from_secs(300))); | ||
| } | ||
| let jobs: Api<Job> = Api::namespaced(ctx.client.clone(), &ctx.operator_namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing what we do if the job fails. Do we restart it? Currently, you are deleting it if it was successful and when it failed as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, you are deleting it if it was successful and when it failed as well
no, completion_time is only Some on success. New jobs pods for that job will create within the default back-off limit (6), the job will stick around as failed.
operator/src/reference_values.rs
Outdated
| let context = Arc::new(ctx); | ||
| // refine watcher if we handle more than one type of job | ||
| tokio::spawn( | ||
| Controller::new(jobs, watcher::Config::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the rust crate allow to monitor only a subset of jobs? Like filtering by label? We are only interested in the pcr calculation jobs. If there is such a filtering, it should reduced the number of the objects that are monitored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already constrained to the namespace, but would you prefer a label anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add other kind jobs, it will be better, yes
crds/src/lib.rs
Outdated
| pub struct ImagePcr { | ||
| pub first_seen: DateTime<Utc>, | ||
| pub pcrs: Vec<Pcr>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new CRD? I thought it should be some kind of metadata in the image manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it should be some kind of metadata in the image manifest
Do you mean the label content? The label content I assume to be this. ImagePcrs is what is expected in the operator-managed config map (Create new config map to cache known PCR values and parts), as an interpretation of this suggestion from #13:
CRD to store for each image seen in the cluster:
- image sha256sum / tag
- Date first seen
- PCR values (i.e. the label from the container above)
where I ultimately didn't go for a CRD for reasons described in the original PR comment (counter-suggestions still welcome)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so, IIUC, this will be the content of the config map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It could look like this (most recently tested with a labelled image that I created, thus the tag):
data:
image-pcrs.json: '{"quay.io/jnaucke/fedora-tagged-pcrs:v2":{"first_seen":"2025-09-12T08:06:49.324536935Z","pcrs":[{"id":4,"value":"551bbd142a716c67cd78336593c2eb3b547b575e810ced4501d761082b5cd4a8","parts":[{"name":"EV_EFI_ACTION", ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, please don't define it in the CRDs. Just move it in the reference values module/file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now I see where you're coming from. The issue is that the fallback compute-pcrs job also writes to that config map (alternative would be job output parsing which we previously disregarded) and needs to know its structure. Alternatives that I'd see:
- rename the
crdscrate (tok8s_resources? meh) - define the map in
operatorlike you suggested, and import that fromcompute-pcrsinstead. I'm not a fan because that would trigger a rebuild ofcompute-pcrsupon any change ofoperator(hence theCargo.tomledit in its Containerfile), which with the current Podman setup is not incremental. - make it a CRD after all (would require some RFC1035-compatible identification of images which, FWIW, I already ended up making for job naming in the meantime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or define another crate. This seems the cleanest option to me
|
@Jakob-Naucke when @6-dehan has implemented the first unti-tests in #20, please add the tests covering this PR. The logic is becoming slightly more complex |
b1c2782 to
92d0f4a
Compare
92d0f4a to
2f29bd9
Compare
- Create a new config map to cache known PCR values and parts. - Fill with PCR label if present in boot image. - Recompute reference values based on config map. Technicalities: - Pivot compute-pcrs job from setting reference values directly to writing to the new config map. It is now used as a fallback. - Move reference value-specific, non-Trustee-interacting operations to a new module. Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
2f29bd9 to
99b9a89
Compare
|
Looks good! |
Supersedes the reference value input workflow and #12.
Uses a config map for value caching instead of a CRD because identifying a CRD with an image ref is a little weird (character type & length limits). I'm not a huge fan either though because this means rewriting the entire map on every new image (as we do with the reference values…), so I'm open to other suggestions.