feat(webapp): apply default repository policy on ECR repo creation#3467
Conversation
Self-hosters that run the webapp's ECR account separately from their EKS worker account hit a 403 Forbidden on every new project's first run: `ensureEcrRepositoryExists` calls CreateRepository but never sets a repository policy, so kubelet can't pull the runner image cross-account. Add an optional `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` env var (raw IAM policy JSON, V4 mirror as well). When set, the webapp calls SetRepositoryPolicy after CreateRepository, baking the operator's cross-account pull rule into every new repo automatically. Existing repos are unaffected — they keep their current policy. Cloud is unaffected — the env var is optional and unset by default. Verified locally against a self-host on EKS with cross-account ECR: without the policy, runners stayed in ImagePullBackOff with 403; with it, the same flow completes a hello-world run end-to-end in ~5s.
|
|
Caution Review failedPull request was closed or merged during review WalkthroughAdds optional environment variables DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY and V4_DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY (v4 falls back to v3). Extends RegistryConfig with ecrDefaultRepositoryPolicy, threads that value from getRegistryConfig through getDeploymentImageRef into ensureEcrRepositoryExists/createEcrRepository, and applies the provided raw IAM policy to newly created ECR repositories using SetRepositoryPolicyCommand. For existing repositories, the code attempts idempotent policy reconciliation and catches/logs reconciliation failures instead of throwing. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
Mirrors how the existing-repo branch already reconciles cache settings. SetRepositoryPolicy is idempotent, so applying it on every deploy is safe and covers two recovery cases that the previous version didn't: 1. A previous repo creation succeeded but SetRepositoryPolicy failed mid-flight, leaving the repo without a policy. Without reconciliation, the existing-repo branch would just return the repo and runners would keep getting 403 Forbidden forever — manual intervention required. 2. The operator updates DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY to grant pull to additional accounts/principals. Existing repos need to pick up the new value, not just freshly created ones. The factored `applyEcrRepositoryPolicy` helper is shared between the create and reconcile call sites, keeping behavior identical.
Summary
Self-hosters that operate the webapp's ECR account separately from the account running the EKS workers (e.g., a shared platform account that hosts the registry plus per-team accounts that host clusters) currently hit a 403 Forbidden the first time any project is deployed:
ensureEcrRepositoryExistsinapps/webapp/app/v3/getDeploymentImageRef.server.tscallsCreateRepositoryandPutLifecyclePolicy, but neverSetRepositoryPolicy— so the new repo inherits the AWS default (only the registry-owner account can read/pull). Workers in the cluster account get 403 every single deploy. The only workarounds today are running a one-off post-create script or pre-creating every repo by hand.Proposed change
Add an optional env var:
Raw IAM policy JSON. When set, the webapp calls
SetRepositoryPolicyimmediately afterCreateRepositoryso every new repo carries that policy from creation. Operators control the principal/actions; we don't bake in any opinions about cross-account boundaries.Example value (for the typical self-host case — grant pull to the cluster account):
{ "Version": "2012-10-17", "Statement": [{ "Sid": "AllowClusterAccountPull", "Effect": "Allow", "Principal": {"AWS": "arn:aws:iam::<cluster-account-id>:root"}, "Action": [ "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability" ] }] }Why env var (not a chart-level field)
DEPLOY_REGISTRY_ECR_TAGS,DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN, etc.) which are already operator-supplied viawebapp.extraEnvVarsin self-host setups.RepositoryCreationTemplatefrom the AWS provider isn't an alternative here: it only applies to repos created via pull-through-cache or replication, not toecr:CreateRepositoryAPI calls.Implementation
apps/webapp/app/env.server.ts— declareDEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICYand the V4 fallback.apps/webapp/app/v3/registryConfig.server.ts— propagateecrDefaultRepositoryPolicytoRegistryConfig.apps/webapp/app/v3/getDeploymentImageRef.server.ts—createEcrRepositoryaccepts the policy; if set, callsSetRepositoryPolicyafterPutLifecyclePolicy.docs/self-hosting/env/webapp.mdx— documentation row added under Deploy & Registry.Verification
Verified end-to-end against a self-hosted Trigger.dev on EKS where the ECR account is separate from the cluster account:
main): the new project's first run pod stays inImagePullBackOffwith403 Forbidden.ecr:BatchGetImage/GetDownloadUrlForLayer/BatchCheckLayerAvailabilityto the cluster account: a freshtrigger.dev deploy --env prodfollowed by ahello-worldrun completes in ~5s end-to-end on the first try.Manually also confirmed that existing repos are untouched (the call only fires inside
createEcrRepository, which only runs whenDescribeRepositoriesreturnedRepositoryNotFoundException).Out of scope
webapp.extraEnvVars, so this follows the same pattern. Happy to add a first-class chart field in a follow-up if that's the preferred direction.DEPLOY_REGISTRY_ECR_TAGSis handled today.This is a draft pending CI / CodeRabbit pass — happy to iterate on direction (e.g., split into per-action env vars, or extend the chart values schema) if any of the above choices feels off.