Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more boskos projects #29

Closed
bobcatfish opened this issue May 28, 2019 · 6 comments · Fixed by #35
Closed

Add more boskos projects #29

bobcatfish opened this issue May 28, 2019 · 6 comments · Fixed by #35
Assignees

Comments

@bobcatfish
Copy link
Contributor

Expected Behavior

We should have enough boskos clusters available that tests are able to run across all our projects.

Actual Behavior

Sometimes projects fail with:

I0528 16:44:41.318] 2019/05/28 16:44:41 main.go:312: Something went wrong: failed to prepare test environment: --provider=gke boskos failed to acquire project: resource not found

Which we think is caused when too many tests are running at the same time

(e.g. tektoncd/pipeline#909)

Additional Info

https://github.com/tektoncd/plumbing#boskos

@bobcatfish bobcatfish added this to the Pipelines 0.5 🐱 milestone May 28, 2019
@bobcatfish bobcatfish self-assigned this Jun 12, 2019
bobcatfish added a commit to bobcatfish/community that referenced this issue Jun 12, 2019
Unfortunately since the resources we are currently using for our
infrastructure are Google resources, we couldn't just give all governing
board members owner access too all the projects we are using. In the
long run we probably want to change this so that the infrastructure is
billed to the CDF, but in the short term we've scaled back the
permissions and are trying to pinpoint exactly the right set to make
sure ppl can do what they need to do.

I also tried to add these permissions to all the boskos projects but
there are about to be 14 of them
(tektoncd/plumbing#29) and it turns out to be
super tedious to apply these individual permissions across 14 different
projects - and they can't share roles. So I decided to keep it simple
and not give everyone access to the boskos projects for now (especially
since afaik no one has ever needed to interact with them directly) and
to open tektoncd/plumbing#34 about finding a
way to not need 16+ different projects with their own permissions - as
long as this is okay with the non-Google governing folks! If it isn't
I'll add the permissions to all the projects.
bobcatfish added a commit to bobcatfish/plumbing that referenced this issue Jun 12, 2019
We have been seeing errors in our automated end to end tests which
indicate that boskos (which use use to manage GCP projects for our end
to end tests - see tektoncd#34 about maybe not using it anymore) had no projects
left. So this PR doubles the number of boskos projects.

Note that I have already manually applied this change to the boskos
cluster.

As you can see in tektoncd/community#25 I
decided to not give everyone access to all 7 new projects b/c managing
fine grained access across 14 boskos projects is no fun at all 😩 but I
can add them anyway if folks want them.

Fixes tektoncd#29
tekton-robot pushed a commit that referenced this issue Jun 12, 2019
We have been seeing errors in our automated end to end tests which
indicate that boskos (which use use to manage GCP projects for our end
to end tests - see #34 about maybe not using it anymore) had no projects
left. So this PR doubles the number of boskos projects.

Note that I have already manually applied this change to the boskos
cluster.

As you can see in tektoncd/community#25 I
decided to not give everyone access to all 7 new projects b/c managing
fine grained access across 14 boskos projects is no fun at all 😩 but I
can add them anyway if folks want them.

Fixes #29
tekton-robot pushed a commit to tektoncd/community that referenced this issue Jun 12, 2019
Unfortunately since the resources we are currently using for our
infrastructure are Google resources, we couldn't just give all governing
board members owner access too all the projects we are using. In the
long run we probably want to change this so that the infrastructure is
billed to the CDF, but in the short term we've scaled back the
permissions and are trying to pinpoint exactly the right set to make
sure ppl can do what they need to do.

I also tried to add these permissions to all the boskos projects but
there are about to be 14 of them
(tektoncd/plumbing#29) and it turns out to be
super tedious to apply these individual permissions across 14 different
projects - and they can't share roles. So I decided to keep it simple
and not give everyone access to the boskos projects for now (especially
since afaik no one has ever needed to interact with them directly) and
to open tektoncd/plumbing#34 about finding a
way to not need 16+ different projects with their own permissions - as
long as this is okay with the non-Google governing folks! If it isn't
I'll add the permissions to all the projects.
@bobcatfish
Copy link
Contributor Author

I missed a step! Added by @dlorenc to our docs in #38

@bobcatfish
Copy link
Contributor Author

Looks like it strikes again: tektoncd/pipeline#1888 (comment)

@bobcatfish
Copy link
Contributor Author

It's the same error but when I look at some of the (many) boskos projects we have, they are definitely not all in use. For example:

So whatever is going wrong seems like something else 🤔

@bobcatfish
Copy link
Contributor Author

as activity from ~2 hours ago

oh wait a second, maybe that's PST 🤔

@bobcatfish
Copy link
Contributor Author

okay so discovered a couple things: boskos thinks all the projects are "dirty" and this is in the logs for the "boskos janitor":

  msg: "failed to clean up project tekton-prow-10, error info: Activated service account credentials for: [prow-account@tekton-releases.iam.gserviceaccount.com]
ERROR: (gcloud.compute.instances.list) Some requests did not succeed:
 - Invalid value for field 'zone': 'asia-northeast3-a'. Unknown zone.
 - Invalid value for field 'zone': 'asia-northeast3-b'. Unknown zone.
 - Invalid value for field 'zone': 'asia-northeast3-c'. Unknown zone.

Fail to list resource 'instances' from project 'tekton-prow-10'
ERROR: (gcloud.compute.disks.delete) unrecognized arguments: --global 

To search the help text of gcloud commands, run:
  gcloud help -- SEARCH_TERMS
Error try to delete resources: CalledProcessError()
ERROR: (gcloud.container.clusters.list) ResponseError: code=404, message=Not Found.
[=== Start Janitor on project 'tekton-prow-10' ===]
[=== Activating service_account /etc/test-account/service-account.json ===]
[=== Finish Janitor on project 'tekton-prow-10' with status 1 ===]

@bobcatfish
Copy link
Contributor Author

it's also possible that this is because we're using such an old version of boskos and it might need an update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant