Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Buildkite CI from AWS to GKE agent queues #5912

Merged
merged 1 commit into from Apr 25, 2024

Conversation

mstifflin
Copy link
Contributor

@mstifflin mstifflin commented Apr 16, 2024

What changed?

  • Update Buildkite pipeline yaml to work with the newly provisioned queues in Google Kubernetes Engine
  • Use agent-stack-k8s v0.8.0 helm chart which has its own expected pipeline yaml syntax in order to successfully onboard.
  • Install buildkite-agent in Dockerfile for use in the code coverage step
    • The mount that previously worked in AWS's VM based infra doesn't work in Kubernete's container based set up. Install the cli directly for simplicity.
  • Increase timeout for a test that was flakier on GKE
  • Leave the unit-test step on AWS while debugging timeout issues.

Why?

  • Migrate from AWS > Google Cloud. Buildkite Enterprise recommends GKE (as opposed to GCP) as the way to have a queue with autoscaling compute.

How did you test it?

Potential risks

  • CI builds will be broken or flaky
  • Can be mitigated by a git revert

Release notes
n/a

Documentation Changes
n/a

Copy link

codecov bot commented Apr 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.56%. Comparing base (9524eca) to head (14ae582).

❗ Current head 14ae582 differs from pull request most recent head e90efb0. Consider uploading reports for the commit e90efb0 to get more accurate results

Additional details and impacted files

see 6 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9524eca...e90efb0. Read the comment docs.

@coveralls
Copy link

coveralls commented Apr 16, 2024

Pull Request Test Coverage Report for Build 018f1666-ddc2-4a01-895c-4add087713ec

Details

  • 0 of 1 (0.0%) changed or added relevant line in 1 file are covered.
  • 32 unchanged lines in 12 files lost coverage.
  • Overall coverage increased (+0.02%) to 67.744%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/persistence/persistence-tests/historyV2PersistenceTest.go 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
common/task/weighted_round_robin_task_scheduler.go 1 89.05%
service/history/task/transfer_standby_task_executor.go 2 86.83%
common/persistence/sql/sqlplugin/postgres/task.go 2 73.4%
common/persistence/sql/sqlplugin/postgres/db.go 2 80.0%
service/matching/db.go 2 73.23%
common/util.go 2 91.78%
service/matching/taskListManager.go 2 81.16%
common/log/tag/tags.go 3 50.46%
service/history/queue/timer_queue_processor_base.go 3 77.82%
common/task/fifo_task_scheduler.go 4 83.51%
Totals Coverage Status
Change from base Build 018f127b-cd0f-43c3-b47a-35b58b7623db: 0.02%
Covered Lines: 99367
Relevant Lines: 146680

💛 - Coveralls

Copy link
Contributor

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for doing this

@mstifflin mstifflin force-pushed the tifflin/test-helm branch 2 times, most recently from 7a5a20d to e1ffb10 Compare April 18, 2024 20:42
@mstifflin mstifflin enabled auto-merge (squash) April 18, 2024 21:01
@mstifflin mstifflin disabled auto-merge April 20, 2024 10:37
@mstifflin mstifflin force-pushed the tifflin/test-helm branch 9 times, most recently from f84bb0a to d2db515 Compare April 24, 2024 23:27
artifact_paths:
- ".build/coverage/*.out"
- ".build/coverage/metadata.txt"
retry:
automatic:
limit: 1
agents:
queue: workers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this job is failing on K8s for pull-request pipeline it will also probably fail in master pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did set pipeline-master's unit test step back to AWS, but I realized I added an extra "make just build" command to it that wasn't there previously. Will update.

Migrate pipeline-pull-request queues to GKE
Bump the test timeout for TestScanAllTrees since it keeps timing out
Migrate pipeline-master Buildkite queues to GKE
Add comment on how buildkite-agent is installed, add comment on yq explode
@mstifflin mstifflin merged commit 869eb00 into master Apr 25, 2024
18 of 19 checks passed
@mstifflin mstifflin deleted the tifflin/test-helm branch April 25, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants