Migrate Taskcluster to postgres by helfi92 · Pull Request #154 · taskcluster/taskcluster-rfcs

helfi92 · 2019-12-16T16:32:13Z

No description provided.

djmitche

This looks good at a read-through. I wonder if we shouldn't follow the IETF process and create a new RFC, rather than revise this one in-place? For example https://tools.ietf.org/html/rfc293 updates RFC288 but is obsoleted by RFC298. It probably doesn't matter.. @ccooper do you have an opinion there?

We talked about audiences that might be interested in this RFC:

People deploying Taskcluster (cloudops)
People using Taskcluster (releng, firefox-ci at large)
People developing Taskcluster (TC team)

I think this addresses what we know of those groups' requirements, and hopefully has enough detail that they can raise any concerns early in the process.

helfi92 · 2019-12-17T19:58:36Z

I wonder if we shouldn't follow the IETF process and create a new RFC, rather than revise this one in-place?

Done!

helfi92 · 2020-01-16T02:01:44Z

@edunham @sciurus I updated the RFC. Do you mind giving it another look, please? Thanks!

edunham · 2020-01-16T19:58:25Z

+To test the scalability and performance of the system, we will do an import of
+the FirefoxCI production database (minus the secrets) into the postgres database
+on the staging deployment and then observe if the database crashes or if there
+are any noticeable performance issues that arise.


Great, thanks for adding this! Let's make sure we do the parallel request testing on the system to which we've imported production amounts of data. We want to test 2 things:

Can production quantities of data be imported successfully?

Does the DB perform as expected when it has production quantities of data in it?

As currently written, this tests 1 but not 2. To test both, we could just make sure to do the Parallel Requests testing on the same instance that we've imported the prod quantities of data into.

Good observation. I'll update the RFC to make this clearer.

edunham · 2020-01-16T20:00:41Z

+## Backups and Restores
+
+Teams operating Taskcluster will rely on the cloud provider's backup system to
+handle backups and restores.


A note about what backup/restore guarantees we're getting from the old system and would want to be no worse than in the new one might be handy here. There are often many options to choose from between the extremes of "no backups" vs "keep everything forever", and if there's any general guidance then this could be a good place to put it. If not, we can figure that out per-deployment.

I think we'll want to figure that out per-deployment. The tools available are basically unrelated to Taskcluster and the design of this project, so there's not much more to say here other than this.

edunham · 2020-01-16T20:06:12Z

+Direct SQL access to the database is *not allowed*. Taskcluster will allow
+ad-hoc read-only queries on the data-set via stored procedures with access
+controlled by Postgres permissions. This feature will most likely be done after
+step 2 of the transition.


Thanks for clarifying the read-only intent here. I assume that the details of how postgres permissions will need to be configured will be forthcoming with the update that adds this feature?

Actually upon reading the next section it's sounding like TC will handle all the postgres perms stuff internally.

Correct. Taskcluster will manage posrtgres permissions internally.

edunham · 2020-01-16T20:08:29Z

+(configured in Kubernetes), and on install/upgrade we'll use the admin user to
+create a non-admin user for each service, with appropriate GRANTs for that
+service's access. Deployers of Taskcluster will pick the passwords for all the
+non-admin users (configured in Kubernetes). It's up to the deployer to create


By "configured in Kubernetes", you mean "encrypted then passed as env vars like all the other secrets we currently provide to services", right?

edunham

I'm happy with this and have no objections to moving it to final comment period.

imbstack · 2020-02-05T20:27:13Z

+## Permissions
+
+Taskcluster will manage permissions to tables/schemas and deployers will manage
+user accounts. The deployment will have an "admin" postgres user/password


Does the admin password need to be in kubernetes at all? Can it live outside of it?

It does not, an in fact the deployment docs specifically warn against including it in the kubernetes config.

imbstack · 2020-02-05T20:27:47Z

+
+## Ad-hoc Queries
+
+Direct SQL access to the database is *not allowed*. Taskcluster will allow


I was thinking we allowed ad-hoc queries but only on a reporting db

Ad-hoc queries will run on a read-only db. What is a reporting db? Let me know if this doesn't answer your question.

imbstack · 2020-02-05T20:29:00Z

+using the existing stored procedure that returned the single column. That new
+stored procedure is then deployed before the code that uses it is deployed.
+
+A consequence of this design is that "procedures are forever" -- an upgrade can


I think we can delete a procedure but it has to happen in a later upgrade, right?

We also talked about supporting rollbacks during the all-hands. Can we add a section here talking about them and how we will support them or justifying why we won't.

I think we can delete a procedure but it has to happen in a later upgrade, right?

Rather than delete a procedure, a safer alternative would be to change the body of the function to return an empty array. `

We also talked about supporting rollbacks during the all-hands. Can we add a section here talking about them and how we will support them or justifying why we won't.

As long as the procedure signature is not changed, rolling back shouldn't cause any issues.

Can we add a section here talking about them

Will do.

imbstack · 2020-02-05T20:29:55Z

+
+### Tracing
+
+Taskcluster will use New Relic to a have better visibility of the database,


Can this be expanded on? We haven't used New Relic with tc before. What sort of changes do we need to make to support it. Why New Relic instead of something else, etc.

Not saying I disagree, just want to know more.

To support it i think you'd want to add an env var to conditionally load the newrelic package, so we can turn it on in stage and prod but not in local dev. You'll also need to add a couple env vars for config. Beyond that, hopefully it will monkeypatch the relevant libraries like pg and just work™

New Relic was my recommendation because it's something Mozilla has licenses for and some other teams use it extensively. https://cloud.google.com/trace/ plus https://googleapis.dev/nodejs/trace/latest/ would be a potential alternative.

The goal of using a tracing / apm service during the migration is to get clearer visibility into query performance over time than we can get from application logging or pg views and logs alone.

imbstack · 2020-02-05T20:31:51Z

Looking good so far!

djmitche · 2020-05-01T14:05:41Z

Having almost finished the project, we should probably (fix the check failure and) merge this!

helfi92 · 2020-05-01T14:22:46Z

I forgot this was still open. Agreed. I'll take care of this. Thanks for the ping.

helfi92 self-assigned this Dec 16, 2019

helfi92 requested a review from djmitche December 16, 2019 16:32

djmitche approved these changes Dec 16, 2019

View reviewed changes

Comment thread rfcs/0065-Migrate-queue-to-postgres.md Outdated

helfi92 force-pushed the rfc-65 branch 5 times, most recently from 44a80ca to 0876a8e Compare December 17, 2019 19:56

helfi92 changed the title ~~Update RFC#65~~ Migrate Taskcluster to Postgres Dec 17, 2019

helfi92 force-pushed the rfc-65 branch from 0876a8e to 8bb5d8c Compare December 17, 2019 19:57

helfi92 changed the title ~~Migrate Taskcluster to Postgres~~ Migrate Taskcluster to postgres Dec 17, 2019

helfi92 added the Phase: Proposal label Dec 18, 2019

sciurus reviewed Dec 19, 2019

View reviewed changes

edunham reviewed Jan 9, 2020

View reviewed changes

Comment thread rfcs/0154-Migrate-taskcluster-to-postgres.md

Comment thread rfcs/0154-Migrate-taskcluster-to-postgres.md Outdated

helfi92 force-pushed the rfc-65 branch from a5a9df8 to b34aea6 Compare January 16, 2020 01:57

edunham reviewed Jan 16, 2020

View reviewed changes

helfi92 requested review from edunham and sciurus January 21, 2020 15:11

sciurus approved these changes Jan 23, 2020

View reviewed changes

edunham approved these changes Jan 23, 2020

View reviewed changes

helfi92 added Phase: Final Comment and removed Phase: Proposal labels Jan 23, 2020

imbstack reviewed Feb 5, 2020

View reviewed changes

helfi92 removed the Phase: Final Comment label Mar 16, 2020

helfi92 added the Phase: Decided label Mar 16, 2020

helfi92 force-pushed the rfc-65 branch 2 times, most recently from 14b6246 to 8664bdf Compare May 1, 2020 16:10

helfi92 added 8 commits May 1, 2020 12:10

Add RFC#154

cac8ee5

Update Summary, Motivation, Schema Migration, Permissions

374a0af

Rename filename to say taskcluster rather than queue

0c50e3c

Add paragraph on confidence and performance

0881f3e

Add performance secion, point of no return

16aa42d

Mention that parallel requests are on on prod data

7e03f1e

Add section on rolling back

2f0f925

Add rfc in TOC

9d2c036

helfi92 force-pushed the rfc-65 branch from 8664bdf to 9d2c036 Compare May 1, 2020 16:10

helfi92 merged commit 8a76163 into taskcluster:master May 1, 2020

helfi92 deleted the rfc-65 branch May 1, 2020 16:12


		## Ad-hoc Queries

		Direct SQL access to the database is not allowed. Taskcluster will allow


		### Tracing

		Taskcluster will use New Relic to a have better visibility of the database,

Conversation

helfi92 commented Dec 16, 2019

Uh oh!

djmitche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

helfi92 commented Dec 17, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

helfi92 commented Jan 16, 2020

Uh oh!

edunham Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edunham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imbstack commented Feb 5, 2020

Uh oh!

djmitche commented May 1, 2020

Uh oh!

helfi92 commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edunham Jan 16, 2020 •

edited

Loading