PXP-7805 Audit Service SQS #1603

paulineribeyre · 2021-05-11T19:35:10Z

Jira Ticket: PXP-7805

goes with uc-cdis/audit-service#2 and uc-cdis/fence#923

Breaking change for environments that already have the audit service deployed: the new version of the audit service will require running gen3 kube-setup-audit-service again and adding the following to the audit service configuration file:

PULL_FROM_QUEUE: true
QUEUE_CONFIG:
  type: aws_sqs
  sqs_url: <sqs url>
  region: <sqs region>

New Features

New CLI module "gen3 sqs" to manage AWS SQS queues
Setting up the audit service now involves the creation of an AWS SQS
Fence now uses service account "fence-sa" which has access to push messages to the audit SQS
The audit service now uses service account "audit-service-sa" which has access to read messages in the audit SQS

Deployment changes

The new version of the audit service will require running gen3 kube-setup-audit-service and gen3 kube-setup-fence again, and updating the audit-service and fence configuration files

williamhaley · 2021-05-13T19:42:34Z

I'm sorry to add this comment late in the game, but this PR makes me think about our coupling with AWS. I know we're already pretty tightly coupled, but I was wondering if there are abstract open source messaging platforms we considered before SQS? Realistically I think we're pretty far into AWS world so maybe it's not even worth considering.

williamhaley · 2021-05-13T19:49:13Z

gen3/bin/kube-setup-audit-service.sh

+  local saName
+  local roleName
+
+  # fence can push messages to the queue


(thinking out loud)

I wonder if this should be in kube-setup-fence-service.sh. I could see an argument for having it here, but I also think that the audit-service only cares about the receiving policy, and the senders should be responsible for their ability to put to the queue. But then again the queue is created here so 🤷

Maybe it makes sense here since the audit-service "owns" this queue, but I guess I also wonder about the implementation and the order we roll services. Do we have a dependency sequence now that's not obvious? Do we have a way of making sure the apps wait to init until after the queue is created and they're able to put/get from it?

Yeah I could move it to kube-setup-fence-service.sh. The audit SQS and policies would get created even if the audit-service is not in use though, but that's maybe not a big deal.

the order we roll services

right now fence rolls after audit-service

Do we have a way of making sure the apps wait to init until after the queue is created and they're able to put/get from it?

the SQS and SQS policies are set up during kube-setup and the service container doesn't start until kube-setup completes, so i think we should be good?

Ah, good point. I didn't think about the fact that setup would always run first anyway. And good point about fence potentially having unused policies!

gen3/bin/kube-setup-audit-service.sh

gen3/bin/tfplan.sh

tf_files/aws/modules/sqs/cloud.tf

williamhaley · 2021-05-13T20:10:51Z

tf_files/aws/modules/sqs/cloud.tf

+  # 5 min visilibity timeout; avoid consuming the same message twice
+  visibility_timeout_seconds = 300
+  # 1209600s = 14 days (max value); time AWS will keep unread messages in the queue
+  message_retention_seconds = 1209600


Do we have any alerting/monitoring on the queues? Even with 14 days retention we could potentially lose messages in the queue. If that happened on a Commons running audit-service and we didn't detect it we could lose audit history. If there was an AWS outage/processing backup we could also potentially have a huge backlog for the audit-service to process.

I haven't looked into that. We might be able to set up CloudWatch monitoring

i created ticket DEVOPS-52 to look into this

williamhaley

Overall I think this makes sense. I do have some broad/vague questions about how we handle queues and similar services in a way that isn't AWS-specific, but I don't know enough to speak to our general direction on that.

Adding SQS also gives us one more service to monitor/watch. Not saying that's necessarily bad, but more operational overhead for instance admins to consider.

paulineribeyre · 2021-05-13T22:29:31Z

@williamhaley

our coupling with AWS. I know we're already pretty tightly coupled, but I was wondering if there are abstract open source messaging platforms we considered before SQS? Realistically I think we're pretty far into AWS world so maybe it's not even worth considering.

Adding SQS also gives us one more service to monitor/watch. Not saying that's necessarily bad, but more operational overhead for instance admins to consider.

No, we already use SQS for multiple use cases (ssjdispatcher, DCF replication). I actually chose to use SQS because we already use it and I thought we would already have code in cloud-automation ready for me to use. Turns out that wasn't the case since the existing code is not generic, but I tried to make the new gen3 sqs module in this PR generic. I do agree with your point about AWS coupling, but we already heavily use AWS and SQS. Introducing a new messaging platform would mean more operational overhead for instance admins.

To counter that, I am writing the new fence code and audit-service code in a way that'll make it easy to add support for other types of queue. But i think it's fair to assume cloud-automation can deploy using AWS.

williamhaley · 2021-05-14T15:41:37Z

Thanks for all your replies @paulineribeyre! That makes a lot of sense. I completely forgot SQS is already used in a place I'd seen with batch jobs 🙄

paulineribeyre added 4 commits April 22, 2021 20:25

Add audit-service SQS

43ec47e

Fix tfoutput bug...

045be90

Fixes

98c583d

service accounts

39ba834

paulineribeyre force-pushed the feat/audit-sqs branch from 38e454c to 39ba834 Compare May 11, 2021 19:57

williamhaley reviewed May 13, 2021

View reviewed changes

gen3/bin/kube-setup-audit-service.sh Outdated Show resolved Hide resolved

williamhaley reviewed May 13, 2021

View reviewed changes

gen3/bin/tfplan.sh Show resolved Hide resolved

williamhaley reviewed May 13, 2021

View reviewed changes

tf_files/aws/modules/sqs/cloud.tf Outdated Show resolved Hide resolved

williamhaley reviewed May 13, 2021

View reviewed changes

williamhaley previously approved these changes May 13, 2021

View reviewed changes

rename my_queue => generic_queue

48f6c37

paulineribeyre dismissed williamhaley’s stale review via 48f6c37 May 17, 2021 23:00

williamhaley previously approved these changes May 18, 2021

View reviewed changes

Add fence-sa to presigned-url-fence-deploy + fix tests?

f1a5f8b

paulineribeyre dismissed williamhaley’s stale review via f1a5f8b May 18, 2021 16:17

This was referenced May 26, 2021

PXP-7805 Fetch audit logs from an AWS SQS uc-cdis/audit-service#2

Merged

PXP-7805 Push audit logs to an AWS SQS uc-cdis/fence#923

Merged

merge awsuser attach-policy and awsrole attach-policy

9f61bee

paulineribeyre force-pushed the feat/audit-sqs branch from 5f46cf2 to 9f61bee Compare June 3, 2021 23:27

PlanXCyborg added test-google-googleServiceAccountRemovalTest test-google-googleServiceAccountTest test-portal-dataUploadTest test-portal-homepageTest test-portal-indexingPageTest test-portal-loginPageTest test-sheepdogAndPeregrine-submitAndQueryNodesTest labels Jun 4, 2021

PlanXCyborg added the test-sheepdogAndPeregrine-submitFileTest label Jun 4, 2021

refactor and create policies with AWS CLI

499ac79

paulineribeyre force-pushed the feat/audit-sqs branch from f72194f to 499ac79 Compare June 9, 2021 17:04

paulineribeyre requested review from williamhaley and emalinowski June 9, 2021 20:09

williamhaley previously approved these changes Jun 9, 2021

View reviewed changes

paulineribeyre added skip-awshelper-build-wait skip-gen3-helper-tests test-portal-homepageTest labels Jun 9, 2021

paulineribeyre dismissed williamhaley’s stale review via 849335a June 10, 2021 21:51

fix audit-service config

1b24398

williamhaley previously approved these changes Jun 10, 2021

View reviewed changes

paulineribeyre dismissed williamhaley’s stale review via 1b24398 June 10, 2021 21:53

paulineribeyre force-pushed the feat/audit-sqs branch from 849335a to 1b24398 Compare June 10, 2021 21:53

williamhaley approved these changes Jun 10, 2021

View reviewed changes

emalinowski approved these changes Jun 11, 2021

View reviewed changes

paulineribeyre merged commit 300bf6c into master Jun 11, 2021

paulineribeyre deleted the feat/audit-sqs branch June 11, 2021 18:29

PlanXCyborg mentioned this pull request Jul 12, 2021

Gen3 Monthly Release 2021.07 qa-covid19.planx-pla.net 1625845274 uc-cdis/gitops-qa#1493

Merged

This was referenced Sep 7, 2021

BDCAT PREPROD Release 2021.07 preprod.gen3.biodatacatalyst.nhlbi.nih.gov 1631031159 uc-cdis/cdis-manifest#3453

Merged

BDCat prod release 1631557981 uc-cdis/cdis-manifest#3484

Merged

PlanXCyborg mentioned this pull request Sep 14, 2021

BDCAT Staging Release 2021.07 staging.gen3.biodatacatalyst.nhlbi.nih.gov 1631640599 uc-cdis/cdis-manifest#3495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PXP-7805 Audit Service SQS #1603

PXP-7805 Audit Service SQS #1603

paulineribeyre commented May 11, 2021 •

edited

Loading

williamhaley commented May 13, 2021

williamhaley May 13, 2021

paulineribeyre May 13, 2021

williamhaley May 14, 2021

williamhaley May 13, 2021

paulineribeyre May 13, 2021

paulineribeyre May 17, 2021

williamhaley left a comment

paulineribeyre commented May 13, 2021

williamhaley commented May 14, 2021

PXP-7805 Audit Service SQS #1603

PXP-7805 Audit Service SQS #1603

Conversation

paulineribeyre commented May 11, 2021 • edited Loading

New Features

Deployment changes

williamhaley commented May 13, 2021

williamhaley May 13, 2021

Choose a reason for hiding this comment

paulineribeyre May 13, 2021

Choose a reason for hiding this comment

williamhaley May 14, 2021

Choose a reason for hiding this comment

williamhaley May 13, 2021

Choose a reason for hiding this comment

paulineribeyre May 13, 2021

Choose a reason for hiding this comment

paulineribeyre May 17, 2021

Choose a reason for hiding this comment

williamhaley left a comment

Choose a reason for hiding this comment

paulineribeyre commented May 13, 2021

williamhaley commented May 14, 2021

paulineribeyre commented May 11, 2021 •

edited

Loading