-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure health monitoring alerts #250
Comments
Blocked by #387. |
Had a quick chat with @julialeague about this to help set my head straight on a path forward — thanks Julia!! There are a lot of things we could be looking at, but without knowing usage patterns well (#137), we can’t know what unusual looks like. Therefore, I propose:
At a future point, once we know what "normal" traffic looks like, we can consider monitoring things like:
|
Speaking to these points, given the rate of change of these, I think they’d result in fragile tests. Considering that the "pass" state is a URL is inaccessible, I think we’d quickly be asserting things which were no longer of help (for example, that a nonsense URL was inaccessible). Instead, I think we might want to consider introducing a security scanner for these. GitHub’s super-linter seems to be a good option, which uses tflint for identifying formatting/known linter problems and terrascan for identifying security risks. I’ll file a new issue for these, for post-MVP. |
I believe this can be accomplished with Route53 health checks which would trigger a CloudWatch alarm.
This can be achieved by a Route53 health check as well, hitting the health check endpoint on the public API. Unfortunately, we no longer have a set URL for the public API — it will be either https://public-api-green.dawson.ustaxcourt.gov/public-api/health or https://public-api-blue.dawson.ustaxcourt.gov/public-api/health. I’ll need to file an issue (closely related to flexion#6864) and have it fixed before I can fully implement this health check.
Already handled by a CloudWatch metric, and we can add an alarm. |
CloudWatch Alarms looks like a natural clearing house for status, and it uses SNS for notifications. |
Speaking with @mmarcotte on direction here — we’ll use a simple SNS configuration for now and go with the Route53 approach. We know we have a blindspot that if AWS has a catastrophic outage, we will not be notified (because the notification system may also be offline), and I’ll file an issue to consider using an external notification service like Opsgenie for post-MVP. |
Reported flexion#6903 to get a single API endpoint for the system health JSON. |
|
Latest Elasticsearch alarms are in https://github.com/ustaxcourt/ef-cms/compare/add-es-alarms, blocked on running account-specific terraform steps, communication in Slack. |
flexion#6903 is completed by flexion#7177, awaiting PR to the Court. |
Awaiting #608. |
As the Court, so that we can ensure we have a secure and available system, we need assurance that after code updates, known weak points are not vulnerable to leak sensitive data and the application is available.
Acceptance criteria
Alerts are configured when:
The text was updated successfully, but these errors were encountered: