Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul router log handling #1570

Closed
wants to merge 29 commits into from
Closed

Overhaul router log handling #1570

wants to merge 29 commits into from

Conversation

smlx
Copy link
Member

@smlx smlx commented Jan 22, 2020

Checklist

  • Affected Issues have been mentioned in the Closing issues section
  • Documentation has been written/updated
  • PR title is ready for changelog and subsystem label(s) applied

This change implements the router logs part of the logging infrastructure rebuild (#1297).

It introduces three new services:

  • logs-tee listens for remote syslog UDP packets, and forwards them to 2 (or more) destinations.
    This is designed to allow the new router logging system to be deployed alongside the existing system until it has been tested in production. Then eventually the existing system can be removed.
  • router-logs-collector takes remote syslog, converts it to JSON format and forwards it to a rabbitMQ exchange.
  • logs-dispatcher takes logs from a rabbitMQ queue in batches and sends it to an elasticsearch index.

Here's the route that router logs now take:

haproxy -> logs-tee -> router-logs-collector -> rabbitMQ -> logs-dispatcher -> elasticsearch

The key changes from the existing router log infrastructure is:

  • sends logs from all environments to a single "project index" in Elasticsearch.
    The aim of this is to improve Elasticsearch performance by reducing the number of indices.
    This relies on kubernetes namespace labelling with lagoon/project: <projectname>, which is read by fluentd during log parsing.
  • uses fluentd parsing only (no more logstash)
  • routes through rabbitMQ to provide a central buffer for logs and enable e.g. dispatching logs out to other (external) endpoints.

This PR doesn't include .lagoon.yml, because I'm not sure how to test that locally.

It does include integration tests for the new components. You can run them locally:

# bring up minimal testbed
make minishift/start MINISHIFT_MEMORY=6GB MINISHIFT_CPUS=4 && make up COMPOSE_FILE=docker-compose.logs.yaml
# run tests
make build:tests && make tests/logs
# clean up ES index
curl -s --user admin:admin -XDELETE http://localhost:9200/router-logs-local_development_cluster-mytestproject-$(date +%Y.%m)

Note that this PR builds on top of #1428, and includes #1526, #1489, #1566, #1569, #1542 which have been split out for easier review.

The easiest diff for changes in this PR is auto-build-system...logs-overhaul

Closing issues

Partially implements #1364
Closes #1298, #674
Relates to #1297, #1164, #1454, #1146

The effective version of the previous build-dep was 7.3
Simplify the build by reusing the same variable name.
* use BASE_VERSION
* add versions to pecl extensions, as recommended upstream
Also remove recursive make from minishift and add minishift/start
target.
Remove logic from Makefile as well as build/ directory and associated
.gitignore entries.
Use dependencies to avoid recursive make.
* use bare keycloak service name in docker-compose.yaml where possible
* make keycloak service name configurable where necessary
* configure the keycloak service URL in the `make up` target on linux
Avoid .1 as that is the host gateway.
Minishift can be slow, so bump the number of retries to give it more
time to update routes.
This service intercepts router logs from openshift and copies them to
both the old and the new router log collector.
This service takes router logs, parses them, and sends them to the
broker.
This services collects logs from the broker and sends them to the
logs-db.
This is required for the router logs test suite.
The compose file to use can be targeted via the `up` Makefile target:

  make up COMPOSE_FILE=docker-compose.logs.yaml
@smlx smlx added the 3-logging-reporting Logging & Reporting subsystem label Jan 22, 2020
@smlx smlx requested a review from dasrecht January 22, 2020 07:13
@smlx
Copy link
Member Author

smlx commented May 8, 2020

This is partially superseded by #1859. I'll close this one, maybe some bits will make it into another PR.

@smlx smlx closed this May 8, 2020
@smlx smlx deleted the logs-overhaul branch June 23, 2020 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3-logging-reporting Logging & Reporting subsystem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logging: router-logs-collector
1 participant