Processes log updates from Travis Worker, streams them to the web client, aggregates them, and archives to S3.
Ruby Shell PLpgSQL
Switch branches/tags
deploy.2014-10-17.17-50 deploy.2014-10-12.14-29 deploy.2014-10-12.14-14 deploy.2014-10-12.14-11 deploy.2014-09-15.14-46 deploy.2014-08-25.10-30 deploy.2014-08-25.10-21 deploy.2014-08-25.09-15 deploy.2014-08-19.10-00 deploy.2014-08-19.09-39 deploy.2014-08-18.14-30 deploy.2014-08-18.14-26 deploy.2014-07-04.17-27 deploy.2014-07-04.13-00 deploy.2014-07-04.12-21 deploy.2014-07-04.12-04 deploy.2014-06-30.09-47 deploy.2014-06-30.09-46 deploy.2014-06-29.22-30 deploy.2014-06-09.05-31 deploy.2014-06-09.04-00 deploy.2014-06-09.03-56 deploy.2014-06-09.03-19 deploy.2014-05-17.12-27 deploy.2014-05-01.19-41 deploy.2014-04-30.18-29 deploy.2014-01-20.14-42 deploy.2014-01-07.11-19 deploy.2013-11-13.08-45 deploy.2013-11-04.00-17 deploy.2013-09-16.16-06 deploy.2013-09-16.15-35 deploy.2013-09-16.15-23 deploy.2013-09-16.15-07 deploy.2013-09-16.15-01 deploy.2013-09-16.14-38 deploy.2013-09-16.12-11 deploy.2013-09-16.11-41 deploy.2013-09-16.11-18 deploy.2013-07-04.12-09 deploy.2013-07-04.11-47 deploy.2013-07-04.11-42 deploy.2013-06-27.19-45 deploy.2013-05-27.15-02 deploy.2013-05-27.14-45 deploy.2013-05-27.13-58 deploy.2013-05-27.01-10 deploy.2013-05-26.18-43 deploy.2013-05-26.17-37 deploy.2013-05-26.17-35 deploy.2013-05-25.18-24 deploy.2013-05-25.17-06 deploy.2013-05-25.16-33 deploy.2013-05-25.15-50 deploy.2013-05-25.15-47 deploy.2013-05-23.14-12 deploy.2013-05-23.12-35 deploy.2013-05-22.20-51 deploy.2013-05-22.12-57 deploy.2013-05-20.13-53 deploy.2013-05-10.11-31 deploy.2013-04-29.13-50 deploy.2013-04-29.13-46 deploy.2013-04-29.13-43 deploy.2013-03-14.00-00 deploy.2013-03-13.23-05 deploy.2013-03-13.23-00 deploy.2013-03-13.22-28 deploy.2013-03-13.18-10 deploy.2013-03-13.17-48 deploy.2013-03-09.21-50 deploy.2013-03-06.01-29 deploy.2013-03-04.16-44 deploy.2013-03-04.16-38 deploy.2013-03-02.18-24 deploy.2013-03-02.18-15 deploy.2013-03-02.18-06 deploy.2013-03-02.17-59 deploy.2013-03-02.17-43 deploy.2013-02-11.20-11 deploy.2013-02-07.21-51 deploy.2013-02-06.23-39 deploy.2013-02-06.17-00 deploy.2013-02-06.16-52 deploy.2013-02-06.16-35 deploy.2013-02-06.14-45 deploy.2013-02-05.23-06 deploy.2013-02-05.21-53 deploy.2013-02-05.14-45 deploy.2013-02-01.17-30 deploy.2013-01-31.22-54 deploy.2013-01-31.22-48 deploy.2013-01-29.21-14 deploy.2013-01-28.15-30 deploy.2013-01-28.15-22 deploy.2013-01-28.13-41 deploy.2013-01-28.13-28 deploy.2013-01-28.13-16 deploy.2013-01-28.03-48 deploy.2013-01-28.03-25
Nothing to show
Latest commit a4bdd5c Dec 8, 2017 @bnferguson bnferguson Merge pull request #165 from travis-ci/bf-debug-dead-drain-consumer
More switching logger to debug, this time for drain consumers
Failed to load latest commit information.
bin override log level directly on Travis.logger Aug 16, 2017
db Default enterprise migrations ignore partman Jul 5, 2017
lib Switching this to debug since when Rabbit goes down we see lots of ot… Dec 7, 2017
script Remove an unused func + shfmt Jul 12, 2017
spec Don't update aggregated_at nor content if there're no log parts (#156) Oct 10, 2017
.example.env Style, organization, and rubocop-related cleanups Apr 14, 2017
.gitignore Port over some ignores from MRI branch Dec 17, 2015
.rspec Convert spec/support to spec/spec_helper Dec 15, 2016
.rubocop.yml Bump up the max class length 🙀 Jun 1, 2017
.rubocop_todo.yml Ensure enterprise migrations script is idempotent Jul 12, 2017
.ruby-version update ruby version Sep 15, 2017
.simplecov Style, organization, and rubocop-related cleanups Apr 14, 2017
.travis.yml update ruby version Sep 15, 2017
Dockerfile Sqitch now runs in the docker container using a helper script, but fa… Jun 21, 2017
Gemfile update ruby version Sep 15, 2017
Gemfile.lock casual deps update Jul 21, 2017
LICENSE Rearrange some bits in bin/, script/, and config/ Mar 21, 2017
PULL_REQUEST_TEMPLATE Add pull request template [skip ci] Aug 9, 2017
Procfile Remove pgbouncer wrapper from worker_low May 11, 2017 Add section in README for `worker_critical` dyno May 9, 2017
Rakefile Rearrange rake db stuff for (hopefully?) familiarity Apr 5, 2017 Style, organization, and rubocop-related cleanups Apr 14, 2017
docker-compose.yml Lets match the procfile more. Also dont need the build arg Jun 9, 2017
sqitch.conf Get rid of the toml modeline Apr 5, 2017

Travis Logs

Build Status

Travis Logs processes log updates which are streamed from Travis Worker instances via RabbitMQ. The log parts are streamed via Pusher to the web client (Travis Web) and added to the database.

Once all log parts have been received, and a timeout has passed (10 seconds default), the log parts are aggregated into one final log.

Travis Logs archives logs to S3 and the database records are purged once it is verified that the logs are archived correctly.

Local Development

When developing locally, one may want to set certain config params via env vars, such as a DATABASE_URL that points to a valid PostgreSQL server. See the .example.env file for examples.

Process types

Some of the process types listed in ./Procfile depend on other process types, while others are independent:

drain process

The drain process is responsible for consuming log parts messages via AMQP and batching them together as enqueued jobs in the log_parts sidekiq queue.

web process

The web process runs a Sinatra web app that exposes APIs to handle interactions with other Travis applications and the external Pusher service.

worker_critical process

The worker_critical process is responsible for handling jobs from the following sidekiq queues:

logs.pusher_forwarding sidekiq queue

The jobs in the logs.pusher_forwarding queue forward each log part individually to Pusher.

worker_high process

The worker_high process is responsible for handling jobs from the following sidekiq queues:

log_parts sidekiq queue

The jobs in the log_parts sidekiq queue write batches of log parts records to the log_parts table.

aggregate sidekiq queue

The jobs in the aggregate sidekiq queue combine all log_parts records for a given log id into a single content blob that is set on the corresponding logs record and then deletes the log_parts records.

worker_low process

The worker_low process is responsible for handling jobs from the following sidekiq queues:

archive sidekiq queue

Jobs in the archive sidekiq queue move the content of each fully aggregated log record from the database to S3. Once archiving is complete, a job is sent for consumption in the purge sidekiq queue.

purge sidekiq queue

Jobs in the purge sidekiq queue set the log record content to NULL after verifying that the archived (S3) content fully matches the log record content. If there is a mismatch, the log id is sent to the archive sidekiq queue for re-archiving.

aggregate_sweeper process

The aggregate_sweeper process is an optional process that periodically queries the log_parts table for records that may have been missed by the event-based aggregation process that flows through the aggregate sidekiq queue.

Database specifics

Schema management

The schema and migrations for travis-logs are managed with sqitch. All of the deploy, verify, and revert scripts may be found in the ./db/ directory.

Data lifecycle

The process types above use PostgreSQL for various operations, with a structure of two tables: logs and log_parts. Normal operations may be generalized as a progression from writing to log_parts, to combining those records into logs, and then moving the content to S3.

For this reason, the log_parts table at any one time is mostly empty space, with the size reported by PostgreSQL being significantly larger than what is really there. To a lesser degree, the logs table is also mostly empty, although the live record count will continue to grow over the lifetime of a deployment as metadata is retained after the content has been moved to S3.

Partitioned log_parts

In order to address the empty space growth caused by the high record churn of log_parts, the deployments of travis-logs used for hosted Travis CI use the pg_partman extension to drop daily partitions that are 2 days old.

The partitions are maintained by running the partman.run_maintenance query, triggered via a daily Heroku scheduled job. Because the log_parts table is being accessed constantly in production, and various operations within partman.run_maintenance require a PostgreSQL lock type of AccessExclusiveLock of the log_parts table, the implementation of the maintenance operation includes a redis-based switch that prevents access to the log_parts table via other processes.

During the maintenance operation, sidekiq workers will sleep and retry, then resume upon maintenance completion. Any requests to web dynos during maintenance that require access to the log_parts table will return 503. This is certainly not ideal, and more changes may be considered to further reduce production impact in the future. In practice, the complete maintenance operation lasts about 1 minute.

License & copyright information

See LICENSE file.

Copyright (c) 2011-2017 Travis CI GmbH