Docker based worker for TaskCluster
JavaScript Shell Other
Latest commit 3f068ff Aug 23, 2016 @gregarndt gregarndt committed on GitHub Merge pull request #242 from gregarndt/improve_image_loading
Attempt to remove files once they are no longer needed
Failed to load latest commit information.
bin-utils-src Initial hack on noVNC support Nov 20, 2015
bin-utils Initial hack on noVNC support Nov 20, 2015
bin Add statsum client to garbage collector May 6, 2016
deploy Bug 1294205: Update docker-worker AMIs for cve-2016-5696. r=garndt Aug 11, 2016
lib Attempt to remove files once they are no longer needed Aug 23, 2016
schemas Update links in schema to point to updated URLs. Aug 23, 2016
test Bug 1295184 part 3: Make billing cycle interval configurable. r=garndt Aug 18, 2016
.babelrc Spot termination should be checked on startup Apr 17, 2015
.dockerignore Docker ignore file Jul 18, 2014
.eslintrc Improve tests by using an already existing indexed task Sep 30, 2015
.gitignore Fix tests affected by moving to docker 1.10 Feb 25, 2016
.jshintrc Bug 1131747 - Add testdroid proxy feature to docker-worker Feb 14, 2015
.npmignore Add .npmignore so we only bake in needed details in the ami Jul 20, 2014
Dockerfile Update env, invalid payload, maxruntime, tests for v2 queue Jul 31, 2014
README.md Update readme to include post deployment verification Mar 24, 2016
Vagrantfile Reload vagrant vm after applying updates Feb 24, 2016
build.sh Fixed some interactive bugs and got vnc working and tested :) Nov 21, 2015
config.yml Add lib-monitor client May 3, 2016
npm-shrinkwrap.json Add lib-monitor client May 3, 2016
package.json Add lib-monitor client May 3, 2016
packer.json reworked deploy scripting Feb 4, 2014
taskgraph.json Update to docker 1.10.0 Feb 22, 2016
vagrant.sh Reload vagrant vm after applying updates Feb 24, 2016

README.md

Docker Worker

Docker task host for linux.

Each task is evaluated in an restricted docker container. Docker has a bunch of awesome utilities for making this work well... Since the images are COW running any number of task hosts is plausible and we can manage their overall usage.

We manipulate the docker hosts through the use of the docker remote api

See the doc site for how to use the worker from an existing worker-type the docs here are for hacking on the worker itself.

Requirements

  • Node 0.12.x
  • Docker
  • Packer (to build AMI)

Usage

# from the root of this repo) also see --help
node --harmony bin/worker.js <config>

Configuration

The defaults contains all configuration options for the docker worker in particular these are important:

  • taskcluster the credentials needed to authenticate all pull jobs from taskcluster.

  • pulse the credentials for listening to pulse exchanges.

  • registries registry credentials

  • influx connection string and settings for sending metrics to influx.

Directory Structure

Environment

docker-worker runs in an Ubuntu environment with various packages and kernel modules installed.

Within the root of the repo is a Vagrantfile and vagrant.sh script that simplifies creating a local environment that mimics the one uses in production. This environment allows one to not only run the worker tests but also to run images used in TaskCluster in an environment similar to production without needing to configure special things on the host.

Loopback Devices

The v4l2loopback and snd-aloop kernel modules are installed to allow loopback audio/video devices to be available within tasks that require them. For information on how to configure these modules like production, consult the vagrant script used for creating a local environment.

Running tests

There are a few components that must be configured for the tests to work properly (e.g. docker, kernel modules, and other packages). To ease the setup, a vagrant file is provided in this repo that can setup an environment very similar to the one docker-worker runs in production.

Setting up vagrant

  1. Install VirtualBox
  2. Install Vagrant
  3. Install vagrant-reload by running vagrant plugin install vagrant-reload
  4. Within the root of the repo, run vagrant up

*** Note: If TASKCLUSTER_ACCESS_TOKEN, TASKCLUSTER_CLIENT_ID, PULSE_USERNAME, PULSE_PASSWORD are configured within the virtual environment if available locally when building ***

Logging into virtual machine and configuring environment

  1. vagrant ssh
  2. The tests require TASKCLUSTER_ACCESS_TOKEN, TASKCLUSTER_CLIENT_ID, PULSE_USERNAME, PULSE_PASSWORD to be setup within the environment. If they were not available locally when building, add them to the virtual machine now.
  3. cd /vagrant # Your local checkout of the docker-worker repo is made available under the '/vagrant' directory
  4. ./build.sh # Builds some of the test images that are required
  5. npm install # Installs all the necessary node modules

Running Tests

The following set of scopes are needed for running the test suites:

  • queue:create-task:no-provisioning-nope/dummy-type-*
  • queue:poll-task-urls
  • queue:claim-task
  • queue:resolve-task
  • queue:create-artifact:public/*
  • queue:get-artifact:private/docker-worker-tests/*
  • queue:cancel-task
  • assume:worker-type:no-provisioning-nope/dummy-type-*
  • assume:scheduler-id:docker-worker-tests/*
  • assume:worker-id:random-local-worker/dummy-worker-*

  1. Either all the tests can be run, but running npm test or ./test/test.sh, however, under most circumstances one only wants to run a single test suite
  2. For individual test files, run ./node_modules/mocha/bin/mocha --bail test/<file>
  3. For running tests within a test file, add "--grep " when running the above command to capture just the individual test name.

*** Note: Sometimes things don't go as planned and tests will hang until they timeout. To get more insight into what went wrong, set "DEBUG=" when running the tests to get more detailed output. **

Common problems

  • Time synchronization : if you're running docker in a VM your VM may drift in time... This often results in stale warnings on the queue.

Deployment

The below is a detailed guide to how deployment works if you know what you're doing and just need a check list see: deployment check list

Requirements

  • packer
  • make
  • node 0.12.x
  • credentials for required services (i

Amazon Credentials

docker-worker is currently deployed to AWS EC2. Using packer to configure and deploy an AMI requires Amazon credentials to be specified. Follow this document to configure the environment appropriately.

Building AMI's

The docker worker deploy script is essentially a wrapper around packer with an interactive configuration script to ensure you're not missing particular environment variables. There are two primary workflows that are important.

  1. Building the base AMI. Do this when:

    • You need to add new apt packages.

    • You need to update docker (see above).

    • You need to run some expensive one-off installation.

    • You need to update ssl/gpg keys

      Note that you need to manually update the sourceAMI field in the app.json file after you create a new base AMI.

      Also note to generate this base AMI, access to the ssl and gpg keys that the work needs is necessary.

      Example:

      ./deploy/bin/build base
  2. Building the app AMI. Do this when:

    • You want to deploy new code/features.

    • You need to update diamond/statsd/configs (not packages).

      Note: That just because you deploy an AMI does not mean anyone is using it.. Usually you need to also update a provisioner workerType with the new AMI id.

      Example:

      ./deploy/bin/build app

Everything related to the deployment of the worker is in the deploy folder which has a number of other important sub folders.

  • deploy/packer : The packer folder contains a list (app/base) of ami(s) which need to be created... Typically you only need to build the "app" ami which is built on a pre-existing base ami (see sourceAMI in app.json).

  • deploy/variables.js : contains the list of variables for the deployment and possible defaults

  • deploy/template : This folder is a mirror of what will be deployed on the server but with mustache like variables (see variables.js for the list of all possible variables) if you need to add a script/config/etc... Add it here in one of the sub folders.

  • deploy/deploy.json : A generated file (created by running deploy/bin/build ) or running make -C deploy this file contains all the variables needed to deploy the application

  • deploy/target : Contains the final files to be uploaded when creating the AMI all template values have been subsituted... It is useful to check this by running make -C deploy prior to building the full ami.

  • deploy/bin/build : The script responsible for invoking packer with the correct arguments and creating the artifacts which need to be uploaded to the AMI)

Block-Device Mapping

The AMI built with packer will mount all available instances storage under /mnt and use this for storing docker images and containers. In order for this to work you must specify a block device mapping that maps ephemeral[0-9] to /dev/sd[b-z].

It should be noted that they'll appear in the virtual machine as /dev/xvd[b-z], as this is how Xen storage devices are named under newer kernels. However, the format and mount script will mount them all as a single partition on /mnt using LVM.

An example block device mapping looks as follows:

  {
  "BlockDeviceMappings": [
      {
        "DeviceName": "/dev/sdb",
        "VirtualName": "ephemeral0"
      },
      {
        "DeviceName": "/dev/sdc",
        "VirtualName": "ephemeral1"
      }
    ]
  }

Updating Schema

Schema changes are not deployed automatically so if the schema has been changed, the run the upload-schema.js script to update.

Before running the upload schema script, ensure that AWS credentials are loaded into your environment. See Configuring AWS with Node

Run the upload-schema.js script to update the schema:

babel-node --harmony bin/upload-schema.js

Post-Deployment Verification

After creating a new AMI, operation can be verified by updating a test worker type in the AWS Provisioner and submitting tasks to it. Ensure that the tasks were claimed and completed with the successful outcome. Also add in features/capabilities to the tasks based on code changes made in this release.

Further verification should be done if underlying packages, such as docker, change. Stress tests should be used (submit a graph with a 1000 tasks) to ensure that all tasks have the expected outcome and complete in an expected amount of time.

Errors from docker-worker are reported into papertrail and should be monitored during roll out of new AMIs. Searching for the AMI Id along with ("task resolved" OR "claim task") should give a rough idea if work is being done using these new AMIs.