Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Commit

Permalink
Merge pull request #1978 from weaveworks/weave-1.5-doc-updates
Browse files Browse the repository at this point in the history
Operational guide. Fixes #726, fixes #1102
  • Loading branch information
bboreham committed Jun 3, 2016
2 parents 9b6154c + e092a51 commit 5dd7590
Show file tree
Hide file tree
Showing 8 changed files with 539 additions and 1 deletion.
2 changes: 1 addition & 1 deletion site/installing-weave/systemd.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ normally placed in `/etc/systemd/system/weave.service`.
After=docker.service
[Service]
EnvironmentFile=-/etc/sysconfig/weave
ExecStartPre=/usr/local/bin/weave launch $PEERS
ExecStartPre=/usr/local/bin/weave launch --no-restart $PEERS
ExecStart=/usr/bin/docker attach weave
ExecStop=/usr/local/bin/weave stop
[Install]
Expand Down
5 changes: 5 additions & 0 deletions site/operational-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: Operational Guide
menu_order: 35
---
Operations manual.
55 changes: 55 additions & 0 deletions site/operational-guide/autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: Autoscaling
menu_order: 40
---

### Bootstrap

An autoscaling configuration begins with a fixed cluster:

* Configured as per the [Uniform Fixed Cluster](/site/operational-guide/uniform-fixed-cluster.md)
scenario.
* Hosted on reserved or protected instances to ensure long-term
stability.
* Ideally sized at a minimum of three or five nodes (you can make your
fixed cluster bigger to accommodate base load as required; a minimum
is specified here in the interests of resilience only.)

Building on this foundation, arbitrary numbers of dynamic peers can be
added or removed concurrently as desired, without requiring any
changes to the configuration of the fixed cluster. As with the fixed
cluster, dynamically added nodes recover automatically from reboots
and partitions.

### Scale-out

On additional dynamic peer, at boot, via
[systemd](/site/installing-weave/systemd.md) or equivalent:

weave launch --no-restart --ipalloc-init=observer $PEERS

In this case `$PEERS` means all peers in the _fixed cluster_, initial
and subsequently added, which have not been explicitly removed. It
should include fixed peers which are temporarily offline or stopped.

Note that you do not have to keep track of and specify the addresses
of other dynamic peers in `$PEERS` - they will discover and connect to
each other via the fixed cluster.

> The use of `--ipalloc-init=observer` prevents dynamic peers from
> coming to a consensus on their own - this is important to stop a
> clique forming amongst a group of dynamically added peers if they
> become partitioned from the fixed cluster after having learned about
> each other via discovery.
### Scale-in

On dynamic peer to be removed:

weave reset

If for any reason you cannot arrange for `weave reset` to be run on
the peer before the underlying host is destroyed (for example when
using spot instances that can be destroyed without notice), you will
need an asynchronous process to [reclaim lost IP address
space](/site/operational-guide/tasks.md#detect-reclaim-ipam).
143 changes: 143 additions & 0 deletions site/operational-guide/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
title: Concepts
menu_order: 10
---
This section describes some essential concepts with which you will
need to be familiar before moving on to the example deployment
scenarios.

## Host

For the purposes of this documentation we consider a host to be an
installation of the Linux operating system which is running an
instance of the Docker Engine. It may be executing directly on bare
hardware or inside a virtual machine.

## Peer

A peer is a running instance of Weave Net, typically one per host.

## Peer Name

Peers in the weave network are identified by a 48-bit value formatted
like an ethernet MAC address e.g. `01:23:45:67:89:ab`. This 'peer
name' is used for various purposes:

* Routing of packets between containers on the overlay network
* Recording the origin peer of DNS entries
* Recording ownership of IP address ranges

Whilst it is desirable for the peer name to remain stable across
restarts, it is essential that it is unique - if two or more peers
share the same name chaos will ensue, including but not limited to
double allocation of addresses and inability to route packets on the
overlay network. Consequently when the router is launched on a host it
derives its peer name in order of preference:

* From the command line; user is responsible for uniqueness and
stability
* From the BIOS product UUID, which is generally stable across
restarts and unique across different physical hardware and certain
cloned VMs
* From the hypervisor UUID, which is generally stable across restarts
and unique across VMs which do not provide access to a BIOS product
UUID
* From a random value, practically unique across different physical
hardware and cloned VMs but not stable across restarts

The appropriate strategy for assigning peer names depends on the type
and method of your particular deployment and is discussed in more
detail below.

## Peer Discovery

Peer discovery is a mechanism which allows peers to learn about new
weave hosts from existing peers without being explicitly told. Peer
discovery is
[enabled by default](/site/using-weave/finding-adding-hosts-dynamically.md).

## Network Partition

A network partition is a transient condition whereby some arbitrary
subsets of peers are unable to communicate with each other for the
duration - perhaps because a network switch has failed, or a fibre
optic line severed. Weave is designed to allow peers and their
containers to make maximum safe progress under conditions of
partition, healing automatically once the partition is over.

## IP Address Manager (IPAM)

[IPAM](/site/ipam.md) is the subsystem responsible for dividing up a
large contiguous block of IP addresses (known as the IP allocation
range) amongst peers so that individual addresses may be uniquely
assigned to containers anywhere on the overlay network.

When a new network is formed an initial division of the IP allocation
range must be made. Two (mutually exclusive) mechanisms with different
tradeoffs are provided to perform this task: seeding and consensus.

### Seeding

Seeding requires each peer to be told the list of peer names amongst
which the address space is to be divided initially. There are some
constraints and consequences:

* Every peer added to the network _must_ receive the same seed list,
for all time, or they will not be able to join together to form a
single cohesive whole
* Because the 'product UUID' and 'random value' methods of peer name
assignment are unpredictable, the end user must by necessity also
specify peer names
* Even though every peer _must_ receive the same seed, that seed does
_not_ have to include every peer in the network, nor does it have to
be updated when new peers are added (in fact due to the first
constraint above it may not be)


Example configurations are given in the section on deployment
scenarios:

* [Uniform Dynamic Cluster](/site/operational-guide/uniform-dynamic-cluster.md)

### Consensus

Alternatively, when a new network is formed for the first time peers
can be configured to co-ordinate amongst themselves to automatically
divide up the IP allocation range. This process is known as consensus
and requires each peer to be told the total number of expected peers
(the 'initial peer count') in order to prevent formation of disjoint
groups of peers which would, ultimately, result in duplicate IP
addresses.

Example configurations are given in the section on deployment
scenarios:

* [Interactive Deployment](/site/operational-guide/interactive.md)
* [Uniform Fixed Cluster](/site/operational-guide/uniform-fixed-cluster.md)

### Observers

Finally, an option is provided to start a peer as an _observer_. Such
peers do not require either a seed peer name list nor an initial peer
count; instead they rely on the existence of other peers in the
network which have been so configured. When an observer needs address
space, it will ask for it from one of the peers which partook of the
initial division, triggering consensus if necessary.

Example configurations are given in the section on deployment
scenarios:

* [Autoscaling](/site/operational-guide/autoscaling.md)

## Persistence

Certain information is remembered between launches of weave (for
example across reboots):

* The division of the IP allocation range amongst peers
* Allocation of addresses to containers

The persistence of this information is managed transparently in a
volume container but can be
[destroyed explicitly](/site/operational-guide/tasks.md#reset)
if necessary.
63 changes: 63 additions & 0 deletions site/operational-guide/interactive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Interactive Deployment
menu_order: 20
---

This pattern is recommended for exploration and evaluation only, as
the commands described herein are interactive and not readily amenable
to automation and configuration management. Nevertheless, the
resulting weave network will survive host reboots without the use of a
systemd unit as long as Docker is configured to start on boot.

### Bootstrap

On initial peer:

weave launch

### Add a Peer

On new peer:

weave launch <extant peers>

Where `<extant peers>` means all peers in the network, initial and
subsequently added, which have not been explicitly removed. It should
include peers which are temporarily offline or stopped.

You must then execute:

weave prime

to ensure that the new peer has joined to the existing network; you
_must_ wait for this to complete successfully before moving on to add
further new peers. If this command blocks it means that there is some
issue (such as a network partition or failed peers) that is preventing
a quorum from being reached - you will need to [address
that](/site/troubleshooting.md) before moving on.

### Stop a Peer

A peer can be stopped temporarily with the following command:

weave stop

Such a peer will remember IP address allocation information on the
next `weave launch` but will forget any discovered peers or
modifications to the initial peer list that were made with `weave
connect` or `weave forget`. Note that if the host reboots, Docker
will restart the peer automatically.

### Remove a Peer

On peer to be removed:

weave reset

Then optionally on each remaining peer:

weave forget <removed peer>

This step is not mandatory, but it will eliminate log noise and
spurious network traffic by stopping reconnection attempts and
preventing further connection attempts after restart.
99 changes: 99 additions & 0 deletions site/operational-guide/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: Administrative Tasks
menu_order: 60
---
##<a name="start-on-boot"></a>Configure Weave to Start Automatically on Boot

`weave launch` runs all weave's containers with a Docker restart
policy of `always`, so as long as you have launched weave manually
once and your system is configured to start Docker on boot then weave
will be started automatically on system restarts.

If you're aiming for a non-interactive installation, you can use
systemd to launch weave after Docker - see [systemd
docs](/site/installing-weave/systemd.md) for details.

##<a name="detect-reclaim-ipam"></a>Detect and Reclaim Lost IP Address Space

The recommended way of removing a peer is to run `weave reset` on that
peer before the underlying host is decommissioned or repurposed - this
ensures that the portion of the IPAM allocation range assigned to the
peer is released for reuse. Under certain circumstances this operation
may not be successful, or indeed possible:

* If the peer in question is partitioned from the rest of the network
when `weave reset` is executed on it
* If the underlying host is no longer available to execute `weave
reset` due to a hardware failure or other unplanned termination (for
example when using autoscaling with spot-instances that can be
destroyed without notice)

In some cases you may already be aware of the problem, as you were
unable to execute `weave reset` successfully or because you know
through other channels that the host has died - in these cases you can
proceed straight to the section on reclaiming lost space.

However in some scenarios it may not be obvious that space has been
lost, in which case you can check for it periodically with the
following command on any peer:

weave status ipam

This will list the names of unreachable peers; if you are satisifed
that they are truly gone, rather than temporarily unreachable due to a
partition, you can reclaim their space manually.

When a peer dies unexpectedly the remaining peers will consider its
address space to be unavailable even after it has remained unreachable
for prolonged periods; there is no universally applicable time limit
after which one of the remaining peers could decide unilaterally that
it is safe to appropriate the space for itself, and so an
administrative action is required to reclaim it.

The `weave rmpeer` command is provided to perform this task, and must
be executed on _one_ of the remaining peers. That peer will take
ownership of the freed address space.

##<a name="cluster-upgrade"></a>Upgrade a Cluster

Protocol versioning and feature negotiation are employed in Weave Net
to enable incremental rolling upgrades - each major release maintains
the ability to speak to the preceding major release at a minimum, and
connected peers only utilise features which both support. The general
upgrade procedure is as follows:

On each peer in turn:

* Stop the old weave with `weave stop` (or `systemctl stop weave` if
you're using a systemd unit file)
* Download the new weave script and replace the existing one
* Start the new weave with `weave launch <existing peer list>` (or
`systemctl start weave` if you're using a systemd unit file)

This will result in some downtime as the first launch with the new
script has to pull the new container images; if you wish to minimise
downtime you can download the new script to a temporary location
first:

* Download the new weave script to a temporary location e.g.
`/path/to/new/weave`
* Pull the new images with `/path/to/new/weave setup`
* Stop the old weave with `weave stop` (or `systemctl stop weave` if
you're using a systemd unit file)
* Replace the existing script with the new one
* Start the new weave with `weave launch <existing peer list>` (or
`systemctl start weave` if you're using a systemd unit file)

> NB Always check the release notes for specific versions in case
> there are any special caveats or deviations from the standard
> procedure.
##<a name="reset"></a>Reset Persisted Data

Weave Net persists information in a data volume container named
`weavedb`. If you wish to start from a completely clean slate (for
example to withdraw a peer from one network and join it to another)
you can issue the following command:

weave reset

Loading

0 comments on commit 5dd7590

Please sign in to comment.