Merge pull request #1978 from weaveworks/weave-1.5-doc-updates

Operational guide. Fixes #726, fixes #1102
weaveworks · Jun 3, 2016 · 5dd7590 · 5dd7590
2 parents 9b6154c + e092a51
commit 5dd7590
Show file tree

Hide file tree

Showing 8 changed files with 539 additions and 1 deletion.
diff --git a/site/installing-weave/systemd.md b/site/installing-weave/systemd.md
@@ -19,7 +19,7 @@ normally placed in `/etc/systemd/system/weave.service`.
     After=docker.service
     [Service]
     EnvironmentFile=-/etc/sysconfig/weave
-    ExecStartPre=/usr/local/bin/weave launch $PEERS
+    ExecStartPre=/usr/local/bin/weave launch --no-restart $PEERS
     ExecStart=/usr/bin/docker attach weave
     ExecStop=/usr/local/bin/weave stop
     [Install]

diff --git a/site/operational-guide.md b/site/operational-guide.md
@@ -0,0 +1,5 @@
+---
+title: Operational Guide
+menu_order: 35
+---
+Operations manual.
diff --git a/site/operational-guide/autoscaling.md b/site/operational-guide/autoscaling.md
@@ -0,0 +1,55 @@
+---
+title: Autoscaling
+menu_order: 40
+---
+
+### Bootstrap
+
+An autoscaling configuration begins with a fixed cluster:
+
+* Configured as per the [Uniform Fixed Cluster](/site/operational-guide/uniform-fixed-cluster.md)
+  scenario.
+* Hosted on reserved or protected instances to ensure long-term
+  stability.
+* Ideally sized at a minimum of three or five nodes (you can make your
+  fixed cluster bigger to accommodate base load as required; a minimum
+  is specified here in the interests of resilience only.)
+
+Building on this foundation, arbitrary numbers of dynamic peers can be
+added or removed concurrently as desired, without requiring any
+changes to the configuration of the fixed cluster. As with the fixed
+cluster, dynamically added nodes recover automatically from reboots
+and partitions.
+
+### Scale-out
+
+On additional dynamic peer, at boot, via
+[systemd](/site/installing-weave/systemd.md) or equivalent:
+
+    weave launch --no-restart --ipalloc-init=observer $PEERS
+
+In this case `$PEERS` means all peers in the _fixed cluster_, initial
+and subsequently added, which have not been explicitly removed. It
+should include fixed peers which are temporarily offline or stopped.
+
+Note that you do not have to keep track of and specify the addresses
+of other dynamic peers in `$PEERS` - they will discover and connect to
+each other via the fixed cluster.
+
+> The use of `--ipalloc-init=observer` prevents dynamic peers from
+> coming to a consensus on their own - this is important to stop a
+> clique forming amongst a group of dynamically added peers if they
+> become partitioned from the fixed cluster after having learned about
+> each other via discovery.
+
+### Scale-in
+
+On dynamic peer to be removed:
+
+    weave reset
+
+If for any reason you cannot arrange for `weave reset` to be run on
+the peer before the underlying host is destroyed (for example when
+using spot instances that can be destroyed without notice), you will
+need an asynchronous process to [reclaim lost IP address
+space](/site/operational-guide/tasks.md#detect-reclaim-ipam).
diff --git a/site/operational-guide/concepts.md b/site/operational-guide/concepts.md
@@ -0,0 +1,143 @@
+---
+title: Concepts
+menu_order: 10
+---
+This section describes some essential concepts with which you will
+need to be familiar before moving on to the example deployment
+scenarios.
+
+## Host
+
+For the purposes of this documentation we consider a host to be an
+installation of the Linux operating system which is running an
+instance of the Docker Engine. It may be executing directly on bare
+hardware or inside a virtual machine.
+
+## Peer
+
+A peer is a running instance of Weave Net, typically one per host.
+
+## Peer Name
+
+Peers in the weave network are identified by a 48-bit value formatted
+like an ethernet MAC address e.g. `01:23:45:67:89:ab`. This 'peer
+name' is used for various purposes:
+
+* Routing of packets between containers on the overlay network
+* Recording the origin peer of DNS entries
+* Recording ownership of IP address ranges
+
+Whilst it is desirable for the peer name to remain stable across
+restarts, it is essential that it is unique - if two or more peers
+share the same name chaos will ensue, including but not limited to
+double allocation of addresses and inability to route packets on the
+overlay network. Consequently when the router is launched on a host it
+derives its peer name in order of preference:
+
+* From the command line; user is responsible for uniqueness and
+  stability
+* From the BIOS product UUID, which is generally stable across
+  restarts and unique across different physical hardware and certain
+  cloned VMs
+* From the hypervisor UUID, which is generally stable across restarts
+  and unique across VMs which do not provide access to a BIOS product
+  UUID
+* From a random value, practically unique across different physical
+  hardware and cloned VMs but not stable across restarts
+
+The appropriate strategy for assigning peer names depends on the type
+and method of your particular deployment and is discussed in more
+detail below.
+
+## Peer Discovery
+
+Peer discovery is a mechanism which allows peers to learn about new
+weave hosts from existing peers without being explicitly told. Peer
+discovery is
+[enabled by default](/site/using-weave/finding-adding-hosts-dynamically.md).
+
+## Network Partition
+
+A network partition is a transient condition whereby some arbitrary
+subsets of peers are unable to communicate with each other for the
+duration - perhaps because a network switch has failed, or a fibre
+optic line severed. Weave is designed to allow peers and their
+containers to make maximum safe progress under conditions of
+partition, healing automatically once the partition is over.
+
+## IP Address Manager (IPAM)
+
+[IPAM](/site/ipam.md) is the subsystem responsible for dividing up a
+large contiguous block of IP addresses (known as the IP allocation
+range) amongst peers so that individual addresses may be uniquely
+assigned to containers anywhere on the overlay network.
+
+When a new network is formed an initial division of the IP allocation
+range must be made. Two (mutually exclusive) mechanisms with different
+tradeoffs are provided to perform this task: seeding and consensus.
+
+### Seeding
+
+Seeding requires each peer to be told the list of peer names amongst
+which the address space is to be divided initially. There are some
+constraints and consequences:
+
+* Every peer added to the network _must_ receive the same seed list,
+  for all time, or they will not be able to join together to form a
+  single cohesive whole
+* Because the 'product UUID' and 'random value' methods of peer name
+  assignment are unpredictable, the end user must by necessity also
+  specify peer names
+* Even though every peer _must_ receive the same seed, that seed does
+  _not_ have to include every peer in the network, nor does it have to
+  be updated when new peers are added (in fact due to the first
+  constraint above it may not be)
+
+
+Example configurations are given in the section on deployment
+scenarios:
+
+* [Uniform Dynamic Cluster](/site/operational-guide/uniform-dynamic-cluster.md)
+
+### Consensus
+
+Alternatively, when a new network is formed for the first time peers
+can be configured to co-ordinate amongst themselves to automatically
+divide up the IP allocation range. This process is known as consensus
+and requires each peer to be told the total number of expected peers
+(the 'initial peer count') in order to prevent formation of disjoint
+groups of peers which would, ultimately, result in duplicate IP
+addresses.
+
+Example configurations are given in the section on deployment
+scenarios:
+
+* [Interactive Deployment](/site/operational-guide/interactive.md)
+* [Uniform Fixed Cluster](/site/operational-guide/uniform-fixed-cluster.md)
+
+### Observers
+
+Finally, an option is provided to start a peer as an _observer_. Such
+peers do not require either a seed peer name list nor an initial peer
+count; instead they rely on the existence of other peers in the
+network which have been so configured. When an observer needs address
+space, it will ask for it from one of the peers which partook of the
+initial division, triggering consensus if necessary.
+
+Example configurations are given in the section on deployment
+scenarios:
+
+* [Autoscaling](/site/operational-guide/autoscaling.md)
+
+## Persistence
+
+Certain information is remembered between launches of weave (for
+example across reboots):
+
+* The division of the IP allocation range amongst peers
+* Allocation of addresses to containers
+
+The persistence of this information is managed transparently in a
+volume container but can be
+[destroyed explicitly](/site/operational-guide/tasks.md#reset)
+if necessary.
diff --git a/site/operational-guide/interactive.md b/site/operational-guide/interactive.md
@@ -0,0 +1,63 @@
+---
+title: Interactive Deployment
+menu_order: 20
+---
+
+This pattern is recommended for exploration and evaluation only, as
+the commands described herein are interactive and not readily amenable
+to automation and configuration management. Nevertheless, the
+resulting weave network will survive host reboots without the use of a
+systemd unit as long as Docker is configured to start on boot.
+
+### Bootstrap
+
+On initial peer:
+
+    weave launch
+
+### Add a Peer
+
+On new peer:
+
+    weave launch <extant peers>
+
+Where `<extant peers>` means all peers in the network, initial and
+subsequently added, which have not been explicitly removed. It should
+include peers which are temporarily offline or stopped.
+
+You must then execute:
+
+    weave prime
+
+to ensure that the new peer has joined to the existing network; you
+_must_ wait for this to complete successfully before moving on to add
+further new peers. If this command blocks it means that there is some
+issue (such as a network partition or failed peers) that is preventing
+a quorum from being reached - you will need to [address
+that](/site/troubleshooting.md) before moving on.
+
+### Stop a Peer
+
+A peer can be stopped temporarily with the following command:
+
+    weave stop
+
+Such a peer will remember IP address allocation information on the
+next `weave launch` but will forget any discovered peers or
+modifications to the initial peer list that were made with `weave
+connect` or `weave forget`. Note that if the host reboots, Docker
+will restart the peer automatically.
+
+### Remove a Peer
+
+On peer to be removed:
+
+    weave reset
+
+Then optionally on each remaining peer:
+
+    weave forget <removed peer>
+
+This step is not mandatory, but it will eliminate log noise and
+spurious network traffic by stopping reconnection attempts and
+preventing further connection attempts after restart.
diff --git a/site/operational-guide/tasks.md b/site/operational-guide/tasks.md
@@ -0,0 +1,99 @@
+---
+title: Administrative Tasks
+menu_order: 60
+---
+##<a name="start-on-boot"></a>Configure Weave to Start Automatically on Boot
+
+`weave launch` runs all weave's containers with a Docker restart
+policy of `always`, so as long as you have launched weave manually
+once and your system is configured to start Docker on boot then weave
+will be started automatically on system restarts.
+
+If you're aiming for a non-interactive installation, you can use
+systemd to launch weave after Docker - see [systemd
+docs](/site/installing-weave/systemd.md) for details.
+
+##<a name="detect-reclaim-ipam"></a>Detect and Reclaim Lost IP Address Space
+
+The recommended way of removing a peer is to run `weave reset` on that
+peer before the underlying host is decommissioned or repurposed - this
+ensures that the portion of the IPAM allocation range assigned to the
+peer is released for reuse. Under certain circumstances this operation
+may not be successful, or indeed possible:
+
+* If the peer in question is partitioned from the rest of the network
+  when `weave reset` is executed on it
+* If the underlying host is no longer available to execute `weave
+  reset` due to a hardware failure or other unplanned termination (for
+  example when using autoscaling with spot-instances that can be
+  destroyed without notice)
+
+In some cases you may already be aware of the problem, as you were
+unable to execute `weave reset` successfully or because you know
+through other channels that the host has died - in these cases you can
+proceed straight to the section on reclaiming lost space.
+
+However in some scenarios it may not be obvious that space has been
+lost, in which case you can check for it periodically with the
+following command on any peer:
+
+    weave status ipam
+
+This will list the names of unreachable peers; if you are satisifed
+that they are truly gone, rather than temporarily unreachable due to a
+partition, you can reclaim their space manually.
+
+When a peer dies unexpectedly the remaining peers will consider its
+address space to be unavailable even after it has remained unreachable
+for prolonged periods; there is no universally applicable time limit
+after which one of the remaining peers could decide unilaterally that
+it is safe to appropriate the space for itself, and so an
+administrative action is required to reclaim it.
+
+The `weave rmpeer` command is provided to perform this task, and must
+be executed on _one_ of the remaining peers. That peer will take
+ownership of the freed address space.
+
+##<a name="cluster-upgrade"></a>Upgrade a Cluster
+
+Protocol versioning and feature negotiation are employed in Weave Net
+to enable incremental rolling upgrades - each major release maintains
+the ability to speak to the preceding major release at a minimum, and
+connected peers only utilise features which both support. The general
+upgrade procedure is as follows:
+
+On each peer in turn:
+
+* Stop the old weave with `weave stop` (or `systemctl stop weave` if
+  you're using a systemd unit file)
+* Download the new weave script and replace the existing one
+* Start the new weave with `weave launch <existing peer list>` (or
+  `systemctl start weave` if you're using a systemd unit file)
+
+This will result in some downtime as the first launch with the new
+script has to pull the new container images; if you wish to minimise
+downtime you can download the new script to a temporary location
+first:
+
+* Download the new weave script to a temporary location e.g.
+  `/path/to/new/weave`
+* Pull the new images with `/path/to/new/weave setup`
+* Stop the old weave with `weave stop` (or `systemctl stop weave` if
+  you're using a systemd unit file)
+* Replace the existing script with the new one
+* Start the new weave with `weave launch <existing peer list>` (or
+  `systemctl start weave` if you're using a systemd unit file)
+
+> NB Always check the release notes for specific versions in case
+> there are any special caveats or deviations from the standard
+> procedure.
+
+##<a name="reset"></a>Reset Persisted Data
+
+Weave Net persists information in a data volume container named
+`weavedb`. If you wish to start from a completely clean slate (for
+example to withdraw a peer from one network and join it to another)
+you can issue the following command:
+
+    weave reset
+