docs: Add initial LMAT instructions

weaveworks-liquidmetal · Aug 2, 2022 · 51cfe3d · 51cfe3d
1 parent 2dc51e8
commit 51cfe3d
Show file tree

Hide file tree

Showing 3 changed files with 451 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -9,16 +9,273 @@ and run: mdtoc -inplace README.md
 - [What they test](#what-they-test)
 - [How they work](#how-they-work)
 - [How to run...](#how-to-run)
-  - [Locally](#locally)
+  - [Locally (option 1)](#locally-option-1)
+    - [Tunables](#tunables)
+  - [Locally (option 2)](#locally-option-2)
+  - [Locally (option 3)](#locally-option-3)
   - [In CI](#in-ci)
 <!-- /toc -->
 
 ## What they test
 
+The LMATS are the highest level suite for the Liquid Metal project. Thus they
+ensure that the basic behaviour exposed to a user does what it should.
+
+They ensure that the 2 key components of Liquid Metal ([flintlock][flintlock]
+and [CAPMVM][capmvm]) work properly together.
+
+They run daily as a Github Action. See [here][actions] for runs and results.
+
 ## How they work
 
+This repo contains the infrastructure config and "trigger points" for running the
+LMATS on a non-local (as in not on your computer) bare-metal environment.
+The test code itself for now lives in [CAPMVM][capmvm-e2e].
+
+There are 2 main parts to this repo:
+- [`terraform/`][tf] which contains manifests for provisioning bare-metal infrastructure
+   and configuring flintlock.
+- [`cmd/`][tool] which triggers the execution of the tests.
+
+The sequence of events for a full run is:
+- Terraform section...
+  - Check capacity of Equinix for requested device types and elect metro with
+    sufficient space
+  - Generate SSH keys for use during infrastructure provisioning and later test
+    execution
+  - Create Equinix project
+  - Create 1 host to act as the CAPI management cluster and network "hub"
+  - Create 2 further hosts to run flintlock _can be overridden_
+  - Bootstrap some rudimentary networking
+  - Provision flintlock
+  - Prepare the "management" host to run CAPI
+- Test runner section (over SSH to the management host)...
+  - Prepare configuration based on the output of the Terraform step and any
+    action inputs
+  - Clone CAPMVM on the management host
+  - Change into the directory and run the e2e tests
+- E2E section (streamed over SSH from the management host)...
+  - Create a kind cluster
+  - Initialise the cluster with required CAPI controllers
+  - Generate a template for the CAPMVM workload
+  - Apply the workload to the kind cluster
+  - Ensure all supplied flintlock hosts have been used
+  - Deploy an application to the workload cluster
+- Teardown
+
 ## How to run...
 
-### Locally
+This system is primarily intended to be used by:
+- CI (we cannot enable KVM in action runners, so we have to do a lot of infra
+  provisioning)
+- People who do not want, do not have, or have totally borked their local flintlock /
+  general Liquid Metal environment on their own computer
+
+It is possible, although not really advisable or necessary, to run it locally and
+there are a few options for doing so.
+
+### Locally (option 1)
+
+To run the LMATS against non-local bare-metal infrastructure, first clone and
+change into this repo:
+
+```bash
+git clone https://github.com/weaveworks-liquidmetal/liquid-metal-acceptance-tests
+cd liquid-metal-acceptance-tests
+```
+
+Set the required environment variables:
+
+```bash
+export METAL_AUTH_TOKEN=
+export METAL_ORG_ID=
+```
+
+_If you are a quicksilver team-member, or part of Weaveworks, these credentials
+can be found in 1Pass. Ask Claudia if you are not sure where._
+
+Call the Make command:
+
+```bash
+make all
+```
+
+This process is quite lengthy, you are looking at 10-20 mins. The test section
+alone can take up to 5 mins to run (I am working on making that faster).
+
+To work in steps, or to run the tests several times with the same infrastructure,
+you can call the individual targets:
+
+```bash
+make tf-up
+make e2e # add any flags here as E2E_ARGS="--foo bar" see 'Tunables' below for more
+make tf-down
+```
+
+#### Tunables
+
+The following configuration options/variables can be changed via the environment:
+- `PROJECT_NAME`: change the name of the project to be created in Equinix (default:
+  `"liquid-metal-acceptance-tests"`. Note that project names in Equinix are not
+  unique, so if you wish to use an existing project, setting this will not work.
+- `FLINTLOCK_VERSION`: change the version of flintlock used in the tests (default:
+  [latest][flintlock-releases]).
+- `DEVICE_COUNT`: change the number of bare-metal hosts which will run flintlock
+  (default: `2`).
+- `DEVICE`: change the type of Equinix devices (default: `c3.small.x86`).
+- `E2E_ARGS`: append flags to the test command:
+  - `-version`: the version of CAPMVM to use in the tests (if set will override
+    `repo` and `branch`). Must match exactly the tag name of the release, eg: `v0.1.0`.
+  - `-repo`: the URL to a repo (fork) of CAPMVM to use in the tests.
+  - `-branch`: the name of a branch to use in the tests. Can be used in combination
+    with `repo` or alone to target a branch of the upstream repo.
+  - These flags are properties of the test runner. For more information on how
+    that works and what other flags are available, see the [tool readme][tool].
+
+For example, to run the LMATS against version `v0.1.0` of Flintlock and against
+a branch on my fork of CAPMVM:
+
+```
+export FLINTLOCK_VERSION=v0.1.0
+make all E2E_ARGS="--repo https://github.com/Callisto13/cluster-api-provider-microvm --branch e2e"
+```
+
+### Locally (option 2)
+
+If you are not interested in running the tests against a bare-metal host so far
+away, you can simply run the E2Es in CAPMVM without any of this. You wont need
+to clone this repo, but you will need two others and will need to put a bit more
+work into setting up.
+
+_Note this will only be applicable to people running Linux._
+
+First set up a flintlock server:
+
+```bash
+git clone https://github.com/weaveworks-liquidmetal/flintlock
+cd flintlock
+sudo ./hack/scripts/provision.sh --grpc-address 0.0.0.0:9090 --dev --insecure
+# the script will ask you to confirm some choices
+cd ..
+```
+
+Then clone CAPMVM:
+
+```bash
+git clone https://github.com/weaveworks-liquidmetal/cluster-api-provider-microvm
+cd cluster-api-provider-microvm
+```
+
+Ensure you have the following installed:
+- [kind](https://kind.sigs.k8s.io/)
+- [docker](https://docs.docker.com/engine/install/ubuntu/)
+- [kubectl](https://kubernetes.io/docs/tasks/tools/)
+- [clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl)
+
+And run the tests 100% locally from the CAPMVM repo:
+
+```bash
+FL=$(hostname -I | awk '{print $1}') # should get the first private IP of your machine
+export FLINTLOCK_HOSTS="$FL:9090"
+make e2e
+```
+
+More options/flags are available on the tests at this level, see their dedicated
+[docs][capmvm-e2e] for more.
+
+### Locally (option 3)
+
+The last option is for those who have borked or just don't want to set up their
+flintlock, but they perhaps want to iterate on a local version of CAPMVM. Here we
+have a mix of both worlds, where you use the LMATS to provision flintlock on remote
+Equinix hosts, and then tell the local E2Es where those hosts are.
+
+Clone this repo:
+
+```bash
+git clone https://github.com/weaveworks-liquidmetal/liquid-metal-acceptance-tests
+cd liquid-metal-acceptance-tests
+```
+
+Set the required environment variables:
+
+```bash
+export METAL_AUTH_TOKEN=
+export METAL_ORG_ID=
+```
+
+Create the Equinix infrastructure:
+
+```bash
+make tf-up
+# take note of the 'host_ips' in the terraform output
+```
+
+TODO: there is some additional networking needed here to ensure that CAPMVM can
+access the load balancer address of the created workload cluster. I will add it
+at some point. https://github.com/weaveworks-liquidmetal/liquid-metal-acceptance-tests/issues/5
+
+Then clone CAPMVM:
+
+```bash
+git clone https://github.com/weaveworks-liquidmetal/cluster-api-provider-microvm
+cd cluster-api-provider-microvm
+```
+
+Ensure you have the following installed:
+- [kind](https://kind.sigs.k8s.io/)
+- [docker](https://docs.docker.com/engine/install/ubuntu/)
+- [kubectl](https://kubernetes.io/docs/tasks/tools/)
+- [clusterctl](https://cluster-api.sigs.k8s.io/user/quick-start.html#install-clusterctl)
+
+And run the tests locally:
+
+```bash
+# replace the ips here with the ones you noted from the terraform output
+export FLINTLOCK_HOSTS="1.2.3.4:9090,5.6.7.8:9090"
+make e2e
+```
+
+More options/flags are available on the tests at this level, see their dedicated
+[docs][capmvm-e2e] for more.
+
+Don't forget to destroy the infrastructure when you are done:
+
+```bash
+make tf-down
+```
 
 ### In CI
+
+The LMATS will run every day automatically, but they can also be triggered manually
+and configured to run with a combination of component versions.
+
+_Note: this option is only available to members of Weaveworks._
+
+Navigate to the [actions tab][actions].
+
+Select the `Run workflow` on the right.
+
+To run with the default settings, click the green `Run workflow` button.
+
+Otherwise you can configure any/all of the below before triggering:
+- `flintlock_version`: the version of flintlock to use in the tests.
+- `capmvm_version`: the version of CAPMVM to use in the tests (if set will override
+  `capmvm_repo` and `capmvm_branch`). Must match exactly the tag name of the release, eg: `v0.1.0`.
+- `capmvm_repo`: the URL to a repo (fork) of CAPMVM to use in the tests.
+- `capmvm_branch`: the name of a branch to use in the tests. Can be used in combination
+  with `capmvm_repo` or alone to target a branch of the upstream repo.
+
+It can take up to 20 mins to provision the infra and run the tests. The result will
+be posted in the `#team-quicksilver` slack channel.
+
+If anything goes wrong there is a step in the action to remove all the infra.
+I will be exposing an option to keep things around if needed.
+
+[flintlock]: https://github.com/weaveworks-liquidmetal/flintlock
+[capmvm]: https://github.com/weaveworks-liquidmetal/cluster-api-provider-microvm
+[capmvm-e2e]: https://github.com/weaveworks-liquidmetal/cluster-api-provider-microvm/test/e2e
+[flintlock-releases]: https://github.com/weaveworks-liquidmetal/flintlock/releases
+[tool]: /cmd
+[tf]: /terraform
+[actions]: https://github.com/weaveworks-liquidmetal/liquid-metal-acceptance-tests/main/workflows/nightly_e2e.yml
diff --git a/cmd/README.md b/cmd/README.md
@@ -0,0 +1,100 @@
+# Cmd
+
+This is a small helper tool to run the Acceptance tests (LMATS) on remote
+Equinix infrastructure.
+
+This tool will change when we have built the scheduler component.
+
+This tool will go away when I have figured out some more networking for the infra.
+
+## What and why
+
+There are 2 reasons it exists:
+1. To save time on networking complexity during my initial stab at these tests,
+  I chose not to set it up so that the CAPI management cluster could be run
+  from outside the Equinix infra network.
+  _Technically_ then can be since the flintlock servers are bound to a public
+  interface, but the next hurdle then would have been the control plane
+  load balancer address: I would have had to figure out a way to dynamically reserve
+  an IPv4 address and then ensure that it was allocated to the workload cluster.
+  This is not easy to do in Equinix.
+  Alternatively, I would have had to automate a VPN to route the private subnets
+  of the infra, which again is a pain. At some point I will get to solving these.
+2. Until we develop the dynamic scheduler, we need to inject the individual
+  flintlock server IPs into any CAPMVM workload cluster template. This is a pain
+  to do with CAPI/clusterctl and naturally these IPs are not known ahead of time
+  (although I could do something with DNS I suppose? But then would I have to deal
+  with records not being updated in time for the test?). So the tests are built
+  to receive the IPs and then alter the template; this tool handles the extraction,
+  formatting and pass-through of the created infra IPs from the Terraform output
+  to the tests. See [here][capmvm-e2e] for more on how the e2es work.
+
+So for now, the tests are triggered locally but actually run from within one of
+the Equinix machines.
+
+The sequence of events is as follows:
+- The tool is built and called from the Makefile (`make e2e`)
+- It processes and validates any given flags
+- It parses the `../terraform/terraform.tfstate` file for the `outputs.host_ips`
+  and `outputs.management_ip`
+- The `host_ips` are formatted ready for use as flintlock addresses by the tests
+- The command to run over SSH is built from `e2e.sh` template
+- A connection to the `management_ip` is opened using the keys created by the terraform
+  provisioning script
+- The command is executed
+  - Clone CAPMVM at the set version/repo/branch
+  - `cd` and start tests
+  - `cd ..` and remove the directory
+- All output is streamed back in real time
+
+## How to use
+
+The tool is most often called from the root Makefile:
+
+```bash
+make build-e2e # creates the binary
+make e2e # executes the tool
+```
+
+The tool has various flags, none of which need to be set:
+
+```
+Usage of ./cmd/bin/e2e:
+  -address string
+        IP address of host to run SSH command on. (optional)
+  -branch string
+        Branch within CAPMVM repository to clone for tests. (optional)
+  -command string
+        Non-standard command to run on the target machine. (optional)
+  -flintlock-hosts string
+        Comma separated list of flintlock server addresses with ports. (optional)
+  -private-key string
+        Path to file containing private key for connection address. (optional) (default "keys/lm-ed")
+  -repo string
+        URL of non-default CAPMVM repository to clone for tests. (optional)
+  -state-file string
+        Path to terraform state file from which to derive host addresses. (optional) (default "terraform/terraform.tfstate")
+  -user string
+        User to run command as. (optional) (default "root")
+  -version string
+        Version of CAPMVM to test against. (optional)
+```
+
+These can be passed either to the binary directly:
+
+```bash
+./cmd/bin/e2e -repo foo
+```
+
+Or when calling the `make` command (preferred):
+
+```bash
+make e2e E2E_ARGS="-repo foo"
+```
+
+Some flags have an order of precedence:
+- If `-version` is set, `-repo` and `-branch` will be ignored
+- If `-flintlock-hosts` OR `-address` are set, the tool will not look up the
+  required connection/test info from the terraform output.
+
+[capmvm-e2e]: https://github.com/weaveworks-liquidmetal/cluster-api-provider-microvm/test/e2e