Skip to content

Release 2021 05 10#453

Merged
arianvp merged 76 commits intomasterfrom
release_2021-05-10
May 10, 2021
Merged

Release 2021 05 10#453
arianvp merged 76 commits intomasterfrom
release_2021-05-10

Conversation

@arianvp
Copy link
Copy Markdown
Contributor

@arianvp arianvp commented May 10, 2021

2021-05-10

Features

  • Airgap installer is available. See [./offline/docs.md] for rudimentary
    instructions. We will integrate this into https://docs.wire.com/ over time
  • Switched to nix+direnv for installing all the required dependencies for wire-server-deploy. If you do not want to use these tools you can use the quay.io/wire/wire-server-deploy container image and mount wire-server-deploy into it.

Versions

  • wire version 2.106.0 when using the offline installer. However airgap
    bundles for charts might be moved to wire-server repository in the future; to
    decouple wire-server releases from the base platform.
  • kubespray 2.15.0 (kubernetes 1.19.7)
  • ansible-restund v0.2.6 (restund version v0.4.16b1.0.53)
  • ansible-minio v2.1.0
  • ansible-cassandra version v0.1.3
  • ansible-elasticsearch 6.6.0

Breaking changes

  • Nix and direnv are used for installing all required tooling.

  • charts have been moved to wire-server. Chart lifecycle is now tied to
    wire-server instead and is decoupled from the underlying platform. Charts in wire-server
    should be installed with helm 3.

  • Our kubespray reference implementation has been bumped to kuberspray 2.15.0
    and kubernetes 1.19.7. This allows us to use Kubespray's support for offline deployments
    and new Kubernetes API features.

    If you were using our reference playbooks for setting up kubernetes, there is
    no direct upgrade path. Instead you should set up a new cluster; migrate the
    deployments there, and then point to the new cluster. This is rather easy at
    the moment as we only run stateless services in Kubernetes at this point.

  • Restund role was bumped and uses docker instead of rkt now.
    We advice bringing up a fresh restund server; so that rkt is not installed.
    See wireapp/ansible-restund@4db0bc0

    If you want to re-use your existing server we recommend:

    1. ssh into your restund server.
    2. systemctl stop restund.service
    3. now outside again, run the restund.yml playbook.

jschaul and others added 30 commits January 11, 2021 17:23
…modules for ansible dependencies (#404)

* remove poetry, use Nix to provide the ansible we need

Also, set NIX_PATH when entering via direnv, so nix-shell does the right
thing when in there.

Move to ansible 2.9

If we want to bump kubespray to the latest release; we need a newer
ansible as 2.7 is not supported anymore.

nix: use pkgs.ansible from nixpkgs, and python3 instead of python37

Dockerfile: Include python for  localhost python interpreter

Just like in hegemony

pythonForAnsible: move into overlay.nix

Dockerfile: stop creating the symlink from pythonForAnsible to /usr/bin/python

pythonForAnsible is already part of `env`.

* remove download_cli_binaries, provide kubectl and helm with nix

* Use git submodules to provide kubespray

We invented our own ansible playbook, just to clone a git repo, because
`ansible-galaxy` didn't work out here.

Let's use git submodules to clone this git repo at a specific commit.

* migrate remaining external roles from ansible-galaxy to git submodules.

stop .gitignore-ing roles-external, add transitive dependencies

This is super scary. Apparently, ansible roles can depend on other
roles, and ansible-galaxy tries to resolve them. However, it doesn't
ship any lockfile, meaning with `ansible-galaxy` downloads might
suddenly break if the dependencies in external roles didn't pin their
dependencies.

Thankfully we don't use it anymore. This adds the remaining external
roles as git submodules, too.

* ansible/Makefile: remove download-* targets

No more ansible-galaxy required.

* github-actions: Build nix environment

This makes sure the nix environment works, and that from-source
dependencies are cached at https://wire-server.cachix.org

* reintroduce removed comment

Co-authored-by: Florian Klink <flokli@flokli.de>
* ansible: move to kubespray v2.14.2

This updates us to kubernetes v1.18.2

We haven't updated kubernetes in a while; and we want new offline
deployments to use a recent version. We want to keep up cadence.

There is no offifically supported migration path from our previous
version of kubespray to this one.

Due to the stateless nature of kubernetes, we recommend setting up a new
cluster with this version and the redeploying the stateless workloads.

Kubespray itself only supports 3 kubernetes versions; so with this
checkout it is not possible to update from t

* Add changelog
Upgrade required terraform version from 0.13.1 to 0.13.6
Co-authored-by: Lucendio <gregor.jahn@wire.com>
* bump minio

* [ansible:minio] Fully adapted anible-minio role integration

A left over from integrating the new role version

Co-authored-by: Lucendio <gregor.jahn@wire.com>
NOTE: when the previous version of hetzner-kubernetes has been used, part of the
TF state must be moved/migrated

dns module:
* removed var 'inject_addition_subtree' and instead indicating adding a sub-tree
  by whether 'domain' is defined or not
* default to an empty 'subdomains' instead of a pre-defined & opinionated list

instantiate DNS module:
* subdomains must explicitly be defined in environment configuration
…#411)

These two are some leftovers and should help to prevent some side-effects. The
inventory change reflects what Kubspray actually expects. And adding the condition
for flushing the SRV records prevents exactly that when using the bootstrap.yaml
playbook (like it is in the first localhost play at the top).
[helm] Introduce glue for the less copy&pasta approach

* added Makefile and docs
* requires a `helmfile.yaml` in ${ENV_DIR}
* add Helmfile as nix dependency
* fix local ansible python

* Render onto will thames...

* Fix error: with_dict expects a dict

* Actual fix for with_dict
jschaul and others added 21 commits March 2, 2021 16:45
#428)

* update bin/secrets.sh to aid in creating fresh environments with fresh secrets

* comments about usage

* Add an ansible inventory file as output, too

* try local zauth; fall back to docker zauth nicely
We are now use Ansible v2.8 (#404) but also moved the variable out of the generated
inventory (#418). This change re-introduces the setting on a global level. Environments
that rely on an older version of Ansible as well running on older systems still have to set
in the respective inventory.
This causes helmfile to be rebuilt first time.
at the moment CD skips fakehost plays, because it's not in the inventory;
defining some stub is not a nice solution, if the default (implicit)
works just fine. I guess we will find out shortly.

* this might only work properly with Nix when using Ansible >= v2.9
  but that is just a guess atm. We may need to go back to Ansible 2.7
  here
* partially reverting #415; current guess is that the error "boto
  required for this module" mentioned in the PR came form running
  it locally
* Fix ansible module dependencies

What we were doing before was way too complicated. Ansible itself
doesn't have any dependency on boto to function. The _target host_
needs to have boto installed.

By default the implicit localhost sets ansible_python_interpreter to
ansible_playbook_python.  This used to work but stopped working. (Or
maybe it never worked?) The python that ansible runs the playbooks as
cant find boto.  I don't know what changed. Maybe how nix assembles
python packages has changed.

Instead, we configure ansible_python_interpreter to point to the
environment we built with nix explicitly; which contains a python that
has boto installed.

This way boto can be found and the playbooks should succeed!

We could use the same simplification in hegemony I think.

* Use lookup('env')

This resolves the complete path.  Makes debugging a bit easier. And
without this it didn't work on Gregor's machine (Reason still unclear to
me)
Please note that this is meant to only exist temporarily until we have a
solid release cycle for the platform in place. Regardless, overriding the
Kubespray version, defined by the the submodule pinning, from within ENV_DIR
should be avoided at all costs.
Otherwise we might run into issues with its subsequent dependency.
See https://github.com/cloudalchemy/ansible-node-exporter#warning
1. Remove duplicate submodule
2. Remove the second kubectl role too
  * since it's not used anymore according to @arianvp
3. Remove helm role
  * since it's not used anymore according to @arianvp
To prevent duplication, simplify maintainability and prevent version drift, this
change set aims to merge existing Makefiles into one located at the root of the
code base.

The resulting Makefile introduces a concepts of target dependencies (see check-*),
which are certain files or folders put in place by a previous step (e.g. Terraform
generates an inventory consumed by Ansible targets). Thus, running `make decrypt`
is still required, but the check-*-inputs

* adds asserts to log extraction playbook to get rid of native make conditions; set
  local default location to ENV_DIR
* the necessity of setting ENV or ENV_DIR did not change; it's just that the check
  has been moved to the top of the Makefile. This way make fails as early as possible
* targets abstracting Terraform invocations now require the '.terraform' folder to
  exist, which implies `terraform init` to be invoked upfront, if '.terraform/' is
  not already there. Also, `make init` was renamed to `make re-init` due to the
  described implicit behaviour.

NOTE: the existing Makefiles will be removed in a follow-up PR
* bump elasticsearch role

Version didn't work with ansible 2.9

* ansible: Restructure example inventory files to follow ansible documented directory structure

This is needed because the documented way to customize kubespray is
through group_vars. And this allows people to do that.

We use this for an example inventory file for offline

* bin/offline: Remove

Scripts for making a collectio nof offline charts will be moved to
wire-server in a follow-up PR; so having this code here does not make
sense anymore

* mirror apt repositories

Adds scripts that mirrors the required debian packages for kubespray
and all our other playbooks

* offline:  Build and upload docker image with offline environment

* Download all the requirements for offline kubespray and all other
ansible playbooks

Next step is pointing ansible to the offline artifact

* Offline kubespray and ansible

Set up variables such that all downloaded sources are fetched from
the assets tarball. This requires an assethost to those those assets

An example inventory file is included to showcase an offline setup

* Set turn secret to bogus placeholder value

This is just so that `helm template` succeeds with the example
values.yaml :)

* Mirror helm charts and their container dependencies

* rename default inventory file to 99-static

This means it gets picked up if you specify an inventory as a
_directory_ instead of as a file

* Some fixes

* Fix offline/ci.sh script

helm containers had wrong directory structure; also tarball wouldn't
build

* Add note to inventory about default ip address

* cassandra: Remove hostname role from cassandra

No need to override hostnames (as far as I know)

* cassandra: Do not install ntpd in offline

We don't need it.

I actually question if we need it at all?

we're not using the ntpd server functionality; only the client
functionality, but Ubuntu 18.04 ships with an NTP client by default
https://ubuntu.com/server/docs/network-ntp

But this is something for another time. Lets just conditionally disable
it

* cassandra: Use JRE; not JDK

We do not need the java compiler to run cassandra (I hope?)

* cassandra: Force AWS autodetection off

* Copy example values into offline artifact

* Enable team-settings and account-pages

* Add secrets script to generate secrets for offline

This needs to be re-integrated with secrets.sh at some point, but I
couldn't bother with the rebase for now.

* restund: Remove hostname and ansible-role-ntp roles

We have no need to modify the hostname for restund servers or to have
ntpd installed

* restund: Disable TLS turn for now

* restund: remove vars_prompt for restund_zrest_secret

This value is auto-generated now and known

* Get rid of intermediate assets directory

This allows you to run `offline/ci.sh` and then the root directory is
ready for offline deploy. Useful for interactive development

* Adjust github pipeline

Compression of artifact has been moved into offline/ci.sh script

* Add docker alias for offline environment

People should run `source ./bin/offline-env.sh`

Will add this to the docs

* move offline-cluster to bin folder

* Add wrapper script for easy remote call of install instructions

Will be used in CD

* Add offline-helm script for Continious delivery

We'll use this to test out if all the offline helm artifacts indeed
install

* Fix helm_external playbook

It didn't pick up the network interfaces unless they were in [all:vars]
which was a bit akward.

* Generate secrets and deploy helm charts

* Add continous delivery for testing offline

* Use tarball instead of gzipped tarball

Tradeoff between bandwidth and compression. And we're compressing the
entire assets tarball at the end anyway

* Add sftd to offline package

* ci.sh: Remove inbetween artifacts

Github Actions otherwise doesn't have enough disk space!

We keep the possibility for incrementality locally; as that's useful

* terraform: Block all traffic but DNS and NTP

This allows us to simulate an offline environment where NTP is still
available.

Cassandra needs servers to be relatively in sync. However how that is
done is out of scope for us at the moment so we assume some external NTP
server.

In the future we could perhaps set up our own NTP servers in offline
as what we care about is only _relative_ clock difference; not absolute

* USe ssh-agent so that we do not have to put private key in cloud-init
config

* Some fixes to the offline deploy scripts

* Fix offline-env alias

* Mark terraform outputs sensitive

* Disable CD for now

Github Actions is too slow. We run this manually on Hetzner for now.

* Add docs for offline deploy

* Update .gitignore

Co-authored-by: Lucendio <gregor.jahn@wire.com>

* Add a note about the recently introduced firewall hardening to the docs

...and how to mitigate that

* ci.si: Pin wire-server version

In the future we probably wanna move the helm chart scripts to the
wire-server repo and automatically bundle an offline deploy of the helm
charts for each wire-server release

* HACK: free op some disk space in github actions

* Update restund pin

We updated the submodule; so new container

Co-authored-by: Lucendio <gregor.jahn@wire.com>

* Set LOCALHOST_PYTHON in nix-based container

* Re-enable CD; but with cleanup handled by Github

The cleanup action will always be run; even if the job is cancelled.
This makes sure there are no dangling resources.

However we should probably just use remote state instead; to make sure
to handle all edge cases.

* Make github actions not idle the connection

* Update changlog and add support for pushing tagged releases

Also upload docker container for people who do not use airgap install

* [skip ci] Fix inventory path

* Disable pretty tags for now. There's a bug i don't want to debug at this
point

We can do the release tagging bits in a follow-up PR.

* Fix container upload

Co-authored-by: Jun Matsushita <jun@iilab.org>
Co-authored-by: Lucendio <gregor.jahn@wire.com>
To prevent nginx from crashing due to a type conversion issue.
See wireapp/ansible-sft#32
* Update wire-server version to 2.106.0

Also fix some scripts
@arianvp arianvp force-pushed the release_2021-05-10 branch from 50c40a6 to f971e55 Compare May 10, 2021 08:09
@arianvp arianvp requested a review from lucendio May 10, 2021 08:09
@arianvp arianvp force-pushed the release_2021-05-10 branch from f971e55 to c941c89 Compare May 10, 2021 09:43
Copy link
Copy Markdown
Contributor

@lucendio lucendio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to adjust the release notes according to the following comments:

  • mention Ansible version?
  • swapped docker and rkt
  • typo: brining

@arianvp arianvp merged commit df43a17 into master May 10, 2021
@arianvp arianvp deleted the release_2021-05-10 branch May 10, 2021 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants