Merged
Conversation
Helm charts were added to that git repository in wireapp/wire-server#1293 Related ticket: https://wearezeta.atlassian.net/browse/SQPIT-124
…r or are obsolete, update README (#402)
…modules for ansible dependencies (#404) * remove poetry, use Nix to provide the ansible we need Also, set NIX_PATH when entering via direnv, so nix-shell does the right thing when in there. Move to ansible 2.9 If we want to bump kubespray to the latest release; we need a newer ansible as 2.7 is not supported anymore. nix: use pkgs.ansible from nixpkgs, and python3 instead of python37 Dockerfile: Include python for localhost python interpreter Just like in hegemony pythonForAnsible: move into overlay.nix Dockerfile: stop creating the symlink from pythonForAnsible to /usr/bin/python pythonForAnsible is already part of `env`. * remove download_cli_binaries, provide kubectl and helm with nix * Use git submodules to provide kubespray We invented our own ansible playbook, just to clone a git repo, because `ansible-galaxy` didn't work out here. Let's use git submodules to clone this git repo at a specific commit. * migrate remaining external roles from ansible-galaxy to git submodules. stop .gitignore-ing roles-external, add transitive dependencies This is super scary. Apparently, ansible roles can depend on other roles, and ansible-galaxy tries to resolve them. However, it doesn't ship any lockfile, meaning with `ansible-galaxy` downloads might suddenly break if the dependencies in external roles didn't pin their dependencies. Thankfully we don't use it anymore. This adds the remaining external roles as git submodules, too. * ansible/Makefile: remove download-* targets No more ansible-galaxy required. * github-actions: Build nix environment This makes sure the nix environment works, and that from-source dependencies are cached at https://wire-server.cachix.org * reintroduce removed comment Co-authored-by: Florian Klink <flokli@flokli.de>
* ansible: move to kubespray v2.14.2 This updates us to kubernetes v1.18.2 We haven't updated kubernetes in a while; and we want new offline deployments to use a recent version. We want to keep up cadence. There is no offifically supported migration path from our previous version of kubespray to this one. Due to the stateless nature of kubernetes, we recommend setting up a new cluster with this version and the redeploying the stateless workloads. Kubespray itself only supports 3 kubernetes versions; so with this checkout it is not possible to update from t * Add changelog
niv update
Upgrade required terraform version from 0.13.1 to 0.13.6
Co-authored-by: Lucendio <gregor.jahn@wire.com>
* bump minio * [ansible:minio] Fully adapted anible-minio role integration A left over from integrating the new role version Co-authored-by: Lucendio <gregor.jahn@wire.com>
NOTE: when the previous version of hetzner-kubernetes has been used, part of the TF state must be moved/migrated dns module: * removed var 'inject_addition_subtree' and instead indicating adding a sub-tree by whether 'domain' is defined or not * default to an empty 'subdomains' instead of a pre-defined & opinionated list instantiate DNS module: * subdomains must explicitly be defined in environment configuration
…#411) These two are some leftovers and should help to prevent some side-effects. The inventory change reflects what Kubspray actually expects. And adding the condition for flushing the SRV records prevents exactly that when using the bootstrap.yaml playbook (like it is in the first localhost play at the top).
[helm] Introduce glue for the less copy&pasta approach
* added Makefile and docs
* requires a `helmfile.yaml` in ${ENV_DIR}
* add Helmfile as nix dependency
* fix local ansible python * Render onto will thames... * Fix error: with_dict expects a dict * Actual fix for with_dict
#428) * update bin/secrets.sh to aid in creating fresh environments with fresh secrets * comments about usage * Add an ansible inventory file as output, too * try local zauth; fall back to docker zauth nicely
Add helm plugins
This causes helmfile to be rebuilt first time.
at the moment CD skips fakehost plays, because it's not in the inventory; defining some stub is not a nice solution, if the default (implicit) works just fine. I guess we will find out shortly. * this might only work properly with Nix when using Ansible >= v2.9 but that is just a guess atm. We may need to go back to Ansible 2.7 here * partially reverting #415; current guess is that the error "boto required for this module" mentioned in the PR came form running it locally
* Fix ansible module dependencies
What we were doing before was way too complicated. Ansible itself
doesn't have any dependency on boto to function. The _target host_
needs to have boto installed.
By default the implicit localhost sets ansible_python_interpreter to
ansible_playbook_python. This used to work but stopped working. (Or
maybe it never worked?) The python that ansible runs the playbooks as
cant find boto. I don't know what changed. Maybe how nix assembles
python packages has changed.
Instead, we configure ansible_python_interpreter to point to the
environment we built with nix explicitly; which contains a python that
has boto installed.
This way boto can be found and the playbooks should succeed!
We could use the same simplification in hegemony I think.
* Use lookup('env')
This resolves the complete path. Makes debugging a bit easier. And
without this it didn't work on Gregor's machine (Reason still unclear to
me)
Please note that this is meant to only exist temporarily until we have a solid release cycle for the platform in place. Regardless, overriding the Kubespray version, defined by the the submodule pinning, from within ENV_DIR should be avoided at all costs.
Otherwise we might run into issues with its subsequent dependency. See https://github.com/cloudalchemy/ansible-node-exporter#warning
To prevent duplication, simplify maintainability and prevent version drift, this change set aims to merge existing Makefiles into one located at the root of the code base. The resulting Makefile introduces a concepts of target dependencies (see check-*), which are certain files or folders put in place by a previous step (e.g. Terraform generates an inventory consumed by Ansible targets). Thus, running `make decrypt` is still required, but the check-*-inputs * adds asserts to log extraction playbook to get rid of native make conditions; set local default location to ENV_DIR * the necessity of setting ENV or ENV_DIR did not change; it's just that the check has been moved to the top of the Makefile. This way make fails as early as possible * targets abstracting Terraform invocations now require the '.terraform' folder to exist, which implies `terraform init` to be invoked upfront, if '.terraform/' is not already there. Also, `make init` was renamed to `make re-init` due to the described implicit behaviour. NOTE: the existing Makefiles will be removed in a follow-up PR
* bump elasticsearch role Version didn't work with ansible 2.9 * ansible: Restructure example inventory files to follow ansible documented directory structure This is needed because the documented way to customize kubespray is through group_vars. And this allows people to do that. We use this for an example inventory file for offline * bin/offline: Remove Scripts for making a collectio nof offline charts will be moved to wire-server in a follow-up PR; so having this code here does not make sense anymore * mirror apt repositories Adds scripts that mirrors the required debian packages for kubespray and all our other playbooks * offline: Build and upload docker image with offline environment * Download all the requirements for offline kubespray and all other ansible playbooks Next step is pointing ansible to the offline artifact * Offline kubespray and ansible Set up variables such that all downloaded sources are fetched from the assets tarball. This requires an assethost to those those assets An example inventory file is included to showcase an offline setup * Set turn secret to bogus placeholder value This is just so that `helm template` succeeds with the example values.yaml :) * Mirror helm charts and their container dependencies * rename default inventory file to 99-static This means it gets picked up if you specify an inventory as a _directory_ instead of as a file * Some fixes * Fix offline/ci.sh script helm containers had wrong directory structure; also tarball wouldn't build * Add note to inventory about default ip address * cassandra: Remove hostname role from cassandra No need to override hostnames (as far as I know) * cassandra: Do not install ntpd in offline We don't need it. I actually question if we need it at all? we're not using the ntpd server functionality; only the client functionality, but Ubuntu 18.04 ships with an NTP client by default https://ubuntu.com/server/docs/network-ntp But this is something for another time. Lets just conditionally disable it * cassandra: Use JRE; not JDK We do not need the java compiler to run cassandra (I hope?) * cassandra: Force AWS autodetection off * Copy example values into offline artifact * Enable team-settings and account-pages * Add secrets script to generate secrets for offline This needs to be re-integrated with secrets.sh at some point, but I couldn't bother with the rebase for now. * restund: Remove hostname and ansible-role-ntp roles We have no need to modify the hostname for restund servers or to have ntpd installed * restund: Disable TLS turn for now * restund: remove vars_prompt for restund_zrest_secret This value is auto-generated now and known * Get rid of intermediate assets directory This allows you to run `offline/ci.sh` and then the root directory is ready for offline deploy. Useful for interactive development * Adjust github pipeline Compression of artifact has been moved into offline/ci.sh script * Add docker alias for offline environment People should run `source ./bin/offline-env.sh` Will add this to the docs * move offline-cluster to bin folder * Add wrapper script for easy remote call of install instructions Will be used in CD * Add offline-helm script for Continious delivery We'll use this to test out if all the offline helm artifacts indeed install * Fix helm_external playbook It didn't pick up the network interfaces unless they were in [all:vars] which was a bit akward. * Generate secrets and deploy helm charts * Add continous delivery for testing offline * Use tarball instead of gzipped tarball Tradeoff between bandwidth and compression. And we're compressing the entire assets tarball at the end anyway * Add sftd to offline package * ci.sh: Remove inbetween artifacts Github Actions otherwise doesn't have enough disk space! We keep the possibility for incrementality locally; as that's useful * terraform: Block all traffic but DNS and NTP This allows us to simulate an offline environment where NTP is still available. Cassandra needs servers to be relatively in sync. However how that is done is out of scope for us at the moment so we assume some external NTP server. In the future we could perhaps set up our own NTP servers in offline as what we care about is only _relative_ clock difference; not absolute * USe ssh-agent so that we do not have to put private key in cloud-init config * Some fixes to the offline deploy scripts * Fix offline-env alias * Mark terraform outputs sensitive * Disable CD for now Github Actions is too slow. We run this manually on Hetzner for now. * Add docs for offline deploy * Update .gitignore Co-authored-by: Lucendio <gregor.jahn@wire.com> * Add a note about the recently introduced firewall hardening to the docs ...and how to mitigate that * ci.si: Pin wire-server version In the future we probably wanna move the helm chart scripts to the wire-server repo and automatically bundle an offline deploy of the helm charts for each wire-server release * HACK: free op some disk space in github actions * Update restund pin We updated the submodule; so new container Co-authored-by: Lucendio <gregor.jahn@wire.com> * Set LOCALHOST_PYTHON in nix-based container * Re-enable CD; but with cleanup handled by Github The cleanup action will always be run; even if the job is cancelled. This makes sure there are no dangling resources. However we should probably just use remote state instead; to make sure to handle all edge cases. * Make github actions not idle the connection * Update changlog and add support for pushing tagged releases Also upload docker container for people who do not use airgap install * [skip ci] Fix inventory path * Disable pretty tags for now. There's a bug i don't want to debug at this point We can do the release tagging bits in a follow-up PR. * Fix container upload Co-authored-by: Jun Matsushita <jun@iilab.org> Co-authored-by: Lucendio <gregor.jahn@wire.com>
To prevent nginx from crashing due to a type conversion issue. See wireapp/ansible-sft#32
Not needed anymore
* Update wire-server version to 2.106.0 Also fix some scripts
50c40a6 to
f971e55
Compare
f971e55 to
c941c89
Compare
lucendio
approved these changes
May 10, 2021
Contributor
lucendio
left a comment
There was a problem hiding this comment.
You might want to adjust the release notes according to the following comments:
- mention Ansible version?
- swapped
dockerandrkt - typo: brining
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
2021-05-10
Features
instructions. We will integrate this into https://docs.wire.com/ over time
quay.io/wire/wire-server-deploycontainer image and mount wire-server-deploy into it.Versions
bundles for charts might be moved to wire-server repository in the future; to
decouple wire-server releases from the base platform.
Breaking changes
Nix and direnv are used for installing all required tooling.
charts have been moved to wire-server. Chart lifecycle is now tied to
wire-server instead and is decoupled from the underlying platform. Charts in wire-server
should be installed with helm 3.
Our kubespray reference implementation has been bumped to kuberspray 2.15.0
and kubernetes 1.19.7. This allows us to use Kubespray's support for offline deployments
and new Kubernetes API features.
If you were using our reference playbooks for setting up kubernetes, there is
no direct upgrade path. Instead you should set up a new cluster; migrate the
deployments there, and then point to the new cluster. This is rather easy at
the moment as we only run stateless services in Kubernetes at this point.
Restund role was bumped and uses
dockerinstead ofrktnow.We advice bringing up a fresh
restundserver; so thatrktis not installed.See wireapp/ansible-restund@4db0bc0
If you want to re-use your existing server we recommend:
restundserver.systemctl stop restund.servicerestund.ymlplaybook.