Skip to content
This repository has been archived by the owner on Nov 3, 2021. It is now read-only.

adapt to k8s environment #76

Merged
merged 117 commits into from
Nov 21, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
beb8ef5
check if running on k8s draft
Sep 8, 2017
d7946c9
k8s identification started
Sep 15, 2017
222dab6
echo about pushed image
Sep 15, 2017
e7574e8
file echos
Sep 15, 2017
099e38b
temp delivery.yaml pushing none master branches
Sep 15, 2017
b70736c
temp delivery.yaml pushing none master branches
Sep 15, 2017
39fae30
temp delivery.yaml pushing none master branches
Sep 15, 2017
1dbb2ae
which python3
Sep 18, 2017
ac6c034
check if python3 is installed as part of the bus docker build
Sep 18, 2017
03c96fd
python3 output as part of docker build
Sep 18, 2017
72957d0
debug commands added
Sep 18, 2017
c3b755c
more debugs
Sep 18, 2017
9191157
more debugs
Sep 18, 2017
f3f906a
/details in order not to mess with linux /etc folder
Sep 18, 2017
33c0278
pip3 upgrade pip3
Sep 18, 2017
9e0f668
no pip3 upgrade pip3
Sep 18, 2017
81bcb85
cat /details/labels output
Sep 18, 2017
5f076bf
fewer debug output
Sep 18, 2017
f6207e9
new line after cat output
Sep 18, 2017
e98e22d
debug output changed
Sep 20, 2017
f20cf5a
less output
Sep 20, 2017
d746f76
delivery.yaml update to latest version with pipeline, commands and when
Sep 22, 2017
a79c9a1
fix delivery.yaml
Sep 22, 2017
82215c6
fix delivery.yaml
Sep 22, 2017
27ec835
fix delivery.yaml
Sep 22, 2017
83732e6
fix delivery.yaml
Sep 22, 2017
add6d4c
more detailed steps
Sep 22, 2017
f8fd31a
one step
Sep 22, 2017
73eaf1f
two step
Sep 22, 2017
622e1e2
tow step
Sep 22, 2017
fb21c8a
one step
Sep 22, 2017
d4ff3e1
three steps
Sep 22, 2017
d10f5ce
one step, 3 commands
Sep 22, 2017
ecb630f
cat ghe-backup-secret file
Sep 26, 2017
1df4156
list dir
Sep 26, 2017
120dc82
cat kms key file
Sep 26, 2017
80ee6ce
cat kms key file2
Sep 26, 2017
928166b
readlink and ls for all sym links
Sep 26, 2017
770ea53
debug echos
Sep 26, 2017
4652dda
no readlink
Sep 26, 2017
3442611
sym link read in ghe-backup-secret
Sep 26, 2017
226c9f3
cat sym link read
Sep 26, 2017
d4aa0fb
head instead of cat
Sep 26, 2017
b7aee2a
private key file identified
Sep 26, 2017
545936b
write private ssh key file 1st attempt
Sep 29, 2017
f1b277f
SSHKEY unbound variable fix
Sep 29, 2017
b21e288
get rid of labels
Sep 29, 2017
b5bcccd
adding debug lines
Sep 29, 2017
b2d6806
if elif fi
Oct 6, 2017
c9c5e1d
no head for private key file
Oct 6, 2017
c5a26d3
--cache-from added
Oct 6, 2017
da2853d
fix 19649BASEIMAGE
Oct 6, 2017
6c5fd33
BASE_IMAGE - BASEIMAGE fix
Oct 9, 2017
8fb0e73
BASE_IMAGE - BASEIMAGE fix2
Oct 9, 2017
e8727b2
docker cache for automata as well
Oct 9, 2017
1094755
after cron
Oct 29, 2017
e165d93
change echos
Oct 29, 2017
6bd0ad5
github-master
Oct 30, 2017
8e803d3
screen added to docker image for debugging later purposes in prod (ta…
Oct 30, 2017
5ec9ee8
push to Piero One only from master branch
Oct 30, 2017
ae90398
only one docker file anymore
Oct 30, 2017
9515e27
Pier One namespace and ghe-backup docker image
Oct 30, 2017
341d6b9
Pier One namespace and ghe-backup docker image
Oct 30, 2017
c1f9924
ghe backup kubernetes
Oct 30, 2017
3df69f3
drop echo
Oct 30, 2017
1f4f580
drop comments
Oct 30, 2017
3fbd855
overview drawing with kubernetes
Oct 30, 2017
bfb29d6
default cron and alternative
Oct 30, 2017
972e99c
alternative cron readme adaption
Oct 30, 2017
20dc9b4
apt-get & pip3 versions, no user root
Oct 30, 2017
35791ae
user application with sudo permissions
Oct 30, 2017
4e58bb2
build docker image also for none master branches to test that
Oct 30, 2017
c84dc72
dockerfile refactored
Oct 30, 2017
206969b
python version cleanup
Oct 30, 2017
041a771
python version cleanup
Oct 30, 2017
8ab5439
docker file refactored
Oct 30, 2017
d15c84b
docker file refactored
Oct 30, 2017
cd5bd73
docker file refactored
Oct 30, 2017
a68501d
no cache for now
Oct 30, 2017
d6b09bc
CMD fix
Oct 30, 2017
d98721e
one COPY command
Oct 30, 2017
503d8ba
one more COPY command
Oct 30, 2017
d9d79e1
original copy commands and application user
Oct 30, 2017
4105949
allow su
Oct 30, 2017
1a69e34
fewer RUN and cron w/ application user
Oct 30, 2017
a9cd328
RUN cmd fix
Oct 30, 2017
c6bbfdf
create dir before cloning into
Oct 30, 2017
e65d1b0
RUN cmd fix
Oct 30, 2017
5020460
2 RUN cmds
Oct 30, 2017
6306703
no-install-recommends
Oct 30, 2017
72cf5c6
RUN cmd fix
Oct 30, 2017
5078fc9
RUN cmd fix
Oct 30, 2017
69377a1
drop user root comment
Oct 30, 2017
3a7b2bb
push form feature branch to test cluster test deployment
Nov 1, 2017
2e4cafe
image ghe-backup-kubernetes
Nov 1, 2017
4936d93
cron path: /usr/sbin/cron
Nov 2, 2017
342962b
call final-docker-cmd.sh differently
Nov 2, 2017
f5bccff
chown on fifo, run final command as sudo due to cron
Nov 3, 2017
75212af
comment about sbin and priviledged users
Nov 3, 2017
0ccf9a6
change ssh key creation according to https://github.com/zalando/ghe-b…
Nov 6, 2017
f5f0253
base image upgrade, install bash and cron, versions coming later
Nov 9, 2017
53d0aa8
cron and bash package versions
Nov 9, 2017
524b192
application user is sufficient
Nov 9, 2017
f196a46
change ownership of /data at runtime of not owned by application user
Nov 9, 2017
b555130
no screen as I can't connect to a terminal
Nov 9, 2017
1d0d41c
ssh was not installed, version later
Nov 9, 2017
6e158a0
ssh version
Nov 9, 2017
69b78de
rsync added to docker file, version later, readme updated
Nov 10, 2017
a65f28c
rsync version
Nov 10, 2017
1c9fb0c
replace backtick substitution with command as in https://github.com/…
Nov 13, 2017
df31184
restore section added
Nov 14, 2017
9e60cf7
section title camelcase
Nov 14, 2017
f6a979f
/data folder handling moved to start_backup shell script because on r…
Nov 15, 2017
8e30779
/data folder handling moved to start_backup shell script because on r…
Nov 15, 2017
89e3e4d
restore section refined
Nov 15, 2017
62035f3
fix: /data/ghe-production-data/ could not be created because owner of…
Nov 16, 2017
0b7d95d
deploy only if on master branch
Nov 17, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
57 changes: 36 additions & 21 deletions DockerfileAutomata → Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,36 @@
FROM registry.opensource.zalan.do/stups/python:3.5-cd26
FROM registry.opensource.zalan.do/stups/python:3.6.2-14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest version of stups/python is 3.6.3-15 published 42h ago

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed lets do this with a new ticket: #78

MAINTAINER lothar.schulz@zalando.de

USER root
# folder structure and user
RUN \
useradd -d /backup -u 998 -o application && \
mkdir -p /data/ghe-production-data/ && mkdir -p /backup/backup-utils/ && \
# read package lists
apt-get update -y && \
apt-get install -y sudo && \
# create application user
useradd -d /backup -u 998 -o -c "application user" application && \
# allow su
echo "application ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/application && \
chmod 0440 /etc/sudoers.d/application && \
# update w/ latest security patches
# install python pip3 pyyaml & english, git, screen
apt-get install -y --no-install-recommends unattended-upgrades python3=3.5.1-3 python3-dev=3.5.1-3 && \
apt-get install -y --no-install-recommends python3-pip=8.1.1-2ubuntu0.4 python3-yaml=3.11-3build1 && \
apt-get install -y --no-install-recommends language-pack-en=1:16.04+20161009 git=1:2.7.4-0ubuntu1.3 && \
apt-get install -y --no-install-recommends ssh=1:7.2p2-4ubuntu2.2 && \
apt-get install -y --no-install-recommends bash=4.3-14ubuntu1.2 && \
apt-get install -y --no-install-recommends rsync=3.1.1-3ubuntu1 && \
apt-get install -y --no-install-recommends cron=3.0pl1-128ubuntu2 && \
# install boto3
pip3 install --upgrade boto==2.48.0 boto3==1.4.7 && \
# clean apt-get lists
rm -rf /var/lib/apt/lists/* && \
# create directories
mkdir -p /backup/backup-utils/ && \
mkdir -p /kms && mkdir -p /var/log/ && mkdir /delete-instuck-backups
WORKDIR /backup

# read package lists
# update w/ latest security patches
# install python pip3 boto3 pyyaml & english & git
# clone backup-utils
RUN \
apt-get update -y && \
apt-get install -y unattended-upgrades python3 python3-dev python3-pip python3-yaml language-pack-en git && \
pip3 install --upgrade boto boto3 && \
rm -rf /var/lib/apt/lists/* && \
# clone backup-utils
git clone -b stable https://github.com/github/backup-utils.git && \
git -C /backup/backup-utils pull

Expand All @@ -33,31 +46,33 @@ COPY start_backup.sh /start_backup.sh
COPY python/delete_instuck_progress.py /delete-instuck-backups/delete_instuck_progress.py

# copy cron job
COPY cron-ghe-backup-automata /etc/cron.d/ghe-backup
COPY cron-ghe-backup /etc/cron.d/ghe-backup

# copy finale CMD commands
COPY final-docker-cmd.sh /backup/final-docker-cmd.sh


#PLACEHOLDER_4_COPY_SCM_SOURCE_JSON

# change mode of files
RUN \
chown -R application: /data && \
# change mode of files
chown -R application: /backup && \
chown -R application: /kms && \
chown -R application: /delete-instuck-backups && \
chown -R root: /start_backup.sh && \
chown -R application: /start_backup.sh && \
chmod 0700 /kms/extract_decrypt_kms.py && \
chmod 0700 /kms/convert-kms-private-ssh-key.sh && \
chmod 0644 /etc/cron.d/ghe-backup && \
chmod 0700 /delete-instuck-backups/delete_instuck_progress.py && \
chmod 0700 /start_backup.sh && \
chmod 0700 /backup/final-docker-cmd.sh && \
mkfifo /var/log/ghe-prod-backup.log

# delete_instuck_progress log
RUN \
mkfifo /var/log/ghe-prod-backup.log && \
chown -R application: /var/log/ghe-prod-backup.log && \
touch /var/log/ghe-delete-instuck-progress.log && \
chown -R application: /var/log/ghe-delete-instuck-progress.log

CMD ["/backup/final-docker-cmd.sh"]
USER application

# https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#user mentions to avoid sudo,
# however cron as part of the final-docker-cmd.sh has to run as
CMD "/backup/final-docker-cmd.sh"
63 changes: 0 additions & 63 deletions DockerfileBus

This file was deleted.

7 changes: 3 additions & 4 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ node('kraken') {

node('kraken') {
stage("Build Docker Image") {
docker(dockerRepo, fullImageName, "DockerfileAutomata" , false)
docker(dockerRepo, fullImageName, "Dockerfile" , false)
}
}

if (env.BRANCH_NAME == 'master') {
node('kraken') {
stage("Build and Push Docker") {
docker(dockerRepo, fullImageName, "DockerfileAutomata" , true)
docker(dockerRepo, fullImageName, "Dockerfile" , true)
}
}
timeout(time: 60, unit: "MINUTES") {
Expand Down Expand Up @@ -99,8 +99,7 @@ if (env.BRANCH_NAME == 'master') {
def docker(String dockerRepo, String fullImageName, String dockerfile, boolean pushImage) {
if (pushImage == true) {
sh "/tools/run :stups -- scm-source"
sh "/tools/run :stups -- sed -i 's&.*#PLACEHOLDER_4_COPY_SCM_SOURCE_JSON.*&COPY scm-source.json /scm-source.json&' DockerfileBus"
sh "/tools/run :stups -- sed -i 's&.*#PLACEHOLDER_4_COPY_SCM_SOURCE_JSON.*&COPY scm-source.json /scm-source.json&' DockerfileAutomata"
sh "/tools/run :stups -- sed -i 's&.*#PLACEHOLDER_4_COPY_SCM_SOURCE_JSON.*&COPY scm-source.json /scm-source.json&' Dockerfile"
}

sh "/tools/run :stups -- docker build --rm -t $fullImageName -f $dockerfile ."
Expand Down
126 changes: 116 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,32 @@
[Zalando Tech's ](https://tech.zalando.com/) [Github Enterprise](https://enterprise.github.com/) backup approach.

## Overview
[Github Enterprise](https://enterprise.github.com/) at Zalando Tech is a Ha setup running master and replica instances on AWS. The AWS account that runs the [high availability](https://help.github.com/enterprise/2.5/admin/guides/installation/high-availability-configuration/) setup also runs one backup host. There is a second backup host running in a different AWS account. We believe this backup gives us reliable backup data even in case one AWS is compromised.
[Github Enterprise](https://enterprise.github.com/) at Zalando Tech is a
[high availability](https://help.github.com/enterprise/2.11/admin/guides/installation/configuring-github-enterprise-for-high-availability/)
setup running master and replica instances on AWS.
The AWS account that runs the [high availability](https://help.github.com/enterprise/2.11/admin/guides/installation/configuring-github-enterprise-for-high-availability/)
setup also runs one backup host.
Another backup host can run in a different AWS account.
[Zalando Tech's ](https://tech.zalando.com/) [Github Enterprise](https://enterprise.github.com/) backup
can also run as a [POD](https://kubernetes.io/docs/concepts/workloads/pods/pod/#what-is-a-pod)
inside a [Kubernetes](https://kubernetes.io/) cluster.

We believe this backup approach provides reliable backup data even in case one AWS account or Kubernetes cluster is compromised.

![overview](/ZalandoGithubEnterprise.jpg "backup approach overview")

Basically [Zalando Tech's ](https://tech.zalando.com/) [Github Enterprise](https://enterprise.github.com/) backup
wraps github's [backup-utils](https://github.com/github/backup-utils) in a
[Docker](https://www.docker.com/) container.
If running on AWS, an [EBS volume](https://aws.amazon.com/de/ebs/) stores the actual backup data.
This way one can access the data even if the regarding backup host is down.
If running on Kubernetes, a [stateful set](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/)
including [volumes](https://kubernetes.io/docs/concepts/storage/volumes/) and
[volume claims](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) stores the actual backup data.
See a sample [statefulset below]()https://github.com/zalando/ghe-backup/blob/master/README.md#kubernetes-stateful-set,-volume,-volume-claim)
[Zalando Kubernetes](https://github.com/zalando-incubator/kubernetes-on-aws#kubernetes-on-aws) is based on AWS, so [volume claims
are based on EBS](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#aws).

![overview](/backup_overview.PNG "backup approach overview")

Basically ghe-backup wraps github's [backup-utils](https://github.com/github/backup-utils) in a [Docker](https://www.docker.com/) container. An [EBS volume](https://aws.amazon.com/de/ebs/) stores the actual backup data to be able to access the data even if the regarding backup host is down.

## Local docker development

Expand Down Expand Up @@ -43,8 +64,13 @@ e.g.

### IAM [policy](http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies.html) settings

Github Enterprise backup hosts contain private ssh keys that have to match with public ssh keys registered on the Github Enterprise main instance.
Private ssh keys should not be propagated unencrypted with deployments. AWS KMS allows to encrypt any kind of data, so this service is used to encrypt the private ssh key. KMS actions are managed by policies to make sure only configured tasks can be performed.
[Zalando Tech's ](https://tech.zalando.com/) [Github Enterprise](https://enterprise.github.com/) backup hosts contain
private ssh keys that have to match with public ssh keys registered on the Github Enterprise main instance.
Private ssh keys should not be propagated unencrypted with deployments.
AWS KMS allows to encrypt any kind of data, so this service is used to encrypt the private ssh key for both,
[Zalando Tech's ](https://tech.zalando.com/) [Github Enterprise](https://enterprise.github.com/) backup running on AWS and Kubernetes.
KMS actions are managed by policies to make sure only configured tasks can be performed.

A kms policy similar to the one shown below is needed to:
* allow kms decryption of the encrypted ssh key
* access s3 bucket
Expand Down Expand Up @@ -109,6 +135,31 @@ Pls go to bashtest directory:

*Make sure* you run ```./cleanup-tests.sh ``` in order to clean up afterwards.


### Running in an additional AWS account
Please adapt the cron tab definitions when running in another AWS account e.g. to the values in cron-ghe-backup-alternative.
This lowers the load on the Github Enterprise master with respect to backup attempts.


### Restore

Restoring backups is based on github's _(using the backup and restore commands)[https://github.com/github/backup-utils#using-the-backup-and-restore-commands]_.
The actual _ghe-restore_ command gets issued from the backup host. Please note: the backup restore can run for several hours.
(Nohup)[https://en.wikipedia.org/wiki/Nohup] is recommended to keep the restore process running even if the shell connection is lost.

sample steps include:
```
put ghe instance to restor to into maintenance mode
# ssh into your ec2 instance and exec into your container
# docker exec -it [container label or ID] bash/sh
# or
# exec into your pod
# kubectl exec -it [your pod e.g. statefulset-ghe-backup-0] bash/sh
nohup /backup/backup-utils/bin/ghe-restore -f [IP address of the ghe master to restore] &
# monitor the backup progress
tail -f nohup.out
```

## Contribution
pls refer to [CONTRIBUTING.md](CONTRIBUTING.md)

Expand Down Expand Up @@ -179,10 +230,65 @@ and run it like:
### EBS volumes with Senza
Please follow these instructions: [senza's storage guild](https://docs.stups.io/en/latest/user-guide/storage.html) to create a EBS volume the stups way.

## Blog Post
You can find more context and details on [Zalando's](https://github.com/zalando/) [tech blog](https://tech.zalando.com/).
Blog post one is basically why and how ghe-backup was done: [tech.zalando.com/blog/multi-aws-github-enterprise-backup](https://tech.zalando.com/blog/multi-aws-github-enterprise-backup/).
Blog post two explains how changes gets continuously delivered using [Lizzy](https://github.com/zalando/lizzy/): [tech.zalando.com/blog/ci-pipelines-with-lizzy](https://tech.zalando.com/blog/ci-pipelines-with-lizzy/).
### Kubernetes stateful set, volume, volume claim

The statefulset resource definition is the main kubernetes configuration file:
```
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: statefulset-ghe-backup
spec:
serviceName: deploy-ghe-backup
replicas: 1
template:
metadata:
labels:
app: ghe-backup
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
containers:
- name: container-{ghe-backup}
image: pierone.zalando/machinery/ghe-backup-kubernetes:latest
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
cpu: 400m
memory: 4Gi
volumeMounts:
- name: data-{ghe-backup}
mountPath: /data
- name: {ghe-backup}-secret
mountPath: /meta/ghe-backup-secret
readOnly: true
- name: podinfo
mountPath: /details
readOnly: false
volumes:
- name: {ghe-backup}-secret
secret:
secretName: {ghe-backup}-secret
- name: podinfo
downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
volumeClaimTemplates:
- metadata:
name: data-ghe-backup
annotations:
volume.beta.kubernetes.io/storage-class: standard
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1000Gi
```

===
### License
Expand Down
Binary file added ZalandoGithubEnterprise.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion backup.config
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
GHE_HOSTNAME="github.bus.zalan.do"
GHE_HOSTNAME="github-master.bus.zalan.do"
GHE_DATA_DIR="/data/ghe-production-data"
GHE_NUM_SNAPSHOTS=40
GHE_EXTRA_SSH_OPTS=" -o StrictHostKeyChecking=no "
Expand Down
Binary file removed backup_overview.PNG
Binary file not shown.