disaster-recovery

Name		Name	Last commit message	Last commit date
parent directory ..
broken-kube-ca		broken-kube-ca
broken-opa-gatekeeper		broken-opa-gatekeeper
complete-power-outage		complete-power-outage
etcd-split-brain		etcd-split-brain
namespace-stuck-terminating		namespace-stuck-terminating
rebuild-from-scratch		rebuild-from-scratch
run-away-app		run-away-app
Kubernetes Master Class - 5_11 - Disaster Recovery.pptx		Kubernetes Master Class - 5_11 - Disaster Recovery.pptx
README.md		README.md
recycle-all-labs.sh		recycle-all-labs.sh

README.md

Recovering from a disaster with Rancher and Kubernetes

Everything breaks at some point; whether it is infrastructure (DNS, network, storage, etc.) or Kubernetes itself, something will fail eventually. In this session, we will walk through some common failure scenarios, including identifying failures and how to respond to them in the fastest way possible using the same troubleshooting steps, scripts, and tools Rancher Support uses when supporting our Enterprise customers. Then finally, to recover from these types of failures in place or scratch. This session includes documentation and scripts for reproducing all of these failures (based on actual events) in a lab environment.

YouTube video

Overview

Terms
Common scenarios with lab examples
- Identifying the issue
- Troubleshooting
- Restoring/Recovering
- Preventive tasks
- Reproducing in a lab
Q&A

Terms

Rancher Server is a set of pods that run the main orchestration engine and UI for Rancher.
RKE (Rancher Kubernetes Engine) is the tool Rancher uses to create and manage Kubernetes clusters
Local/upstream cluster This is the cluster where the Rancher server is installed, this is usually an RKE built cluster)
Downstream cluster(s) are Kubernetes cluster that Rancher is managing
Managed clusters are created and managed by Rancher
Imported clusters were created outside Rancher then imported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disaster-recovery

disaster-recovery

broken-kube-ca

broken-kube-ca

broken-opa-gatekeeper

broken-opa-gatekeeper

complete-power-outage

complete-power-outage

etcd-split-brain

etcd-split-brain

namespace-stuck-terminating

namespace-stuck-terminating

rebuild-from-scratch

rebuild-from-scratch

run-away-app

run-away-app

Kubernetes Master Class - 5_11 - Disaster Recovery.pptx

Kubernetes Master Class - 5_11 - Disaster Recovery.pptx

README.md

README.md

recycle-all-labs.sh

recycle-all-labs.sh

README.md

Recovering from a disaster with Rancher and Kubernetes

YouTube video

Overview

Terms

Common Scenarios

Files

disaster-recovery

Directory actions

More options

Directory actions

More options

Latest commit

History

disaster-recovery

Folders and files

parent directory

Recovering from a disaster with Rancher and Kubernetes

Overview

Terms

Common Scenarios