Control health checks and toggle upstream node status in load balancers with ease.
-
Updated
May 22, 2017 - Go
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Control health checks and toggle upstream node status in load balancers with ease.
[INACTIVE] Terraform provider for Arachnys' Cabot. Create, manage, and manipulate status checks, and alerts for services.
Endpoint monitoring and DNS failover agent written in Go
Maia is a CLI that allows you to execute remote commands on multiple machines at once.
External Node Classifier written in Go
The Skinny Distributed Lock Service
Capstone project of the Udacity's Cloud Native Application Architecture Nanodegree
Simple way to test connection to memcached
Keep Kubernetes Deployments up-to-date with the `latest` container images
DevOps E / SRE 업무를 하면서 전문성을 갖추기 위하여 공부한 자료를 업로드하는 공간입니다. 개인적인 공부이지만 참고할 부분이 될 수 있었으면 좋겠습니다.
Demo repository for "Go for Operations" workshop.
Chaos testing, network emulation, and stress testing tool for containers
A Chaos Engineering Platform for Kubernetes.
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
A chaos engineering platform for supporting the complete fault drill lifecycle.