Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration via annotations #64

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Pod-Reaper is configurable through environment variables. The pod-reaper specifi
- `REQUIRE_LABEL_VALUES` comma-separated list of metadata label values (of key-value pair) that pod-reaper should require
- `REQUIRE_ANNOTATION_KEY` pod metadata annotation (of key-value pair) that pod-reaper should require
- `REQUIRE_ANNOTATION_VALUES` comma-separated list of metadata annotation values (of key-value pair) that pod-reaper should require
- `RULES` comma-separated list of rules to load regardless of default

Additionally, at least one rule must be enabled, or the pod-reaper will error and exit. See the Rules section below for configuring and enabling rules.

Expand All @@ -37,6 +38,41 @@ EXCLUDE_LABEL_VALUES=disabled,false
CHAOS_CHANCE=.001
```

#### Annotations

Rule configuration may be overridden by annotations on individual pods. For single-value rules, the configured rule value will be replaced by the annotation value. For multi-value rules, annotations will be added to the configured rule values. See [Implemented Rules](#implemented-rules) for available annotations.

Example environment variables with annotations:

```sh
# pod-reaper configuration
NAMESPACE=test
SCHEDULE=@every 30s

# enable at least one rule
MAX_UNREADY=5m
RULES=duration,unready
```

Pods

```yaml
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
pod-reaper/max-duration: 1h
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
```

In this configuration, the Duration, and Unready rules will be loaded. The pod will be reaped if it is older than 1 hour and unready for 5 minutes.

### `NAMESPACE`

Default value: "" (which will look at ALL namespaces)
Expand Down Expand Up @@ -134,10 +170,18 @@ Default value: Logrus

This environment variable modifies the structured log format for easy ingestion into different logging systems, including Stackdriver via the Fluentd format. Available formats: Logrus, Fluentd

### `RULES`

This is an optional, comma-separated list of rules which should be loaded. If a rule is specified here, it will be loaded even if it does not have a configuration defined in an environment variable. This is used to load rules which only operate on annotations.

Available rules: chaos, container_status, duration, pod_status, unready

## Implemented Rules

### Chaos Chance

Annotation: `pod-reaper/chaos-chance`

Flags a pod for reaping based on a random number generator.

Enabled and configured by setting the environment variable `CHAOS_CHANCE` with a floating point value. A random number generator will generate a value in range `[0,1)` and if the the generated value is below the configured chaos chance, the pod will be flagged for reaping.
Expand All @@ -154,6 +198,8 @@ Remember that pods can be excluded from reaping if the pod has a label matching

### Container Status

Annotation: `pod-reaper/container-statuses`

Flags a pod for reaping based on a container within a pod having a specific container status.

Enabled and configured by setting the environment variable `CONTAINER_STATUSES` with a coma separated list (no whitespace) of statuses. If a pod is in either a waiting or terminated state with a status in the specified list of status, the pod will be flagged for reaping.
Expand All @@ -169,6 +215,8 @@ Note that this will not catch statuses that are describing the entire pod like t

### Pod Status

Annotation: `pod-reaper/pod-statuses`

Flags a pod for reaping based on the pod status.

Enabled and configured by setting the environment variable `POD_STATUSES` with a coma separated list (no whitespace) of statuses. If the pod status in the specified list of status, the pod will be flagged for reaping.
Expand All @@ -184,12 +232,16 @@ Note that pod status is different than container statuses as it checks the statu

### Duration

Annotation: `pod-reaper/max-duration`

Flags a pod for reaping based on the pods current run duration.

Enabled and configured by setting the environment variable `MAX_DURATION` with a valid go-lang `time.duration` format (example: "1h15m30s"). If a pod has been running longer than the specified duration, the pod will be flagged for reaping.

### Unready

Annotation: `pod-reaper/max-unready`

Flags a pod for reaping based on the time the pod has been unready.

Enabled and configured by setting the environment variable `MAX_UNREADY` with a valid go-lang `time.duration` format (example: "10m"). If a pod has been unready longer than the specified duration, the pod will be flagged for reaping.
Expand Down
33 changes: 27 additions & 6 deletions rules/chaos.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,15 @@ import (
"strconv"
"time"

"github.com/sirupsen/logrus"
v1 "k8s.io/api/core/v1"
)

const envChaosChance = "CHAOS_CHANCE"
const (
ruleChaos = "chaos"
envChaosChance = "CHAOS_CHANCE"
annotationChaosChance = annotationPrefix + "/chaos-chance"
)

var _ Rule = (*chaos)(nil)

Expand All @@ -23,18 +28,34 @@ func init() {
}

func (rule *chaos) load() (bool, string, error) {
value, active := os.LookupEnv(envChaosChance)
if !active {
explicit := explicitLoad(ruleChaos)
value, hasDefault := os.LookupEnv(envChaosChance)
if !explicit && !hasDefault {
return false, "", nil
}
chance, err := strconv.ParseFloat(value, 64)
if err != nil {
if !explicit && err != nil {
return false, "", fmt.Errorf("invalid chaos chance %s", err)
}
rule.chance = chance
return true, fmt.Sprintf("chaos chance %s", value), nil

if rule.chance != 0 {
return true, fmt.Sprintf("chaos chance %s", value), nil
}
return true, fmt.Sprint("chaos (no default)"), nil
Comment on lines 30 to +45
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand the desired behavior here, as there are 4 possible cases based on explicit load and default value.

  1. NO explicit load and NO default value -- don't load the rule
  2. NO explicit load but default value -- load the rule (previous behavior) -- this case honors the annotation overrides?
  3. explicit load but NO default value -- load the rule, but the rule would also say "don't reap" unless the pod had an annotation override and the condition for that override was met?
  4. explicit load but default value -- loads the rule and and default value, but honors the annotation overrides.

Wondering if we need a way to determine whether or not the reaper should honor the annotations? I guess the concern is making sure that if the reaper isn't intended to be overwritten that it can't be. Maybe I'm missing that piece here (I'm still looking!)

There might also be a case where explicit loading of the rule is set but there's also a value that isn't parsible as a float (i.e. lizard) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct for all 4 cases. If a rule is loaded (explicitly or with a default value), it will always check for an annotation and use the annotation's value if found. I don't think we need an option to explicitly enable annotations. It should be safe to assume that if a user added a pod-reaper/* annotation, they intend to opt-in to annotation override. If a pod is not annotated, then pod-reaper will behave the same as it does now.

The annotation is parsed in ShouldReap(). If a pod is annotated with pod-reaper/chaos-chance: lizard, a warning will be logged, and the behavior will fall back to default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chaos rule might be a weird edge case since all enabled rules must return true. If you were to enable chaos without a default, it would prevent any pod which was not annotated from being reaped. I would prefer that a pod is reaped if any rule returns true, but that would be a potentially breaking change. We could also implement reap/spare logic like you mentioned in #44 (comment), but I wanted to keep things simple if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, reaping if any rule returns true would be breaking. If it operator by reaping when any rule returned true there wouldn't be a way to get an "or" function between multiple rules. How I've done the "reaping on any rule = true" situation was to make multiple reapers, each with only one rule.

}

func (rule *chaos) ShouldReap(pod v1.Pod) (bool, string) {
return rand.Float64() < rule.chance, "was flagged for chaos"
chance := rule.chance
annotationValue := pod.Annotations[annotationChaosChance]
if annotationValue != "" {
annotationChance, err := strconv.ParseFloat(annotationValue, 64)
if err == nil {
chance = annotationChance
} else {
logrus.Warnf("pod %s has invalid chaos chance: %s", pod.Name, err)
}
}

return rand.Float64() < chance, "was flagged for chaos"
}
43 changes: 42 additions & 1 deletion rules/chaos_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import (
"testing"

"github.com/stretchr/testify/assert"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
)

func TestChaosLoad(t *testing.T) {
Expand All @@ -32,6 +32,14 @@ func TestChaosLoad(t *testing.T) {
assert.Equal(t, "", message)
assert.False(t, loaded)
})
t.Run("explicit load without default", func(t *testing.T) {
os.Clearenv()
os.Setenv(envExplicitLoad, ruleChaos)
loaded, message, err := (&chaos{}).load()
assert.NoError(t, err)
assert.Equal(t, "chaos (no default)", message)
assert.True(t, loaded)
})
}

func TestChaosShouldReap(t *testing.T) {
Expand All @@ -52,4 +60,37 @@ func TestChaosShouldReap(t *testing.T) {
shouldReap, _ := chaos.ShouldReap(v1.Pod{})
assert.False(t, shouldReap)
})
t.Run("annotation override reap", func(t *testing.T) {
os.Clearenv()
os.Setenv(envChaosChance, "0.0") // default never
chaos := chaos{}
chaos.load()
pod := v1.Pod{}
pod.Annotations = map[string]string{
annotationChaosChance: "1.0", // override always
}
shouldReap, message := chaos.ShouldReap(pod)
assert.True(t, shouldReap)
assert.Equal(t, "was flagged for chaos", message)
})
t.Run("annotation override no reap", func(t *testing.T) {
os.Clearenv()
os.Setenv(envChaosChance, "1.0") // default always
chaos := chaos{}
chaos.load()
pod := v1.Pod{}
pod.Annotations = map[string]string{
annotationChaosChance: "0.0", // override never
}
shouldReap, _ := chaos.ShouldReap(pod)
assert.False(t, shouldReap)
})
t.Run("explicit load no annotation", func(t *testing.T) {
os.Clearenv()
os.Setenv(envExplicitLoad, ruleChaos)
chaos := chaos{}
chaos.load()
shouldReap, _ := chaos.ShouldReap(v1.Pod{})
assert.False(t, shouldReap)
})
}
32 changes: 25 additions & 7 deletions rules/container_status.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,14 @@ import (
"os"
"strings"

"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
)

const envContainerStatus = "CONTAINER_STATUSES"
const (
containerStatusName = "container_status"
envContainerStatus = "CONTAINER_STATUSES"
annotationContainerStatus = annotationPrefix + "/container-statuses"
)

var _ Rule = (*containerStatus)(nil)

Expand All @@ -17,16 +21,30 @@ type containerStatus struct {
}

func (rule *containerStatus) load() (bool, string, error) {
value, active := os.LookupEnv(envContainerStatus)
if !active {
explicit := explicitLoad(containerStatusName)
value, hasDefault := os.LookupEnv(envContainerStatus)
if !explicit && !hasDefault {
return false, "", nil
}
rule.reapStatuses = strings.Split(value, ",")
return true, fmt.Sprintf("container status in [%s]", value), nil
if value != "" {
rule.reapStatuses = strings.Split(value, ",")
}

if len(rule.reapStatuses) != 0 {
return true, fmt.Sprintf("container status in [%s]", value), nil
}
return true, "container status (no default)", nil
}

func (rule *containerStatus) ShouldReap(pod v1.Pod) (bool, string) {
for _, reapStatus := range rule.reapStatuses {
reapStatuses := rule.reapStatuses
annotationValue := pod.Annotations[annotationContainerStatus]
if annotationValue != "" {
annotationValues := strings.Split(annotationValue, ",")
reapStatuses = append(reapStatuses, annotationValues...)
}

for _, reapStatus := range reapStatuses {
for _, containerStatus := range pod.Status.ContainerStatuses {
state := containerStatus.State
// check both waiting and terminated conditions
Expand Down
32 changes: 31 additions & 1 deletion rules/container_status_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import (
"testing"

"github.com/stretchr/testify/assert"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
)

func testWaitContainerState(reason string) v1.ContainerState {
Expand Down Expand Up @@ -65,6 +65,14 @@ func TestContainerStatusLoad(t *testing.T) {
assert.Equal(t, "", message)
assert.False(t, loaded)
})
t.Run("explicit load without default", func(t *testing.T) {
os.Clearenv()
os.Setenv(envExplicitLoad, containerStatusName)
loaded, message, err := (&containerStatus{}).load()
assert.NoError(t, err)
assert.Equal(t, "container status (no default)", message)
assert.True(t, loaded)
})
}

func TestContainerStatusShouldReap(t *testing.T) {
Expand All @@ -87,4 +95,26 @@ func TestContainerStatusShouldReap(t *testing.T) {
shouldReap, _ := containerStatus.ShouldReap(pod)
assert.False(t, shouldReap)
})
t.Run("annotation reap", func(t *testing.T) {
os.Clearenv()
os.Setenv(envContainerStatus, "test-status")
containerStatus := containerStatus{}
containerStatus.load()
pod := testStatusPod(testWaitContainerState("another-status"))
pod.Annotations = map[string]string{
annotationContainerStatus: "another-status",
}
shouldReap, reason := containerStatus.ShouldReap(pod)
assert.True(t, shouldReap)
assert.Regexp(t, ".*another-status.*", reason)
})
t.Run("explicit load no annotation", func(t *testing.T) {
os.Clearenv()
os.Setenv(envExplicitLoad, containerStatusName)
containerStatus := containerStatus{}
containerStatus.load()
pod := testStatusPod(testWaitContainerState("not-present"))
shouldReap, _ := containerStatus.ShouldReap(pod)
assert.False(t, shouldReap)
})
}
39 changes: 32 additions & 7 deletions rules/duration.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,15 @@ import (
"os"
"time"

"k8s.io/api/core/v1"
"github.com/sirupsen/logrus"
v1 "k8s.io/api/core/v1"
)

const envMaxDuration = "MAX_DURATION"
const (
ruleDuration = "duration"
envMaxDuration = "MAX_DURATION"
annotationMaxDuration = annotationPrefix + "/max-duration"
)

var _ Rule = (*duration)(nil)

Expand All @@ -17,25 +22,45 @@ type duration struct {
}

func (rule *duration) load() (bool, string, error) {
value, active := os.LookupEnv(envMaxDuration)
if !active {
explicit := explicitLoad(ruleDuration)
value, hasDefault := os.LookupEnv(envMaxDuration)
if !explicit && !hasDefault {
return false, "", nil
}
duration, err := time.ParseDuration(value)
if err != nil {
if !explicit && err != nil {
return false, "", fmt.Errorf("invalid max duration: %s", err)
}
rule.duration = duration
return true, fmt.Sprintf("maximum run duration %s", value), nil

if rule.duration != 0 {
return true, fmt.Sprintf("maximum run duration %s", value), nil
}
return true, fmt.Sprint("maximum run duration (no default)"), nil
}

func (rule *duration) ShouldReap(pod v1.Pod) (bool, string) {
duration := rule.duration
annotationValue := pod.Annotations[annotationMaxDuration]
if annotationValue != "" {
annotationDuration, err := time.ParseDuration(annotationValue)
if err == nil {
duration = annotationDuration
} else {
logrus.Warnf("pod %s has invalid max duration annotation: %s", pod.Name, err)
}
}
if duration == 0 {
return false, ""
}

podStartTime := pod.Status.StartTime
if podStartTime == nil {
return false, ""
}

startTime := time.Unix(podStartTime.Unix(), 0) // convert to standard go time
cutoffTime := time.Now().Add(-1 * rule.duration)
cutoffTime := time.Now().Add(-1 * duration)
runningDuration := time.Now().Sub(startTime)
message := fmt.Sprintf("has been running for %s", runningDuration.String())
return startTime.Before(cutoffTime), message
Expand Down
Loading