Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule refactor #45

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Rule refactor #45

wants to merge 19 commits into from

Conversation

brianberzins
Copy link
Collaborator

Related to #44

This pull-request fundamentally changes the way that rules are loaded and evaluated with the expressed intent of making #44 easier to implement.

Previously, a rule was a structure the implemented two methods: a load method and a shouldReap method. Rules that were configured (environment variables controlling their behavior) were set, were stored into a list of rules for evaluation. Only if ALL rules returned true from the shouldReap method on a specific pod would the pod be reaped.

While this worked extremely well, it makes a couple of assumptions. The biggest is that ALL of the configuration related to what will get reaped exists within the environment variables of the pod-reaper itself. As soon as there's anything dynamic (like say from a pod-annotation), then there's the possibility of rules being configured elsewhere that aren't configured within the reaper itself and vice-versa.

In short, it worked well -- but wasn't very flexible to this kind of extension.
Here's a list of changes:

  • rules have be rewritten to simply be a type rule func(v1.Pod) (result, string) meaning that they are just functions with the proper parameters and return types.
  • tests and rule implementations have been modified to reflect this change
  • some logging has been cleaned up and made more consistent across rules

Functionally -- the behavior of the reaper should be identical to previous versions. While when and how rules are evaluated has changed, neither the configuration (environment variables) nor the test reaping behavior has been altered in this change.

A version of the reaper built with this change can be found here: brianberzins/pod-reaper:alpha-02212020

As always, I tread on the paranoid side when working on pod-reaper to help quintuple check that there won't be any accidental annihilation of clusters.

@brianberzins
Copy link
Collaborator Author

Notes from #46

  1. Documentation about how rules work needs to be updated here
  2. Failing fast on invalid configurations (at least on the pod-reaper side)

It isn't possible to know annotations of pods because they are dynamic vs the immutable setup of the pod-reaper, but the configuration of the pod-reaper itself should be able to "fail fast" and "fail safe" -- If something isn't configured correctly, I generally like to know as soon as possible.

@brianberzins
Copy link
Collaborator Author

Functionally, I'm feeling pretty dang good about this right now.
Can try it out from my latest commit here: brianberzins/pod-reaper:alpha-20200227

@jdharmon
I realized I had included more debug logging before I actually merged in your addition of log levels.
Specifically, I logged spare events where a rule was evaluated, didn't match, and was therefore explicitly spared from reaping at the debug level. I feel like depending on the use case, that could be a logging flood and I'm considering dumping it down to trace level.

@brianberzins brianberzins changed the title WIP: Rule refactor Rule refactor Feb 28, 2020
@brianberzins
Copy link
Collaborator Author

Removed WIP: I think this is good to go.

From all testing that I have done, there is nothing breaking compatibility for any rule or reaper configurations. The structure of logging has however changed, so it someone is dependent on that, then that could be considered a "breaking change".

The more eyes on this the better!

Would love feedback (good or bad) -- golang is certainly not my strongest programming language and I'm picking up new go-isms each time I work on this.

@jdharmon
Copy link
Contributor

jdharmon commented Feb 28, 2020

I would expect to see the following events at the debug level:

  • cycle start
  • spare decision
  • reap decision (at info level)
  • cycle end

I can see how a bunch of spare decisions from rules I don't care about would be annoying. That brings up a question: Should all rules be enabled? Maybe there should be an environment variable to specify which rules to enable. i.e. RULES_ENABLED=Duration,Unready

@brianberzins
Copy link
Collaborator Author

: Should all rules be enabled?

This is the big question here, and it's basically at the heart of #44
I agree that specifying a list of rules to enable helps here with some questions in that issue, specifically controlling what rules may be allowed to be configured/overridden by pod annotations.

I think by default, all rules should be there (this would keep backwards compatibility).
Right now the only solution I have come up with for limiting the rules is right along with yours.

In my head, that lead me to a simple question: "how would I log which rules are enabled". At first it seemed really simple. With your solution, even with default, just log the rules that are "enabled". But what if those are enabled, but have no configured value? Should that error? If so, there's a backwards compatibility problem. If not, what should it log? "enabled with configuration unset" or something like that.

That the question doesn't have an extremely simple answer is making me question the design of this rework.

@jdharmon
Copy link
Contributor

I think for backwards compatibility, it's OK to enable all rules by default. If a rule is not configured, or there are no decorated pods, then it does nothing. If RULES_ENABLED is specified, then enable only those rules.

But what if those are enabled, but have no configured value? Should that error? If so, there's a backwards compatibility problem. If not, what should it log? "enabled with configuration unset" or something like that.

That depends...for this issue, an error would be fine. However, for #44, all rules would need to be able to be enabled without a default configuration. For the latter, I think logging a warning is reasonable. It's not necessarily an error, but it might be. Something like

if enabled {
  default := os.Getenv("X_DEFAULT")
  if default != "" {
    log.Info("Rule X enabled")
  } else {
    log.Warn("Rule X enabled, but no default was specified. To get rid of this warning, either set a value for X_DEFAULT or specify RULES_ENABLED."
  }
}

@brianberzins brianberzins changed the title Rule refactor WIP Rule refactor Feb 29, 2020
@brianberzins brianberzins changed the title WIP Rule refactor Rule refactor Mar 2, 2020
@brianberzins
Copy link
Collaborator Author

At this point, I'm feeling pretty good about this PR.
Right now, there should be no difference in reaping before (but there are some logging changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants