Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop ignoring order of primitives in colocation constraints #153

Merged
merged 1 commit into from
Nov 7, 2015

Conversation

fghaas
Copy link
Contributor

@fghaas fghaas commented Jul 15, 2015

cs_colocation, when used with the pcs provider, makes the incorrect
assumption that the order of primitives in colocation constraints is
arbitrary. It is not, and the following must not be equivalent:

cs_colocation { 'vip_with_service':
  primitives => [ 'nginx_vip', 'nginx_service' ],
}

cs_colocation { 'vip_with_service':
  primitives => [ 'nginx_service', 'nginx_vip' ],
}

These mean different things. The first example ensures that nginx_vip
runs on whatever node nginx_service runs on, and if nginx_service
can't be started on any node, Pacemaker won't even attempt to start
nginx_vip. However, if nginx_vip cannot be started on any node,
nginx_service will continue to be available.

The second example means the opposite. In it, nginx_service is
configured to run wherever nginx_vip runs, and if nginx_vip can't run
anywhere, nginx_service won't start. And if nginx_vip cannot be
started on any node, nginx_service will not be available.

As a result, drop artificial sorting for the primitives, and use them
exactly as specified.

Fixes issue #150.

@mkrakowitzer
Copy link
Contributor

LGTM, Would you consider adding tests?

@ffrank @igalic @nibalizer whats the policy around tests for the corosync module? There are a lot of gaps in the current testing, I kind of feel its unreasonable to expect someone to write a full test for a provider which currently has none when making minor code fixes. My feeling is:

  • If a provider currently has tests, then it should be a requirement to add them to the provider.
  • If a provider does not have any tests, it should not be a requirement to add them. An issue should opened to add a test for that provider?

@fghaas
Copy link
Contributor Author

fghaas commented Jul 16, 2015

@mkrakowitzer Your LGTM confuses me; what exactly looks good to you? My patch fixes the provider so it no longer makes incorrect assumptions about the reversibility of colocation constraints, but there is still the problem that it deals with things backwards (compared to the crm provider) or correctly (compared to non-puppetized Pacemaker).

@mkrakowitzer
Copy link
Contributor

Your patch that fixes the provider so it no longer makes incorrect assumptions about the reversibility of colocation constraints looks good to me.

Sorry I did not see the second comment, I am not 100% which is more correct. My feeling is that it should work in the sameway as the crm provider. At least then from a puppet perspetive the ordering will be applied the same regardless of the wether you are using crm or pcs provider.

Even if this does not match the way you would do it from the command line:

crmsh # crm configure colocation website-with-ip INFINITY: WebSite ClusterIP
pcs   # pcs constraint colocation add ClusterIP with WebSite INFINITY

@fghaas
Copy link
Contributor Author

fghaas commented Jul 16, 2015

OK, so your suggestion is to change

       :primitives => [rsc, with_rsc].sort,

to

       :primitives => [with_rsc, rsc],

Do I understand that correctly?

@mkrakowitzer
Copy link
Contributor

Yes, it is either this, or to modify the README documentation for colocation and modify crm resource to behave in the opposite way. The primitives should be colocated in the same way regardless of whether crm or pcs providers are being used. What are your thoughts on this?

@igalic
Copy link
Contributor

igalic commented Jul 17, 2015

this provider is notoriously hard to (acceptance) test (correctly. i.e.: on more than one machine)
but we should fix the spec tests, at the very least.

@fghaas
Copy link
Contributor Author

fghaas commented Jul 20, 2015

Thanks a lot for the tests @mkrakowitzer. As it turns out I had misinterpreted Puppet's intended behavior (mimicking the opposite of Pacemaker's default), so while my original patch is fine, its commit message is backwards and misleading. How do we best fix this? Do I pull your commits into my branch, do an interactive rebase rewording my commit, and then force-push to my branch again? Or should I just update my branch, you re-pull from me, and you resubmit?

@mkrakowitzer
Copy link
Contributor

I would say update your branches commit message and resubmit. Although I think you need to incorporate my changes on line 111 and 124.

This means that

cs_colocation { 'first_with_second':
  primitives => [ 'first', 'second' ],
}

would result in the below command being executed:

pcs   # pcs constraint colocation add first with second INFINITY id=first_with_second

Which would be equivilent to the current implementation of the crmsh provider:

crm configure colocation first_with_second INFINITY: second first

@mkrakowitzer
Copy link
Contributor

I'll submit the tests as a seperate PR.

@fghaas fghaas force-pushed the primitives-order branch 2 times, most recently from c8c1cf5 to 275e8ba Compare July 20, 2015 09:39
@fghaas
Copy link
Contributor Author

fghaas commented Jul 20, 2015

@mkrakowitzer I've updated my branch; please take a look at 275e8ba and see if you're happy with that.

@mkrakowitzer
Copy link
Contributor

As is pcs will run:

pcs constraint colocation add second with first INFINITY id=first_with_second

Which I do not think what we want. The previous commit was OK, ie the very first commit. Sorry for all the confusion. If you can just resubmit it (You mentioned updating your commit message), then we can merge it and ill submit an updated test.

@fghaas
Copy link
Contributor Author

fghaas commented Jul 20, 2015

@mkrakowitzer Of the code snippet you mentioned, can you please paste the exact cs_colocation Puppet invocation and the XML that pcs generates?

@mkrakowitzer
Copy link
Contributor

  cs_primitive { 'first':
    primitive_class => 'ocf',
    primitive_type  => 'Dummy',
    provided_by     => 'heartbeat',
    operations      => { 'monitor' => { 'interval' => '10s' } },
  }

  cs_primitive { 'second':
    primitive_class => 'ocf',
    primitive_type  => 'Dummy',
    provided_by     => 'heartbeat',
    operations      => { 'monitor' => { 'interval' => '10s' } },
  }

  cs_colocation { 'first_with_second':
    primitives => [ 'first', 'second' ],
    score      => 'INFINITY',
  }

As per most recent commit:

Debug: Executing '/usr/sbin/pcs constraint colocation add second with first INFINITY id=first_with_second'
 <rsc_colocation id="first_with_second" rsc="second" score="INFINITY" with-rsc="first"/>

As per below diff, original commit + See previous comments re line 111 and 124. This results in the same xml config being generated as the crm provider.

diff --git a/lib/puppet/provider/cs_colocation/pcs.rb b/lib/puppet/provider/cs_colocation/pcs.rb
index dad7f82..1c9b7c7 100644
--- a/lib/puppet/provider/cs_colocation/pcs.rb
+++ b/lib/puppet/provider/cs_colocation/pcs.rb
@@ -39,7 +39,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac
       colocation_instance = {
         :name       => items['id'],
         :ensure     => :present,
-        :primitives => [with_rsc, rsc],
+        :primitives => [rsc, with_rsc],
         :score      => items['score'],
         :provider   => self.name,
         :new        => false
@@ -108,7 +108,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac

       cmd = [ command(:pcs), 'constraint', 'colocation' ]
       cmd << "add"
-      rsc = @property_hash[:primitives].pop
+      rsc = @property_hash[:primitives].shift
       if rsc.include? ':'
         items = rsc.split[':']
         if items[1] == 'Master'
@@ -121,7 +121,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac
         cmd << rsc
       end
       cmd << 'with'
-      rsc = @property_hash[:primitives].pop
+      rsc = @property_hash[:primitives].shift
       if rsc.include? ':'
         items = rsc.split(':')
         if items[1] == 'Master'
Debug: Executing '/usr/sbin/pcs constraint colocation add first with second INFINITY id=first_with_second'
      <rsc_colocation id="first_with_second" rsc="first" score="INFINITY" with-rsc="second"/>

@fghaas
Copy link
Contributor Author

fghaas commented Jul 20, 2015

@mkrakowitzer, the comment in the type says that if you do

  cs_colocation { 'first_with_second':
    primitives => [ 'first', 'second' ],
    score      => 'INFINITY',
  }

then first is meant to be the independent resource that gets placed first, and second is the dependent resource that gets placed depending on first, and only if first can be successfully placed. This is the exact reverse order compared to the native Pacemaker logic, but if that's what's desired, so be it.

If we are in agreement that what I described above is the desired behavior, then

<rsc_colocation id="first_with_second" rsc="second" score="INFINITY" with-rsc="first"/>

is exactly what will achieve that. And if you flip rsc and with-rsc, you'll get it backwards.

Please apply whatever logic you see fit here, and then let me know whether I need to fix up my manifests. I don't know if the comment in the type is wrong, or the crm implementation of cs_colocation is broken too.

@mkrakowitzer
Copy link
Contributor

Yep if we go by the type documentation, the crm provider it broken too. Since the crm provider is using native Pacemaker logic already, does it not make sense to make pcs use the same?

Even if the type documentation is wrong, that can be fixed.

To be honest I would say you are the expert re corosync/pacemaker, so we could you use your guidance as to what makes the most sense.

@fghaas
Copy link
Contributor Author

fghaas commented Jul 21, 2015

Well. Pacemaker itself is not really consistent because it flips the order of a colocation constraint's primitives depending on whether the constraint is binary, or contains a resource set. So the comment in the type does have some merit.

"Fixing" the crm provider in place is problematic though. I don't know how widely people even use this feature, but for anyone who does, flipping the order of primitives will break their cluster — meaning people update the module, run puppet agent, and then their cluster just stops with no sign of error, because the constraint has just silently been updated to something that can't be resolved.

So as far as the situation right now is concerned, it's a mess. Because even if we "fix" the type "documentation" (note scare quotes because it really isn't documentation; it's just a comment and the README is not very clear on the matter) and set up both providers so they do the same thing, they still won't be in line with Pacemaker because they, unlike Pacemaker, would be using the same ordering for binary constraints and for resource sets.

@mkrakowitzer
Copy link
Contributor

I don't even think this can be addressed by semver, as that would mean bumping to 1.0.0 and this module is nowhere near ready for a 1.0.0 release :/

Perhaps it is worth maintaining a separate stable branch based on current master branch(I think @ffrank mentioned this before), Then master can be used to bundle all these breaking changes (I think there are going to be plenty to get this module into somewhat reasonable condition for a 1.0.0 release)

@ffrank how is that 0.7.1 release coming along?

@ffrank
Copy link
Contributor

ffrank commented Jul 23, 2015

Not a lot, sadly. I had to switch rather drastically to book-writing mode, sorry. Hope there will be some time a few weeks down the road.

If 1.0.0 is special insofar that we consider the API as stable, does semver allow to make braking changes between different 0.x releases?

Generally: Thanks for handling this so steadfastly @mkrakowitzer. Completely agree that we cannot reasonably ask for tests where there are none in the first place. Kudos for making some! Also agree that unit tests go a long way, even without good acceptance test coverage.

I have not grokked the full thread above, but from what I understand, you settled for "same manifest yields same behavior regardless of provider". This is the correct solution. Thanks to @fghaas for the code and all the helpful feedback, most obliged!

@fghaas
Copy link
Contributor Author

fghaas commented Jul 23, 2015

@ffrank, "same manifest yields same behavior regardless of provider" is obviously required, but we haven't settled on what the desired behavior should be.

@DavidS
Copy link

DavidS commented Sep 10, 2015

Hi @fghaas , long time no see :)

Any ideas on how to proceed here?

cc @ffrank , @mkrakowitzer , @igalic

@igalic
Copy link
Contributor

igalic commented Oct 14, 2015

@roidelapluie have you merged this into your develop branch?

@ffrank
Copy link
Contributor

ffrank commented Oct 15, 2015

Patch looks good to me.

Am I right to assume that PCCI acceptance testing doesn't quite work yet (i.e. expected to fail)?

@igalic
Copy link
Contributor

igalic commented Oct 19, 2015

it's failing now, @ffrank!

@ffrank
Copy link
Contributor

ffrank commented Oct 19, 2015

I guess what I'm asking is: Does master pass pcci right now?

@nibalizer
Copy link
Member

@ffrank nope. http://ci.puppet.community/modules/puppet-community/puppet-corosync
but the fix is easy, just change the top of spec_helper_acceptance.rb to look like this:
https://github.com/puppetlabs/puppetlabs-stdlib/blob/master/spec/spec_helper_acceptance.rb#L7

@roidelapluie
Copy link
Member

See #169

@igalic igalic closed this Oct 23, 2015
@igalic igalic reopened this Oct 23, 2015
@igalic
Copy link
Contributor

igalic commented Oct 23, 2015

i've closed / reopened this pr to trigger a rerun of the tests, apparently, that didn't really happen, or they are still broken, despite #169. /cc @roidelapluie

@roidelapluie
Copy link
Member

@igalic thit PR should be rebased first I think

@roidelapluie
Copy link
Member

@igalic note that you can request rerun there: http://ci.puppet.community/rechecks

@roidelapluie
Copy link
Member

@DavidS
Copy link

DavidS commented Oct 29, 2015

@fghaas from the comments it seems that this may pass acceptance tests after rebasing on top of current master, as #169 is now merged and had at least some of the systems passing.

cs_colocation, when used with the pcs provider, makes the incorrect
assumption that the order of primitives in colocation constraints is
arbitrary. It is not, and the following must not be equivalent:

cs_colocation { 'vip_with_service':
  primitives => [ 'nginx_service', 'nginx_vip' ],
}

cs_colocation { 'vip_with_service':
  primitives => [ 'nginx_vip', 'nginx_service' ],
}

These mean different things. The first example ensures that nginx_vip
runs on whatever node nginx_service runs on, and if nginx_service
can't be started on any node, Pacemaker won't even attempt to start
nginx_vip. However, if nginx_vip cannot be started on any node,
nginx_service will continue to be available.

The second example means the opposite. In it, nginx_service is
configured to run wherever nginx_vip runs, and if nginx_vip can't run
anywhere, nginx_service won't start. But if nginx_service cannot be
started, nginx_vip is still available.

Note that the provider's expected order is directly reversed compared
to the one that standalone Pacemaker would use (no matter whether pcs
or crm is used). This is intentional as per commit 8e218d9.

At any rate, for the pcs provider the order of resources is wrong, and
injecting the sort to mask that issue is nonsense. As a result, drop
artificial sorting for the primitives, fix the order, and use them
exactly as specified.

Fixes issue voxpupuli#150.
@fghaas
Copy link
Contributor Author

fghaas commented Nov 2, 2015

@DavidS Rebased, thanks.

@bogdando
Copy link
Contributor

bogdando commented Nov 2, 2015

The failed build details lead to the 404 error.
I lost a track on what had you decided to preserve the backwards compatibility? A guy updated the module and rolled changes onto the live cluster, would he be happy with this?

igalic added a commit that referenced this pull request Nov 7, 2015
Stop ignoring order of primitives in colocation constraints
@igalic igalic merged commit 5db6da7 into voxpupuli:master Nov 7, 2015
@igalic
Copy link
Contributor

igalic commented Nov 7, 2015

thanks all. i'm merging this, because it's the right way to go about it.
as for backwards compatibility: i'm really not sure it's so good to be backwards compatible with a broken module ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants