-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop ignoring order of primitives in colocation constraints #153
Conversation
LGTM, Would you consider adding tests? @ffrank @igalic @nibalizer whats the policy around tests for the corosync module? There are a lot of gaps in the current testing, I kind of feel its unreasonable to expect someone to write a full test for a provider which currently has none when making minor code fixes. My feeling is:
|
@mkrakowitzer Your LGTM confuses me; what exactly looks good to you? My patch fixes the provider so it no longer makes incorrect assumptions about the reversibility of colocation constraints, but there is still the problem that it deals with things backwards (compared to the crm provider) or correctly (compared to non-puppetized Pacemaker). |
Your patch that fixes the provider so it no longer makes incorrect assumptions about the reversibility of colocation constraints looks good to me. Sorry I did not see the second comment, I am not 100% which is more correct. My feeling is that it should work in the sameway as the crm provider. At least then from a puppet perspetive the ordering will be applied the same regardless of the wether you are using crm or pcs provider. Even if this does not match the way you would do it from the command line:
|
OK, so your suggestion is to change :primitives => [rsc, with_rsc].sort, to :primitives => [with_rsc, rsc], Do I understand that correctly? |
Yes, it is either this, or to modify the README documentation for colocation and modify crm resource to behave in the opposite way. The primitives should be colocated in the same way regardless of whether crm or pcs providers are being used. What are your thoughts on this? |
this provider is notoriously hard to (acceptance) test (correctly. i.e.: on more than one machine) |
Thanks a lot for the tests @mkrakowitzer. As it turns out I had misinterpreted Puppet's intended behavior (mimicking the opposite of Pacemaker's default), so while my original patch is fine, its commit message is backwards and misleading. How do we best fix this? Do I pull your commits into my branch, do an interactive rebase rewording my commit, and then force-push to my branch again? Or should I just update my branch, you re-pull from me, and you resubmit? |
I would say update your branches commit message and resubmit. Although I think you need to incorporate my changes on line 111 and 124. This means that
would result in the below command being executed:
Which would be equivilent to the current implementation of the crmsh provider:
|
I'll submit the tests as a seperate PR. |
c8c1cf5
to
275e8ba
Compare
@mkrakowitzer I've updated my branch; please take a look at 275e8ba and see if you're happy with that. |
As is pcs will run:
Which I do not think what we want. The previous commit was OK, ie the very first commit. Sorry for all the confusion. If you can just resubmit it (You mentioned updating your commit message), then we can merge it and ill submit an updated test. |
@mkrakowitzer Of the code snippet you mentioned, can you please paste the exact |
cs_primitive { 'first':
primitive_class => 'ocf',
primitive_type => 'Dummy',
provided_by => 'heartbeat',
operations => { 'monitor' => { 'interval' => '10s' } },
}
cs_primitive { 'second':
primitive_class => 'ocf',
primitive_type => 'Dummy',
provided_by => 'heartbeat',
operations => { 'monitor' => { 'interval' => '10s' } },
}
cs_colocation { 'first_with_second':
primitives => [ 'first', 'second' ],
score => 'INFINITY',
} As per most recent commit:
As per below diff, original commit + See previous comments re line 111 and 124. This results in the same xml config being generated as the crm provider. diff --git a/lib/puppet/provider/cs_colocation/pcs.rb b/lib/puppet/provider/cs_colocation/pcs.rb
index dad7f82..1c9b7c7 100644
--- a/lib/puppet/provider/cs_colocation/pcs.rb
+++ b/lib/puppet/provider/cs_colocation/pcs.rb
@@ -39,7 +39,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac
colocation_instance = {
:name => items['id'],
:ensure => :present,
- :primitives => [with_rsc, rsc],
+ :primitives => [rsc, with_rsc],
:score => items['score'],
:provider => self.name,
:new => false
@@ -108,7 +108,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac
cmd = [ command(:pcs), 'constraint', 'colocation' ]
cmd << "add"
- rsc = @property_hash[:primitives].pop
+ rsc = @property_hash[:primitives].shift
if rsc.include? ':'
items = rsc.split[':']
if items[1] == 'Master'
@@ -121,7 +121,7 @@ Puppet::Type.type(:cs_colocation).provide(:pcs, :parent => Puppet::Provider::Pac
cmd << rsc
end
cmd << 'with'
- rsc = @property_hash[:primitives].pop
+ rsc = @property_hash[:primitives].shift
if rsc.include? ':'
items = rsc.split(':')
if items[1] == 'Master'
|
@mkrakowitzer, the comment in the type says that if you do cs_colocation { 'first_with_second':
primitives => [ 'first', 'second' ],
score => 'INFINITY',
} then If we are in agreement that what I described above is the desired behavior, then <rsc_colocation id="first_with_second" rsc="second" score="INFINITY" with-rsc="first"/> is exactly what will achieve that. And if you flip Please apply whatever logic you see fit here, and then let me know whether I need to fix up my manifests. I don't know if the comment in the type is wrong, or the |
Yep if we go by the type documentation, the crm provider it broken too. Since the crm provider is using native Pacemaker logic already, does it not make sense to make pcs use the same? Even if the type documentation is wrong, that can be fixed. To be honest I would say you are the expert re corosync/pacemaker, so we could you use your guidance as to what makes the most sense. |
Well. Pacemaker itself is not really consistent because it flips the order of a colocation constraint's primitives depending on whether the constraint is binary, or contains a resource set. So the comment in the type does have some merit. "Fixing" the crm provider in place is problematic though. I don't know how widely people even use this feature, but for anyone who does, flipping the order of primitives will break their cluster — meaning people update the module, run puppet agent, and then their cluster just stops with no sign of error, because the constraint has just silently been updated to something that can't be resolved. So as far as the situation right now is concerned, it's a mess. Because even if we "fix" the type "documentation" (note scare quotes because it really isn't documentation; it's just a comment and the README is not very clear on the matter) and set up both providers so they do the same thing, they still won't be in line with Pacemaker because they, unlike Pacemaker, would be using the same ordering for binary constraints and for resource sets. |
I don't even think this can be addressed by semver, as that would mean bumping to 1.0.0 and this module is nowhere near ready for a 1.0.0 release :/ Perhaps it is worth maintaining a separate stable branch based on current master branch(I think @ffrank mentioned this before), Then master can be used to bundle all these breaking changes (I think there are going to be plenty to get this module into somewhat reasonable condition for a 1.0.0 release) @ffrank how is that 0.7.1 release coming along? |
Not a lot, sadly. I had to switch rather drastically to book-writing mode, sorry. Hope there will be some time a few weeks down the road. If 1.0.0 is special insofar that we consider the API as stable, does semver allow to make braking changes between different 0.x releases? Generally: Thanks for handling this so steadfastly @mkrakowitzer. Completely agree that we cannot reasonably ask for tests where there are none in the first place. Kudos for making some! Also agree that unit tests go a long way, even without good acceptance test coverage. I have not grokked the full thread above, but from what I understand, you settled for "same manifest yields same behavior regardless of provider". This is the correct solution. Thanks to @fghaas for the code and all the helpful feedback, most obliged! |
@ffrank, "same manifest yields same behavior regardless of provider" is obviously required, but we haven't settled on what the desired behavior should be. |
Hi @fghaas , long time no see :) Any ideas on how to proceed here? cc @ffrank , @mkrakowitzer , @igalic |
@roidelapluie have you merged this into your develop branch? |
Patch looks good to me. Am I right to assume that PCCI acceptance testing doesn't quite work yet (i.e. expected to fail)? |
it's failing now, @ffrank! |
I guess what I'm asking is: Does master pass |
@ffrank nope. http://ci.puppet.community/modules/puppet-community/puppet-corosync |
See #169 |
i've closed / reopened this pr to trigger a rerun of the tests, apparently, that didn't really happen, or they are still broken, despite #169. /cc @roidelapluie |
@igalic thit PR should be rebased first I think |
@igalic note that you can request rerun there: http://ci.puppet.community/rechecks |
cs_colocation, when used with the pcs provider, makes the incorrect assumption that the order of primitives in colocation constraints is arbitrary. It is not, and the following must not be equivalent: cs_colocation { 'vip_with_service': primitives => [ 'nginx_service', 'nginx_vip' ], } cs_colocation { 'vip_with_service': primitives => [ 'nginx_vip', 'nginx_service' ], } These mean different things. The first example ensures that nginx_vip runs on whatever node nginx_service runs on, and if nginx_service can't be started on any node, Pacemaker won't even attempt to start nginx_vip. However, if nginx_vip cannot be started on any node, nginx_service will continue to be available. The second example means the opposite. In it, nginx_service is configured to run wherever nginx_vip runs, and if nginx_vip can't run anywhere, nginx_service won't start. But if nginx_service cannot be started, nginx_vip is still available. Note that the provider's expected order is directly reversed compared to the one that standalone Pacemaker would use (no matter whether pcs or crm is used). This is intentional as per commit 8e218d9. At any rate, for the pcs provider the order of resources is wrong, and injecting the sort to mask that issue is nonsense. As a result, drop artificial sorting for the primitives, fix the order, and use them exactly as specified. Fixes issue voxpupuli#150.
275e8ba
to
8ad3f7d
Compare
@DavidS Rebased, thanks. |
The failed build details lead to the 404 error. |
Stop ignoring order of primitives in colocation constraints
thanks all. i'm merging this, because it's the right way to go about it. |
cs_colocation
, when used with thepcs
provider, makes the incorrectassumption that the order of primitives in colocation constraints is
arbitrary. It is not, and the following must not be equivalent:
These mean different things. The first example ensures that nginx_vip
runs on whatever node nginx_service runs on, and if nginx_service
can't be started on any node, Pacemaker won't even attempt to start
nginx_vip. However, if nginx_vip cannot be started on any node,
nginx_service will continue to be available.
The second example means the opposite. In it, nginx_service is
configured to run wherever nginx_vip runs, and if nginx_vip can't run
anywhere, nginx_service won't start. And if nginx_vip cannot be
started on any node, nginx_service will not be available.
As a result, drop artificial sorting for the primitives, and use them
exactly as specified.
Fixes issue #150.