New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register service in consul #802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you forgot to take care about deregistering services. What about specifying the check with DeregisterCriticalServiceAfter
?
And in case of demote I think it should explicitly call service.deregister
patroni/dcs/consul.py
Outdated
'tags': [role] | ||
} | ||
|
||
if role in ['master', 'replica']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have couple of remarks here:
- [WIP] Standby cluster implementation #679 introduces a new role,
standby_leader
, should we register is as a usual replica or create a special tag for that? - Does it make sense to avoid registering a replica which is not in the state "running"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think the best way is to create a special tag for this and make a configuration option to tag this nodes as replica additionally.
- Yes, it make sense. Especially with
DeregisterCriticalServiceAfter
it can be deregister by consul and we won't register it again when state change to "running".
patroni/dcs/consul.py
Outdated
@catch_consul_errors | ||
def register_service(self, service_name, **kwargs): | ||
logger.info('Register service {}, params {}'.format(service_name, kwargs)) | ||
return self.retry(self._client.agent.service.register, service_name, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should be really careful with introducing new retries and timeouts, it is very important to make sure that one cycle of HA on the primary doesn't exceed ttl
. So far Patroni does only two calls with such long retries: when it loads the cluster state (_load_cluster`) and when it refreshes the leader key (_update_leader).
In some cases it might become the 3rd place, but maybe it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we would fail registration silently then we will never retry to register again before one of ['role', 'api_url', 'conn_url']
is changed. Otherwise is it ok to fail entire touch_member
if registration is failed?
patroni/dcs/consul.py
Outdated
self._my_member_data = data | ||
return True | ||
except Exception: | ||
logger.exception('touch_member') | ||
return False | ||
|
||
@catch_consul_errors | ||
def register_service(self, service_name, **kwargs): | ||
logger.info('Register service {}, params {}'.format(service_name, kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging.info
supports nice printf-like string formatting, please don't mix it with string.format
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it in af74036
Good point, but what will be reasonable time for such deregistration?
Is |
I think it is better to stick with |
patroni/dcs/consul.py
Outdated
api_parts = api_parts._replace(path='/{}'.format(role)) | ||
conn_parts = urlparse(data['conn_url']) | ||
params = { | ||
'address': conn_parts.hostname, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add 'service_id': service_name_from_scope_name(self._name),
to params, otherwise it is using service_name
as a service_id, what makes impossible to run master and replica with the same agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've added deregistration and service_id is now based on scope_name and name
register service only if state is running force service registration if `create_member` is True
patroni/dcs/consul.py
Outdated
@@ -346,8 +346,7 @@ def touch_member(self, data, ttl=None, permanent=False): | |||
args = {} if permanent else {'acquire': self._session} | |||
self._client.kv.put(self.member_path, json.dumps(data, separators=(',', ':')), **args) | |||
if self._register_service: | |||
self.update_service(self._my_member_data, data, force=create_member) | |||
self._my_member_data = data | |||
self.update_service(not create_member and member and member.data or {}, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to resolve the conflict after the merge of #805, because it made _my_member_data
not available.
The force
argument in the update_service
is still available. I think it should be set to True
during the next call if the last attempt to call register_service
has failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_service
now save last result and set force
to True
if last result was False
And because of this I've removed retry
from register_service
patroni/dcs/consul.py
Outdated
conn_parts = urlparse(data['conn_url']) | ||
check = base.Check.http(api_parts.geturl(), self._service_check_interval, deregister=self._client.http.ttl * 10) | ||
params = { | ||
'service_id': '{}_{}'.format(self._scope, self._name), | ||
'service_id': '{0}/{1}'.format(self._scope, self._name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've deliberately changed the delimiter from '_'
to '/'
, because usage of underscore may lead to collisions in service_id's across different clusters.
keep last result and set force=True if last result was False removed retry in register_service
Merged, thank you! |
Mention Consul related parameters introduced by #802
* fix some warnings when running unit-tests * allow python-kubernetes up to 10.0.1 * python-consul>=0.7.1 is required due to #802
* fix some warnings when running unit-tests * allow python-kubernetes up to 10.0.1 * python-consul>=0.7.1 is required due to #802
Implementation of #771, but a bit different, register service 'scope_name' with tag 'master' or 'replica'
example with scope 'pgsql-pgpi'