Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register service in consul #802

Merged
merged 5 commits into from Sep 7, 2018
Merged

Register service in consul #802

merged 5 commits into from Sep 7, 2018

Conversation

Isonami
Copy link
Contributor

@Isonami Isonami commented Sep 3, 2018

Implementation of #771, but a bit different, register service 'scope_name' with tag 'master' or 'replica'

example with scope 'pgsql-pgpi'

Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.
pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul.
[root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul.
[root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.```

Copy link
Collaborator

@CyberDem0n CyberDem0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you forgot to take care about deregistering services. What about specifying the check with DeregisterCriticalServiceAfter?
And in case of demote I think it should explicitly call service.deregister

'tags': [role]
}

if role in ['master', 'replica']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have couple of remarks here:

  1. [WIP] Standby cluster implementation #679 introduces a new role, standby_leader, should we register is as a usual replica or create a special tag for that?
  2. Does it make sense to avoid registering a replica which is not in the state "running"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think the best way is to create a special tag for this and make a configuration option to tag this nodes as replica additionally.
  2. Yes, it make sense. Especially with DeregisterCriticalServiceAfter it can be deregister by consul and we won't register it again when state change to "running".

@catch_consul_errors
def register_service(self, service_name, **kwargs):
logger.info('Register service {}, params {}'.format(service_name, kwargs))
return self.retry(self._client.agent.service.register, service_name, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should be really careful with introducing new retries and timeouts, it is very important to make sure that one cycle of HA on the primary doesn't exceed ttl. So far Patroni does only two calls with such long retries: when it loads the cluster state (_load_cluster`) and when it refreshes the leader key (_update_leader).

In some cases it might become the 3rd place, but maybe it is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we would fail registration silently then we will never retry to register again before one of ['role', 'api_url', 'conn_url'] is changed. Otherwise is it ok to fail entire touch_member if registration is failed?

self._my_member_data = data
return True
except Exception:
logger.exception('touch_member')
return False

@catch_consul_errors
def register_service(self, service_name, **kwargs):
logger.info('Register service {}, params {}'.format(service_name, kwargs))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.info supports nice printf-like string formatting, please don't mix it with string.format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it in af74036

patroni/dcs/consul.py Outdated Show resolved Hide resolved
@Isonami
Copy link
Contributor Author

Isonami commented Sep 4, 2018

It looks like you forgot to take care about deregistering services. What about specifying the check with DeregisterCriticalServiceAfter?

Good point, but what will be reasonable time for such deregistration?

And in case of demote I think it should explicitly call service.deregister

Is delete_leader good place to put call for service.deregister? And i don't know code very deep, is there a possible race condition with register call from touch_member?

@CyberDem0n
Copy link
Collaborator

what will be reasonable time for such deregistration?

ttl * 5 should be good enough I think.

Is delete_leader good place to put call for service.deregister? And i don't know code very deep, is there a possible race condition with register call from touch_member?

I think it is better to stick with touch_member. All calls are serialized and you can also rely on start == 'stopped'

api_parts = api_parts._replace(path='/{}'.format(role))
conn_parts = urlparse(data['conn_url'])
params = {
'address': conn_parts.hostname,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add 'service_id': service_name_from_scope_name(self._name), to params, otherwise it is using service_name as a service_id, what makes impossible to run master and replica with the same agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added deregistration and service_id is now based on scope_name and name

Isonami and others added 2 commits September 5, 2018 15:17
register service only if state is running
force service registration if `create_member` is True
@@ -346,8 +346,7 @@ def touch_member(self, data, ttl=None, permanent=False):
args = {} if permanent else {'acquire': self._session}
self._client.kv.put(self.member_path, json.dumps(data, separators=(',', ':')), **args)
if self._register_service:
self.update_service(self._my_member_data, data, force=create_member)
self._my_member_data = data
self.update_service(not create_member and member and member.data or {}, data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to resolve the conflict after the merge of #805, because it made _my_member_data not available.
The force argument in the update_service is still available. I think it should be set to True during the next call if the last attempt to call register_service has failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_service now save last result and set force to True if last result was False
And because of this I've removed retry from register_service

conn_parts = urlparse(data['conn_url'])
check = base.Check.http(api_parts.geturl(), self._service_check_interval, deregister=self._client.http.ttl * 10)
params = {
'service_id': '{}_{}'.format(self._scope, self._name),
'service_id': '{0}/{1}'.format(self._scope, self._name),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've deliberately changed the delimiter from '_' to '/', because usage of underscore may lead to collisions in service_id's across different clusters.

keep last result and set force=True if last result was False
removed retry in register_service
@CyberDem0n CyberDem0n merged commit 2e9cb41 into zalando:master Sep 7, 2018
@CyberDem0n
Copy link
Collaborator

Merged, thank you!

CyberDem0n pushed a commit that referenced this pull request Nov 21, 2018
Mention Consul related parameters introduced by #802
CyberDem0n pushed a commit that referenced this pull request Oct 11, 2019
* fix some warnings when running unit-tests
* allow python-kubernetes up to 10.0.1
* python-consul>=0.7.1 is required due to #802
@CyberDem0n CyberDem0n mentioned this pull request Oct 11, 2019
CyberDem0n added a commit that referenced this pull request Oct 11, 2019
* fix some warnings when running unit-tests
* allow python-kubernetes up to 10.0.1
* python-consul>=0.7.1 is required due to #802
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants