Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nss-lookup target to CentOS 7 unit #601

Closed
wants to merge 1 commit into from

Conversation

epallerols
Copy link

We found a bug while rebooting our ElasticSearch data nodes in a CentOS 7 machine. The service starts but it keeps logging the following error:

[2016-03-16 09:58:46,764][WARN ][transport.netty          ] [vmd32dgz1] exception caught on transport layer [[id: 0x09aec994]], closing connection
java.nio.channels.UnresolvedAddressException
  at sun.nio.ch.Net.checkAddress(Net.java:101)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
  at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
  at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
  at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
  at org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:634)
  at org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:216)
  at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229)
  at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
  at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:787)
  at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:754)
  at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:726)
  at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:220)
  at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:373)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

Restarting the service solves the issue. After checking some ElasticSearch tickets I found out that the problem seems to be related to the host/network name resolution.

There are some tickets related to this:

I am no expert but adding the nss-lookup.target on the elasticsearch systemd unit fixes the issue for us, since we are telling the service to wait for the network to be ready.

@tylerjl
Copy link
Contributor

tylerjl commented Mar 16, 2016

Hey @epallerols, thanks for the PR.

I looked into this and dug this up from the man page for systemd.special:

... All services for which the
availability of full host/network name resolution is essential should be
ordered after this target, but not pull it in. ...

In the majority of cases, Elasticsearch will probably want hostname resolution functional, but standalone instances of Elasticsearch don't technically need it for clusters comprised of a single node that aren't trying to resolve other hosts.

One option that may work well here is that if you encounter this problem, drop an override file to amend the unit with your required unit:

$ cat /etc/systemd/system/elasticsearch-es-01.service.d/nss.conf
[Unit]
After=nss-lookup.target

This works pretty nicely since it won't interfere with the files puppet writes and lets you customize the unit as needed with just a file. Does this solution seem reasonable?

@epallerols
Copy link
Author

Hi @tylerjl,

Thank you very much for your answer. Your approach seems to be the right thing to do.
We have tried and it works like charm, although we had to change the file to look like this:

$ cat /etc/systemd/system/elasticsearch-es-01.service.d/network.conf
[Unit]
After=network-online.target

@epallerols epallerols closed this Mar 16, 2016
@cdenneen
Copy link
Contributor

Tyler only issue here is that file is created by provider after module
tells it to add it. So on fresh install or addition of another instance in
puppet code that config would need to be manually added again. Probably
should be maintained in puppet code with inifile.
On Wed, Mar 16, 2016 at 12:38 PM Enric Pallerols notifications@github.com
wrote:

Closed #601 #601.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#601 (comment)

@tylerjl
Copy link
Contributor

tylerjl commented Mar 16, 2016

We could probably manage the systemd overrides, yes. However, it wouldn't be a priority for a while and if someone is using puppet for it anyway, my suggestion would be to just use some file { } resources to manage the overrides alongside your elasticsearch::instance.

Even something super simple like

$instance = 'foo'
file { "/etc/systemd/system/${instance}.service.d/nss.conf" : ... } ->
  elasticsearch::instance { $instance : ... }

(may not be perfect, it's pseudopuppet.)

If you do think it's a priority though, feel free to make a ticket and we'll tackle it down the road at some point.

@epallerols
Copy link
Author

Hi @tylerjl and @cdenneen,

We have modified our puppet code to include the file as @tylerjl suggests, something similar to this:

file { "/usr/lib/systemd/system/elasticsearch-${es_config['cluster.name']}.service.d/network.conf":
  ensure  => file,
  content => template('templates_folder/systemd_network.conf.erb'),
  require => Package['elasticsearch'],
  notify  => Exec["systemd_reload_${es_config['cluster.name']}"],
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants