Add nss-lookup target to CentOS 7 unit #601

epallerols · 2016-03-16T10:27:46Z

We found a bug while rebooting our ElasticSearch data nodes in a CentOS 7 machine. The service starts but it keeps logging the following error:

[2016-03-16 09:58:46,764][WARN ][transport.netty          ] [vmd32dgz1] exception caught on transport layer [[id: 0x09aec994]], closing connection
java.nio.channels.UnresolvedAddressException
  at sun.nio.ch.Net.checkAddress(Net.java:101)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
  at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
  at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
  at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
  at org.elasticsearch.common.netty.channel.Channels.connect(Channels.java:634)
  at org.elasticsearch.common.netty.channel.AbstractChannel.connect(AbstractChannel.java:216)
  at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:229)
  at org.elasticsearch.common.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:182)
  at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:787)
  at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:754)
  at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:726)
  at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:220)
  at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:373)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

Restarting the service solves the issue. After checking some ElasticSearch tickets I found out that the problem seems to be related to the host/network name resolution.

There are some tickets related to this:

I am no expert but adding the nss-lookup.target on the elasticsearch systemd unit fixes the issue for us, since we are telling the service to wait for the network to be ready.

tylerjl · 2016-03-16T14:03:18Z

Hey @epallerols, thanks for the PR.

I looked into this and dug this up from the man page for systemd.special:

... All services for which the
availability of full host/network name resolution is essential should be
ordered after this target, but not pull it in. ...

In the majority of cases, Elasticsearch will probably want hostname resolution functional, but standalone instances of Elasticsearch don't technically need it for clusters comprised of a single node that aren't trying to resolve other hosts.

One option that may work well here is that if you encounter this problem, drop an override file to amend the unit with your required unit:

$ cat /etc/systemd/system/elasticsearch-es-01.service.d/nss.conf
[Unit]
After=nss-lookup.target

This works pretty nicely since it won't interfere with the files puppet writes and lets you customize the unit as needed with just a file. Does this solution seem reasonable?

epallerols · 2016-03-16T16:37:54Z

Hi @tylerjl,

Thank you very much for your answer. Your approach seems to be the right thing to do.
We have tried and it works like charm, although we had to change the file to look like this:

$ cat /etc/systemd/system/elasticsearch-es-01.service.d/network.conf
[Unit]
After=network-online.target

cdenneen · 2016-03-16T17:32:57Z

Tyler only issue here is that file is created by provider after module
tells it to add it. So on fresh install or addition of another instance in
puppet code that config would need to be manually added again. Probably
should be maintained in puppet code with inifile.
On Wed, Mar 16, 2016 at 12:38 PM Enric Pallerols notifications@github.com
wrote:

Closed #601 #601.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#601 (comment)

tylerjl · 2016-03-16T19:44:34Z

We could probably manage the systemd overrides, yes. However, it wouldn't be a priority for a while and if someone is using puppet for it anyway, my suggestion would be to just use some file { } resources to manage the overrides alongside your elasticsearch::instance.

Even something super simple like

$instance = 'foo'
file { "/etc/systemd/system/${instance}.service.d/nss.conf" : ... } ->
  elasticsearch::instance { $instance : ... }

(may not be perfect, it's pseudopuppet.)

If you do think it's a priority though, feel free to make a ticket and we'll tackle it down the road at some point.

epallerols · 2016-03-17T10:56:03Z

Hi @tylerjl and @cdenneen,

We have modified our puppet code to include the file as @tylerjl suggests, something similar to this:

file { "/usr/lib/systemd/system/elasticsearch-${es_config['cluster.name']}.service.d/network.conf":
  ensure  => file,
  content => template('templates_folder/systemd_network.conf.erb'),
  require => Package['elasticsearch'],
  notify  => Exec["systemd_reload_${es_config['cluster.name']}"],
}

Add nss-lookup target to CentOS 7 unit

e637930

epallerols closed this Mar 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nss-lookup target to CentOS 7 unit #601

Add nss-lookup target to CentOS 7 unit #601

epallerols commented Mar 16, 2016

tylerjl commented Mar 16, 2016

epallerols commented Mar 16, 2016

cdenneen commented Mar 16, 2016

tylerjl commented Mar 16, 2016

epallerols commented Mar 17, 2016

Add nss-lookup target to CentOS 7 unit #601

Add nss-lookup target to CentOS 7 unit #601

Conversation

epallerols commented Mar 16, 2016

tylerjl commented Mar 16, 2016

epallerols commented Mar 16, 2016

cdenneen commented Mar 16, 2016

tylerjl commented Mar 16, 2016

epallerols commented Mar 17, 2016