EC2 Discovery plugin

The EC2 discovery plugin provides a list of seed addresses to the {ref}/modules-discovery-hosts-providers.html[discovery process] by querying the AWS API for a list of EC2 instances matching certain criteria determined by the plugin settings.

If you are looking for a hosted solution of {es} on AWS, please visit https://www.elastic.co/cloud.

install_remove.asciidoc

Using the EC2 discovery plugin

The discovery-ec2 plugin allows {es} to find the master-eligible nodes in a cluster running on AWS EC2 by querying the AWS API for the addresses of the EC2 instances running these nodes.

It is normally a good idea to restrict the discovery process just to the master-eligible nodes in the cluster. This plugin allows you to identify these nodes by certain criteria including their tags, their membership of security groups, and their placement within availability zones. The discovery process will work correctly even if it finds master-ineligible nodes, but master elections will be more efficient if this can be avoided.

The interaction with the AWS API can be authenticated using the instance role, or else custom credentials can be supplied.

Enabling EC2 discovery

To enable EC2 discovery, configure {es} to use the ec2 seed hosts provider:

discovery.seed_providers: ec2

Configuring EC2 discovery

EC2 discovery supports a number of settings. Some settings are sensitive and must be stored in the {ref}/secure-settings.html[{es} keystore]. For example, to authenticate using a particular access key and secret key, add these keys to the keystore by running the following commands:

bin/elasticsearch-keystore add discovery.ec2.access_key
bin/elasticsearch-keystore add discovery.ec2.secret_key

The available settings for the EC2 discovery plugin are as follows.

discovery.ec2.access_key ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable]): An EC2 access key. If set, you must also set discovery.ec2.secret_key. If unset, discovery-ec2 will instead use the instance role. This setting is sensitive and must be stored in the {es} keystore.
discovery.ec2.secret_key ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable]): An EC2 secret key. If set, you must also set discovery.ec2.access_key. This setting is sensitive and must be stored in the {es} keystore.
discovery.ec2.session_token ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable]): An EC2 session token. If set, you must also set discovery.ec2.access_key and discovery.ec2.secret_key. This setting is sensitive and must be stored in the {es} keystore.
discovery.ec2.endpoint: The EC2 service endpoint to which to connect. See https://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region to find the appropriate endpoint for the region. This setting defaults to ec2.us-east-1.amazonaws.com which is appropriate for clusters running in the us-east-1 region.
discovery.ec2.protocol: The protocol to use to connect to the EC2 service endpoint, which may be either http or https. Defaults to https.
discovery.ec2.proxy.host: The address or host name of an HTTP proxy through which to connect to EC2. If not set, no proxy is used.
discovery.ec2.proxy.port: When the address of an HTTP proxy is given in discovery.ec2.proxy.host, this setting determines the port to use to connect to the proxy. Defaults to 80.
discovery.ec2.proxy.username ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable]): When the address of an HTTP proxy is given in discovery.ec2.proxy.host, this setting determines the username to use to connect to the proxy. When not set, no username is used. This setting is sensitive and must be stored in the {es} keystore.
discovery.ec2.proxy.password ({ref}/secure-settings.html[Secure], {ref}/secure-settings.html#reloadable-secure-settings[reloadable]): When the address of an HTTP proxy is given in discovery.ec2.proxy.host, this setting determines the password to use to connect to the proxy. When not set, no password is used. This setting is sensitive and must be stored in the {es} keystore.
discovery.ec2.read_timeout: The socket timeout for connections to EC2, {time-units}[including the units]. For example, a value of 60s specifies a 60-second timeout. Defaults to 50 seconds.
discovery.ec2.groups: A list of the names or IDs of the security groups to use for discovery. The discovery.ec2.any_group setting determines the behaviour of this setting. Defaults to an empty list, meaning that security group membership is ignored by EC2 discovery.
discovery.ec2.any_group: Defaults to true, meaning that instances belonging to any of the security groups specified in discovery.ec2.groups will be used for discovery. If set to false, only instances that belong to all of the security groups specified in discovery.ec2.groups will be used for discovery.
discovery.ec2.host_type: Each EC2 instance has a number of different addresses that might be suitable for discovery. This setting allows you to select which of these addresses is used by the discovery process. It can be set to one of private_ip, public_ip, private_dns, public_dns or tag:TAGNAME where TAGNAME refers to a name of a tag. This setting defaults to private_ip.

If you set discovery.ec2.host_type to a value of the form tag:TAGNAME then the value of the tag TAGNAME attached to each instance will be used as that instance’s address for discovery. Instances which do not have this tag set will be ignored by the discovery process.

For example if you tag some EC2 instances with a tag named elasticsearch-host-name and set host_type: tag:elasticsearch-host-name then the discovery-ec2 plugin will read each instance’s host name from the value of the elasticsearch-host-name tag. Read more about EC2 Tags.
discovery.ec2.availability_zones: A list of the names of the availability zones to use for discovery. The name of an availability zone is the region code followed by a letter, such as us-east-1a. Only instances placed in one of the given availability zones will be used for discovery.

discovery.ec2.tag.TAGNAME

A list of the values of a tag called TAGNAME to use for discovery. If set, only instances that are tagged with one of the given values will be used for discovery. For instance, the following settings will only use nodes with a role tag set to master and an environment tag set to either dev or staging.

discovery.ec2.tag.role: master
discovery.ec2.tag.environment: dev,staging

Note	The names of tags used for discovery may only contain ASCII letters, numbers, hyphens and underscores. In particular you cannot use tags whose name includes a colon.

discovery.ec2.node_cache_time

Sets the length of time for which the collection of discovered instances is cached. {es} waits at least this long between requests for discovery information from the EC2 API. AWS may reject discovery requests if they are made too often, and this would cause discovery to fail. Defaults to 10s.

All secure settings of this plugin are {ref}/secure-settings.html#reloadable-secure-settings[reloadable], allowing you to update the secure settings for this plugin without needing to restart each node.

Recommended EC2 permissions

The discovery-ec2 plugin works by making a DescribeInstances call to the AWS EC2 API. You must configure your AWS account to allow this, which is normally done using an IAM policy. You can create a custom policy via the IAM Management Console. It should look similar to this.

{
  "Statement": [
    {
      "Action": [
        "ec2:DescribeInstances"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

Automatic node attributes

The discovery-ec2 plugin can automatically set the aws_availability_zone node attribute to the availability zone of each node. This node attribute allows you to ensure that each shard has copies allocated redundantly across multiple availability zones by using the {ref}/modules-cluster.html#shard-allocation-awareness[Allocation Awareness] feature.

In order to enable the automatic definition of the aws_availability_zone attribute, set cloud.node.auto_attributes to true. For example:

cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone

The aws_availability_zone attribute can be automatically set like this when using any discovery type. It is not necessary to set discovery.seed_providers: ec2. However this feature does require that the discovery-ec2 plugin is installed.

Binding to the correct address

It is important to define network.host correctly when deploying a cluster on EC2. By default each {es} node only binds to localhost, which will prevent it from being discovered by nodes running on any other instances.

You can use the {ref}/modules-network.html[core network host settings] to bind each node to the desired address, or you can set network.host to one of the following EC2-specific settings provided by the discovery-ec2 plugin:

EC2 Host Value	Description
`ec2:privateIpv4`	The private IP address (ipv4) of the machine.
`ec2:privateDns`	The private host of the machine.
`ec2:publicIpv4`	The public IP address (ipv4) of the machine.
`ec2:publicDns`	The public host of the machine.
`ec2:privateIp`	Equivalent to `ec2:privateIpv4`.
`ec2:publicIp`	Equivalent to `ec2:publicIpv4`.
`ec2`	Equivalent to `ec2:privateIpv4`.

These values are acceptable when using any discovery type. They do not require you to set discovery.seed_providers: ec2. However they do require that the discovery-ec2 plugin is installed.

Best Practices in AWS

This section contains some other information about designing and managing an {es} cluster on your own AWS infrastructure. If you would prefer to avoid these operational details then you may be interested in a hosted {es} installation available on AWS-based infrastructure from https://www.elastic.co/cloud.

Storage

EC2 instances offer a number of different kinds of storage. Please be aware of the following when selecting the storage for your cluster:

Instance Store is recommended for {es} clusters as it offers excellent performance and is cheaper than EBS-based storage. {es} is designed to work well with this kind of ephemeral storage because it replicates each shard across multiple nodes. If a node fails and its Instance Store is lost then {es} will rebuild any lost shards from other copies.
EBS-based storage may be acceptable for smaller clusters (1-2 nodes). Be sure to use provisioned IOPS to ensure your cluster has satisfactory performance.
EFS-based storage is not recommended or supported as it does not offer satisfactory performance. Historically, shared network filesystems such as EFS have not always offered precisely the behaviour that {es} requires of its filesystem, and this has been known to lead to index corruption. Although EFS offers durability, shared storage, and the ability to grow and shrink filesystems dynamically, you can achieve the same benefits using {es} directly.

Choice of AMI

Prefer the Amazon Linux 2 AMIs as these allow you to benefit from the lightweight nature, support, and EC2-specific performance enhancements that these images offer.

Networking

Smaller instance types have limited network performance, in terms of both bandwidth and number of connections. If networking is a bottleneck, avoid instance types with networking labelled as Moderate or Low.
It is a good idea to distribute your nodes across multiple availability zones and use {ref}/modules-cluster.html#shard-allocation-awareness[shard allocation awareness] to ensure that each shard has copies in more than one availability zone.
Do not span a cluster across regions. {es} expects that node-to-node connections within a cluster are reasonably reliable and offer high bandwidth and low latency, and these properties do not hold for connections between regions. Although an {es} cluster will behave correctly when node-to-node connections are unreliable or slow, it is not optimised for this case and its performance may suffer. If you wish to geographically distribute your data, you should provision multiple clusters and use features such as {ref}/modules-cross-cluster-search.html[cross-cluster search] and {ref}/xpack-ccr.html[cross-cluster replication].

Other recommendations

If you have split your nodes into roles, consider tagging the EC2 instances by role to make it easier to filter and view your EC2 instances in the AWS console.
Consider enabling termination protection for all of your data and master-eligible nodes. This will help to prevent accidental termination of these nodes which could temporarily reduce the resilience of the cluster and which could cause a potentially disruptive reallocation of shards.
If running your cluster using one or more auto-scaling groups, consider protecting your data and master-eligible nodes against termination during scale-in. This will help to prevent automatic termination of these nodes which could temporarily reduce the resilience of the cluster and which could cause a potentially disruptive reallocation of shards. If these instances are protected against termination during scale-in then you can use {ref}/shard-allocation-filtering.html[shard allocation filtering] to gracefully migrate any data off these nodes before terminating them manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!