Skip to content

Commit

Permalink
Set up separate druid public-eqiad cluster.
Browse files Browse the repository at this point in the history
NOTE: It is (mostly) safe to rename the 'druid-eqiad' Zookeeper cluster
to 'druid-analytics-eqiad', because it is only used in puppet.
There is nothing in zookeeper or druid configurations that
refer to the Zookeeper cluster name.
The JMXTrans graphite metrics for this zookeeper cluster will change, as will the
alerts. Let's be sure to silence all ZK alerts for druid100[123] when we merge this.

This also sets up a new Zookeeper cluster colocated on the
druid public-eqiad nodes called 'druid-public-eqiad'.

Bug: T176223
Change-Id: I8624fbc402105cc44818a2ee8d0db7dbab3526ee
  • Loading branch information
ottomata authored and elukey committed Oct 12, 2017
1 parent 3d81714 commit bb9ab0f
Show file tree
Hide file tree
Showing 13 changed files with 205 additions and 31 deletions.
14 changes: 11 additions & 3 deletions hieradata/common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -445,14 +445,22 @@ zookeeper_clusters:
conf2002.codfw.wmnet: '2002'
conf2003.codfw.wmnet: '2003'

# ZK cluster for Druid (in Analytics cluster),
# colocated with Druid.
druid-eqiad:
# ZK cluster for Druid analytics-eqiad cluster (non public),
# colocated on druid hosts.
druid-analytics-eqiad:
hosts:
druid1001.eqiad.wmnet: '1001'
druid1002.eqiad.wmnet: '1002'
druid1003.eqiad.wmnet: '1003'

# ZK cluster for Druid public-eqiad cluster, (for AQS, wikistats, etc.)
# colocated on druid hosts.
druid-public-eqiad:
hosts:
druid1004.eqiad.wmnet: '1004'
druid1005.eqiad.wmnet: '1005'
druid1006.eqiad.wmnet: '1006'

# Used to sync the setting between all Kafka clusters and clients.
kafka_message_max_bytes: 4194304

Expand Down
15 changes: 6 additions & 9 deletions hieradata/role/common/druid/analytics/worker.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@

admin::groups:
- druid-admins


# Druid nodes get their own Zookeeper cluster to isolate them
# from the production ones.
# Configure the zookeeper profile.
profile::zookeeper::cluster_name: druid-eqiad
profile::zookeeper::cluster_name: druid-analytics-eqiad
# Don't page if a zookeeper server in this cluster goes down.
profile::zookeeper::is_critical: false
# Max number of connections per IP for Zookeeper
Expand All @@ -18,7 +16,7 @@ profile::zookeeper::sync_limit: 8
# To avoid version conflics with Cloudera zookeeper package, this
# class manually specifies which debian package version should be installed.
profile::zookeeper::zookeeper_version: '3.4.5+dfsg-2+deb8u2'
profile::zookeeper::firewall::srange: '(($DRUID_HOSTS $ANALYTICS_NETWORKS))'
profile::zookeeper::firewall::srange: '$DRUID_ANALYTICS_HOSTS'

# Druid nodes also incldue CDH, so we need to specify a few defaults that
# let them find the Hadoop Cluster.
Expand All @@ -33,13 +31,13 @@ profile::hadoop::client::resourcemanager_hosts:
# The logical name of this druid cluster
profile::druid::common::druid_cluster_name: analytics-eqiad
# The logical name of the zookeeper cluster that druid should use
profile::druid::common::zookeeper_cluster_name: druid-eqiad
profile::druid::common::zookeeper_cluster_name: druid-analytics-eqiad

# Make druid build an extension composed of CDH jars.
profile::druid::common::use_cdh: true

# The default MySQL Druid metadata storage database name is just 'druid'.
# Since the analytics-eqiad Druid cluster was previously the only one,
# Since the analytics-eqiad Druid cluster was originally the only one,
# We set this to the default of 'druid', just to be explicit about it.
profile::druid::common::metadata_storage_database_name: 'druid'

Expand Down Expand Up @@ -123,12 +121,11 @@ profile::druid::middlemanager::env:


# --- Druid Overlord
# Overlord will accept indexing jobs from Hadoop nodes in the ANALYTICS_NETWORKS
profile::druid::overlord::ferm_srange: '$ANALYTICS_NETWORKS'
profile::druid::overlord::properties:
druid.indexer.runner.type: remote
druid.indexer.storage.type: metadata
profile::druid::overlord::env:
DRUID_HEAP_OPTS: "-Xmx4g -Xms4g"
DRUID_EXTRA_JVM_OPTS: "-XX:NewSize=256m -XX:MaxNewSize=256m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



130 changes: 130 additions & 0 deletions hieradata/role/common/druid/public/worker.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
admin::groups:
- druid-admins

# Druid nodes get their own Zookeeper cluster to isolate them
# from the production ones.
# Configure the zookeeper profile.
profile::zookeeper::cluster_name: druid-public-eqiad
# Don't page if a zookeeper server in this cluster goes down.
profile::zookeeper::is_critical: false
# Max number of connections per IP for Zookeeper
profile::zookeeper::max_client_connections: 1024
# Default tick_time is 2000ms, this should allow a max
# of 16 seconds of latency for Zookeeper client sessions.
# See comments in role::kafka::analytics::broker for more info.
profile::zookeeper::sync_limit: 8
# To avoid version conflics with Cloudera zookeeper package, this
# class manually specifies which debian package version should be installed.
profile::zookeeper::zookeeper_version: '3.4.5+dfsg-2+deb8u2'
profile::zookeeper::firewall::srange: '$DRUID_PUBLIC_HOSTS'

# Druid nodes also incldue CDH, so we need to specify a few defaults that
# let them find the Hadoop Cluster.
profile::hadoop::client::zookeeper_cluster_name: main-eqiad
profile::hadoop::client::resourcemanager_hosts:
- analytics1001.eqiad.wmnet
- analytics1002.eqiad.wmnet


# -- Druid common configuration

# The logical name of this druid cluster
profile::druid::common::druid_cluster_name: public-eqiad
# The logical name of the zookeeper cluster that druid should use
profile::druid::common::zookeeper_cluster_name: druid-public-eqiad

# Make druid build an extension composed of CDH jars.
profile::druid::common::use_cdh: true

# Use this as the metadata storage database name in MySQL.
profile::druid::common::metadata_storage_database_name: 'druid_public_eqiad'

profile::druid::daemons_autoreload: false
profile::druid::ferm_srange: '$DRUID_PUBLIC_HOSTS'
profile::druid::monitoring_enabled: true

profile::druid::common::properties:
druid.metadata.storage.type: mysql
druid.metadata.storage.connector.host: analytics1003.eqiad.wmnet
# druid.metadata.storage.connector.password is set in the private repo.
druid.storage.type: hdfs
druid.request.logging.type: file
druid.request.logging.dir: /var/log/druid
# We need to use a special deep storage directory in HDFS so
# we don't conflict with other (e.g. analytics-eqiad) druid
# cluster deep storage.
# NOTE: This directory is ensured to exist by usage of the
# druid::cdh::hadoop::deep_storage define included in the
# role::analytics_cluster::hadoop::master class.
druid.storage.storageDirectory: /user/druid/deep-storage-public-eqiad


# -- Druid worker service configurations

# --- Druid Broker

# Broker gets a special ferm_srange since it is the frontend query interface to Druid.
profile::druid::broker::ferm_srange: '$DOMAIN_NETWORKS'
profile::druid::broker::properties:
druid.processing.numThreads: 10
druid.processing.buffer.sizeBytes: 2147483647
# Set numMergeBuffers to use v2 groupBy engine
druid.processing.numMergeBuffers: 10
druid.server.http.numThreads: 20
druid.broker.http.numConnections: 20
druid.broker.http.readTimeout: PT5M
# Increase druid broker query cache size to 2G.
# TBD: Perhaps we should also try using memcached?
druid.cache.sizeInBytes: 2147483648
profile::druid::broker::env:
DRUID_HEAP_OPTS: "-Xmx25g -Xms25g"
DRUID_EXTRA_JVM_OPTS: "-XX:NewSize=6g -XX:MaxNewSize=6g -XX:MaxDirectMemorySize=64g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


# --- Druid Coordinator
profile::druid::coordinator::properties: {}
profile::druid::coordinator::env:
DRUID_HEAP_OPTS: "-Xmx10g -Xms10g"
DRUID_EXTRA_JVM_OPTS: "-XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


# --- Druid Historical
profile::druid::historical::properties:
druid.processing.numThreads: 10
druid.processing.buffer.sizeBytes: 1073741824
# Set numMergeBuffers to use v2 groupBy engine
druid.processing.numMergeBuffers: 10
druid.server.http.numThreads: 20
druid.server.maxSize: 2748779069440 # 2.5 TB
druid.segmentCache.locations: '[{"path":"/var/lib/druid/segment-cache","maxSize"\:2748779069440}]'
druid.historical.cache.useCache: true
druid.historical.cache.populateCache: true
profile::druid::historical::env:
DRUID_HEAP_OPTS: "-Xmx12g -Xms12g"
DRUID_EXTRA_JVM_OPTS: "-XX:NewSize=6g -XX:MaxNewSize=6g -XX:MaxDirectMemorySize=32g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


# --- Druid MiddleManager
profile::druid::middlemanager::properties:
druid.worker.ip: "%{::fqdn}"
druid.worker.capacity: 12
druid.processing.numThreads: 3
druid.processing.buffer.sizeBytes: 536870912
druid.server.http.numThreads: 20
druid.indexer.runner.javaOpts: "-server -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dhadoop.mapreduce.job.user.classpath.first=true"
druid.indexer.task.defaultHadoopCoordinates: ["org.apache.hadoop:hadoop-client:cdh"]
profile::druid::middlemanager::env:
DRUID_HEAP_OPTS: "-Xmx64m -Xms64m"
DRUID_EXTRA_JVM_OPTS: "-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


# --- Druid Overlord

# Overlord will accept indexing jobs from Hadoop nodes in the ANALYTICS_NETWORKS
profile::druid::overlord::ferm_srange: '(($DRUID_PUBLIC_HOSTS $ANALYTICS_NETWORKS))'
profile::druid::overlord::properties:
druid.indexer.runner.type: remote
druid.indexer.storage.type: metadata
profile::druid::overlord::env:
DRUID_HEAP_OPTS: "-Xmx4g -Xms4g"
DRUID_EXTRA_JVM_OPTS: "-XX:NewSize=256m -XX:MaxNewSize=256m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
10 changes: 5 additions & 5 deletions manifests/site.pp
Original file line number Diff line number Diff line change
Expand Up @@ -757,7 +757,8 @@
interface::add_ip6_mapped { 'main': }
}

# Analytics Druid servers.
# Druid analytics-eqiad (non public) servers.
# These power internal backends and queries.
# https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake#Druid
node /^druid100[123].eqiad.wmnet$/ {
role(druid::analytics::worker)
Expand All @@ -766,12 +767,11 @@
include ::standard
}

# 'Public' Druid servers.
# Druid public-eqiad servers.
# These power AQS and wikistats 2.0 and contain non sensitive datasets.
# https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake#Druid
# These are currently spares, until they are ready to become
# a new Druid Cluster. WIP T176223
node /^druid100[456].eqiad.wmnet$/ {
role(spare::system)
role(druid::public::worker)

include ::base::firewall
include ::standard
Expand Down
16 changes: 9 additions & 7 deletions modules/network/manifests/constants.pp
Original file line number Diff line number Diff line change
Expand Up @@ -160,19 +160,21 @@
'10.64.53.21', # analytics1002.eqiad.wmnet
'2620:0:861:108:f21f:afff:fee8:bc3f', # analytics1002.eqiad.wmnet
],
'druid_hosts' => [
'druid_analytics_hosts' => [
'10.64.5.101', # druid1001.eqiad.wmnet
'2620:0:861:104:1e98:ecff:fe29:e298', # druid1001.eqiad.wmnet
'10.64.36.102', # druid1002.eqiad.wmnet
'2620:0:861:106:1602:ecff:fe06:8bec', # druid1002.eqiad.wmnet
'10.64.53.103', # druid1003.eqiad.wmnet
'2620:0:861:108:1e98:ecff:fe29:e278', # druid1003.eqiad.wmnet
'10.64.5.24', # druid1004.eqiad.wmnet
'2620:0:861:104:1a66:daff:feac:87a1', # druid1004.eqiad.wmnet
'10.64.21.109', # druid1005.eqiad.wmnet
'2620:0:861:105:1a66:daff:feae:36fb', # druid1005.eqiad.wmnet
'10.64.53.32', # druid1006.eqiad.wmnet
'2620:0:861:108:1a66:daff:feac:75cd', # druid1006.eqiad.wmnet
],
'druid_public_hosts' => [
'10.64.0.35', # druid1004.eqiad.wmnet
'2620:0:861:101:1a66:daff:feac:87a1', # druid1004.eqiad.wmnet
'10.64.16.172', # druid1005.eqiad.wmnet
'2620:0:861:102:1a66:daff:feae:36fb', # druid1005.eqiad.wmnet
'10.64.48.171', # druid1006.eqiad.wmnet
'2620:0:861:107:1a66:daff:feac:75cd', # druid1006.eqiad.wmnet
],
'cache_misc' => [
'10.64.32.97', # cp1045.eqiad.wmnet
Expand Down
5 changes: 5 additions & 0 deletions modules/profile/manifests/druid/broker.pp
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Class: profile::druid::broker
#
# NOTE that most Druid service profiles default ferm_srange
# to profile::druid::ferm_srange, but broker
# defaults to profile::druid::broker::ferm_srange, to
# haver finer control over how Druid accepts queries.
#
class profile::druid::broker(
$properties = hiera('profile::druid::broker::properties'),
$env = hiera('profile::druid::broker::env'),
Expand Down
7 changes: 6 additions & 1 deletion modules/profile/manifests/druid/overlord.pp
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# Class: profile::druid::overlord
#
# NOTE that most Druid service profiles default ferm_srange
# to profile::druid::ferm_srange, but overlord
# defaults to profile::druid::overlord::ferm_srange, to
# haver finer control over how Druid accepts indexing tasks.
#
class profile::druid::overlord(
$properties = hiera('profile::druid::overlord::properties'),
$env = hiera('profile::druid::overlord::env'),
$ferm_srange = hiera('profile::druid::overlord::ferm_srange'),
$monitoring_enabled = hiera('profile::druid::monitoring_enabled'),
$daemon_autoreload = hiera('profile::druid::daemons_autoreload'),
$ferm_srange = hiera('profile::druid::ferm_srange'),
) {

require ::profile::druid::common
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
ferm::service{ 'hadoop-hdfs-namenode':
proto => 'tcp',
port => '8020',
srange => '$ANALYTICS_NETWORKS',
srange => '(($ANALYTICS_NETWORKS $DRUID_PUBLIC_HOSTS))',
}

ferm::service{ 'hadoop-hdfs-zkfc':
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# == Class role::analytics_cluster::hadoop::ferm::resourcemanager
#
class role::analytics_cluster::hadoop::ferm::resourcemanager {

ferm::service{ 'hadoop-yarn-resourcemanager-scheduler':
proto => 'tcp',
port => '8030',
Expand All @@ -17,7 +16,7 @@
ferm::service{ 'hadoop-yarn-resourcemanager':
proto => 'tcp',
port => '8032',
srange => '$ANALYTICS_NETWORKS',
srange => '(($ANALYTICS_NETWORKS $DRUID_PUBLIC_HOSTS))',
}

ferm::service{ 'hadoop-yarn-resourcemanager-admin':
Expand Down Expand Up @@ -55,7 +54,5 @@
port => '9983',
srange => '$ANALYTICS_NETWORKS',
}


}

3 changes: 3 additions & 0 deletions modules/role/manifests/analytics_cluster/hadoop/master.pp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
# It's deep storage directory will be /user/druid/deep-storage.
path => '/user/druid/deep-storage',
}
# The Druid public-eqiad cluster's deep storage
# directory will be /user/druid/deep-storage-public-eqiad
::druid::cdh::hadoop::deep_storage { 'public-eqiad': }

class { '::cdh::hadoop::master': }

Expand Down
2 changes: 1 addition & 1 deletion modules/role/manifests/analytics_cluster/hadoop/worker.pp
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,6 @@
ferm::service{ 'hadoop-access':
proto => 'tcp',
port => '1024:65535',
srange => '$ANALYTICS_NETWORKS',
srange => '(($ANALYTICS_NETWORKS $DRUID_PUBLIC_HOSTS))',
}
}
7 changes: 7 additions & 0 deletions modules/role/manifests/druid/analytics/worker.pp
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Class: role::druid::analytics::worker
# Sets up the Druid analytics cluster for internal use.
# This cluster may contain data not suitable for
# use in public APIs.
#
class role::druid::analytics::worker {
system::role { 'druid::analytics::worker':
description => "Druid worker in the analytics-${::site} cluster",
}

include ::profile::druid::broker
include ::profile::druid::coordinator
include ::profile::druid::historical
Expand Down
20 changes: 20 additions & 0 deletions modules/role/manifests/druid/public/worker.pp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Class: role::druid::public::worker
# Sets up the Druid public cluster for use with AQS and wikistats 2.0.
#
class role::druid::public::worker {
system::role { 'druid::public::worker':
description => "Druid worker in the public-${::site} cluster",
}

include ::profile::druid::broker
include ::profile::druid::coordinator
include ::profile::druid::historical
include ::profile::druid::middlemanager
include ::profile::druid::overlord

# Zookeeper is co-located on some public druid hosts, but not all.
if $::fqdn in $::profile::druid::common::zookeeper_hosts {
include profile::zookeeper::server
include profile::zookeeper::firewall
}
}

0 comments on commit bb9ab0f

Please sign in to comment.