initial hackery

added info move api key to defaults Add real username Change order install latest datadog-agent release These changes are required to install the latest 5.0 release of the agent that is not affected by POODLE. I had to swtich to ``stable``. I'm not sure why the latest release is not on ``unstable``. Initial Redhat Support fixed apt usage in redhat add release appropriate psutils tested on centos 5.x psutil is included with the 'omnibus' install of the agent Add tags to the datadog config file updated to include process checks updated process checks to align correctly; updated readme with new process check feature Parameterize the use_mount setting in datadog.conf make datadog config fully parameterized quote variables Add template & rendering for datadog /etc/conf.d/*.yaml files Add info about "datadog_checks" and "datadog_use_mount" to README.md Add info about "datadog_config" to README.md README fixes; Include null default for datadog_checks default value for datadog_checks needs to be an empty dict Only create /etc/dd-agent/conf.d/process.yaml when datadog_process_checks is defined Adding enabled flag for Packer builds, preprod envs, etc Deprecate process check separate handling The common `datadog_checks` interface should be used instead Add more supported OS versions to metadata Factorize datadog.conf creation [olivier.vielpeau@datadoghq.com] Also factorized the service task Remove need to redefine default vars in playbook `datadog_config` does not require `api_key`, `dd_url` and `use_mount` anymore, the values of respectively `datadog_api_key`, `datadog_url` and `datadog_use_mount` are used if defined. Template YUM repo to support different archs [olivier.vielpeau@datadoghq.com] Use `template` task instead of `copy` Delete unused vars/ Prepare ansible galaxy release Update README and metadata Fix markdown formatting of author info in README [readme] Mention that the role needs sudo rights Related to #3 [apt] Add apt-transport-https to the debian install In preparation for HTTPS repo. Changed keyserver to use port 80 Changed the keyserver to utilize the supported port 80 instead of the default 11371 for use on networks that block/firewall non-standard ports. When specifying a port number, you need to prepend the URL with “hkp://“. [yum] Check signature of the agent's RPM package For Datadog Agent 5.5.0, the RPM package is signed with a GPG key. Let's check the validity of this key when installing the RPM package with Ansible. It also enables the use of HTTPS on our yum repository. Adds explicit file permissions This came about due to issues in the environment from which I work in. We are bound by strict CIS rules which caused the files dropped by ansible to have a mode of 0600 and owned by user root. Preventing the `dd-agent` user from reading his own config files. This PR simply enforces that user to own his the files such that he can read them allowing the daemon to start. Note that this PR assumes the user is dd-agent. While this is true for EL, I'm unaware of what the user is for Debian packages. Sets default user to root * Uses an override for the red hat package Sets default user to dd-agent * Per recommendation set the default user to dd-agent * I left the group as root as this would fail on the debian package * Removes override allowing end users to customize as needed. I simply overly thought out the solution anyways Add a link to the role's Ansible Galaxy page in the README Add Galaxy install to README Closes #17 Avoids bare variables which is deprecated [readme] Use `become: yes` instead of `sudo:yes` `become: yes` is preferred since Ansible 1.9 [meta] Set min Ansible version to 1.6 Since we're using `keyserver` on `apt_key` [meta] Add Xenial to supported platforms Add Changelog, and prepare initial release Allow apt repo settings to be set by playbook user To enable deploys using a local apt mirror. New variables: * datadog_apt_repo * datadog_apt_key_url [changelog] Prepare 1.1.0 release Adding datadog tracing to ansible-datadog Updating config files Adding ability to add check agents to main.yml Adding datadock config checks to datadog role Adding datadock config checks to datadog role Adding hadoop and mysql steps to datadog Fixing snake-bit issue updating repo to default to disabled Enable repo for datadog trace agent Adding monitoring acceptance tests Adding mysql Removing datadog tracing since its bundled with latest dd-agent Adding NTP as a default datadog check Revert "Adding NTP as a default datadog check" Make sure dd-trace-agent is uninstallad before installing datadog agent Fixing - making yum instead of service adding ssl cert expiry check files adding ssl cert expiry check files
udemy · Mar 29, 2017 · da1274b · da1274b
commit da1274b
Show file tree

Hide file tree

Showing 18 changed files with 901 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,14 @@
+CHANGELOG
+=========
+
+# 1.1.0 / 2016-06-27
+
+* [FEATURE] Allow APT repo settings to be user-defined. See [#20][] (thanks [@geoffwright][])
+
+# 1.0.0 / 2016-06-08
+
+Initial release, compatible with Ansible v1 & v2
+
+[#20]: https://github.com/DataDog/ansible-datadog/issues/20
+
+[@geoffwright]: https://github.com/geoffwright
diff --git a/README.md b/README.md
@@ -0,0 +1,90 @@
+Ansible Datadog Role
+========
+[![Ansible Galaxy](http://img.shields.io/badge/galaxy-Datadog.datadog-660198.svg)](https://galaxy.ansible.com/Datadog/datadog/)
+
+Install and configure Datadog base agent & checks.
+
+Installation
+------------
+
+```
+ansible-galaxy install Datadog.datadog
+```
+
+Role Variables
+--------------
+
+- `datadog_api_key` - Your Datadog API key.
+- `datadog_checks` - YAML configuration for agent checks to drop into `/etc/dd-agent/conf.d`.
+- `datadog_config` - Settings to place in `/etc/dd-agent/datadog.conf`.
+- `datadog_process_checks` - Array of process checks and options (DEPRECATED: use `process` under
+`datadog_checks` instead)
+- `datadog_apt_repo` - Override default Datadog `apt` repository
+- `datadog_apt_key_url` - Override default url to Datadog `apt` key
+
+Dependencies
+------------
+None
+
+Example Playbooks
+-------------------------
+```
+- hosts: servers
+  roles:
+    - { role: Datadog.datadog, become: yes }  # On Ansible < 1.9, use `sudo: yes` instead of `become: yes`
+  vars:
+    datadog_api_key: "123456"
+    datadog_config:
+      tags: "mytag0, mytag1"
+      log_level: INFO
+    datadog_checks:
+      process:
+        init_config:
+        instances:
+          - name: ssh
+            search_string: ['ssh', 'sshd' ]
+          - name: syslog
+            search_string: ['rsyslog' ]
+            cpu_check_interval: 0.2
+            exact_match: true
+            ignore_denied_access: true
+      ssh_check:
+        init_config:
+        instances:
+          - host: localhost
+            port: 22
+            username: root
+            password: changeme
+            sftp_check: True
+            private_key_file:
+            add_missing_keys: True
+      nginx:
+        init_config:
+        instances:
+          - nginx_status_url: http://example.com/nginx_status/
+            tags:
+              - instance:foo
+          - nginx_status_url: http://example2.com:1234/nginx_status/
+            tags:
+              - instance:bar
+```
+
+```
+- hosts: servers
+  roles:
+    - { role: Datadog.datadog, become: yes, datadog_api_key: "mykey" }  # On Ansible < 1.9, use `sudo: yes` instead of `become: yes`
+```
+
+License
+-------
+
+Apache2
+
+Author Information
+------------------
+
+brian@akins.org
+
+dustinjamesbrown@gmail.com --Forked from brian@akins.org
+
+Datadog <info@datadoghq.com> --Forked from dustinjamesbrown@gmail.com
diff --git a/defaults/main.yml b/defaults/main.yml
@@ -0,0 +1,33 @@
+---
+datadog_enabled: yes
+datadog_api_key: "youshouldsetthis"
+
+# Comma seperated list of tags
+datadog_tags: ""
+
+datadog_url: "https://app.datadoghq.com"
+datadog_use_mount: "no"
+
+# default datadog.conf options
+datadog_config: {}
+
+# default checks enabled
+datadog_checks: {}
+
+# default checks enabled
+datadog_check_agents: {}
+
+# default user/group
+datadog_user: dd-agent
+datadog_group: root
+
+# default apt repo
+datadog_apt_repo: "deb http://apt.datadoghq.com/ stable main"
+
+datadog_mysql_host: localhost
+datadog_mysql_user: datadog
+datadog_mysql_password: ThisNeedsToBeChangedViaVault
+datadog_mysql_replication_enabled: True
+datadog_mysql_extra_status_enabled: True
+datadog_mysql_extra_innodb_enabled: True
+datadog_mysql_extra_performance_enabled: True
diff --git a/files/nutcracker.py b/files/nutcracker.py
@@ -0,0 +1,217 @@
+"""
+
+To test this, run 'sudo -u dd-agent dd-agent check nutcracker'
+
+When ready: 
+- place this file in /etc/dd-agent/checks.d/nutcracker.py
+- put the config file in /etc/dd-agent/conf.d/nutcracker.yaml
+- service datadog-agent restart
+"""
+
+import hashlib
+import json
+import md5
+import memcache
+import os
+import socket
+import sys
+import time
+import uuid
+
+from checks import AgentCheck
+
+class NutcrackerCheck(AgentCheck):
+    SOURCE_TYPE_NAME = 'nutcracker'
+    SERVICE_CHECK = 'nutcracker.can_connect'
+
+    DEFAULT_HOST = '127.0.0.1'
+    DEFAULT_PORT = 11211
+    DEFAULT_STATS_PORT = 22222
+
+    # Pool stats.  These descriptions are from 'nutcracker --describe-stats'
+    POOL_STATS = [
+        ['curr_connections', 'gauge', None],  # Number of current connections
+        ['total_connections', 'rate', None],  # Running total connections made
+        ['server_ejects', 'rate', None],  # times a backend server was ejected
+        ['client_err', 'rate', None],  # errors on client connections
+        ]
+
+    # Server stats.  These descriptions are from 'nutcracker --describe-stats'
+    SERVER_STATS = [
+        ['server_eof', 'rate', None],  # eof on server connections
+        ['server_err', 'rate', None],  # errors on server connections
+        ['server_timedout', 'rate', 'timedout'],  # timeouts on server connections
+        ['server_connections', 'gauge', 'connections'],  # active server connections
+        ['requests', 'rate', None],  # requests
+        ['request_bytes', 'rate', None],  # total request bytes
+        ['responses', 'rate', None],  # responses
+        ['response_bytes', 'rate', None],  # total response bytes
+        ['in_queue', 'gauge', None],  # requests in incoming queue
+        ['in_queue_bytes', 'gauge', None],  # current request bytes in incoming queue
+        ['out_queue', 'gauge', None],  # requests in outgoing queue
+        ['out_queue_bytes', 'gauge', None],  # current request bytes in outgoing queue
+        ]
+
+    def _get_raw_stats(self, host, stats_port):
+        # Connect
+        self.log.debug("Connecting to %s:%s", host, stats_port)
+        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        s.connect((host, stats_port))
+
+        # Read
+        file = s.makefile('r')
+        data = file.readline();
+        s.close()
+
+        # Load
+        return json.loads(data);
+
+    def _send_datadog_stat(self, item, data, tag_map, prefix):
+        # Break out the info
+        stat_key, stat_type, override_name = item
+
+        # Make sure we have a name
+        if not override_name: 
+            override_name = stat_key
+
+        # Add the prefix if appropriate.
+        if prefix:
+            override_name = prefix + "_" + override_name
+
+        try:
+            # Get the data, make sure it's there.
+            stat_data = float(data.get(stat_key))
+        except:
+            # Hrm, not there.  Let it be zero.
+            stat_data = 0
+
+        # Make the datadog metric.
+        metric = self.normalize(override_name.lower(), self.SOURCE_TYPE_NAME)
+
+        tags = [k+":"+v for k,v in tag_map.iteritems()]
+
+        if stat_type == 'gauge':
+            self.gauge(metric, stat_data, tags=tags)
+            return
+
+        if stat_type == 'rate':
+            metric += "_rate"
+            self.rate(metric, stat_data, tags=tags)
+            return
+
+        if stat_type == 'bool':
+            self.gauge(metric, (1 if stat_data else 0), tags=tags)
+            return
+
+        raise Exception("Unknown datadog stat type '%s' for key '%s'" % (stat_type, stat_key))
+
+    def _get_metrics(self, host, port, stats_port, tags, aggregation_key):
+        try:
+            raw_stats = self._get_raw_stats(host, stats_port)
+        except Exception as e:
+            self.service_check(self.SERVICE_CHECK, AgentCheck.CRITICAL)
+            self.event({
+                'timestamp': int(time.time()),
+                'event_type': 'get_stats',
+                'msg_title': 'Cannot get stats',
+                'msg_text': str(e),
+                'aggregation_key': aggregation_key
+            })
+
+            raise
+
+
+        # pprint.pprint(raw_stats)
+
+        # Get all the pool stats
+        for pool_key, pool_data in raw_stats.iteritems():
+            try:
+                # Pools are not separated from the other keys, blarg.  
+                # Just check if it's a dict with one of the pool keys, if not then skip it.
+                pool_data['client_connections']
+            except:
+                # Not there, it's not a pool.
+                self.log.debug(pool_key + ": NOT A POOL");
+                continue
+
+            # Start the stat tags.
+            tags['nutcracker_pool'] = pool_key
+
+            # It's a pool.  Process all the non-server stats
+            for item in self.POOL_STATS:
+                self._send_datadog_stat(item, pool_data, tags, "pool")
+
+            # Find all the servers.
+            for server_key, server_data in pool_data.iteritems():
+                try:
+                    # Servers are not separated from the other keys, blarg.  
+                    # Just check if it's a dict with one of the server keys, if not then skip it.
+                    server_data['in_queue_bytes']
+                except:
+                    # Not there, it's not a server.
+                    self.log.debug(server_key + ": NOT A SERVER");
+                    continue
+
+                # Set the server in the tags.
+                tags['nutcracker_pool_server'] = server_key
+
+                # It's a server.  Send stats.
+                for item in self.SERVER_STATS:
+                    self._send_datadog_stat(item, server_data, tags, "server")
+
+        # The key for our roundtrip tests.
+        key = uuid.uuid4().hex
+
+        try:
+            # Make the connection and do a round trip.
+            mc = memcache.Client([host+':'+str(port)], debug=0)
+
+            mc.set(key, key)
+            data = mc.get(key)
+            mc.delete(key)
+            empty_data = mc.get(key)
+
+            # Did the get work?
+            if data != key:
+                raise Exception("Cannot set and get")
+
+            # Did the delete work?
+            if empty_data:
+                raise Exception("Cannot delete")
+
+        except Exception as e:
+            # Something failed.
+            metric = self.normalize("test_connect_fail", self.SOURCE_TYPE_NAME)
+            self.gauge(metric, 1, tags=tags)
+
+            self.service_check(self.SERVICE_CHECK, AgentCheck.CRITICAL)
+            self.event({
+                'timestamp': int(time.time()),
+                'event_type': 'test_data',
+                'msg_title': 'Cannot get/set/delete',
+                'msg_text': str(e),
+                'aggregation_key': aggregation_key
+            })
+
+            raise
+
+        # Connection is ok.
+        self.service_check(self.SERVICE_CHECK, AgentCheck.OK)
+
+    # Called by datadog as the starting point for this check.
+    def check(self, instance):
+        host = instance.get('host', self.DEFAULT_HOST)
+        port = int(instance.get('port', self.DEFAULT_PORT))
+        stats_port = int(instance.get('stats_port', self.DEFAULT_STATS_PORT))
+
+        tags = {}
+        for item in instance.get('tags', []):
+            k, v = item.split(":", 1)
+            tags[k] = v
+
+        tags["host"] = host + ":" + str(port)
+
+        aggregation_key = hashlib.md5(host+":"+str(port)).hexdigest()
+
+        self._get_metrics(host, port, stats_port, tags, aggregation_key)
+