Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.
Russell Christopher edited this page Nov 6, 2016 · 2 revisions

In order to run redshift-monitor, you must create a repository to store Redshift metrics in, then edit, upload and configure an AWS Lambda function.

Setup Database

Create a PostgreSQL RDS instance in Redshift. Choose whatever instance size you’d like: it won’t be very active in terms of WRITE activity, and you’ll likely create a Tableau Data Extract against this dataset from Tableau Desktop anyway…so you can probably go “small”

Connect to the database with psql or tool like pgadmin3 / pgadmin4. Create a new database named redshift-data:

Connect to the redshift-database, and execute the create-table.sql script:

Note that this script assigns ownership of the sequence and table to a user who will not exist in your Postgres instance. Before running the script, plug in a different user for the GRANTs.

Modify connection properties for your databases

Open redshift-monitor.zip and extract redshiftmonitor.js to your desktop.

Open redshiftmonitor.js with a text and/or code editor and modify the following properties for both the Redshift and PostgreSQL connections:

  • user: A valid username with permissions to connect to the server and database
  • database: The database to connect to. For Redshift, this will be the database which contains tables you wish to check for skew / distribution values. You must include a database name for this value, even if you don’t care about looking at database-specific characteristics. Keep the database property ‘redshift-data’ for pgConfig.
  • password: The password for “user”. Yeah, in plain text. Sorry. We’ll use AWS’s KMS to encrypt this value in a future build. But not yet.
  • port: The port your instance listens on
  • host: DNS hostname of your instance

When you are through, update the zip file with your modified *.js

Optionally, update either the showDebug or hourToCollectLAstSevenDaysMetrics properties:

showDebug

By default, redshift-monitor delivers basic information in the CloudWatch logs:

Setting showDebug to true will toggle to a very verbose log which can be useful for troubleshooting purposes.

hourToCollectLastSevenDaysMetrics

Some of the queries in “our collection” don’t need and/or shouldn’t be fired every hour. Use hourstoCollectLastSevenDaysMetrics to set the hour (in GMT) which they will. Keep in mind that Lambda uses GMT vs. whatever time zone you happen to live in. If this function is executed at the console in Singapore (GMT + 8) with a hourstoCollectLastSevenDaysMetrics value of 12, the “7 day” queries will fire once during the hour of 12p (4a GMT). In Lambda, the queries will fire during the hour of 4a, local Singapore time.

Don't forget to replace the redshiftmonitor.js in the zip file with the one you just modified! Later, you'll be uploading this *.zip to AWS.

Create a Lambda Function and Cloud Watch Trigger to Execute it

First, create an IAM role which will allow your function to execute:

  • Login to the AWS console and go to IAM service.
  • In the left-hand navigation list, click Roles.
  • Click Create New Role.
  • Create a role with a Role Name of myLambdaRedshiftRole. Click Next Step.
  • Click the Select button for AWS Lambda in the AWS Services Roles list.
  • Scroll and/or Filter down to the AWSLambdaVPCAccessExecutionRole. Select it.
  • Click Next Step, and then Create Role.

Create the function

  • Login to the AWS console. Select the Lambda Service.
  • If you’ve never created a Lambda function before, you’ll be presented with a welcome screen. Click Get Started Now.
  • Click Create a Lambda Function.
  • Underneath the list of “blueprints”, click Skip.
  • On the Configure Triggers page, click inside the “dotted square”, then select CloudWatch Events – Schedule.
  • You’ll be presented with an empty rule. Fill the following properties in as follows:

Rule name: Hourly

Rule Description: Something appropriate, like “Execute Every Hour”

Schedule Expression: rate (1 hour)

Enable Trigger: checked

  • Click Next.
  • In the Configure Function screen, fill in the following properties:

Name: myRedshiftFunction (or some other pithy descriptor)

Description (Optional): Knock yourself out. Describe something

Runtime: Node.js 4.3

Code entry type: Upload a Zip file

  • Click the Upload button associated with Function Package and select the redshiftmonitor.zip file you recently modified.
  • Enter (exactly) redshiftmonitor.myHandler in the Handler text box.
  • In the Role property, select Choose an existing role.
  • Under Existing role, choose the role you just created (myLamdaRedshiftRole).

  • Under Advanced Settings, change the Timeout value. There is no “right” value as the correct one depends on how quickly your Redshift cluster can answer the questions the Lambda function will ask. At 1 min, 30 sec (a very long timeout period for Lambda), I still saw a timeout every few days while driving 95%+ CPU load on my test cluster. You will need to experiment. If your function times out, you will "lose" data for the period in question as we do not retry. If you know how to leverage WLM user & query groups, consider using a Redshift login which causes the queries to be executed on the super-user queue, so less waiting would be required.

  • Click Next, and then Create Function.

  • If you wish, click the Test button. The function currently takes no input parameters, so a template like this will work fine:

  • Click Save and Test. You will likely see something like the following after a few moments:

Note: For reasons unknown, it’s not completely unusual for there to be a connection error against PostgreSQL the first time you test. Wait a minute or so, then just try testing again. Should work unless you're dealing with a "real" problem.

Where is the sample Tableau workbook?

It's here.

Where are my logs?

Over here:

  • Login to the AWS console. Select the** Lambda Service**.
  • In the Function List, click on the name of your Function.
  • Click the Monitoring tab.
  • On the right, click View logs in CloudWatch.
  • You will see a history of log streams, one for each Lambda Function execution. Open one:

  • Choosing the Text radio button will allow you to semi-easily copy/paste text…

Other Stuff:

Note that the Lambda-ready function is in a file called redshiftmonitor.js. A second file named redshift-monitor.js (note the “-“) and associated package.json file can be used on your desktop if you like to play with Node. Do a standard NPM install to pull down the packages necessary for the code to run on your desktop.