<p>This Runbook takes data from an S3 bucket, and populates an AWS Redshift table with the data.</p>
<p>The initial reason for this RunBook was to populate AWS Cost and Usage Reports (CUR) into Redshift &nbsp;The CUR is dumped into a S3 bucket. In order to run queries, it must be copied into a Redshift table.</p>
<p>We have written a series of blog posts on this:</p>
<p><a href="https://unskript.com/blog/keeping-your-cloud-costs-in-check-automated-aws-cost-charts-and-alerting/" target="_blank" rel="noopener">https://unskript.com/blog/keeping-your-cloud-costs-in-check-automated-aws-cost-charts-and-alerting/</a></p>
<p><a href="https://unskript.com/blog/cloud-costs-charting-daily-ec2-usage-and-cost/" target="_blank" rel="noopener">https://unskript.com/blog/cloud-costs-charting-daily-ec2-usage-and-cost/</a></p>
<p>&nbsp;</p>
<h2>Prerequisites</h2>
<p>&nbsp;</p>
<p>Here are the steps you need to complete before you can run this runbook:</p>
<ol>
<li>&nbsp;Create a Cost and Usage Report at AWS (here's a <a href="https://docs.aws.amazon.com/cur/latest/userguide/cur-create.html">step by step guide</a>)</li>
<li>Create a AWS Secret that in Secrets Manager that has access to your AWS Redshift.&nbsp;</li>
<li>Once your CUR report has started populating, you'll need to create a table in Redshift &nbsp;In your S3 bucket, there will be a folder for the year/month. Inside will be a file that ends in RedshiftCommands.sql
<ol>
<li>The first line (it's really long) creates the table.&nbsp; Run this is the RedShift query editor (V2).&nbsp;</li>
<li>The second line is the query to update the table &nbsp;You'll need this for this runbook (create sql query - in the rebuildSQL variable)</li>
</ol>
</li>
</ol>
<p>Every month, you'll need to create the new table in RedShift manually. (this is a TODO for anyone interested in contributing!)&nbsp;</p>
<p>&nbsp;</p>
<h2>What this RunBook does</h2>
<ol>
<li>Gets the AWS SecretARN from Secrets Manager &nbsp;Given the secret_name input - this action will return the ARN required to make Redshift Queries,</li>
<li>Create SQL queries. There are 2 queries to be run:
<ol>
<li>Truncate Table - this deletes all existing data (but keeps the columns).</li>
<li>rebuildSQL - This makes the query to update the table with the latest data from S3 &nbsp;This query requires the Query from your RedshiftCommands.sq1 &nbsp;We just change the tablename into a variable so that it can be used month after month.&nbsp;</li>
</ol>
</li>
<li>AWS Redshift Query - truncate.&nbsp; This applies the Truncate table query to your RedShift table.</li>
<li>AWS Get Redshift Query Details - checks to see that the first query has completed before running the 2nd query</li>
<li>AWS Redshift Query - truncate.</li>
<li>AWS Redshift Query rebuild sql - this query repopulates the Redshift table.&nbsp; This may take a while. In this runbook - we do not look to see that the query has finished.&nbsp; We just wait a few moniutes before making additional calls on the table.</li>
</ol>

In [10]:
##
##  Copyright (c) 2023 unSkript, Inc
##  All rights reserved.
##

from __future__ import annotations

from typing import Optional

from pydantic import BaseModel, Field


from beartype import beartype
@beartype
def aws_get_secrets_manager_secretARN_printer(output):
    if output is None:
        return
    pprint.pprint({"secret": output})


@beartype
@beartype
@beartype
def aws_get_secrets_manager_secretARN(handle, region: str, secret_name:str) -> str:


    # Create a Secrets Manager client

    client = handle.client(
        service_name='secretsmanager',
        region_name=region
    )


    get_secret_value_response = client.get_secret_value(
        SecretId=secret_name
        )

    #print(get_secret_value_response)
    # Decrypts secret using the associated KMS key.
    secretArn = get_secret_value_response['ARN']
    return secretArn




task = Task(Workflow())
task.configure(inputParamsJson='''{
    "region": "region",
    "secret_name": "\\"awsuser-doug-redshift\\""
    }''')
task.configure(outputName="secretArn")

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(aws_get_secrets_manager_secretARN, lego_printer=aws_get_secrets_manager_secretARN_printer, hdl=hdl, args=args)

In [11]:
import datetime

today = datetime.datetime.now()

yearmonth = today.strftime('%Y%m')
month = today.strftime('%m')
year =  today.strftime('%Y')
yearmonthday = yearmonth +"01"
#print("yearmonthday",yearmonthday)
if int(month) <12:
    nextMonth = int(month)+1
    if nextMonth < 10:
        nextMonthStr = "0" + str(nextMonth)
    else: 
        nextMonthStr = str(nextMonth)
if int(month) == 12:
    nextMonthStr = "01"
    year = year +1   
nextMonthYMD = year + nextMonthStr +"01"


tableName = 'awsbilling'+ yearmonth
dateRange = yearmonthday+'-'+nextMonthYMD
#print("dateRange", dateRange)

TruncateSQL = f"truncate table {tableName}"
print("TruncateSQL", TruncateSQL)
RebuildSql = f"copy {tableName} from 's3://unskript-billing-doug/all/unskript-billing-doug/{dateRange}/unskript-billing-doug-RedshiftManifest.json' credentials     'aws_iam_role=arn:aws:iam::100498623390:role/service-role/AmazonRedshift-CommandsAccessRole-20230103T181457' region 'us-west-2'    GZIP CSV IGNOREHEADER 1 TIMEFORMAT 'auto' manifest;"
print("RebuildSql", RebuildSql)



In [12]:
##
##  Copyright (c) 2021 unSkript, Inc
##  All rights reserved.
##


from __future__ import annotations
from pydantic import BaseModel, Field
from typing import List, Dict
from unskript.connectors.aws import aws_get_paginator
import pprint
from beartype import beartype


from beartype import beartype
@beartype
def aws_create_redshift_query(handle, region: str,cluster:str, database:str, secretArn: str, query:str) -> str:

    # Input param validation.
    #major change
    client = handle.client('redshift-data', region_name=region)
    # define your query
    query = query
    # execute the query
    response = client.execute_statement(
        ClusterIdentifier=cluster,
        Database=database,
        SecretArn=secretArn,
        Sql=query
    )
    resultId = response['Id']
    print(response)
    print("resultId",resultId)


    return resultId

#make a change


def unskript_default_printer(output):
    if isinstance(output, (list, tuple)):
        for item in output:
            print(f'item: {item}')
    elif isinstance(output, dict):
        for item in output.items():
            print(f'item: {item}')
    else:
        print(f'Output for {task.name}')
        print(output)

task = Task(Workflow())
task.configure(inputParamsJson='''{
    "cluster": "cluster",
    "database": "database",
    "query": "TruncateSQL",
    "region": "region",
    "secretArn": "secretArn"
    }''')
task.configure(outputName="truncateId")
task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(aws_create_redshift_query, lego_printer=unskript_default_printer, hdl=hdl, args=args)

In [13]:
from __future__ import annotations
##
##  Copyright (c) 2023 unSkript, Inc
##  All rights reserved.
##
from pydantic import BaseModel, Field
from typing import List, Dict
from unskript.connectors.aws import aws_get_paginator
import pprint
from beartype import beartype


from typing import Optional

from pydantic import BaseModel, Field


from beartype import beartype
@beartype
def aws_get_redshift_query_details(handle, region: str, queryId:str) -> Dict:

    client = handle.client('redshift-data', region_name=region)
    response = client.describe_statement(
    Id=queryId
    )
    return response




def unskript_default_printer(output):
    if isinstance(output, (list, tuple)):
        for item in output:
            print(f'item: {item}')
    elif isinstance(output, dict):
        for item in output.items():
            print(f'item: {item}')
    else:
        print(f'Output for {task.name}')
        print(output)

task = Task(Workflow())
task.configure(inputParamsJson='''{
    "queryId": "truncateId",
    "region": "region"
    }''')

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(aws_get_redshift_query_details, lego_printer=unskript_default_printer, hdl=hdl, args=args)

In [14]:
##
##  Copyright (c) 2021 unSkript, Inc
##  All rights reserved.
##


from __future__ import annotations
from pydantic import BaseModel, Field
from typing import List, Dict
from unskript.connectors.aws import aws_get_paginator
import pprint
from beartype import beartype


from beartype import beartype
@beartype
def aws_create_redshift_query(handle, region: str,cluster:str, database:str, secretArn: str, query:str) -> str:

    # Input param validation.
    #major change
    client = handle.client('redshift-data', region_name=region)
    # define your query
    query = query
    # execute the query
    response = client.execute_statement(
        ClusterIdentifier=cluster,
        Database=database,
        SecretArn=secretArn,
        Sql=query
    )
    resultId = response['Id']
    print(response)
    print("resultId",resultId)


    return resultId

#make a change


def unskript_default_printer(output):
    if isinstance(output, (list, tuple)):
        for item in output:
            print(f'item: {item}')
    elif isinstance(output, dict):
        for item in output.items():
            print(f'item: {item}')
    else:
        print(f'Output for {task.name}')
        print(output)

task = Task(Workflow())
task.configure(inputParamsJson='''{
    "cluster": "cluster",
    "database": "database",
    "query": "RebuildSql",
    "region": "region",
    "secretArn": "secretArn"
    }''')
task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(aws_create_redshift_query, lego_printer=unskript_default_printer, hdl=hdl, args=args)