## Setting up Sagemaker Studio

Here we are setting the autoreload functionality to retain the state across sessions unless explicitly stated. We are referencing the code folder to deploy for later.

In [66]:
%load_ext autoreload
%autoreload 2

import sys
from pathlib import Path

SOURCE_FOLDER = Path("../src")
SAGEMAKER_FOLDER = Path("./")
CODE_FOLDER = Path("code")

CODE_FOLDER.mkdir(parents=True, exist_ok=True)

sys.path.append(f"./{CODE_FOLDER}")
sys.path.append(f"../{SOURCE_FOLDER}")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Our environment constants.
We may need to update the `USER_PROFILE` once Adrian is done working.

In [67]:
DOMAIN_ID = "d-sz2auifr74ie"
USER_PROFILE = "adrianm2-d23"

## Lifecycle Configuration

Customize SageMaker Studio using Lifecycle configurations. These are shell scripts that will be triggered by lifecycle events, such as starting a new Studio notebook.

The following script upgrades the packages on a SageMaker Studio Kernel Application.

In [68]:
%%writefile packages.sh

#!/bin/bash
# This script upgrades the packages on a SageMaker 
# Studio Kernel Application.

set -eux

pip install -q --upgrade pip
pip install -q --upgrade awscli boto3
pip install -q --upgrade scikit-learn==0.23.2
pip install -q --upgrade PyYAML==6.0
pip install -q --upgrade sagemaker

Overwriting packages.sh


## Permissions

Update the Execution Policy assigned to SageMaker's Execution Role and add the appropriate permissions.

In [69]:
import sagemaker

role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::294404188698:role/service-role/AmazonSageMaker-ExecutionRole-20230824T181825


Open the Amazon IAM service, find the role and edit the custom Execution Policy assigned to it. You can edit the permissions of the Execution Policy and use the following definition instead:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "IAM0",
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": [
                        "autoscaling.amazonaws.com",
                        "ec2scheduled.amazonaws.com",
                        "elasticloadbalancing.amazonaws.com",
                        "spot.amazonaws.com",
                        "spotfleet.amazonaws.com",
                        "transitgateway.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Sid": "IAM1",
            "Effect": "Allow",
            "Action": [
                "iam:CreateRole",
                "iam:PassRole",
                "iam:AttachRolePolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Lambda",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:DeleteFunction",
                "lambda:InvokeFunctionUrl",
                "lambda:InvokeFunction",
                "lambda:UpdateFunctionCode",
                "lambda:InvokeAsync"
            ],
            "Resource": "*"
        },
        {
            "Sid": "SageMaker",
            "Effect": "Allow",
            "Action": [
                "sagemaker:UpdateDomain",
                "sagemaker:UpdateUserProfile"
            ],
            "Resource": "*"
        },
        {
            "Sid": "CloudWatch",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "cloudwatch:GetMetricData",
                "cloudwatch:DescribeAlarmsForMetric",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:CreateLogGroup",
                "logs:DescribeLogStreams"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ECR",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3",
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::*"
        }
    ]
}
```

We can now create a new lifecycle configuration that we can later select as the start-up script for our kernel. This will allow us to access AWS resources from our notebooks.

[SageMaker Lifecycle Best Practices](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-lifecycle-config-install.html)

1. Setting Parameters from Command-Line Arguments
2. Base64 Encoding a Script
3. Deleting Existing Lifecycle Configuration
4. Creating New Lifecycle Configuration
5. Extracting ARN
6. Updating User Profile

In [70]:
%%bash -s "$DOMAIN_ID" "$USER_PROFILE"

DOMAIN_ID=$(echo "$1")
USER_PROFILE=$(echo "$2")

LCC_CONTENT=`openssl base64 -A -in packages.sh`

aws sagemaker delete-studio-lifecycle-config \
    --studio-lifecycle-config-name packages

response=$(aws sagemaker create-studio-lifecycle-config \
    --studio-lifecycle-config-name packages \
    --studio-lifecycle-config-content $LCC_CONTENT \
    --studio-lifecycle-config-app-type KernelGateway) 

arn=$(echo "${response}" | python3 -c "import sys, json; print(json.load(sys.stdin)['StudioLifecycleConfigArn'])")
echo "${arn}"

aws sagemaker update-user-profile --domain-id $DOMAIN_ID \
    --user-profile-name $USER_PROFILE \
    --user-settings '{
        "KernelGatewayAppSettings": {
            "LifecycleConfigArns": ["'$arn'"]
        }
    }'


An error occurred (ResourceInUse) when calling the DeleteStudioLifecycleConfig operation: Unable to delete LCC [arn:aws:sagemaker:us-east-1:294404188698:studio-lifecycle-config/packages] because LCC is in use by one or more Apps

An error occurred (ResourceInUse) when calling the CreateStudioLifecycleConfig operation: The ID or Name specified is already in use.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/conda/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/conda/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/conda/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)






An error occurred (ValidationException) when calling the UpdateUserProfile operation: 1 validation error detected: Value '[]' at 'userSettings.kernelGatewayAppSettings.lifecycleConfigArns' failed to satisfy constraint: Member must satisfy constraint: [Member must have length less than or equal to 256, Member must have length greater than or equal to 0, Member must satisfy regular expression pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:studio-lifecycle-config/.*]


CalledProcessError: Command 'b'\nDOMAIN_ID=$(echo "$1")\nUSER_PROFILE=$(echo "$2")\n\nLCC_CONTENT=`openssl base64 -A -in packages.sh`\n\naws sagemaker delete-studio-lifecycle-config \\\n    --studio-lifecycle-config-name packages\n\nresponse=$(aws sagemaker create-studio-lifecycle-config \\\n    --studio-lifecycle-config-name packages \\\n    --studio-lifecycle-config-content $LCC_CONTENT \\\n    --studio-lifecycle-config-app-type KernelGateway) \n\narn=$(echo "${response}" | python3 -c "import sys, json; print(json.load(sys.stdin)[\'StudioLifecycleConfigArn\'])")\necho "${arn}"\n\naws sagemaker update-user-profile --domain-id $DOMAIN_ID \\\n    --user-profile-name $USER_PROFILE \\\n    --user-settings \'{\n        "KernelGatewayAppSettings": {\n            "LifecycleConfigArns": ["\'$arn\'"]\n        }\n    }\'\n'' returned non-zero exit status 255.

## Constant for pipeline


* `BUCKET`: This is the name of the S3 bucket where we will organize every resource we are going to use during the program. This name has to be unique. 
* `DATA_FILEPATH`: The local path where we'll keep our initial training dataset
* `sagemaker_client`: We'll use a [boto3 SageMaker Client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) instance to access SageMaker.
* `iam_client`: We'll use a [boto3 IAM Client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam.html) instance to access IAM.
* `role`: This is the execution role attached to this notebook. We can use this role with any of the SageMaker services that need it to ensure they run with the appropriate permissions.
* `region`: The current region attached to our session. 
* `sagemaker_session`: The current SageMaker session.

In [71]:
%%writefile {CODE_FOLDER}/constants.py

import boto3
import sagemaker
from pathlib import Path

BUCKET = "default-soil-predictions"
S3_LOCATION = f"s3://{BUCKET}/soilpreds"

DATA_ACCUWEATHER_FILEPATH = Path().resolve() / "data" / "accuweather_hourly_1.29_to_6.15.csv"
DATA_WEATHERLINK_COMPARE_FILEPATH = Path().resolve() / "data" / "meteo_data for model 30.1.2023. - 31.7.2023..csv"
DATA_WEATHERLINK_MODEL_FILEPATH = Path().resolve() / "data" / "meteo_data for model 30.1.2023. - 31.7.2023..csv"
DATA_SENSOR1_FILEPATH = Path().resolve() / "data" / "meteo_data for model 30.1.2023. - 31.7.2023..csv"
DATA_SENSOR2_FILEPATH = Path().resolve() / "data" / "meteo_data for model 30.1.2023. - 31.7.2023..csv"


sagemaker_client = boto3.client("sagemaker")
iam_client = boto3.client("iam")
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()

Writing code/constants.py


In [72]:
from constants import BUCKET
import urllib.request
import pandas as pd

from constants import *
from pathlib import Path
from sagemaker.s3 import S3Uploader

!aws s3api create-bucket --bucket $BUCKET

# urllib.request.urlretrieve(
#     "https://storage.googleapis.com/download.tensorflow.org/data/palmer_penguins/penguins_size.csv", 
#     DATA_FILEPATH
# )

S3Uploader.upload(local_path=str(DATA_ACCUWEATHER_FILEPATH), desired_s3_uri=S3_LOCATION)

{
    "Location": "/default-soil-predictions"
}


's3://default-soil-predictions/soilpreds/accuweather_hourly_1.29_to_6.15.csv'

In [73]:
S3Uploader.upload(local_path=str(DATA_WEATHERLINK_COMPARE_FILEPATH), desired_s3_uri=S3_LOCATION)

's3://default-soil-predictions/soilpreds/meteo_data for model 30.1.2023. - 31.7.2023..csv'

In [74]:
S3Uploader.upload(local_path=str(DATA_WEATHERLINK_MODEL_FILEPATH), desired_s3_uri=S3_LOCATION)

's3://default-soil-predictions/soilpreds/meteo_data for model 30.1.2023. - 31.7.2023..csv'

In [75]:
S3Uploader.upload(local_path=str(DATA_SENSOR1_FILEPATH), desired_s3_uri=S3_LOCATION)

's3://default-soil-predictions/soilpreds/meteo_data for model 30.1.2023. - 31.7.2023..csv'

In [76]:
S3Uploader.upload(local_path=str(DATA_SENSOR2_FILEPATH), desired_s3_uri=S3_LOCATION)

's3://default-soil-predictions/soilpreds/meteo_data for model 30.1.2023. - 31.7.2023..csv'