# Assignment: setup
This series of the notebooks takes you through assignments while you're following ML development process using Amazon SageMaker MLOps building blocks. 

The assignments are based on the provided notebooks and you can use the code in the notebooks to complete exercises.

Refer to the notebook [`00-start-here.ipynb`](../00-start-here.ipynb) for code snippets and a general guidance for the exercises in this assignment.

## Import packages

In [2]:
import time
import os
import json
import boto3
import numpy as np  
import pandas as pd 
import sagemaker

sagemaker.__version__

'2.165.0'

## Exercise 1: AWS and SageMaker environment
- Instantiate a [sagemaker session](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html)
- Get the name of the default bucket to use in relevant Amazon SageMaker interactions
- Get the SageMaker execution role
- Get the AWS region
- Instantiate a boto3 [sagemaker client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html)


In [3]:
# Exercise 1 - write code here

# Get some variables you need to interact with SageMaker service
boto_session = boto3.Session()
region = boto_session.region_name
bucket_name = sagemaker.Session().default_bucket()
bucket_prefix = "from-idea-to-prod/xgboost"  
sm_session = sagemaker.Session()
sm_client = boto_session.client("sagemaker")
sm_role = sagemaker.get_execution_role()

initialized = True

print(sm_role)

arn:aws:iam::531485126105:role/service-role/AmazonSageMaker-ExecutionRole-20230614T171444


In [5]:
%store bucket_name
%store bucket_prefix
%store sm_role
%store region
%store initialized

Stored 'bucket_name' (str)
Stored 'bucket_prefix' (str)
Stored 'sm_role' (str)
Stored 'region' (str)
Stored 'initialized' (bool)


## Exercise 2: Studio environment
- Explore the notebook metadata file `/opt/ml/metadata/resource-metadata.json`
- Get the SageMaker `domain_id`
- Get the Studio user profile name
- Get the notebook image name

In [6]:
# Exercise 2 - write code here
NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json"

NOTEBOOK_METADATA_FILE = "/opt/ml/metadata/resource-metadata.json"
domain_id = None

if os.path.exists(NOTEBOOK_METADATA_FILE):
    with open(NOTEBOOK_METADATA_FILE, "rb") as f:
        domain_id = json.loads(f.read()).get('DomainId')
        print(f"SageMaker domain id: {domain_id}")

%store domain_id

SageMaker domain id: d-1qvmpqvqiuve
Stored 'domain_id' (str)


## Exercise 3: Data
- Download a dataset. You can use your own dataset and download it from your local storage or from internet
- Load data into a Pandas dataframe and view the data

In [13]:
!wget -P data/ -N https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip

--2023-06-18 07:21:16--  https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘data/bank-additional.zip’

bank-additional.zip     [  <=>               ] 434.15K  1.30MB/s    in 0.3s    

Last-modified header missing -- time-stamps turned off.
2023-06-18 07:21:17 (1.30 MB/s) - ‘data/bank-additional.zip’ saved [444572]



In [15]:
# Exercise 3 - write code here
# df_data = pd.read_csv()
    
import zipfile

with zipfile.ZipFile("data/bank-additional.zip", "r") as z:
    print("Unzipping bank+marketing...")
    z.extractall("data")

with zipfile.ZipFile("data/bank-additional.zip", "r") as z:
    print("Unzipping bank-additional...")
    z.extractall("data")

print("Done")

Unzipping bank+marketing...
Unzipping bank-additional...
Done


## Continue with the assignment 1
Navigate to the [assignment 1](01-assignment-local-development.ipynb) notebook.