In [4]:
!python -m pip install boto3 sagemaker



In [9]:
# S3 prefix
prefix = "sharma101"

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = "arn:aws:iam::377780585900:role/Sagemaker_build_role"  # get_execution_role()

## Create the session

The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [2]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training

When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using some the classic [Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which we have included. 

We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket. 

In [None]:
WORK_DIRECTORY = "datasets"

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)

## Create an estimator and fit the model

In order to use SageMaker to fit our algorithm, we'll create an `Estimator` that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:

* The __container name__. This is constructed as in the shell commands above.
* The __role__. As defined above.
* The __instance count__ which is the number of machines to use for training.
* The __instance type__ which is the type of machine to use for training.
* The __output path__ determines where the model artifact will be written.
* The __session__ is the SageMaker session object that we defined above.

Then we use fit() on the estimator to train against the data that we uploaded above.

In [21]:
data_location = "s3://sagemaker-us-east-1-377780585900/sharma101/newdentdata.zip"

In [27]:
account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name
image =  "{}.dkr.ecr.{}.amazonaws.com/damagedetection:latest".format(account, region)

tree = sage.estimator.Estimator(
    image,
    role,
    1,
    "ml.c4.2xlarge",
    output_path="s3://{}/output".format(sess.default_bucket()),
    sagemaker_session=sess,
)

tree.fit(data_location)

INFO:sagemaker:Creating training-job with name: damageten-2024-06-05-17-26-42-524


2024-06-05 17:26:47 Starting - Starting the training job...
2024-06-05 17:27:03 Starting - Preparing the instances for training...
2024-06-05 17:27:37 Downloading - Downloading input data...
2024-06-05 17:28:22 Downloading - Downloading the training image.........
2024-06-05 17:29:42 Training - Training image download completed. Training in progress..[34mExtracted and deleted zip files.[0m
[34mFiles in /opt/ml/input/data/training/: ['data.yaml', 'README.roboflow.txt', 'README.dataset.txt'][0m
[34mStarting the training.[0m
[34mcpu[0m
[34m#033[34m#033[1mengine/trainer: #033[0mtask=detect, mode=train, model=yolov8n.pt, data=/opt/ml/input/data/training/data.yaml, epochs=2, time=None, patience=100, batch=16, imgsz=614, save=True, save_period=-1, cache=False, device=cpu, workers=8, project=/opt/ml/model, name=train, exist_ok=True, pretrained=/opt/ml/model/train/weights/best.pt, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, clos