# LightGBMの推論用カスタムコンテナを構築し、SageMakerによる推論の仕組みを深く理解する

## アジェンダ
* パターン3のカスタムコンテナを作成
    * 学習編で実施済み
* パターン2のカスタムコンテナを作成：コンテナファイルを外部から指定する
    * Inference-Toolkitと、MMSをインストールしたコンテナを作成
* XGBoostビルトインコンテナで、LightBGMの推論を実施する
    * requirements.txtを用意する
    * inference.pyをお作用に従って作る

# Lab : LightGBMのカスタムコンテナを通して、SageMakerの動作を理解する

LightGBMがインストールされたカスタムコンテナを構築し、SageMaker Trainingジョブで学習を行います。
カスタムコンテナの挙動を観察し、SageMakerの動作について理解を深めます。

ノートブックは20分程度で実行できます。

# 0.実行環境確認
本ノートブックは、SageMakerノートブックインスタンス上で動作確認しています。
* インスタンスタイプ：ml.t3.medium
* カーネル：conda_python3

## 0-1.pythonバージョン確認

In [248]:
#Pythonのバージョン情報
import sys
sys.version # 3.8.12

'3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) \n[GCC 9.4.0]'

In [2]:
# Pythonのバージョン確認 (システムコマンド使用）
!python -V # 3.8.12

Python 3.8.12


## 0-2.SageMakerSDKバージョン確認

Amazon SageMaker Python SDKは、Amazon SageMaker上で機械学習されたモデルをトレーニングおよびデプロイするためのオープンソースライブラリです。

このSDKを使用すると、一般的な深層学習フレームワーク、Amazonが提供するアルゴリズム、またはSageMaker互換のDockerイメージに組み込まれた独自のアルゴリズムを使ってモデルをトレーニングおよびデプロイすることができます。

* ドキュメント : https://sagemaker.readthedocs.io/en/stable/
* GitHub : https://github.com/aws/sagemaker-python-sdk

SageMakerSDK をインポートすると、バケットが作成されます。  
sagemaker-＜region＞-＜account＞

In [3]:
# SageMakerSDK のバージョン確認
import sagemaker
print('Current SageMaker Python SDK Version ={0}'.format(sagemaker.__version__)) # 2.110.0

Current SageMaker Python SDK Version =2.101.1


# 1.データ準備

学習、推論で利用するデータを準備します。

scikit-learn付属の、ボストン住宅価格データセットを利用します。(注：バージョン1.2から除外されます）  
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html

以下のスクリプトを参考にしています。

https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/lightgbm_bring_your_own_container_local_training_and_serving/lightgbm_bring_your_own_container_local_training_and_serving.py

In [4]:
import sklearn
sklearn.__version__ # 1.0.1

'1.0.1'

In [5]:
import pandas as pd
pd.__version__ # 1.3.4

'1.3.4'

## 1-1. データロード

In [6]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

In [7]:
data = load_boston() # 1.2でデータセットがなくすという警告が出ますが動作に影響ありません


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

## 1-2. 特徴量生成（Feature Engineering）
本ノートブックでは実施しません。そのままデータを利用します。

## 1-3. データ分割
学習用（train）、評価用（validation）、テスト用（test）にデータを分割します。  
train:val:test = 3(60%):1(20%):1(20%)に分割します。  

In [8]:
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=45)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=45)

trainX = pd.DataFrame(X_train, columns=data.feature_names)
trainX['target'] = y_train

valX = pd.DataFrame(X_val, columns=data.feature_names)
valX['target'] = y_val

testX = pd.DataFrame(X_test, columns=data.feature_names)

In [9]:
# 確認
trainX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.08829,12.5,7.87,0.0,0.524,6.012,66.6,5.5605,5.0,311.0,15.2,395.6,12.43,22.9
1,0.33983,22.0,5.86,0.0,0.431,6.108,34.9,8.0555,7.0,330.0,19.1,390.18,9.16,24.3
2,0.10469,40.0,6.41,1.0,0.447,7.267,49.0,4.7872,4.0,254.0,17.6,389.25,6.05,33.2
3,6.80117,0.0,18.1,0.0,0.713,6.081,84.4,2.7175,24.0,666.0,20.2,396.9,14.7,20.0
4,1.35472,0.0,8.14,0.0,0.538,6.072,100.0,4.175,4.0,307.0,21.0,376.73,13.04,14.5


In [10]:
# 確認
valX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.0315,95.0,1.47,0.0,0.403,6.975,15.3,7.6534,3.0,402.0,17.0,396.9,4.56,34.9
1,0.51183,0.0,6.2,0.0,0.507,7.358,71.6,4.148,8.0,307.0,17.4,390.07,4.73,31.5
2,19.6091,0.0,18.1,0.0,0.671,7.313,97.9,1.3163,24.0,666.0,20.2,396.9,13.44,15.0
3,0.95577,0.0,8.14,0.0,0.538,6.047,88.8,4.4534,4.0,307.0,21.0,306.38,17.28,14.8
4,0.09604,40.0,6.41,0.0,0.447,6.854,42.8,4.2673,4.0,254.0,17.6,396.9,2.98,32.0


In [11]:
# 確認
testX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.25387,0.0,6.91,0.0,0.448,5.399,95.3,5.87,3.0,233.0,17.9,396.9,30.81
1,0.01951,17.5,1.38,0.0,0.4161,7.104,59.5,9.2229,3.0,216.0,18.6,393.24,8.05
2,4.64689,0.0,18.1,0.0,0.614,6.98,67.6,2.5329,24.0,666.0,20.2,374.68,11.66
3,3.67367,0.0,18.1,0.0,0.583,6.312,51.9,3.9917,24.0,666.0,20.2,388.62,10.58
4,0.29819,0.0,6.2,0.0,0.504,7.686,17.0,3.3751,8.0,307.0,17.4,377.51,3.92


In [12]:
# 確認
y_test[0:5]

array([14.4, 33. , 29.8, 21.2, 46.7])

## 1-4.データ保存
ローカル、S3それぞれにデータを保存します。

### 1-4-1.ローカルへ保存

In [13]:
# ディレクトリ作成
from pathlib import Path

Path('./data/train').mkdir(parents=True, exist_ok=True)
Path('./data/valid').mkdir(parents=True, exist_ok=True)
Path('./data/test').mkdir(parents=True, exist_ok=True)

In [14]:
# ローカルへ保存
local_train = './data/train/boston_train.csv'
local_valid = './data/valid/boston_valid.csv'
local_test = './data/test/boston_test.csv'

trainX.to_csv(local_train, header=None, index=False)
valX.to_csv(local_valid, header=None, index=False)
testX.to_csv(local_test, header=None, index=False)

### 1-4-2.S3へ保存

一意のバケット作成のために、sgemaker.Session().default_bucket()を利用します。

https://sagemaker.readthedocs.io/en/stable/api/utility/session.html#sagemaker.session.Session

sagemaker-＜region＞-＜accoutid＞　を取得することができます。

In [15]:
bucket_name = sagemaker.Session().default_bucket()
region_name = sagemaker.Session().boto_region_name
account_id =  sagemaker.Session().account_id()

In [16]:
# 確認
print(bucket_name)
print(region_name)
print(account_id)

sagemaker-ap-northeast-1-805433377179
ap-northeast-1
805433377179


In [17]:
# バケット作成(SageMakerSDKのインポート時作成されています。他のバケット作成時に利用ください)
#import boto3

#s3_resource = boto3.resource('s3')
#s3_resource.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region_name})

In [18]:
# S3へ保存
train_s3 = sagemaker.s3.S3Uploader.upload('./data/train/boston_train.csv', f's3://{bucket_name}/demo_lightgbm/train')
valid_s3 = sagemaker.s3.S3Uploader.upload('./data/valid/boston_valid.csv', f's3://{bucket_name}/demo_lightgbm/valid')

# 2.LightGBMカスタムコンテナの構築


カスタムコンテナの作成には大きく分けて3つのパターンがあります。詳細は以下のブログを参考ください。

https://aws.amazon.com/jp/blogs/news/sagemaker-custom-containers-pattern-training/

SageMakerの動作を理解するためにパターン3のベースイメージ + カスタムレイヤー方式を採用します。

## 2-1. Dockerfileの確認

資材はこちらのノートブックを参考に準備しています。

https://github.com/aws-samples/amazon-sagemaker-local-mode/tree/main/lightgbm_bring_your_own_container_local_training_and_serving/container

In [20]:
!pygmentize ./container/Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m
[37m# This is a Python 2 image that uses the nginx, gunicorn, flask stack[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m

[34mFROM[39;49;00m [33mubuntu:16.04[39;49;00m

[34mMAINTAINER[39;49;00m[33m Amazon AI <sage-learner@amazon.com>[39;49;00m

[34mARG[39;49;00m [31mCONDA_DIR[39;49;00m=/opt/conda
[34mENV[39;49;00m PATH [31m$CONDA_DIR[39;49;00m/bin:[31m$PATH[39;49;00m

[34mRUN[39;49;00m apt-get update && [33m\[39;49;00m
    apt-get install -y --no-install-recommends [33m\[39;49;00m
        ca-certificates [33m\[39;49;00m
        cmake [33m\[39;49;00m
        build-essential [33m\[39;49;00m
        gcc [33m\[39;49;00m
        g++ [33m\[39;49;00m
        git [33m\[39;49;00m
        nginx [33m\[39;49;00m
        wget && [33m\[39;49;00m
    # python environment
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && [33m\

SageMakerのモデルデプロイは、SageMaker SDKでは、deploy()で実行します。

その際に、SageMakerは

docker run < image > server

を実行します。

この場合、以下の serve スクリプトが実行されます。

In [34]:
!pygmentize -l py ./container/lightgbm_regression/serve

[37m#!/usr/bin/env python[39;49;00m

[37m# This file implements the scoring service shell. You don't necessarily need to modify it for various[39;49;00m
[37m# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until[39;49;00m
[37m# gunicorn exits.[39;49;00m
[37m#[39;49;00m
[37m# The flask server is specified to be the app object in wsgi.py[39;49;00m
[37m#[39;49;00m
[37m# We set the following parameters:[39;49;00m
[37m#[39;49;00m
[37m# Parameter                Environment Variable              Default Value[39;49;00m
[37m# ---------                --------------------              -------------[39;49;00m
[37m# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores[39;49;00m
[37m# timeout                  MODEL_SERVER_TIMEOUT              60 seconds[39;49;00m

[34mfrom[39;49;00m [04m[36m__future__[39;49;00m [34mimport[39;49;00m print_function
[34mimport[39;49;00m [04m[36mmultiproc

mainからstart_server()を実行します。

start_server()では、
* nginxの起動（webサーバの役割）
* gunicornの起動（application サーバの役割）
    * gunicornの起動コマンド引数に'wsgi:app'とあるように、wsgiモジュールwsgi.pyの、appアプリケーションを読み込みます。

が行われます。
nginxの起動の際に、nginx.confが読み込まれます。

In [29]:
!pygmentize ./container/lightgbm_regression/nginx.conf

[34mworker_processes[39;49;00m [34m1[39;49;00m;
[34mdaemon[39;49;00m [31moff[39;49;00m; [37m# Prevent forking[39;49;00m


[34mpid[39;49;00m [33m/tmp/nginx.pid[39;49;00m;
[34merror_log[39;49;00m [33m/var/log/nginx/error.log[39;49;00m;

[34mevents[39;49;00m {
  [37m# defaults[39;49;00m
}

[34mhttp[39;49;00m {
  [34minclude[39;49;00m [33m/etc/nginx/mime.types[39;49;00m;
  [34mdefault_type[39;49;00m [33mapplication/octet-stream[39;49;00m;
  [34maccess_log[39;49;00m [33m/var/log/nginx/access.log[39;49;00m [33mcombined[39;49;00m;

  [34mupstream[39;49;00m [33mgunicorn[39;49;00m {
    [34mserver[39;49;00m [33munix:/tmp/gunicorn.sock[39;49;00m;
  }

  [34mserver[39;49;00m {
    [34mlisten[39;49;00m [34m8080[39;49;00m [33mdeferred[39;49;00m;
    [34mclient_max_body_size[39;49;00m [34m5m[39;49;00m;

    [34mkeepalive_timeout[39;49;00m [34m5[39;49;00m;
    [34mproxy_read_timeout[39;49;00m [33m1200s[39;49;00m;

    [34mlocation[39

gunicornへのアプリケーションのキック様に使われるファイル。

gunicornの起動コマンド引数に'wsgi:app'とあるように、wsgiモジュールwsgi.pyの、appアプリケーションを読み込みます。

In [28]:
!pygmentize ./container/lightgbm_regression/wsgi.py

[34mimport[39;49;00m [04m[36mpredictor[39;49;00m [34mas[39;49;00m [04m[36mmyapp[39;49;00m

[37m# This is just a simple wrapper for gunicorn to find your app.[39;49;00m
[37m# If you want to change the algorithm file, simply change "predictor" above to the[39;49;00m
[37m# new file.[39;49;00m

app = myapp.app


predictor.pyの、appを読み込みます。

In [35]:
!pygmentize ./container/lightgbm_regression/predictor.py

[37m# This is the file that implements a flask server to do inferences. It's the file that you will modify to[39;49;00m
[37m# implement the scoring for your own algorithm.[39;49;00m

[34mfrom[39;49;00m [04m[36m__future__[39;49;00m [34mimport[39;49;00m print_function

[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msignal[39;49;00m
[34mimport[39;49;00m [04m[36mtraceback[39;49;00m
[34mimport[39;49;00m [04m[36mio[39;49;00m
[34mimport[39;49;00m [04m[36mflask[39;49;00m

[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mlightgbm[39;49;00m [34mas[39;49;00m [04m[36mlgb[39;49;00m

prefix = [33m'[39;49;00m[33m/opt/ml/[39;49;00m[33m'[39;49;00m
model_path = os.path.join(prefix, [33m'[39;49;00m[33mmodel[

Flaskアプリケーションを構成していることがわかりました。では、これらをビルドしていきましょう。

## 2-2. dockerイメージの build & push

ビルド&pushには7分ほどかかります。

In [36]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-lightgbm-regression

cd container

chmod +x lightgbm_regression/train
chmod +x lightgbm_regression/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to ap-northeast-1 if none defined)
region=$(aws configure get region)
region=${region:-ap-northeast-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  29.18kB
Step 1/10 : FROM ubuntu:16.04
16.04: Pulling from library/ubuntu
58690f9b18fc: Pulling fs layer
b51569e7c507: Pulling fs layer
da8ef40b9eca: Pulling fs layer
fb15d46c38dc: Pulling fs layer
fb15d46c38dc: Waiting
b51569e7c507: Verifying Checksum
b51569e7c507: Download complete
da8ef40b9eca: Verifying Checksum
da8ef40b9eca: Download complete
58690f9b18fc: Verifying Checksum
58690f9b18fc: Download complete
fb15d46c38dc: Verifying Checksum
fb15d46c38dc: Download complete
58690f9b18fc: Pull complete
b51569e7c507: Pull complete
da8ef40b9eca: Pull complete
fb15d46c38dc: Pull complete
Digest: sha256:91bd29a464fdabfcf44e29e1f2a5f213c6dfa750b6290e40dd6998ac79da3c41
Status: Downloaded newer image for ubuntu:16.04
 ---> b6f507652425
Step 2/10 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Running in 5402ad15ebc5
Removing intermediate container 5402ad15ebc5
 ---> e3d6eaf0faae
Step 3/10 : ARG CONDA_DIR=/opt/conda
 ---> Running in

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## 2-3. 学習前設定
ECRでpushしたコンテナのURIを確認

AWSコンソールでECRに移動し、作成したコンテナがあることを確認します。

image URIを取得し、以下にはりつけます。

In [37]:
# 確認
print(bucket_name)
print(region_name)
print(account_id)

sagemaker-ap-northeast-1-805433377179
ap-northeast-1
805433377179


In [38]:
image_uri = f'{account_id}.dkr.ecr.{region_name}.amazonaws.com/sagemaker-lightgbm-regression'

In [39]:
# 確認
image_uri

'805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression'

In [40]:
hyperparameters={'boosting_type': 'gbdt',
            'objective': 'regression',
            'num_leaves': 31,
            'learning_rate': 0.05,
            'feature_fraction': 0.9,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': 0}

## 2-4.ローカル学習の実行
ECRからビルドしたイメージを持ってきて、ローカルのdockerでビルドして、実行する

In [41]:
# ローカルファイルのパスを設定（S3パス指定も可）
train_location = 'file://'+local_train
valid_location = 'file://'+local_valid

print(train_location)
print(valid_location)

file://./data/train/boston_train.csv
file://./data/valid/boston_valid.csv


In [42]:
from sagemaker.estimator import Estimator

In [43]:
from sagemaker import get_execution_role

role = get_execution_role()

In [44]:
# 確認
role

'arn:aws:iam::805433377179:role/service-role/AmazonSageMaker-ExecutionRole-20220807T102095'

SageMakerのEstimatorを作成します。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html

In [45]:
local_lightgbm = Estimator(
    image_uri,
    role,
    instance_count=1,
    instance_type="local",
    hyperparameters=hyperparameters
    )

fitメソッドで学習ジョブを発行します

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.fit

In [46]:
local_lightgbm.fit({'train':train_location, 'validation': valid_location})

Creating 3jpw638uk8-algo-1-4wiz8 ... 
Creating 3jpw638uk8-algo-1-4wiz8 ... done
Attaching to 3jpw638uk8-algo-1-4wiz8
[36m3jpw638uk8-algo-1-4wiz8 |[0m Starting the training.
[36m3jpw638uk8-algo-1-4wiz8 |[0m Reading hyperparameters data: /opt/ml/input/config/hyperparameters.json
[36m3jpw638uk8-algo-1-4wiz8 |[0m hyperparameters_data: {'boosting_type': 'gbdt', 'objective': 'regression', 'num_leaves': '31', 'learning_rate': '0.05', 'feature_fraction': '0.9', 'bagging_fraction': '0.8', 'bagging_freq': '5', 'verbose': '0'}
[36m3jpw638uk8-algo-1-4wiz8 |[0m Found train files: ['/opt/ml/input/data/train/boston_train.csv']
[36m3jpw638uk8-algo-1-4wiz8 |[0m Found validation files: ['/opt/ml/input/data/validation/boston_valid.csv']
[36m3jpw638uk8-algo-1-4wiz8 |[0m building training and validation datasets
[36m3jpw638uk8-algo-1-4wiz8 |[0m Starting training...
[36m3jpw638uk8-algo-1-4wiz8 |[0m You can set `force_row_wise=true` to remove the overhead.
[36m3jpw638uk8-algo-1-4wiz8 |[0m A

ローカルモードの学習結果は

Amazon S3
Buckets
sagemaker-us-west-2-805433377179
sagemaker-lightgbm-regression-2022-10-03-06-17-32-054/

に出力されます。


## 2-5.ローカルデプロイ

serializer : インプットデータの形式を指定します。
https://sagemaker.readthedocs.io/en/stable/v2.html

In [47]:
local_predictor = local_lightgbm.deploy(1, 'local', serializer=sagemaker.serializers.CSVSerializer()) 

Attaching to w330hghdwg-algo-1-ngebc
[36mw330hghdwg-algo-1-ngebc |[0m Starting the inference server with 2 workers.
[36mw330hghdwg-algo-1-ngebc |[0m [2022-10-19 05:45:09 +0000] [13] [INFO] Starting gunicorn 20.1.0
[36mw330hghdwg-algo-1-ngebc |[0m [2022-10-19 05:45:09 +0000] [13] [INFO] Listening at: unix:/tmp/gunicorn.sock (13)
[36mw330hghdwg-algo-1-ngebc |[0m [2022-10-19 05:45:09 +0000] [13] [INFO] Using worker: gevent
[36mw330hghdwg-algo-1-ngebc |[0m [2022-10-19 05:45:09 +0000] [15] [INFO] Booting worker with pid: 15
[36mw330hghdwg-algo-1-ngebc |[0m [2022-10-19 05:45:09 +0000] [16] [INFO] Booting worker with pid: 16
![36mw330hghdwg-algo-1-ngebc |[0m 172.18.0.1 - - [19/Oct/2022:05:45:13 +0000] "GET /ping HTTP/1.1" 200 1 "-" "python-urllib3/1.26.8"


In [48]:
!docker ps

CONTAINER ID   IMAGE                                                                             COMMAND   CREATED         STATUS         PORTS                                       NAMES
64eedfc0a8e3   805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression   "serve"   4 seconds ago   Up 4 seconds   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   w330hghdwg-algo-1-ngebc


In [49]:
# 起動中のコンテナを停止する場合
#!docker stop XXXXXXXXXXX #XXXXXXXXXXXは CONtAINER ID
#!docker ps

## 2-6.ローカルエンドポイントで推論実施

In [50]:
# 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = local_predictor.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

[36mw330hghdwg-algo-1-ngebc |[0m Invoked with 102 records
19.95642073217597
27.844891841022335
23.747437427003455
21.961517177305176
33.70952263893306
16.546899933876215
20.7577247308279
21.58941351302627
28.44096446328559
21.573610198594977
16.520022349295115
18.56239893242527
33.70952263893306
21.66404760045202
18.839854556333133
20.524517944865078
23.512192914502315
19.720552829648888
14.831841119971708
25.48273874904075
24.232639474441545
21.624005932843115
24.961489794296718
31.737194191676068
21.634052928440624
28.40721160777621
21.408363849719503
14.831841119971708
22.218594550645975
21.174456098551236
21.78791955089051
14.831841119971708
29.996695633096042
22.44097524661187
33.83316205414468
26.41403196992683
33.70952263893306
17.366188662166092
27.56686070285819
30.785697489113854
19.36938873496206
20.70626548555591
17.759853567831996
27.888269821752413
20.521395163186774
14.831841119971708
24.776417537973362
24.965857100129327
19.649289821764185
21.026797620813866
33.709522

## 2-7.学習ジョブを発行
次は、ローカルモードではなく、
同じカスタムコンテナで、学習ジョブを実行します。

In [51]:
# 確認
print(train_s3)
print(valid_s3)

s3://sagemaker-ap-northeast-1-805433377179/demo_lightgbm/train/boston_train.csv
s3://sagemaker-ap-northeast-1-805433377179/demo_lightgbm/valid/boston_valid.csv


In [52]:
est_lightgbm = Estimator(
    image_uri,
    role,
    instance_count=1,
    instance_type="ml.m4.2xlarge", # インスタンスタイプを指定
    hyperparameters=hyperparameters)

In [53]:
est_lightgbm.fit({'train':train_s3, 'validation': valid_s3})

2022-10-19 05:45:14 Starting - Starting the training job...
2022-10-19 05:45:38 Starting - Preparing the instances for trainingProfilerReport-1666158314: InProgress
.........
2022-10-19 05:47:03 Downloading - Downloading input data...
2022-10-19 05:47:39 Training - Downloading the training image...
2022-10-19 05:48:15 Uploading - Uploading generated training model
2022-10-19 05:48:15 Completed - Training job completed
[34mStarting the training.[0m
[34mReading hyperparameters data: /opt/ml/input/config/hyperparameters.json[0m
[34mhyperparameters_data: {'bagging_fraction': '0.8', 'bagging_freq': '5', 'boosting_type': 'gbdt', 'feature_fraction': '0.9', 'learning_rate': '0.05', 'num_leaves': '31', 'objective': 'regression', 'verbose': '0'}[0m
[34mFound train files: ['/opt/ml/input/data/train/boston_train.csv'][0m
[34mFound validation files: ['/opt/ml/input/data/validation/boston_valid.csv'][0m
[34mbuilding training and validation datasets[0m
[34mStarting training...[0m
[34mY

学習には3分ほど時間がかかります。

課金されるのは75秒ほどです。

## 2-8.エンドポイントにデプロイ

デプロイすると、
SageMaker は docker run <image> serveを実行します。

    
デプロイには3分ほどかかります。

In [54]:
from sagemaker.predictor import csv_serializer

deployメソッドで、推論エンドポイントをデプロイします。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.deploy

In [55]:
predictor = est_lightgbm.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer, wait=True)

----------!

In [56]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor.predict(payload).decode('utf-8')
print(predicted)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


19.95642073217597
27.844891841022335
23.747437427003455
21.961517177305176
33.70952263893306
16.546899933876215
20.7577247308279
21.58941351302627
28.44096446328559
21.573610198594977
16.520022349295115
18.56239893242527
33.70952263893306
21.66404760045202
18.839854556333133
20.524517944865078
23.512192914502315
19.720552829648888
14.831841119971708
25.48273874904075
24.232639474441545
21.624005932843115
24.961489794296718
31.737194191676068
21.634052928440624
28.40721160777621
21.408363849719503
14.831841119971708
22.218594550645975
21.174456098551236
21.78791955089051
14.831841119971708
29.996695633096042
22.44097524661187
33.83316205414468
26.41403196992683
33.70952263893306
17.366188662166092
27.56686070285819
30.785697489113854
19.36938873496206
20.70626548555591
17.759853567831996
27.888269821752413
20.521395163186774
14.831841119971708
24.776417537973362
24.965857100129327
19.649289821764185
21.026797620813866
33.70952263893306
22.770867837558004
25.12436361101226
32.04499227317

# 3.Inference-toolkitの導入。実行ファイルを外部から指定する。Appサーバーまわりの制御はSageMakerの仕組みを用いる。

自身で nginx, gunicorn, Flask の仕組みを構築することは、メンテナンスコストが発生します。
Inference-toolkitを導入することで、すでに用意された環境を利用することができます。

* Inference-Toolkitを導入する
* ビルトインコンテナ + requirements.txt, inference.pyを利用する

本セクションでは、Inference-Toolkitを導入します。

https://github.com/aws/sagemaker-inference-toolkit

サンプルコード

Complete Example
Here is a complete example demonstrating usage of the SageMaker Inference Toolkit in your own container for deployment to a multi-model endpoint.

https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/multi_model_bring_your_own

In [262]:
!pygmentize ./container_sminftoolkit/Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m
[37m# This is a Python 2 image that uses the nginx, gunicorn, flask stack[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m

[34mFROM[39;49;00m [33mubuntu:16.04[39;49;00m

[34mMAINTAINER[39;49;00m[33m Amazon AI <sage-learner@amazon.com>[39;49;00m

[34mARG[39;49;00m [31mCONDA_DIR[39;49;00m=/opt/conda
[34mENV[39;49;00m PATH [31m$CONDA_DIR[39;49;00m/bin:[31m$PATH[39;49;00m

[34mRUN[39;49;00m apt-get update && [33m\[39;49;00m
    apt-get install -y --no-install-recommends [33m\[39;49;00m
        ca-certificates [33m\[39;49;00m
        cmake [33m\[39;49;00m
        build-essential [33m\[39;49;00m
        gcc [33m\[39;49;00m
        g++ [33m\[39;49;00m
        git [33m\[39;49;00m
        [37m#nginx \[39;49;00m
        wget && [33m\[39;49;00m
    # python environment
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && [33m\

これは、以下に該当する。

3.Implement a serving entrypoint, which starts the model server.


https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py

start_model_server()は、引数指定しない場合、

DEFAULT_HANDLER_SERVICE = default_handler_service.__name__

を指定。これは、inference-toolkitのハンドラサービスである。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_handler_service.py



ハンドラサービスが、Transformer()を作り、そのなかで、推論ハンドラが作られている。

DefaultHandlerService -> Transformer -> DefaultInferenceHandler

https://github.com/aws/sagemaker-inference-toolkit/blob/3774c1a0fb4408cfa95333b75d6e30a376bffa52/src/sagemaker_inference/transformer.py


In [320]:
!pygmentize ./container_sminftoolkit/dockerd-entrypoint.py

[34mfrom[39;49;00m [04m[36msagemaker_inference[39;49;00m [34mimport[39;49;00m model_server

[37m#model_server.start_model_server(handler_service="/home/model-server/model_handler.py")[39;49;00m
model_server.start_model_server()


start_model_server()は引数指定しない場合、
inference-toolkitのTransform()が作られる。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py

DEFAULT_HANDLER_SERVICE = default_handler_service.__name__

より、

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_handler_service.py

__init__にて、Trransformer()が実行

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/transformer.py

Transform()において、inference-toolkitのDefaultInferenceHandlerが利用される。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_inference_handler.py

よって、このdockerd-entrypoint.pyが最小構成となる。

## 解説
ハンドラサービスと推論ハンドラがある。

ハンドラサービスは、以下に該当する。

2.Implement a handler service that is executed by the model server.

モデルの推論ハンドラは、以下に該当する。

1.Implement an inference handler, which is responsible for loading the model and providing input, predict, and output functions. 


2.のハンドラサービスから、1.の推論ハンドラがロードされる。推論ハンドラはinference-toolkitで用意したものを使ってもよい。

In [321]:
#!pygmentize ./container_sminftoolkit/model_handler.py ### 最小構成には不要

In [295]:
%%sh

# The name of our algorithm
#algorithm_name=demo-sagemaker-multimodel
algorithm_name=demo-sagemaker-inftoolkit

#cd container
cd container_sminftoolkit

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:f8ebd71982d092267df14cafda9a43dd0a6d6191fda06998882231edd9027170
The push refers to repository [805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit]
097af2d7de3f: Preparing
126753c8c75e: Preparing
e29f1ec0574d: Preparing
4f6cd5801fa9: Preparing
ad7f025dd140: Preparing
e0513fab2b61: Preparing
1251204ef8fc: Preparing
47ef83afae74: Preparing
df54c846128d: Preparing
be96a3f634de: Preparing
e0513fab2b61: Waiting
1251204ef8fc: Waiting
47ef83afae74: Waiting
df54c846128d: Waiting
be96a3f634de: Waiting
126753c8c75e: Pushed
097af2d7de3f: Pushed
4f6cd5801fa9: Pushed
1251204ef8fc: Layer already exists
47ef83afae74: Layer already exists
df54c846128d: Layer already exists
be96a3f634de: Layer already exists
ad7f025dd140: Pushed
e29f1ec0574d: Pushed
e0513fab2b61: Pushed
latest: digest: sha256:ca86ab9f586b7295f6f0a6169b697a0eeaea94ad1f36e1df9d7b851e7785ca58 size: 2407


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



* dockerd-entrypoint.py が実行され、サーバーの起動を試みる。
    * サーバー起動の際に必要はハンドラーは、odel-handler.pyに記載されている。
    


In [296]:
!docker image ls

REPOSITORY                                                                        TAG             IMAGE ID       CREATED              SIZE
805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit       latest          f8ebd71982d0   About a minute ago   2.14GB
demo-sagemaker-inftoolkit                                                         latest          f8ebd71982d0   About a minute ago   2.14GB
805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit       <none>          f3bd019d4558   35 minutes ago       2.05GB
<none>                                                                            <none>          f475fc39cd10   23 hours ago         1.12GB
805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression   latest          4e9ccbf2b14e   25 hours ago         2.25GB
sagemaker-lightgbm-regression                                                     latest          4e9ccbf2b14e   25 hours ago         2.25GB
preprod-xgboost

## ローカルにエンドポイントをデプロイ
モデルは前のセクションで作成したLGBMモデル

* ソースも指定する
* LGBMはrequirements.txtでインストールする

In [279]:
container_uri = '805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit:latest'

In [277]:
est_xgb.image_uri

'354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1'

In [278]:
est_xgb.model_data

's3://sagemaker-ap-northeast-1-805433377179/sagemaker-xgboost-2022-10-17-00-34-15-146/model.tar.gz'

In [309]:
from sagemaker.predictor import RealTimePredictor

lgb_model = sagemaker.model.Model(#est_xgb.image_uri, # XGBoostビルトインコンテナのURI
                                  container_uri,
                                  model_data=est_xgb.model_data, # ローカル学習で生成したモデルファイル
                                  role=role,
                                  predictor_cls=RealTimePredictor, # 推論するための識別子を指定
                                  source_dir='./src_builtin_container_serve', # requirements.txt必要な場合
                                  entry_point='inference.py' # source_dirを指定している場合、.pyファイルを指定する。
                                  #entry_point='./src_builtin_container_serve/inference.py'
                                 )

In [314]:
predictor_lgb_model = lgb_model.deploy(initial_instance_count=1,
                                       instance_type='local', 
                                       serializer=csv_serializer, ### string形式でSageMakerに渡す（認識してもらう）
                                      )

Attaching to 6s1g8pjoto-algo-1-87ph1
[36m6s1g8pjoto-algo-1-87ph1 |[0m Collecting lightgbm
[36m6s1g8pjoto-algo-1-87ph1 |[0m   Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m65.6 MB/s[0m eta [36m0:00:00[0m31m?[0m eta [36m-:--:--[0m
[36m6s1g8pjoto-algo-1-87ph1 |[0m Installing collected packages: lightgbm
[36m6s1g8pjoto-algo-1-87ph1 |[0m Successfully installed lightgbm-3.3.3
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:43:34,500 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36m6s1g8pjoto-algo-1-87ph1 |[0m MMS Home: /opt/conda/lib/python3.9/site-packages
[36m6s1g8pjoto-algo-1-87ph1 |[0m Current directory: /
[36m6s1g8pjoto-algo-1-87ph1 |[0m Temp directory: /tmp
[36m6s1g8pjoto-algo-1-87ph1 |[0m Number of GPUs: 0
[36m6s1g8pjoto-algo-1-87ph1 |[0m Number of CPUs: 2
[36m6s1g8pjoto-algo-1-87ph1 |[0m Max heap size: 859 M
[36m6s1g8pjoto-algo-1-87ph1 |[0m Python 

The class RealTimePredictor has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [315]:
!docker ps

CONTAINER ID   IMAGE                                                                                COMMAND                  CREATED         STATUS         PORTS                                       NAMES
f380dc891702   805433377179.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit:latest   "python /usr/local/b…"   2 minutes ago   Up 2 minutes   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   6s1g8pjoto-algo-1-87ph1


In [312]:
!docker stop 8d42da364deb

[36mb1nb8tpxh2-algo-1-p4uek exited with code 0
8d42da364deb
[0mAborting on container exit...


In [313]:
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


## 推論実施

In [316]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor_lgb_model.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,308 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - text/csv
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,312 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - <class 'str'>
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,313 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.25387,0.0,6.91,0.0,0.448,5.399,95.3,5.87,3.0,233.0,17.9,396.9,30.81
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,313 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.01951,17.5,1.38,0.0,0.4161,7.104,59.5,9.2229,3.0,216.0,18.6,393.24,8.05
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,314 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 4.64689,0.0,18.1,0.0,0.614,6.98,67.6,2.5329,24.0,666.0,20.2,374.68,11.66
[36m6s1g8pjoto-algo-1-87ph1 |[0m 2022-10-20T06:45:48,315 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle 

In [301]:
print(predicted)

[19.95642073217597, 27.844891841022335, 23.747437427003455, 21.961517177305176, 33.70952263893306, 16.546899933876215, 20.7577247308279, 21.58941351302627, 28.44096446328559, 21.573610198594977, 16.520022349295115, 18.56239893242527, 33.70952263893306, 21.66404760045202, 18.839854556333133, 20.524517944865078, 23.512192914502315, 19.720552829648888, 14.831841119971708, 25.48273874904075, 24.232639474441545, 21.624005932843115, 24.961489794296718, 31.737194191676068, 21.634052928440624, 28.40721160777621, 21.408363849719503, 14.831841119971708, 22.218594550645975, 21.174456098551236, 21.78791955089051, 14.831841119971708, 29.996695633096042, 22.44097524661187, 33.83316205414468, 26.41403196992683, 33.70952263893306, 17.366188662166092, 27.56686070285819, 30.785697489113854, 19.36938873496206, 20.70626548555591, 17.759853567831996, 27.888269821752413, 20.521395163186774, 14.831841119971708, 24.776417537973362, 24.965857100129327, 19.649289821764185, 21.026797620813866, 33.70952263893306,

In [317]:
type(predicted)

str

In [None]:
%%sh

# The name of our algorithm
algorithm_name=sm-inf-toolkit

#cd container
cd container_sminftoolkit ### 変更点

#chmod +x lightgbm_regression/train ### 変更点
chmod +x lightgbm_regression/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to ap-northeast-1 if none defined)
region=$(aws configure get region)
region=${region:-ap-northeast-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}


# 3. 実行ファイルを外部から指定する

「2.LightGBMカスタムコンテナの構築」ではカスタムコンテナ内に学習起動スクリプトtrainを配置しましたが、
ソースコードを修正するごとにコンテナを作り替える必要があります。

保守性を上げるには、コンテナ（環境）とソースコードを分けた方がいい場合もあります。
以下では外部からスクリプトファイルを指定する方法を紹介します。

## 3-0.SageMaker Training Toolkitとは
外部からスクリプトを指定するためには、SageMaker Training Toolkitを導入します。

https://github.com/aws/sagemaker-training-toolkit


trainコマンドが  
/opt/conca/bin/train  
にインストールされます。  


先程のdockerfileに追記します。
資材からは、trainを除外しておきます。trainを含んだままの場合、
docker run <image> train
を実行したときに、カレントディレクトリのtrainスクリプトが実行されてしまい、training toolkitが導入した　trainコマンドが実行できないためです。

## 3-1. Dockerfile確認

In [None]:
!pygmentize ./container_smtrtoolkit/Dockerfile

## 3-2. build & push

In [None]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-toolkit

#cd container
cd container_smtrtoolkit ### 変更点

#chmod +x lightgbm_regression/train ### 変更点
chmod +x lightgbm_regression/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to ap-northeast-1 if none defined)
region=$(aws configure get region)
region=${region:-ap-northeast-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}


## 3-3.学習(ローカル)

In [None]:
image_uri_toolkit = f'{account_id}.dkr.ecr.{region_name}.amazonaws.com/sagemaker-toolkit'

In [None]:
# 確認
image_uri_toolkit

In [None]:
est_lightgbm_toolkit = Estimator(
    image_uri_toolkit,
    role,
    instance_count=1,
    instance_type="local",
    hyperparameters=hyperparameters,
    entry_point='./src/train_practice.py'
    )

In [None]:
est_lightgbm_toolkit.fit({'train':train_s3, 'validation': valid_s3})

ローカルモードで学習することができました。別のスクリプトを指定してみましょう。

In [None]:
est_lightgbm_toolkit2 = Estimator(
    image_uri_toolkit,
    role,
    instance_count=1,
    instance_type="local",
    hyperparameters=hyperparameters,
    #entry_point='./src/train_practice.py'
    entry_point='./src/train_practice.sh' ### シェルスクリプトに変更
    )

In [None]:
est_lightgbm_toolkit2.fit({'train':train_s3, 'validation': valid_s3})

コンテナ外部から任意のファイルを実行することが確認できました。

## （optional）4. カスタムコンテナを使わず、built-inコンテナのrequirement.txtにlightgbmを記載して実行する



過去バージョン（1.3-3, 1.2-2, 1.2-1, 1.0-1)はこちら

https://github.com/aws/sagemaker-xgboost-container/releases


In [None]:
container_uri = sagemaker.image_uris.retrieve("xgboost", region_name, "1.5-1")

In [None]:
container_uri

In [None]:
est_xgb = Estimator(
    container_uri, # xgboostのbuilt-inコンテナ
    role,
    instance_count=1,
    instance_type="local",
    hyperparameters=hyperparameters,
    source_dir='./src_builtin_container',
    entry_point='train_practice.py'
    )

In [None]:
[1,2,3,4,5]

In [None]:
est_xgb.fit({'train':train_s3, 'validation': valid_s3})

## 4-2. 推論実施

### 4-2-1.デプロイ

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.deploy


デプロイの際に、ソースコードを指定するにはどうしたらいいのか？

https://www.youtube.com/watch?v=sngNd79GpmE&t=596s


ポイント：あらためて、Estimatorを定義する必要がある。

In [None]:
# 確認
print(est_xgb)
print(est_xgb.base_job_name)
print(est_xgb.checkpoint_local_path)
print(est_xgb.checkpoint_s3_uri)
print(est_xgb.code_channel_name)
print(est_xgb.code_location)
print(est_xgb.code_uri)
print(est_xgb.collection_configs)
print(est_xgb.CONTAINER_CODE_CHANNEL_SOURCEDIR_PATH)
print(est_xgb.container_log_level)
print(est_xgb.debugger_hook_config)
print(est_xgb.debugger_rule_configs)
print(est_xgb.debugger_rules)
print(est_xgb.dependencies)
print(est_xgb.deploy_instance_type)
print(est_xgb.disable_profiler)
print(est_xgb.enable_sagemaker_metrics)
print(est_xgb.encrypt_inter_container_traffic)
print(est_xgb.entry_point)
print(est_xgb.environment)
print(est_xgb.git_config)
print(est_xgb.image_uri)
print(est_xgb.input_mode)
print(est_xgb.instance_count)
print(est_xgb.model_uri)
print(est_xgb.model_data)


### serve用のファイルは、.py かつ、作法に従う必要がある。

MMSは.pyを扱うように設計されているため。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator

In [None]:
# デプロイ
predictor_xgb = est_xgb.deploy(1, 'local', serializer=csv_serializer, wait=True)

In [None]:
!docker ps

## エラー

RuntimeError: Model /opt/ml/model/model.tar.gz cannot be loaded:


6o0805unb3-algo-1-k8ugv | [2022-10-16 02:25:50 +0000] [19] [ERROR] Exception in worker process  
6o0805unb3-algo-1-k8ugv | Traceback (most recent call last):  
6o0805unb3-algo-1-k8ugv |   File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/serve_utils.py", line 175, in get_loaded_booster  
6o0805unb3-algo-1-k8ugv |     booster = pkl.load(open(full_model_path, "rb"))  
6o0805unb3-algo-1-k8ugv | _pickle.UnpicklingError: invalid load key, '\x1f'.  

In [None]:
# モデルをDLして確認する
print(est_xgb.model_data)

In [None]:
!aws s3 cp $est_xgb.model_data .

In [None]:
!tar zxvf model.tar.gz

## 原因
lightgbm-regression-model.txtなので、pklでは読み込めない。

モデルロードする関数を上書きするには？？（そもそもこれがやりたい）

https://github.com/aws/sagemaker-xgboost-container/blob/master/docker/1.5-1/final/Dockerfile.cpu

# Set SageMaker entrypoints
ENV SAGEMAKER_TRAINING_MODULE sagemaker_xgboost_container.training:main  
ENV SAGEMAKER_SERVING_MODULE sagemaker_xgboost_container.serving:main  


まず、serving.main()が実行される

https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L143

serving_env = env.ServingEnv()

で、環境変数にパラメータが読み込まれる


L147

user_module = modules.import_module(serving_env.module_dir, serving_env.module_name)

ここで、ユーザーのモジュールが読み込まれる。

L18をみると、sagemaker_containers.beta.framework.modulesがモジュールのようだ。

from sagemaker_containers.beta.framework import (
    encoders,
    env,
    modules,
    server,
    transformer,
    worker,
)

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/beta/framework/__init__.py

sagemaker_containers.beta.frameworkはアーカイブされている。

現在はこちら。initをみると

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py



L258で、imortしている。

module = importlib.import_module(name)

def import_module(uri, name=DEFAULT_MODULE_NAME, cache=None):  # type: (str, str, bool) -> module

とあるように、DEFAULT_MODULE_NAMEが読み込まれるようだ





https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L148,149: L147で読み込んだユーザーモジュールに上書きする

user_module_transformer = _user_module_transformer(user_module)  
user_module_transformer.initialize()  


L116にあるように、model_fnなどのユーザー関数に上書きされる。


def _user_module_transformer(user_module):  
    model_fn = getattr(user_module, "model_fn", default_model_fn)  
    input_fn = getattr(user_module, "input_fn", None)  
    predict_fn = getattr(user_module, "predict_fn", None)  
    output_fn = getattr(user_module, "output_fn", None)  
    transform_fn = getattr(user_module, "transform_fn", None)  

## model_fnを定義したファイルが、importされているか？

いま、そもそも環境変数に正しく情報渡せていない気がする。


https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py

L237より、

def import_module(uri, name=DEFAULT_MODULE_NAME, cache=None):  # type: (str, str, bool) -> module

第二引数に指定する必要がある。

これを呼ぶのは、


https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L147

user_module = modules.import_module(serving_env.module_dir, serving_env.module_name)

serving_env.module_name である。指定できているのか？


L143より

serving_env = env.ServingEnv()

これは、以下のファイル。

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_env.py

L862

class ServingEnv(_Env):



https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_env.py

L329には、

class _Env(_mapping.MappingMixin):


module_name = os.environ.get(_params.USER_PROGRAM_ENV, None)

とある。


L595

TrainingEnvには、

        # override base class attributes  
        if self._module_name is None:  
            self._module_name = str(sagemaker_hyperparameters.get(_params.USER_PROGRAM_PARAM, None))  
        self._user_entry_point = self._user_entry_point or sagemaker_hyperparameters.get(  
            _params.USER_PROGRAM_PARAM  
        )  
        
        
        

## USER_PROGRAM_ENVに設定できればいい？


https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/parameters.py


L18

USER_PROGRAM_ENV = "SAGEMAKER_PROGRAM"  # type: str

SAGEMAKER_PROGRAMに設定できればいいようだ。

ビルトインコンテナにはどうすれば設定できるのだろうか？？

以下のYouTubeだと、boto3でEnvironment引数を使っている。

https://youtu.be/sngNd79GpmE?t=780

# デバッグのために、dockerイメージをプルして、中をみてみる。

ビルトインコンテナの中身をみるには、どうすればいいのか？

XGBoostの場合は、ローカルでbuildしていくようだ。

https://github.com/aws/sagemaker-xgboost-container

In [None]:
container_uri

In [None]:
pwd

In [None]:
cd 

In [None]:
# Example

# CPU
!docker build -t xgboost-container-base:1.5-1-cpu-py3 -f /home/ec2-user/SageMaker/sagemaker-xgboost-container/docker/1.5-1/base/Dockerfile.cpu .

In [None]:
# Example

# CPU
!docker build -t preprod-xgboost-container:1.5-1-cpu-py3 -f /home/ec2-user/SageMaker/sagemaker-xgboost-container/docker/1.5-1/final/Dockerfile.cpu .

# コンテナの中に入って確認する方法
コンソールを立ち上げて、以下の流れで実行する


ディレクトリ移動
 $ cd sagemaker-xgboost-container/
 
baseコンテナをビルド
 $ docker build -t xgboost-container-base:1.5-1-cpu-py3 -f docker/1.5-1/base/Dockerfile.cpu .

finalコンテナをビルド
 $ docker build -t preprod-xgboost-container:1.5-1-cpu-py3 -f docker/1.5-1/final/Dockerfile.cpu .

構築されたイメージを確認
$ docker image ls

中に入って確認（コンテナのタグもつけて指定すること）
$ docker run -it preprod-xgboost-container:1.5-1-cpu-py3 /bin/bash

$ docker run -it 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1 /bin/bash  

   

# コンテナの中に入り、 command serveを実行してみる

In [None]:
aws ecr get-login-password --region ${your_region} | docker login --username AWS --password-stdin ${your_aws_account_id}.dkr.ecr.${your_region}.amazonaws.com

In [None]:
!docker pull $container_uri

In [None]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-xgb-builtin

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to ap-northeast-1 if none defined)
region=$(aws configure get region)
region=${region:-ap-northeast-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}



In [None]:
!docker pull $container_uri

In [None]:
container_uri

In [None]:
!docker pull 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1

In [None]:
!docker login https://354813040037.dkr.ecr.ap-northeast-1.amazonaws.com

# モデル利用ならうまくいくのではないか？-> OK


YouTubeのリンク先ソースより

https://github.com/aws-samples/aws-ml-jp/blob/main/sagemaker/sagemaker-inference/inference-tutorial/1_sklearn.ipynb



https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model

In [None]:
est_xgb.model_data

In [None]:
role

In [None]:
est_xgb.image_uri

In [None]:
from sagemaker.predictor import RealTimePredictor

lgb_model = sagemaker.model.Model(est_xgb.image_uri, # XGBoostビルトインコンテナのURI
                                  #model_data=None, 
                                  model_data=est_xgb.model_data, # ローカル学習で生成したモデルファイル
                                  #role=None, 
                                  role=role,
                                  predictor_cls=RealTimePredictor, # 推論するための識別子を指定
                                  #env=None, 
                                  ###env={SAGEMAKER_PROGRAM:'inference.py'} # 環境変数
                                  #name=None, 
                                  #vpc_config=None, 
                                  #sagemaker_session=None, 
                                  #enable_network_isolation=False, 
                                  #model_kms_key=None, 
                                  #image_config=None, 
                                  #source_dir=None, 
                                  source_dir='./src_builtin_container_serve', # requirements.txt必要な場合
                                  #code_location=None, 
                                  #entry_point=None, 
                                  entry_point='inference.py' # source_dirを指定している場合、.pyファイルを指定する。
                                  #entry_point='./src_builtin_container_serve/inference.py'
                                  #container_log_level=20, 
                                  #dependencies=None, 
                                  #git_config=None
                                 )

デプロイ

https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy

In [None]:
predictor_lgb_model = lgb_model.deploy(#initial_instance_count=None, 
                                       initial_instance_count=1,
                                       instance_type='local', 
                                       serializer=csv_serializer, ### string形式でSageMakerに渡す（認識してもらう）
                                       #deserializer=None, 
                                       #accelerator_type=None, 
                                       #endpoint_name=None, 
                                       #tags=None, 
                                       #kms_key=None, 
                                       #wait=True, 
                                       #data_capture_config=None, 
                                       #async_inference_config=None, 
                                       #serverless_inference_config=None
                                      )

In [None]:
!docker ps

In [None]:
!docker stop c09fe0b16f36

In [None]:
!docker ps

In [None]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor_lgb_model.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

In [None]:
tmp_l = ['0.25387', '0.0', '6.91', '0.0', '0.448', '5.399', '95.3', '5.87', '3.0', '233.0', '17.9', '396.9', '30.81\n0.01951', '17.5', '1.38', '0.0', '0.4161', '7.104', '59.5', '9.2229', '3.0', '216.0', '18.6', '393.24', '8.05\n4.64689', '0.0', '18.1', '0.0', '0.614', '6.98', '67.6', '2.5329', '24.0', '666.0', '20.2', '374.68', '11.66\n3.67367', '0.0', '18.1', '0.0', '0.583', '6.312', '51.9', '3.9917', '24.0', '666.0', '20.2', '388.62', '10.58\n0.29819', '0.0', '6.2', '0.0', '0.504', '7.686', '17.0', '3.3751', '8.0', '307.0', '17.4', '377.51', '3.92\n8.15174', '0.0', '18.1', '0.0', '0.7', '5.39', '98.9', '1.7281', '24.0', '666.0', '20.2', '396.9', '20.85\n6.65492', '0.0', '18.1', '0.0', '0.713', '6.317', '83.0', '2.7344', '24.0', '666.0', '20.2', '396.9', '13.99']

In [None]:
len(tmp_l)

In [None]:
hoge = [[1,2,3],[1,2,3]]

In [None]:
len(hoge[0])

In [None]:
a_str = '0.25387,0.0,6.91,0.0,0.448,5.399,95.3,5.87,3.0,233.0,17.9,396.9,30.81\n0.01951,17.5,1.38,0.0,0.4161,7.104,59.5,9.2229,3.0,216.0,18.6,393.24,8.05'





In [None]:
#transformed_data = a_str.splitlines()
#transformed_data = a_str.splitlines().split(',')
transformed_data = [i.split(',') for i in a_str.splitlines()]

In [None]:
transformed_data

In [None]:
llist = [i.split(',') for i in transformed_data]

In [None]:
llist

In [None]:
# 改めて、Estimatorを定義する
est_xgb_serve = Estimator(
    container_uri, # xgboostのbuilt-inコンテナ
    role,
    model_data=est_xgb.model_data, ### 学習で保存されたモデルのURI
    instance_count=1,
    instance_type="local",
    #hyperparameters=hyperparameters,
    source_dir='./src_builtin_container_serve',
    entry_point='serve.py' 
    )

In [None]:
# デプロイ
predictor_xgb = est_xgb_serve.deploy(1, 'local', serializer=csv_serializer, wait=True)

ValueError: Estimator is not associated with a training job

# エラー　trainingjobとの紐付け

https://stackoverflow.com/questions/63340328/how-to-define-a-sagemaker-estimator-object-using-a-pre-trained-model-and-then-de

In [None]:
est_xgb_serve2 = est_xgb_serve.attach(TrainingJobName)

In [None]:
!docker ps

In [None]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor_xgb.predict(payload).decode('utf-8')
print(predicted)

# END of Containts ===============

# 5.後片付け
予期せぬ課金を防ぐために、以下のリソースを削除します。

* SageMaker 推論エンドポイント
* ECR
* S3
* SageMakerノートブックインスタンス

# 参考
* SageMaker のtrainingジョブを理解する
    * https://github.com/aws-samples/aws-ml-jp/tree/main/sagemaker/sagemaker-traning/tutorial
* SageMaker-Pytorth training Toolkit
    * https://github.com/aws/sagemaker-pytorch-training-toolkit/
* SageMaker-Pytorch Inference Toolkit
    * https://github.com/aws/sagemaker-pytorch-inference-toolkit
* SageMaker Inference Toolkit
    * https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-toolkits.html
    * https://github.com/aws/sagemaker-inference-toolkit