# LightGBMの推論用カスタムコンテナを構築し、SageMakerによる推論の仕組みを深く理解する

このノートブックでは、LightGBMがインストールされたカスタムコンテナ構築し、SageMaker Trainingジョブで学習後、推論を行います。
カスタムコンテナの挙動を観察し、SageMakerの推論動作について理解を深めます。

ノートブックはXX分程度で実行できます。

# 0.実行環境確認
本ノートブックは、SageMakerノートブックインスタンス上で動作確認しています。
* インスタンスタイプ：ml.t3.medium
* カーネル：conda_python3

## 0-1.pythonバージョン確認

In [2]:
#Pythonのバージョン情報
import sys
sys.version # 3.8.12

'3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) \n[GCC 9.4.0]'

In [3]:
# Pythonのバージョン確認 (システムコマンド使用）
!python -V # 3.8.12

Python 3.8.12


## 0-2.SageMakerSDKバージョン確認

Amazon SageMaker Python SDKは、Amazon SageMaker上で機械学習されたモデルをトレーニングおよびデプロイするためのオープンソースライブラリです。

このSDKを使用すると、一般的な深層学習フレームワーク、Amazonが提供するアルゴリズム、またはSageMaker互換のDockerイメージに組み込まれた独自のアルゴリズムを使ってモデルをトレーニングおよびデプロイすることができます。

* ドキュメント : https://sagemaker.readthedocs.io/en/stable/
* GitHub : https://github.com/aws/sagemaker-python-sdk

SageMakerSDK をインポートすると、バケットが作成されます。  
sagemaker-＜region＞-＜account＞

In [4]:
# SageMakerSDK のバージョン確認
import sagemaker
print('Current SageMaker Python SDK Version ={0}'.format(sagemaker.__version__)) # 2.110.0

Current SageMaker Python SDK Version =2.112.2


# 1.データ準備

学習、推論で利用するデータを準備します。

scikit-learn付属の、ボストン住宅価格データセットを利用します。(注：バージョン1.2から除外されます）  
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html

以下のスクリプトを参考にしています。

https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/lightgbm_bring_your_own_container_local_training_and_serving/lightgbm_bring_your_own_container_local_training_and_serving.py

In [5]:
import sklearn
sklearn.__version__ # 1.0.1

'1.0.1'

In [6]:
import pandas as pd
pd.__version__ # 1.3.4

'1.3.4'

## 1-1. データロード

In [7]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

In [8]:
data = load_boston() # 1.2でデータセットがなくすという警告が出ますが動作に影響ありません


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

## 1-2. 特徴量生成（Feature Engineering）
本ノートブックでは実施しません。そのままデータを利用します。

## 1-3. データ分割
学習用（train）、評価用（validation）、テスト用（test）にデータを分割します。  
train:val:test = 3(60%):1(20%):1(20%)に分割します。  

In [9]:
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=45)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=45)

trainX = pd.DataFrame(X_train, columns=data.feature_names)
trainX['target'] = y_train

valX = pd.DataFrame(X_val, columns=data.feature_names)
valX['target'] = y_val

testX = pd.DataFrame(X_test, columns=data.feature_names)

In [73]:
# 確認
print(trainX.shape)
trainX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.08829,12.5,7.87,0.0,0.524,6.012,66.6,5.5605,5.0,311.0,15.2,395.6,12.43,22.9
1,0.33983,22.0,5.86,0.0,0.431,6.108,34.9,8.0555,7.0,330.0,19.1,390.18,9.16,24.3
2,0.10469,40.0,6.41,1.0,0.447,7.267,49.0,4.7872,4.0,254.0,17.6,389.25,6.05,33.2
3,6.80117,0.0,18.1,0.0,0.713,6.081,84.4,2.7175,24.0,666.0,20.2,396.9,14.7,20.0
4,1.35472,0.0,8.14,0.0,0.538,6.072,100.0,4.175,4.0,307.0,21.0,376.73,13.04,14.5


In [11]:
# 確認
print(valX.shape)
valX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.0315,95.0,1.47,0.0,0.403,6.975,15.3,7.6534,3.0,402.0,17.0,396.9,4.56,34.9
1,0.51183,0.0,6.2,0.0,0.507,7.358,71.6,4.148,8.0,307.0,17.4,390.07,4.73,31.5
2,19.6091,0.0,18.1,0.0,0.671,7.313,97.9,1.3163,24.0,666.0,20.2,396.9,13.44,15.0
3,0.95577,0.0,8.14,0.0,0.538,6.047,88.8,4.4534,4.0,307.0,21.0,306.38,17.28,14.8
4,0.09604,40.0,6.41,0.0,0.447,6.854,42.8,4.2673,4.0,254.0,17.6,396.9,2.98,32.0


In [12]:
# 確認
print(testX.shape)
testX.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.25387,0.0,6.91,0.0,0.448,5.399,95.3,5.87,3.0,233.0,17.9,396.9,30.81
1,0.01951,17.5,1.38,0.0,0.4161,7.104,59.5,9.2229,3.0,216.0,18.6,393.24,8.05
2,4.64689,0.0,18.1,0.0,0.614,6.98,67.6,2.5329,24.0,666.0,20.2,374.68,11.66
3,3.67367,0.0,18.1,0.0,0.583,6.312,51.9,3.9917,24.0,666.0,20.2,388.62,10.58
4,0.29819,0.0,6.2,0.0,0.504,7.686,17.0,3.3751,8.0,307.0,17.4,377.51,3.92


In [13]:
# 確認
y_test[0:5]

array([14.4, 33. , 29.8, 21.2, 46.7])

## 1-4.データ保存
ローカル、S3それぞれにデータを保存します。

### 1-4-1.ローカルへ保存

In [14]:
# ディレクトリ作成
from pathlib import Path

Path('./data/train').mkdir(parents=True, exist_ok=True)
Path('./data/valid').mkdir(parents=True, exist_ok=True)
Path('./data/test').mkdir(parents=True, exist_ok=True)

In [15]:
# ローカルへ保存
local_train = './data/train/boston_train.csv'
local_valid = './data/valid/boston_valid.csv'
local_test = './data/test/boston_test.csv'

trainX.to_csv(local_train, header=None, index=False)
valX.to_csv(local_valid, header=None, index=False)
testX.to_csv(local_test, header=None, index=False)

### 1-4-2.S3へ保存

一意のバケット作成のために、sgemaker.Session().default_bucket()を利用します。

https://sagemaker.readthedocs.io/en/stable/api/utility/session.html#sagemaker.session.Session

sagemaker-＜region＞-＜accoutid＞　を取得することができます。

In [16]:
bucket_name = sagemaker.Session().default_bucket()
region_name = sagemaker.Session().boto_region_name
account_id =  sagemaker.Session().account_id()

In [17]:
# 確認
print(bucket_name)
print(region_name)
print(account_id)

sagemaker-ap-northeast-1-390731033655
ap-northeast-1
390731033655


In [18]:
# バケット作成(SageMakerSDKのインポート時作成されています。他のバケット作成時に利用ください)
#import boto3

#s3_resource = boto3.resource('s3')
#s3_resource.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region_name})

In [19]:
# S3へ保存
train_s3 = sagemaker.s3.S3Uploader.upload('./data/train/boston_train.csv', f's3://{bucket_name}/demo_lightgbm/train')
valid_s3 = sagemaker.s3.S3Uploader.upload('./data/valid/boston_valid.csv', f's3://{bucket_name}/demo_lightgbm/valid')

In [74]:
# 確認:格納したS3のURIが返されています
print(train_s3)
print(valid_s3)

s3://sagemaker-ap-northeast-1-390731033655/demo_lightgbm/train/boston_train.csv
s3://sagemaker-ap-northeast-1-390731033655/demo_lightgbm/valid/boston_valid.csv


# 2.LightGBMカスタムコンテナの構築


学習用のカスタムコンテナの作成には大きく分けて3つのパターンがあります。詳細は以下のブログを参考ください。

https://aws.amazon.com/jp/blogs/news/sagemaker-custom-containers-pattern-training/

まずはSageMakerの動作を理解するためにベースイメージ(ubuntu:16.04) + カスタムレイヤー方式を採用します。

## 2-1. Dockerfileの確認

資材はこちらのノートブックを参考に準備しています。

https://github.com/aws-samples/amazon-sagemaker-local-mode/tree/main/lightgbm_bring_your_own_container_local_training_and_serving/container

まずは、Dockerfileを確認します。

In [20]:
!pygmentize ./container/Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m
[37m# This is a Python 2 image that uses the nginx, gunicorn, flask stack[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m

[34mFROM[39;49;00m [33mubuntu:16.04[39;49;00m

[34mMAINTAINER[39;49;00m[33m Amazon AI <sage-learner@amazon.com>[39;49;00m

[34mARG[39;49;00m [31mCONDA_DIR[39;49;00m=/opt/conda
[34mENV[39;49;00m PATH [31m$CONDA_DIR[39;49;00m/bin:[31m$PATH[39;49;00m

[34mRUN[39;49;00m apt-get update && [33m\[39;49;00m
    apt-get install -y --no-install-recommends [33m\[39;49;00m
        ca-certificates [33m\[39;49;00m
        cmake [33m\[39;49;00m
        build-essential [33m\[39;49;00m
        gcc [33m\[39;49;00m
        g++ [33m\[39;49;00m
        git [33m\[39;49;00m
        nginx [33m\[39;49;00m
        wget && [33m\[39;49;00m
    # python environment
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && [33m\

### 解説：推論エンドポイント構築時のSageMakerの動作
SageMakerの推論エンドポイントのデプロイは、SageMaker SDKでは、deploy()メソッドで実行します。

https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html

その際に、SageMakerは以下のコマンドを実行します。

docker run < Docker image > server

今回のカスタムコンテナでは、 /opt/program に配置した serve スクリプトが実行されます。

serveスクリプトを確認してみましょう。

In [75]:
!pygmentize -l py ./container/lightgbm_regression/serve

[37m#!/usr/bin/env python[39;49;00m

[37m# This file implements the scoring service shell. You don't necessarily need to modify it for various[39;49;00m
[37m# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until[39;49;00m
[37m# gunicorn exits.[39;49;00m
[37m#[39;49;00m
[37m# The flask server is specified to be the app object in wsgi.py[39;49;00m
[37m#[39;49;00m
[37m# We set the following parameters:[39;49;00m
[37m#[39;49;00m
[37m# Parameter                Environment Variable              Default Value[39;49;00m
[37m# ---------                --------------------              -------------[39;49;00m
[37m# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores[39;49;00m
[37m# timeout                  MODEL_SERVER_TIMEOUT              60 seconds[39;49;00m

[34mfrom[39;49;00m [04m[36m__future__[39;49;00m [34mimport[39;49;00m print_function
[34mimport[39;49;00m [04m[36mmultiproc

末尾の start_server() を実行しており、start_server()では以下が行われます。

* nginxの起動（Webサーバ/リバースプロキシの役割）
    * nginx.confを読み込みます。
* gunicornの起動（Applicationサーバの役割）
    * gunicornの起動コマンド引数に'wsgi:app'とあるように、wsgiモジュールwsgi.pyの、appアプリケーションを読み込みます。

nginx.confを確認します。

In [76]:
!pygmentize ./container/lightgbm_regression/nginx.conf

[34mworker_processes[39;49;00m [34m1[39;49;00m;
[34mdaemon[39;49;00m [31moff[39;49;00m; [37m# Prevent forking[39;49;00m


[34mpid[39;49;00m [33m/tmp/nginx.pid[39;49;00m;
[34merror_log[39;49;00m [33m/var/log/nginx/error.log[39;49;00m;

[34mevents[39;49;00m {
  [37m# defaults[39;49;00m
}

[34mhttp[39;49;00m {
  [34minclude[39;49;00m [33m/etc/nginx/mime.types[39;49;00m;
  [34mdefault_type[39;49;00m [33mapplication/octet-stream[39;49;00m;
  [34maccess_log[39;49;00m [33m/var/log/nginx/access.log[39;49;00m [33mcombined[39;49;00m;

  [34mupstream[39;49;00m [33mgunicorn[39;49;00m {
    [34mserver[39;49;00m [33munix:/tmp/gunicorn.sock[39;49;00m;
  }

  [34mserver[39;49;00m {
    [34mlisten[39;49;00m [34m8080[39;49;00m [33mdeferred[39;49;00m;
    [34mclient_max_body_size[39;49;00m [34m5m[39;49;00m;

    [34mkeepalive_timeout[39;49;00m [34m5[39;49;00m;
    [34mproxy_read_timeout[39;49;00m [33m1200s[39;49;00m;

    [34mlocation[39

SageMakerから受け取った /ping と /invocations リクエストを上記で設定したgunicornに渡します。
以下に記載があるように、ポート8080を利用する必要があります。

https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html

How Containers Serve Requests  
Containers need to implement a web server that responds to /invocations and /ping on port 8080.

次に、gunicornへのアプリケーションのキック用に使われるファイル wsgi.pyを確認します。

predictor.py の、appを読み込んでいることがわかります。

In [77]:
!pygmentize ./container/lightgbm_regression/wsgi.py

[34mimport[39;49;00m [04m[36mpredictor[39;49;00m [34mas[39;49;00m [04m[36mmyapp[39;49;00m

[37m# This is just a simple wrapper for gunicorn to find your app.[39;49;00m
[37m# If you want to change the algorithm file, simply change "predictor" above to the[39;49;00m
[37m# new file.[39;49;00m

app = myapp.app


predictor.py を確認します。

flaskフレームワークを用いて、/ping, /invocations に対する処理を実装していることがわかります。

In [24]:
!pygmentize ./container/lightgbm_regression/predictor.py

[37m# This is the file that implements a flask server to do inferences. It's the file that you will modify to[39;49;00m
[37m# implement the scoring for your own algorithm.[39;49;00m

[34mfrom[39;49;00m [04m[36m__future__[39;49;00m [34mimport[39;49;00m print_function

[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mpickle[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[34mimport[39;49;00m [04m[36msignal[39;49;00m
[34mimport[39;49;00m [04m[36mtraceback[39;49;00m
[34mimport[39;49;00m [04m[36mio[39;49;00m
[34mimport[39;49;00m [04m[36mflask[39;49;00m

[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mlightgbm[39;49;00m [34mas[39;49;00m [04m[36mlgb[39;49;00m

prefix = [33m'[39;49;00m[33m/opt/ml/[39;49;00m[33m'[39;49;00m
model_path = os.path.join(prefix, [33m'[39;49;00m[33mmodel[

## 2-2. dockerイメージの build & push
上記で確認したカスタムコンテナをビルドします。

ビルド&pushには7分ほどかかります。

In [25]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-lightgbm-regression

cd container

chmod +x lightgbm_regression/train
chmod +x lightgbm_regression/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to ap-northeast-1 if none defined)
region=$(aws configure get region)
region=${region:-ap-northeast-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon   25.6kB
Step 1/10 : FROM ubuntu:16.04
 ---> b6f507652425
Step 2/10 : MAINTAINER Amazon AI <sage-learner@amazon.com>
 ---> Using cache
 ---> dd27a6b39b57
Step 3/10 : ARG CONDA_DIR=/opt/conda
 ---> Using cache
 ---> 3961c7f42f57
Step 4/10 : ENV PATH $CONDA_DIR/bin:$PATH
 ---> Using cache
 ---> ccc4779db86e
Step 5/10 : RUN apt-get update &&     apt-get install -y --no-install-recommends         ca-certificates         cmake         build-essential         gcc         g++         git         nginx         wget &&     wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh &&     /bin/bash Miniconda3-latest-Linux-x86_64.sh -f -b -p $CONDA_DIR &&     export PATH="$CONDA_DIR/bin:$PATH" &&     conda config --set always_yes yes --set changeps1 no &&     conda install -q -y numpy scipy scikit-learn pandas flask gevent gunicorn &&     git clone --recursive --branch stable --depth 1 https://github.com/Microsoft/LightGBM && 

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## 2-3. 学習前設定
AWSコンソールでECRに移動し、作成したコンテナがあることを確認します。

image URIを設定します。

In [78]:
# 確認
print(bucket_name)
print(region_name)
print(account_id)

sagemaker-ap-northeast-1-390731033655
ap-northeast-1
390731033655


In [79]:
# imageURLの設定
image_uri = f'{account_id}.dkr.ecr.{region_name}.amazonaws.com/sagemaker-lightgbm-regression'

In [80]:
# 確認
image_uri

'390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression'

In [81]:
# 学習で指定するLightGBMのハイパーパラメータを設定します。
hyperparameters={'boosting_type': 'gbdt',
            'objective': 'regression',
            'num_leaves': 31,
            'learning_rate': 0.05,
            'feature_fraction': 0.9,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': 0}

## 2-4.ローカル学習の実行
まずはローカルモードでモデルの学習を行います。
ローカルモードを利用することで、コンテナイメージのダウンロードや展開の手間を省くことができるため、コードのデバッグを行う場合に便利です。

ECRからビルドしたイメージを持ってきて、ローカルのdockerでビルドして、実行する

In [83]:
# ローカルファイルのパスを設定（S3パス指定も可）
train_location = 'file://'+local_train
valid_location = 'file://'+local_valid

print(train_location)
print(valid_location)

file://./data/train/boston_train.csv
file://./data/valid/boston_valid.csv


In [84]:
from sagemaker.estimator import Estimator

In [85]:
from sagemaker import get_execution_role

role = get_execution_role()

In [86]:
# 確認
role

'arn:aws:iam::390731033655:role/TeamRole'

SageMakerのEstimatorを作成します。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html

In [87]:
local_lightgbm = Estimator(
    image_uri,
    role,
    instance_count=1,
    instance_type="local",
    hyperparameters=hyperparameters
    )

fitメソッドで学習ジョブを発行します

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.fit

In [88]:
local_lightgbm.fit({'train':train_location, 'validation': valid_location})

Creating wwx3yxyitg-algo-1-wftc2 ... 
Creating wwx3yxyitg-algo-1-wftc2 ... done
Attaching to wwx3yxyitg-algo-1-wftc2
[36mwwx3yxyitg-algo-1-wftc2 |[0m Starting the training.
[36mwwx3yxyitg-algo-1-wftc2 |[0m Reading hyperparameters data: /opt/ml/input/config/hyperparameters.json
[36mwwx3yxyitg-algo-1-wftc2 |[0m hyperparameters_data: {'boosting_type': 'gbdt', 'objective': 'regression', 'num_leaves': '31', 'learning_rate': '0.05', 'feature_fraction': '0.9', 'bagging_fraction': '0.8', 'bagging_freq': '5', 'verbose': '0'}
[36mwwx3yxyitg-algo-1-wftc2 |[0m Found train files: ['/opt/ml/input/data/train/boston_train.csv']
[36mwwx3yxyitg-algo-1-wftc2 |[0m Found validation files: ['/opt/ml/input/data/validation/boston_valid.csv']
[36mwwx3yxyitg-algo-1-wftc2 |[0m building training and validation datasets
[36mwwx3yxyitg-algo-1-wftc2 |[0m Starting training...
[36mwwx3yxyitg-algo-1-wftc2 |[0m You can set `force_row_wise=true` to remove the overhead.
[36mwwx3yxyitg-algo-1-wftc2 |[0m A

ローカルモードの学習結果についてもS3に保管されます。

s3://sagemaker-< リージョン名 >-< アカウントID >/sagemaker-lightgbm-regression-yyyy-MM-dd-HH-mm-ss-fff/

* model.tar.gz
* output.tar.gz

Trainingジョブの詳細について学びたい場合は、BlackBeltの解説もご参照ください。
https://www.youtube.com/watch?v=byEawTm4O4E

## 2-5.ローカルデプロイ

serializer : インプットデータの形式を指定します。
https://sagemaker.readthedocs.io/en/stable/v2.html

In [89]:
# 事前準備：全コンテナ停止
!docker stop $(docker ps -q)

[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25 05:42:07 +0000] [27] [INFO] Handling signal: term
[36mt8bj7uf19d-algo-1-vmubs exited with code 0
4c284ba0ab8b
[0mAborting on container exit...


In [90]:
# 確認
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


起動中のコンテナイメージがないことを確認し、ローカルデプロイを行います。

In [91]:
local_predictor = local_lightgbm.deploy(1, 'local', serializer=sagemaker.serializers.CSVSerializer()) 

Attaching to czk28afd58-algo-1-tu1u5
[36mczk28afd58-algo-1-tu1u5 |[0m Starting the inference server with 2 workers.
[36mczk28afd58-algo-1-tu1u5 |[0m [2022-10-25 05:43:02 +0000] [10] [INFO] Starting gunicorn 20.1.0
[36mczk28afd58-algo-1-tu1u5 |[0m [2022-10-25 05:43:02 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)
[36mczk28afd58-algo-1-tu1u5 |[0m [2022-10-25 05:43:02 +0000] [10] [INFO] Using worker: gevent
[36mczk28afd58-algo-1-tu1u5 |[0m [2022-10-25 05:43:02 +0000] [12] [INFO] Booting worker with pid: 12
[36mczk28afd58-algo-1-tu1u5 |[0m [2022-10-25 05:43:02 +0000] [13] [INFO] Booting worker with pid: 13
![36mczk28afd58-algo-1-tu1u5 |[0m 172.18.0.1 - - [25/Oct/2022:05:43:04 +0000] "GET /ping HTTP/1.1" 200 1 "-" "python-urllib3/1.26.8"


In [92]:
# 確認
!docker ps

CONTAINER ID   IMAGE                                                                             COMMAND   CREATED          STATUS          PORTS                                       NAMES
b39ccd0a7006   390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression   "serve"   13 seconds ago   Up 12 seconds   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   czk28afd58-algo-1-tu1u5


ローカルにコンテナイメージが展開されていることが確認できました。

## 2-6.ローカルエンドポイントで推論実施

In [93]:
# 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = local_predictor.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

[36mczk28afd58-algo-1-tu1u5 |[0m Invoked with 102 records

19.95642073217597
27.844891841022335
23.747437427003455
21.961517177305176
33.70952263893306
16.546899933876215
20.7577247308279
21.58941351302627
28.44096446328559
21.573610198594977
16.520022349295115
18.56239893242527
33.70952263893306
21.66404760045202
18.839854556333133
20.524517944865078
23.512192914502315
19.720552829648888
14.831841119971708
25.48273874904075
24.232639474441545
21.624005932843115
24.961489794296718
31.737194191676068
21.634052928440624
28.40721160777621
21.408363849719503
14.831841119971708
22.218594550645975
21.174456098551236
21.78791955089051
14.831841119971708
29.996695633096042
22.44097524661187
33.83316205414468
26.41403196992683
33.70952263893306
17.366188662166092
27.56686070285819
30.785697489113854
19.36938873496206
20.70626548555591
17.759853567831996
27.888269821752413
20.521395163186774
14.831841119971708
24.776417537973362
24.965857100129327
19.649289821764185
21.026797620813866
33.70952

## 2-7.学習ジョブを発行
次は、ローカルモードではなく、
同じカスタムコンテナで、学習ジョブを実行します。

Estimatorの引数instance_typeにインスタンスタイプを指定することで、学習ジョブが発行されます。

In [95]:
# 確認
print(train_s3)
print(valid_s3)

s3://sagemaker-ap-northeast-1-390731033655/demo_lightgbm/train/boston_train.csv
s3://sagemaker-ap-northeast-1-390731033655/demo_lightgbm/valid/boston_valid.csv


In [96]:
est_lightgbm = Estimator(
    image_uri,
    role,
    instance_count=1,
    instance_type="ml.m4.2xlarge", # インスタンスタイプを指定
    hyperparameters=hyperparameters)

In [97]:
est_lightgbm.fit({'train':train_s3, 'validation': valid_s3})

2022-10-25 05:44:45 Starting - Starting the training job...
2022-10-25 05:45:09 Starting - Preparing the instances for trainingProfilerReport-1666676685: InProgress
.........
2022-10-25 05:46:32 Downloading - Downloading input data...
2022-10-25 05:47:10 Training - Downloading the training image..[34mStarting the training.[0m
[34mReading hyperparameters data: /opt/ml/input/config/hyperparameters.json[0m
[34mhyperparameters_data: {'bagging_fraction': '0.8', 'bagging_freq': '5', 'boosting_type': 'gbdt', 'feature_fraction': '0.9', 'learning_rate': '0.05', 'num_leaves': '31', 'objective': 'regression', 'verbose': '0'}[0m
[34mFound train files: ['/opt/ml/input/data/train/boston_train.csv'][0m
[34mFound validation files: ['/opt/ml/input/data/validation/boston_valid.csv'][0m
[34mbuilding training and validation datasets[0m
[34mStarting training...[0m
[34mYou can set `force_col_wise=true` to remove the overhead.[0m
[34m[1]#011valid_0's l2: 84.4849[0m
[34mTraining until valid

学習には3分ほど時間がかかります。

課金されるのは75秒ほどです。

## 2-8.エンドポイントにデプロイ

デプロイすると、
SageMaker は docker run <image> serveを実行します。

    
デプロイには3分ほどかかります。

In [43]:
from sagemaker.predictor import csv_serializer

deployメソッドで、推論エンドポイントをデプロイします。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.deploy

In [44]:
predictor = est_lightgbm.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer, wait=True)

-----!

In [45]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor.predict(payload).decode('utf-8')
print(predicted)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


19.95642073217597
27.844891841022335
23.747437427003455
21.961517177305176
33.70952263893306
16.546899933876215
20.7577247308279
21.58941351302627
28.44096446328559
21.573610198594977
16.520022349295115
18.56239893242527
33.70952263893306
21.66404760045202
18.839854556333133
20.524517944865078
23.512192914502315
19.720552829648888
14.831841119971708
25.48273874904075
24.232639474441545
21.624005932843115
24.961489794296718
31.737194191676068
21.634052928440624
28.40721160777621
21.408363849719503
14.831841119971708
22.218594550645975
21.174456098551236
21.78791955089051
14.831841119971708
29.996695633096042
22.44097524661187
33.83316205414468
26.41403196992683
33.70952263893306
17.366188662166092
27.56686070285819
30.785697489113854
19.36938873496206
20.70626548555591
17.759853567831996
27.888269821752413
20.521395163186774
14.831841119971708
24.776417537973362
24.965857100129327
19.649289821764185
21.026797620813866
33.70952263893306
22.770867837558004
25.12436361101226
32.04499227317

# 3.推論コードの外部指定、フロントエンドはSageMakerが準備した仕組みを利用する。
推論コードを外部から指定するために、SageMaker Inference Toolkitを導入します。

https://github.com/aws/sagemaker-inference-toolkit

また、前セクションnginx, gunicorn, flaskを用いて実装したモデルサービングの仕組みはSageMaker側で準備されたものを流用します。
これは、MMS(Multi Model Server)というライブラリを導入します。

https://github.com/awslabs/multi-model-server/tree/master/docker

* SageMaker-Inference-Toolkitと、Multi Model Serverを導入する
* ビルトインコンテナ + requirements.txt, inference.pyを利用する

MMSの利用については、以下のサンプルコードも参照ください。

https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/multi_model_bring_your_own


## 3-1.Dockerfileの確認

まずは、利用するDockerfileを確認します。
MMSに必要なJavaをインストールし、MMSとinference-toolkitをインストールしています。

lightgbmはrequirements.txtでインストールを試みるため、Dockerfileには記載していません。（記載することも可能）

In [98]:
!pygmentize ./container_sminftoolkit/Dockerfile

[34mFROM[39;49;00m [33mpython:3.8[39;49;00m
[34mWORKDIR[39;49;00m[33m /usr/src/app[39;49;00m
[34mRUN[39;49;00m apt-get update && apt-get upgrade -y && apt-get install -y openjdk-11-jdk-headless
[34mRUN[39;49;00m pip install --no-cache-dir multi-model-server sagemaker-inference
[34mCOPY[39;49;00m dockerd-entrypoint.py ./
[34mENTRYPOINT[39;49;00m [[33m"python"[39;49;00m, [33m"/usr/src/app/dockerd-entrypoint.py"[39;49;00m]


## 3-2.エントリポイントを確認

SageMakerSDKにてdeploy()を実行した際の

docker run \<image> server

で実行される、ENTRYPOINTを確認します。

dockerfile

これは、以下に該当する。

3.Implement a serving entrypoint, which starts the model server.


https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py

start_model_server()は、引数指定しない場合、

DEFAULT_HANDLER_SERVICE = default_handler_service.__name__

を指定。これは、inference-toolkitのハンドラサービスである。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_handler_service.py



ハンドラサービスが、Transformer()を作り、そのなかで、推論ハンドラが作られている。

DefaultHandlerService -> Transformer -> DefaultInferenceHandler

https://github.com/aws/sagemaker-inference-toolkit/blob/3774c1a0fb4408cfa95333b75d6e30a376bffa52/src/sagemaker_inference/transformer.py


In [47]:
!pygmentize ./container_sminftoolkit/dockerd-entrypoint.py

[34mfrom[39;49;00m [04m[36msagemaker_inference[39;49;00m [34mimport[39;49;00m model_server
model_server.start_model_server()


start_model_server()は引数指定しない場合、
inference-toolkitのTransform()が作られる。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py

DEFAULT_HANDLER_SERVICE = default_handler_service.__name__

より、

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_handler_service.py

__init__にて、Trransformer()が実行

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/transformer.py

Transform()において、inference-toolkitのDefaultInferenceHandlerが利用される。

https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/default_inference_handler.py

よって、このdockerd-entrypoint.pyが最小構成となる。

## 解説
ハンドラサービスと推論ハンドラがある。

ハンドラサービスは、以下に該当する。

2.Implement a handler service that is executed by the model server.

モデルの推論ハンドラは、以下に該当する。

1.Implement an inference handler, which is responsible for loading the model and providing input, predict, and output functions. 


2.のハンドラサービスから、1.の推論ハンドラがロードされる。推論ハンドラはinference-toolkitで用意したものを使ってもよい。

In [48]:
#!pygmentize ./container_sminftoolkit/model_handler.py ### 最小構成には不要

In [49]:
%%sh

# The name of our algorithm
#algorithm_name=demo-sagemaker-multimodel
algorithm_name=demo-sagemaker-inftoolkit

#cd container
cd container_sminftoolkit

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:a88b9da80bae3eaf6df922c948a07855886e0bd667b17878b717497ec98ee02b
The push refers to repository [390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit]
98e1c540fc48: Preparing
5bc8538c2aef: Preparing
c9ceeb472b7a: Preparing
96b22d555b0b: Preparing
08f7737fec66: Preparing
dee03037c4fc: Preparing
17517a754285: Preparing
0c7daf9a72c8: Preparing
75ba02937496: Preparing
288cf3a46e32: Preparing
186da837555d: Preparing
955c9335e041: Preparing
8e079fee2186: Preparing
0c7daf9a72c8: Waiting
75ba02937496: Waiting
288cf3a46e32: Waiting
186da837555d: Waiting
955c9335e041: Waiting
8e079fee2186: Waiting
dee03037c4fc: Waiting
17517a754285: Waiting
5bc8538c2aef: Layer already exists
08f7737fec66: Layer already exists
98e1c540fc48: Layer already exists
96b22d555b0b: Layer already exists
c9ceeb472b7a: Layer already exists
dee03037c4fc: Layer already exists
17517a754285: Layer already exists
75ba02937496: Layer already exists
186da837555d: Layer already exists
2

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



* dockerd-entrypoint.py が実行され、サーバーの起動を試みる。
    * サーバー起動の際に必要はハンドラーは、odel-handler.pyに記載されている。
    


## ローカルにエンドポイントをデプロイ
モデルは前のセクションで作成したLGBMモデル

* ソースも指定する
* LGBMはrequirements.txtでインストールする

In [50]:
container_uri = f'{account_id}.dkr.ecr.{region_name}.amazonaws.com/demo-sagemaker-inftoolkit:latest'

In [51]:
container_uri

'390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit:latest'

In [52]:
### 2.8の学習ジョブで構築したモデルを利用する
est_lightgbm.model_data

's3://sagemaker-ap-northeast-1-390731033655/sagemaker-lightgbm-regression-2022-10-25-04-37-16-219/output/model.tar.gz'

In [53]:
!docker ps

CONTAINER ID   IMAGE                                                                             COMMAND   CREATED         STATUS         PORTS                                       NAMES
2f5a1ac7a5c9   390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-lightgbm-regression   "serve"   6 minutes ago   Up 6 minutes   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   ijko95ia2e-algo-1-x5jwn


In [54]:
#全コンテナ停止
!docker stop $(docker ps -q)

[36mijko95ia2e-algo-1-x5jwn |[0m [2022-10-25 04:43:36 +0000] [10] [INFO] Handling signal: term
[36mijko95ia2e-algo-1-x5jwn exited with code 0
2f5a1ac7a5c9
[0mAborting on container exit...


In [55]:
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


In [56]:
from sagemaker.predictor import RealTimePredictor

lgb_model = sagemaker.model.Model(#est_xgb.image_uri, # XGBoostビルトインコンテナのURI
                                  container_uri,
                                  model_data=est_lightgbm.model_data, # ローカル学習で生成したモデルファイル
                                  role=role,
                                  predictor_cls=RealTimePredictor, # 推論するための識別子を指定
                                  source_dir='./src_builtin_container_serve', # requirements.txt必要な場合
                                  entry_point='inference.py' # source_dirを指定している場合、.pyファイルを指定する。
                                  #entry_point='./src_builtin_container_serve/inference.py'
                                 )

In [57]:
predictor_lgb_model = lgb_model.deploy(initial_instance_count=1,
                                       instance_type='local', 
                                       serializer=csv_serializer, ### string形式でSageMakerに渡す（認識してもらう）
                                      )

Attaching to c8t5d8mv7c-algo-1-eitup
[36mc8t5d8mv7c-algo-1-eitup |[0m Collecting lightgbm
[36mc8t5d8mv7c-algo-1-eitup |[0m   Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m67.2 MB/s[0m eta [36m0:00:00[0m31m?[0m eta [36m-:--:--[0m
[36mc8t5d8mv7c-algo-1-eitup |[0m Collecting scikit-learn!=0.22.0
[36mc8t5d8mv7c-algo-1-eitup |[0m   Downloading scikit_learn-1.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.2/31.2 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m:--:--[0m
[36mc8t5d8mv7c-algo-1-eitup |[0m Collecting joblib>=1.0.0
[36mc8t5d8mv7c-algo-1-eitup |[0m   Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 KB[0m [31m42.2 MB/s[0m eta [36m0:00:00[0m31m?[0m eta [36m-:--:--[0m
[36mc8

The class RealTimePredictor has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [58]:
!docker ps

CONTAINER ID   IMAGE                                                                                COMMAND                  CREATED         STATUS         PORTS                                       NAMES
ac1dd8df0088   390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit:latest   "python /usr/src/app…"   9 seconds ago   Up 9 seconds   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   c8t5d8mv7c-algo-1-eitup


In [59]:
#!docker stop f380dc891702

In [60]:
#!docker ps

## 推論実施

In [61]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor_lgb_model.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,043 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /opt/ml/model
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,044 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /opt/ml/model
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,059 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - code
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,059 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - lightgbm-regression-model.txt
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,060 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model.tar.gz
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,060 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - code
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,060 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - lightgbm-

In [62]:
print(predicted)

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,160 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 8.64476,0.0,18.1,0.0,0.693,6.193,92.6,1.7912,24.0,666.0,20.2,396.9,15.17
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,162 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.02177,82.5,2.03,0.0,0.415,7.61,15.7,6.27,2.0,348.0,14.7,395.38,3.11
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,162 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.13914,0.0,4.05,0.0,0.51,5.572,88.5,2.5961,5.0,296.0,16.6,396.9,14.69
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,162 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 8.05579,0.0,18.1,0.0,0.584,5.427,95.4,2.4298,24.0,666.0,20.2,352.58,18.14
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,162 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.22876,0.0,8.56,0.0,0.52,6.405,85.4,2.7147,5.0,384.0,20.9,70.8,10.63
[36mc8

In [63]:
type(predicted)

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,192 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 15.8744,0.0,18.1,0.0,0.671,6.545,99.1,1.5192,24.0,666.0,20.2,396.9,21.08
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,192 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.03537,34.0,6.09,0.0,0.433,6.59,40.4,5.4917,7.0,329.0,16.1,395.75,9.5
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,196 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.09252,30.0,4.93,0.0,0.428,6.606,42.2,6.1899,6.0,300.0,16.6,383.78,7.37
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,196 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.32543,0.0,21.89,0.0,0.624,6.431,98.8,1.8125,4.0,437.0,21.2,396.9,15.39
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,198 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.16902,0.0,25.65,0.0,0.581,5.986,88.4,1.9929,2.0,188.0,19.1,385.02,14.81


str

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,203 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.80271,0.0,8.14,0.0,0.538,5.456,36.6,3.7965,4.0,307.0,21.0,288.99,11.69
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,203 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.05479,33.0,2.18,0.0,0.472,6.616,58.1,3.37,7.0,222.0,18.4,393.36,8.93
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,204 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 22.5971,0.0,18.1,0.0,0.7,5.0,89.5,1.5184,24.0,666.0,20.2,396.9,31.99
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,204 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.2498,0.0,21.89,0.0,0.624,5.857,98.2,1.6686,4.0,437.0,21.2,392.04,21.32
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,204 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.17783,0.0,9.69,0.0,0.585,5.569,73.5,2.3999,6.0,391.0,19.2,395.77,15.1
[36mc8

## 返り値をstr以外で受け取りには
Deserializerの説明

# (optional) XGBoostコンテナで、LGBMの推論を実施する

LGBMのカスタムコンテナも存在する
< URL >
    

In [64]:
xgb_container_uri = sagemaker.image_uris.retrieve("xgboost", region_name, "1.5-1")

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,219 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.21038,20.0,3.33,0.0,0.4429,6.812,32.2,4.1007,5.0,216.0,14.9,396.9,4.85
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,220 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 14.4208,0.0,18.1,0.0,0.74,6.461,93.3,2.0026,24.0,666.0,20.2,27.49,18.05
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,224 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 0.54011,20.0,3.97,0.0,0.647,7.203,81.8,2.1121,5.0,264.0,13.0,392.8,9.59
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,224 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 9.33889,0.0,18.1,0.0,0.679,6.38,95.6,1.9682,24.0,666.0,20.2,60.72,24.08
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,227 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 8.20058,0.0,18.1,0.0,0.713,5.936,80.3,2.7792,24.0,666.0,20.2,3.5,16.94
[36m

In [65]:
xgb_container_uri

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,304 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  17.75985357 27.88826982 20.52139516 14.83184112 24.77641754 24.9658571
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,306 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  19.64928982 21.02679762 33.70952264 22.77086784 25.12436361 32.04499227
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,308 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  20.30087172 25.99375103 14.83184112 17.55319877 21.07028654 21.1224133
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,311 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  16.37705308 15.56837359 33.70952264 27.6115402  19.3427611  17.95097211
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,312 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  33.70952264 33.83316205 25.48273875 20.66079602 30.6994611  21.03539072


'354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1'

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,317 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  22.18581949 19.29909953 16.57470006 23.91268407 28.5757133  16.23633985
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,319 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  27.50109599 15.7539434  17.48873231 16.54689993 27.60136795 27.83102845


In [66]:
from sagemaker.predictor import RealTimePredictor

lgb_model = sagemaker.model.Model(xgb_container_uri, # XGBoostビルトインコンテナのURI
                                  model_data=est_lightgbm.model_data, # ローカル学習で生成したモデルファイル
                                  role=role,
                                  predictor_cls=RealTimePredictor, # 推論するための識別子を指定
                                  source_dir='./src_builtin_container_serve', # requirements.txt必要な場合
                                  entry_point='inference.py' # source_dirを指定している場合、.pyファイルを指定する。
                                 )

[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,324 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  23.56626434 27.50109599 21.64384786 18.78723742 17.91867525 14.83184112
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,325 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  22.73504239 18.93446151 20.7828134  15.12466062 26.58007209 33.83316205
[36mc8t5d8mv7c-algo-1-eitup |[0m 2022-10-25T04:43:49,325 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -  22.24168964 18.00348773 26.12094103 21.85762035 23.06703081 31.38145811]


In [67]:
!docker ps

CONTAINER ID   IMAGE                                                                                COMMAND                  CREATED          STATUS         PORTS                                       NAMES
ac1dd8df0088   390731033655.dkr.ecr.ap-northeast-1.amazonaws.com/demo-sagemaker-inftoolkit:latest   "python /usr/src/app…"   10 seconds ago   Up 9 seconds   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   c8t5d8mv7c-algo-1-eitup


In [68]:
!docker stop $(docker ps -q)

ac1dd8df0088
[36mc8t5d8mv7c-algo-1-eitup exited with code 0
[0mAborting on container exit...


In [69]:
!docker ps

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


In [70]:
predictor_lgb_model = lgb_model.deploy(initial_instance_count=1,
                                       instance_type='local', 
                                       serializer=csv_serializer, ### string形式でSageMakerに渡す（認識してもらう）
                                       #deserializer=None, 
                                      )

Attaching to t8bj7uf19d-algo-1-vmubs
[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25:04:43:56:INFO] No GPUs detected (normal if no gpus installed)
[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25:04:43:56:INFO] No GPUs detected (normal if no gpus installed)
[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25:04:43:56:INFO] nginx config: 
[36mt8bj7uf19d-algo-1-vmubs |[0m worker_processes auto;
[36mt8bj7uf19d-algo-1-vmubs |[0m daemon off;
[36mt8bj7uf19d-algo-1-vmubs |[0m pid /tmp/nginx.pid;
[36mt8bj7uf19d-algo-1-vmubs |[0m error_log  /dev/stderr;
[36mt8bj7uf19d-algo-1-vmubs |[0m 
[36mt8bj7uf19d-algo-1-vmubs |[0m worker_rlimit_nofile 4096;
[36mt8bj7uf19d-algo-1-vmubs |[0m 
[36mt8bj7uf19d-algo-1-vmubs |[0m events {
[36mt8bj7uf19d-algo-1-vmubs |[0m   worker_connections 2048;
[36mt8bj7uf19d-algo-1-vmubs |[0m }
[36mt8bj7uf19d-algo-1-vmubs |[0m 
[36mt8bj7uf19d-algo-1-vmubs |[0m http {
[36mt8bj7uf19d-algo-1-vmubs |[0m   include /etc/nginx/mime.types;
[36mt8bj7uf19d-algo-1-vmu

The class RealTimePredictor has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [71]:
### 推論実行
with open(local_test, 'r') as f:
    payload = f.read().strip()

predicted = predictor_lgb_model.predict(payload).decode('utf-8')
print('=' * 20)
print(predicted)

The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25:04:44:05:INFO] No GPUs detected (normal if no gpus installed)
[36mt8bj7uf19d-algo-1-vmubs |[0m [2022-10-25:04:44:05:INFO] Installing module with the following command:
[36mt8bj7uf19d-algo-1-vmubs |[0m /miniconda3/bin/python3 -m pip install . -r requirements.txt
[36mt8bj7uf19d-algo-1-vmubs |[0m Processing /opt/ml/code
[36mt8bj7uf19d-algo-1-vmubs |[0m   Preparing metadata (setup.py) ... [?25ldone
[36mt8bj7uf19d-algo-1-vmubs |[0m Building wheels for collected packages: inference
[36mt8bj7uf19d-algo-1-vmubs |[0m   Building wheel for inference (setup.py) ... [?25ldone
[36mt8bj7uf19d-algo-1-vmubs |[0m [?25h  Created wheel for inference: filename=inference-1.0.0-py2.py3-none-any.whl size=14235 sha256=527b28c9f75f724517f786afd3ab0184d0dc959193faf8bd5746c8460e722d4a
[36mt8bj7uf19d-algo-1-vmubs |[0m   Stored in directory: /home/model-server/tmp/pip-ephem-wheel-cache-_iedeg24/wheels/f3/75/57/158162e9eab7af12b5c338c279b3a81f103b89d7

In [72]:
print(predicted)

[19.95642073217597, 27.844891841022335, 23.747437427003455, 21.961517177305176, 33.70952263893306, 16.546899933876215, 20.7577247308279, 21.58941351302627, 28.44096446328559, 21.573610198594977, 16.520022349295115, 18.56239893242527, 33.70952263893306, 21.66404760045202, 18.839854556333133, 20.524517944865078, 23.512192914502315, 19.720552829648888, 14.831841119971708, 25.48273874904075, 24.232639474441545, 21.624005932843115, 24.961489794296718, 31.737194191676068, 21.634052928440624, 28.40721160777621, 21.408363849719503, 14.831841119971708, 22.218594550645975, 21.174456098551236, 21.78791955089051, 14.831841119971708, 29.996695633096042, 22.44097524661187, 33.83316205414468, 26.41403196992683, 33.70952263893306, 17.366188662166092, 27.56686070285819, 30.785697489113854, 19.36938873496206, 20.70626548555591, 17.759853567831996, 27.888269821752413, 20.521395163186774, 14.831841119971708, 24.776417537973362, 24.965857100129327, 19.649289821764185, 21.026797620813866, 33.70952263893306,

# END of Containts =======================

# 後片付け

# 参考

## （optional）4. カスタムコンテナを使わず、built-inコンテナのrequirement.txtにlightgbmを記載して実行する



過去バージョン（1.3-3, 1.2-2, 1.2-1, 1.0-1)はこちら

https://github.com/aws/sagemaker-xgboost-container/releases


## 4-2. 推論実施

### 4-2-1.デプロイ

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.EstimatorBase.deploy


デプロイの際に、ソースコードを指定するにはどうしたらいいのか？

https://www.youtube.com/watch?v=sngNd79GpmE&t=596s


ポイント：あらためて、Estimatorを定義する必要がある。

### serve用のファイルは、.py かつ、作法に従う必要がある。

MMSは.pyを扱うように設計されているため。

https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator

## エラー

RuntimeError: Model /opt/ml/model/model.tar.gz cannot be loaded:


6o0805unb3-algo-1-k8ugv | [2022-10-16 02:25:50 +0000] [19] [ERROR] Exception in worker process  
6o0805unb3-algo-1-k8ugv | Traceback (most recent call last):  
6o0805unb3-algo-1-k8ugv |   File "/miniconda3/lib/python3.8/site-packages/sagemaker_xgboost_container/algorithm_mode/serve_utils.py", line 175, in get_loaded_booster  
6o0805unb3-algo-1-k8ugv |     booster = pkl.load(open(full_model_path, "rb"))  
6o0805unb3-algo-1-k8ugv | _pickle.UnpicklingError: invalid load key, '\x1f'.  

## 原因
lightgbm-regression-model.txtなので、pklでは読み込めない。

モデルロードする関数を上書きするには？？（そもそもこれがやりたい）

https://github.com/aws/sagemaker-xgboost-container/blob/master/docker/1.5-1/final/Dockerfile.cpu

# Set SageMaker entrypoints
ENV SAGEMAKER_TRAINING_MODULE sagemaker_xgboost_container.training:main  
ENV SAGEMAKER_SERVING_MODULE sagemaker_xgboost_container.serving:main  


まず、serving.main()が実行される

https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L143

serving_env = env.ServingEnv()

で、環境変数にパラメータが読み込まれる


L147

user_module = modules.import_module(serving_env.module_dir, serving_env.module_name)

ここで、ユーザーのモジュールが読み込まれる。

L18をみると、sagemaker_containers.beta.framework.modulesがモジュールのようだ。

from sagemaker_containers.beta.framework import (
    encoders,
    env,
    modules,
    server,
    transformer,
    worker,
)

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/beta/framework/__init__.py

sagemaker_containers.beta.frameworkはアーカイブされている。

現在はこちら。initをみると

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py



L258で、imortしている。

module = importlib.import_module(name)

def import_module(uri, name=DEFAULT_MODULE_NAME, cache=None):  # type: (str, str, bool) -> module

とあるように、DEFAULT_MODULE_NAMEが読み込まれるようだ





https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L148,149: L147で読み込んだユーザーモジュールに上書きする

user_module_transformer = _user_module_transformer(user_module)  
user_module_transformer.initialize()  


L116にあるように、model_fnなどのユーザー関数に上書きされる。


def _user_module_transformer(user_module):  
    model_fn = getattr(user_module, "model_fn", default_model_fn)  
    input_fn = getattr(user_module, "input_fn", None)  
    predict_fn = getattr(user_module, "predict_fn", None)  
    output_fn = getattr(user_module, "output_fn", None)  
    transform_fn = getattr(user_module, "transform_fn", None)  

## model_fnを定義したファイルが、importされているか？

いま、そもそも環境変数に正しく情報渡せていない気がする。


https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py

L237より、

def import_module(uri, name=DEFAULT_MODULE_NAME, cache=None):  # type: (str, str, bool) -> module

第二引数に指定する必要がある。

これを呼ぶのは、


https://github.com/aws/sagemaker-xgboost-container/blob/master/src/sagemaker_xgboost_container/serving.py

L147

user_module = modules.import_module(serving_env.module_dir, serving_env.module_name)

serving_env.module_name である。指定できているのか？


L143より

serving_env = env.ServingEnv()

これは、以下のファイル。

https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_env.py

L862

class ServingEnv(_Env):



https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_env.py

L329には、

class _Env(_mapping.MappingMixin):


module_name = os.environ.get(_params.USER_PROGRAM_ENV, None)

とある。


L595

TrainingEnvには、

        # override base class attributes  
        if self._module_name is None:  
            self._module_name = str(sagemaker_hyperparameters.get(_params.USER_PROGRAM_PARAM, None))  
        self._user_entry_point = self._user_entry_point or sagemaker_hyperparameters.get(  
            _params.USER_PROGRAM_PARAM  
        )  
        
        
        

## USER_PROGRAM_ENVに設定できればいい？


https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/parameters.py


L18

USER_PROGRAM_ENV = "SAGEMAKER_PROGRAM"  # type: str

SAGEMAKER_PROGRAMに設定できればいいようだ。

ビルトインコンテナにはどうすれば設定できるのだろうか？？

以下のYouTubeだと、boto3でEnvironment引数を使っている。

https://youtu.be/sngNd79GpmE?t=780

# デバッグのために、dockerイメージをプルして、中をみてみる。

ビルトインコンテナの中身をみるには、どうすればいいのか？

XGBoostの場合は、ローカルでbuildしていくようだ。

https://github.com/aws/sagemaker-xgboost-container

# コンテナの中に入って確認する方法
コンソールを立ち上げて、以下の流れで実行する


ディレクトリ移動
 $ cd sagemaker-xgboost-container/
 
baseコンテナをビルド
 $ docker build -t xgboost-container-base:1.5-1-cpu-py3 -f docker/1.5-1/base/Dockerfile.cpu .

finalコンテナをビルド
 $ docker build -t preprod-xgboost-container:1.5-1-cpu-py3 -f docker/1.5-1/final/Dockerfile.cpu .

構築されたイメージを確認
$ docker image ls

中に入って確認（コンテナのタグもつけて指定すること）
$ docker run -it preprod-xgboost-container:1.5-1-cpu-py3 /bin/bash

$ docker run -it 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-xgboost:1.5-1 /bin/bash  

   

# コンテナの中に入り、 command serveを実行してみる

# モデル利用ならうまくいくのではないか？-> OK


YouTubeのリンク先ソースより

https://github.com/aws-samples/aws-ml-jp/blob/main/sagemaker/sagemaker-inference/inference-tutorial/1_sklearn.ipynb



https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model

デプロイ

https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy

ValueError: Estimator is not associated with a training job

# エラー　trainingjobとの紐付け

https://stackoverflow.com/questions/63340328/how-to-define-a-sagemaker-estimator-object-using-a-pre-trained-model-and-then-de

# END of Containts ===============

# 5.後片付け
予期せぬ課金を防ぐために、以下のリソースを削除します。

* SageMaker 推論エンドポイント
* ECR
* S3
* SageMakerノートブックインスタンス

# 参考
* SageMaker のtrainingジョブを理解する
    * https://github.com/aws-samples/aws-ml-jp/tree/main/sagemaker/sagemaker-traning/tutorial
* SageMaker-Pytorth training Toolkit
    * https://github.com/aws/sagemaker-pytorch-training-toolkit/
* SageMaker-Pytorch Inference Toolkit
    * https://github.com/aws/sagemaker-pytorch-inference-toolkit
* SageMaker Inference Toolkit
    * https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-toolkits.html
    * https://github.com/aws/sagemaker-inference-toolkit