## Getting Started with Python Client
[Document - Getting Started](http://docs.h2o.ai/driverless-ai/pyclient/docs/html/examples/getting-started.html)

---

[driverlessai](https://pypi.org/project/driverlessai/)(Python Clientライブラリ)のインストール  
`pip install driverlessai` or `conda install -c h2oai driverlessai`

---

In [1]:
import driverlessai

In [7]:
# Driverless AIのuser nameとpasswordの読み込み
import json
with open('idpass.json') as f:
    idpass = json.load(f)

In [6]:
# Driverless AIサーバーへの接続
dai = driverlessai.Client(address='http://3.88.181.75:12345', username=idpass['id'], password=idpass['pass'])
dai

<class 'driverlessai._core.Client'> http://3.88.181.75:12345

[Clientクラス](http://docs.h2o.ai/driverless-ai/pyclient/docs/html/client.html#client)に関して

In [10]:
# Driverless AIへのデータのロード
ds = dai.datasets.create(
    data='s3://h2o-public-test-data/smalldata/iris/iris.csv',
    data_source='s3',
    name='iris-getting-started'
)

Complete 100.00% - [4/4] Computed stats for column C5


【Datasets画面に'iris-getting-started'名でデータが表示される】
<img src="img/data_load.png" width=800px>

In [12]:
ds

<class 'Dataset'> fb8d6c2a-6038-11eb-9a4f-0242ac110002 iris-getting-started

[Datasetクラス](http://docs.h2o.ai/driverless-ai/pyclient/docs/html/client.html#datasets)に関して

In [13]:
# データの各情報の確認
print("データ名: ", ds.name)
print("カラム: ", ds.columns)
print("データシェープ: ", ds.shape)

データ名:  iris-getting-started
カラム:  ['C1', 'C2', 'C3', 'C4', 'C5']
データシェープ:  (150, 5)


In [25]:
ds.head()

C1,C2,C3,C4,C5
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


In [23]:
# Dataset Detailesの確認
print(ds.column_summaries()[["C1","C5"]])    # 表示はC1とC5のみ

--- C1 ---

 4.3|███████
    |█████████████████
    |██████████
    |████████████████████
    |████████████
    |███████████████████
    |█████████████
    |████
    |████
 7.9|████

Data Type: real
Logical Types: []
Datetime Format: 
Count: 150
Missing: 0
Mean: 5.84
SD: 0.828
Min: 4.3
Max: 7.9
Unique: 35
Freq: 10
--- C5 ---

     Iris-setosa|████████████████████
 Iris-versicolor|████████████████████
  Iris-virginica|████████████████████

Data Type: str
Logical Types: []
Datetime Format: 
Count: 150
Missing: 0
Unique: 3
Freq: 50



【Dataset Details画面での表示】
<img src="img/data_detail.png" width=500px>

In [15]:
type(ds.column_summaries())

driverlessai._datasets.DatasetColumnSummaryCollection

In [26]:
# Driverless AIにアップロードされているデータのリスト
dai.datasets.list()

[<class 'Dataset'> fb8d6c2a-6038-11eb-9a4f-0242ac110002 iris-getting-started,
 <class 'Dataset'> d17c41ee-5ecc-11eb-b134-0242ac110002 sample_simple.csv,
 <class 'Dataset'> 110fae96-5c88-11eb-a8f5-0242ac110002 date_sample3.csv,
 <class 'Dataset'> 1f09a98e-5c42-11eb-bf60-0242ac110002 date_sample.csv,
 <class 'Dataset'> cae49b84-5bab-11eb-96ed-0242ac110002 TitanicData.csv]

In [27]:
# データセットの学習、テストセットへの分割
ds_split = ds.split_to_train_test(train_size=0.7, train_name="iris_train", test_name="iris_test")
ds_split

Complete


{'train_dataset': <class 'Dataset'> 46d33edc-603c-11eb-9a4f-0242ac110002 iris_train,
 'test_dataset': <class 'Dataset'> 46d36aa6-603c-11eb-9a4f-0242ac110002 iris_test}

【Datasets画面に表示された学習（train_dataset）とテスト（test_dataset）データ】
<img src="img/train_test.png" width=800px>

In [32]:
# Experiment 設定
settings = {
    'task': 'classification',
    'target_column': ds.columns[-1],
    'accuracy': 1,
    'time': 1
}

In [30]:
# Experiment設定の事前確認（Experiment自体は未実行の状態）
dai.experiments.preview(**ds_split, **settings)

ACCURACY [1/10]:
- Training data size: *105 rows, 5 cols*
- Feature evolution: *[Constant, DecisionTree, GLM, LightGBM, XGBoostGBM]*, *1/4 validation split*
- Final pipeline: *One of [Constant, DecisionTree, GLM, LightGBM, XGBoostGBM], single final model, validated with 4-fold CV*

TIME [1/10]:
- Feature evolution: *2 individuals*, up to *3 iterations*
- Early stopping: disabled

INTERPRETABILITY [8/10]:
- Feature pre-pruning strategy: None
- Monotonicity constraints: enabled
- Feature engineering search space: [CVCatNumEncode, CVTargetEncode, CatOriginal, Cat, Frequent, Interactions, NumCatTE, OneHotEncoding, Original, TextOriginal, Text]
- Pre-trained PyTorch NLP models (with fine-tuning): ['disabled']

[Constant, DecisionTree, GLM, LightGBM, XGBoostGBM] models to train:
- Model and feature tuning: *2*
- Feature evolution: *3*
- Final pipeline: *9*
- Per-model Hyper. opt. trials: *0* (evolution) *0* (final)

Estimated runtime: *minutes*
Auto-click Finish/Abort if not done in: *1 day*

【Experiment画面（実行前）で確認する場合（上記設定の必要あり）】
<img src="img/experiment_setting.png" width=600px>

In [35]:
# Experimentの実行
ex = dai.experiments.create(**ds_split, **settings, name='iris-getting-started')

Experiment launched at: http://3.88.181.75:12345/#/experiment?key=98616740-603e-11eb-9a4f-0242ac110002
Running 100.00% - Status: Complete                                              


【Experiments画面でも実行状況が確認できる】
<img src="img/experiment_running.png" width=800px>

In [36]:
ex

<class 'Experiment'> 98616740-603e-11eb-9a4f-0242ac110002 iris-getting-started

[Experimentクラス](http://docs.h2o.ai/driverless-ai/pyclient/docs/html/client.html#experiments)に関して

In [37]:
# Experimentサマリ
ex.summary()

Status: Complete
Experiment: iris-getting-started (98616740-603e-11eb-9a4f-0242ac110002)
  Version: 1.9.1, 2021-01-27 01:27
  Settings: 1/1/8, seed=281155033, GPUs disabled
  Train data: iris_train (105, 5)
  Validation data: N/A
  Test data: [Test] (45, 4)
  Target column: C5 (3-class)
System specs: Docker/Linux, 31 GB, 8 CPU cores, 0/0 GPU
  Max memory usage: 0.444 GB, 0 GB GPU
Recipe: AutoDL (5 iterations, 2 individuals)
  Validation scheme: stratified, 1 internal holdout
  Feature engineering: 3 features scored (2 selected)
Timing: MOJO latency: 0.00995 millis (2.1kB)
  Data preparation: 4.95 secs
  Shift/Leakage detection: 1.15 secs
  Model and feature tuning: 10.71 secs (6 models trained)
  Feature evolution: 0.95 secs (0 of 3 model trained)
  Final pipeline training: 13.29 secs (9 models trained)
  Python / MOJO scorer building: 39.75 secs / 9.08 secs
Validation score: LOGLOSS = 1.096729 (constant preds)
Validation score: LOGLOSS = 0.2694015 +/- 5.551115e-17 (baseline)
Validatio

【Experiment画面（完了後）で確認する場合】
<img src="img/experiment_done.png" width=800px>

In [45]:
# Experimentの成果物ダウンロード（Client環境のjupyter実行パス上に各ファイルがダウンロードされる）
ex.artifacts.download(overwrite=True)

Downloaded 'report.docx'
Downloaded 'h2oai_experiment_logs_b433119a-5fb1-11eb-bb69-0242ac110002.zip'
Downloaded 'mojo.zip'
Downloaded 'scorer.zip'
Downloaded 'h2oai_experiment_summary_b433119a-5fb1-11eb-bb69-0242ac110002.zip'
Downloaded 'test_preds.csv'
Downloaded 'train_preds.csv'


{'autoreport': 'report.docx',
 'logs': 'h2oai_experiment_logs_b433119a-5fb1-11eb-bb69-0242ac110002.zip',
 'mojo_pipeline': 'mojo.zip',
 'python_pipeline': 'scorer.zip',
 'summary': 'h2oai_experiment_summary_b433119a-5fb1-11eb-bb69-0242ac110002.zip',
 'test_predictions': 'test_preds.csv',
 'train_predictions': 'train_preds.csv'}

In [46]:
!ls

PyClient_test_20210126.ipynb
h2oai_experiment_logs_b433119a-5fb1-11eb-bb69-0242ac110002.zip
h2oai_experiment_summary_b433119a-5fb1-11eb-bb69-0242ac110002.zip
mojo.zip
report.docx
scorer.zip
test_preds.csv
train_preds.csv
