<h1> Hyper-parameter tuning </h1>

In this notebook, you will learn how to carry out hyper-parameter tuning.

This notebook takes several hours to run.

<h2> Environment variables for project and bucket </h2>

Change the cell below to reflect your Project ID and bucket name. See Lab 3a for setup instructions.

In [1]:
import os
PROJECT = 'just-aloe-200223' # REPLACE WITH YOUR PROJECT ID
BUCKET = 'synergi' # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-east1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

In [2]:
# for bash
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION

In [3]:
%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

Updated property [core/project].
Updated property [compute/region].


<h1> 0. train locally </h1>

In [None]:
# ! gsutil cp gs://synergi/train/us000000000000 ./train
# ! gsutil cp gs://synergi/train/us000000000000 ./val

# !tail -n +2 ./train > ./tmp 
# !head -n +100 ./tmp > ${PWD}/sample/train.csv
# !wc -l ${PWD}/sample/train.csv

# !tail -n +2 ./val > ./tmp 
# !head -n +100 ./tmp > ${PWD}/sample/val.csv
# !wc -l ${PWD}/sample/val.csv

# !rm ./tmp 
!${PWD}/taxi_trained

In [None]:
%bash
rm -rf taxifare.tar.gz flight_trained
export PYTHONPATH=${PYTHONPATH}:${PWD}/taxifare
python -m trainer.task \
  --train_data_paths="${PWD}/sample/train*" \
  --eval_data_paths=${PWD}/sample/valid.csv  \
  --output_dir=${PWD}/flight_trained \
  --train_steps=100 \
  --eval_batch_size=32 \
  --eval_batch_size=32 \
  --hidden_units='3 2 1' \
  --job-dir=/tmp

  from ._conv import register_converters as _register_converters
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7efc6de38190>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/content/datalab/synergi/taxi_trained/', '_save_summary_steps': 100}
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7efc6de97fd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_servic

In [None]:
from google.datalab.ml import TensorBoard
OUTDIR='/content/datalab/synergi/flight_trained'
print OUTDIR
TensorBoard().start(OUTDIR)

In [None]:
! gsutil cp gs://synergi/train/us000000000000 ./val
!head -10 ${PWD}/sample/val.csv

In [None]:
%writefile /tmp/test.json 
{"week":21,"dow": 1, "month": 5, "airline": "EV", "arrival_airport": "GTR", "departure_airport": "AEX", "depart_minutes": 895, "scheduled_flight_time": 80, "departure_lat": 31.32, "departure_lon": -92.54, "arrival_lat": 33.45, "arrival_lon": -88.59}

In [None]:
! cat /tmp/test.json 

In [None]:
%bash
model_dir=$(ls ${PWD}/flight_trained/export/exporter)
gcloud ml-engine local predict \
  --model-dir=${PWD}/flight_trained/export/exporter/${model_dir} \
  --json-instances=/tmp/test.json

<h1> 3. Make sure outputs do not clobber each other </h1>

We append the trial-number to the output directory.

In [6]:
!grep -A 5 "trial" taxifare/trainer/task.py

    # Append trial_id to path if we are doing hptuning
    # This code can be removed if you are not using hyperparameter tuning
    arguments['output_dir'] = os.path.join(
        arguments['output_dir'],
        json.loads(
            os.environ.get('TF_CONFIG', '{}')
        ).get('task', {}).get('trial', '')
    ) 

    # Run the training job:
    try:
        model.train_and_evaluate(arguments)


<h1> 4. Create hyper-parameter configuration </h1>

The file specifies the search region in parameter space.  Cloud MLE carries out a smart search algorithm within these constraints (i.e. it does not try out every single value).

In [7]:
%writefile hyperparam.yaml
trainingInput:
  scaleTier: STANDARD_1
  hyperparameters:
    goal: MINIMIZE
    maxTrials: 30
    maxParallelTrials: 3
    hyperparameterMetricTag: rmse
    params:
    - parameterName: train_batch_size
      type: INTEGER
      minValue: 64
      maxValue: 512
      scaleType: UNIT_LOG_SCALE
    - parameterName: nbuckets
      type: INTEGER
      minValue: 10
      maxValue: 20
      scaleType: UNIT_LINEAR_SCALE
    - parameterName: hidden_units
      type: CATEGORICAL
      categoricalValues: ["128 32", "256 128 16", "64 64 64 8"]       

Overwriting hyperparam.yaml


<h1> 5. Run the training job </h1>

Just --config to the usual training command.

In [16]:
%%bash
OUTDIR=gs://${BUCKET}/us_model
JOBNAME=lab4a_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://${BUCKET} \
   --scale-tier=STANDARD_1 \
   --runtime-version=1.4 \
   --config=hyperparam.yaml \
   -- \
   --train_data_paths="gs://$BUCKET/train/us*" \
   --eval_data_paths="gs://${BUCKET}/val/us*"  \
   --output_dir=$OUTDIR \
   --train_steps=5000

gs://synergi/us_model us-east1 lab4a_180429_205608
jobId: lab4a_180429_205608
state: QUEUED


CommandException: 1 files/objects could not be removed.
  for chunk in iter(lambda: fp.read(4096), ''):
Job [lab4a_180429_205608] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe lab4a_180429_205608

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs lab4a_180429_205608


<h2>6. Train chosen model on full dataset</h2>

Look at the last section of the <a href="feateng.ipynb">feature engineering notebook</a>.  The extra parameters are based on hyper-parameter tuning.

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License

In [None]:
%%bash

WARNING -- this uses significant resources and is optional. Remove this line to run the block.

OUTDIR=gs://${BUCKET}/taxifare/feateng2m
JOBNAME=lab4a_$(date -u +%y%m%d_%H%M%S)
TIER=STANDARD_1 
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=$TIER \
   --runtime-version=1.4 \
   -- \
   --train_data_paths="gs://cloud-training-demos/taxifare/train*" \
   --eval_data_paths="gs://cloud-training-demos/taxifare/valid*"  \
   --output_dir=$OUTDIR \
   --train_steps=418168 \
   --train_batch_size=512 --nbuckets=16 --hidden_units="64 64 64 8"