# Classify Images via Transfer Learning

We will build a simple model in Cloud ML using a small set of labeled flower images. This dataset has been selected for ease of explanation only; We've successfully used the same implementation for several proprietary datasets covering cases like interior-design classification (e.g., carpet vs. hardwood floor) and animated-character classification. The code can be found https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/flowers and can easily be adapted to run on different datasets.

## Preprocess Data
We start with a set of labeled images in a Google Cloud Storage bucket and preprocess them to extract the image features from the bottleneck layer (typically the penultimate layer) of the Inception network. Although processing images in this manner can be reasonably expensive, each image can be processed independently and in parallel, making this task a great candidate for Cloud Dataflow.

We process each image to produce its feature representation (also known as an embedding) in the form of a k-dimensional vector of floats (in our case 2,048 dimensions). The preprocessing includes converting the image format, resizing images, and running the converted image through a pre-trained model to get the embeddings. Final output will be written in directory specified by --output_path.

In [None]:
import time
GCS_BUCKET = 'gs://lytx-dry-run' #CHANGE THIS TO YOUR BUCKET
PROJECT = 'indigo-bloom-118922' #CHANGE THIS TO YOUR PROJECT ID
REGION = 'us-central1' #OPTIONALLY CHANGE THIS
TIME = str(int(time.time()))
GCS_PATH= GCS_BUCKET + "/" + TIME

import os
os.environ['GCS_BUCKET'] = GCS_BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION
os.environ['GCS_PATH'] = GCS_PATH

In [None]:
!gsutil mb $GCS_BUCKET
!gsutil cp -r gs://cloud-ml-data/img/flower_photos/train_set.csv $GCS_BUCKET/img/flower_photos/
!gsutil cp -r gs://cloud-ml-data/img/flower_photos/eval_set.csv $GCS_BUCKET/img/flower_photos/
!gsutil cp -r gs://cloud-ml-data/img/flower_photos/dict.txt $GCS_BUCKET/img/flower_photos/

In [None]:
!gsutil cat -r 0-85  $GCS_BUCKET/img/flower_photos/eval_set.csv
!gsutil cat $GCS_BUCKET/img/flower_photos/dict.txt

In [None]:
!gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | wc -l
!gsutil cat gs://cloud-ml-data/img/flower_photos/eval_set.csv | wc -l

### Machine Learning Pipeline 
We are setting up following pipeline 
![Machine Learning Pipeline](pipeline.png)

### SKIP THIS STEP & Go to Next - Preprocessing 
(Optional: This take about 50 minutes to run in DataFlow over 20 single core VMs)
<br><br>
We start with a set of labeled images in a Google Cloud Storage bucket and preprocess them to extract the image features from the bottleneck layer (typically the penultimate layer) of the Inception network. Although processing images in this manner can be reasonably expensive, each image can be processed independently and in parallel, making this task a great candidate for Cloud Dataflow.

We process each image to produce its feature representation (also known as an embedding) in the form of a k-dimensional vector of floats (in our case 2,048 dimensions). The preprocessing includes converting the image format, resizing images, and running the converted image through a pre-trained model to get the embeddings. Final output will be written in directory specified by --output_path.

<br>
(uri, label_ids, image_bytes) -> (tensorflow.Example).

  Output proto contains 'label', 'image_uri' and 'embedding'.
  The 'embedding' is calculated by feeding image into input layer of image
  neural network and reading output of the bottleneck layer of the network.

In [None]:
!python trainer/preprocess.py \
  --input_dict $GCS_BUCKET/img/flower_photos/dict.txt \
  --input_path $GCS_BUCKET/img/flower_photos/eval_set.csv \
  --output_path $GCS_PATH/preproc/eval \
  --num_workers 30 \
  --cloud

In [None]:
!python trainer/preprocess.py \
  --input_dict $GCS_BUCKET/img/flower_photos/dict.txt \
  --input_path $GCS_BUCKET/img/flower_photos/train_set.csv \
  --output_path $GCS_PATH/preproc/train \
  --num_workers 30 \
  --cloud

### Preprocessed Images: Simple copy tfrecords below (Avoid 50 min. Processing time)

In [None]:
!gsutil -m cp gs://lytx-experiment/1512512700/preproc/* $GCS_PATH/preproc/

### Training
Once we've preprocessed data, we can then train a simple classifier. The network will comprise a single fully-connected layer with RELU activations and with one output for each label in the dictionary to replace the original output layer. Final output is computed using the softmax function


![Training](incept_v3.png)

In [None]:
%%bash
echo $GCS_PATH
echo $GCS_BUCKET

In [None]:
%%bash
JOBNAME=lytx_$(date -u +%y%m%d_%H%M%S)
gcloud ml-engine jobs submit training $JOBNAME \
  --stream-logs \
  --module-name=trainer.task \
  --package-path=./trainer \
  --staging-bucket=$GCS_BUCKET \
  --region=us-central1 \
  --runtime-version=1.0 \
  --scale-tier=STANDARD_1 \
  -- \
  --output_path=$GCS_PATH/$JOBNAME/output \
  --eval_data_paths=$GCS_PATH/preproc/eval* \
  --train_data_paths=$GCS_PATH/preproc/train*

## TensorBoard - View our Training Progress

In [None]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://lytx-dry-run/1513014343/lytx_171211_182542/output')

In [None]:
%%bash
MODEL_NAME="lytx_dry_run"
MODEL_VERSION="v1"
MODEL_LOCATION="gs://lytx-dry-run/1513014343/lytx_171211_182542/output/model" #REPLACE this with the location of your model
gcloud ml-engine models create ${MODEL_NAME} --regions us-central1
gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} 

In [None]:
!gsutil cp gs://cloud-ml-data/img/flower_photos/daisy/100080576_f52e8ee070_n.jpg daisy.jpg

In [None]:
%%bash
python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"key":"0", "image_bytes": {"b64": img}})' daisy.jpg &> request.json

In [None]:
%%bash
gcloud ml-engine predict --model lytx_prep --json-instances request.json

In [None]:
!gsutil cat $GCS_BUCKET/img/flower_photos/dict.txt

In [None]:
%%bash
wget https://www.heirloomroses.com/media/catalog/product/cache/1/image/650x/040ec09b1e35df139433887a97daa66f/g/r/gr842-love-1_1.jpg
mv gr842-love-1_1.jpg rose.jpg

In [None]:
%%bash
python -c 'import base64, sys, json; img = base64.b64encode(open(sys.argv[1], "rb").read()); print json.dumps({"key":"1", "image_bytes": {"b64": img}})' rose.jpg &>> request.json

In [None]:
%%bash
gcloud ml-engine predict --model lytx_prep --json-instances request.json