Skip to content
Branch: master
Find file History

README.md

AmoebaNet-D on TPU

This code was implemented based on results in the AmoebaNet paper, which should be cited as: Real, E., Aggarwal, A., Huang, Y. and Le, Q.V., 2018. Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548.

Acknowledgements

The starting point for this code was branched from the implementation for NASNet in https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet and from image processing code in https://github.com/tensorflow/tpu/blob/master/models/experimental/inception/inception_preprocessing.py.

Prerequisites

Setup a Google Cloud project

Follow the instructions at the Quickstart Guide to get a GCE VM with access to Cloud TPU.

To run this model, you will need:

  • A GCE VM instance with an associated Cloud TPU resource
  • A GCS bucket to store your training checkpoints
  • (Optional): The ImageNet training and validation data preprocessed into TFRecord format, and stored in GCS.

Installing extra packages

The AmoebaNet trainer uses a few extra packages. We can install them now:

pip install -U pillow
pip install -U --no-deps tensorflow-serving-api

Formatting the data

The data is expected to be formatted in TFRecord format, as generated by this script.

If you do not have ImageNet dataset prepared, you can use a randomly generated fake dataset to test the model. It is located at gs://cloud-tpu-test-datasets/fake_imagenet.

Training the model

Train the model by executing the following command (substituting the appropriate values):

python amoeba_net.py \
  --tpu=$TPU_NAME \
  --data_dir=$DATA_DIR \
  --model_dir=$MODEL_DIR

If you are not running this script on a GCE VM in the same project and zone as your Cloud TPU, you will need to add the --project and --zone flags specifying the corresponding values for the Cloud TPU you'd like to use.

This will train an AmoebaNet-D model on ImageNet with 256 batch size on a single Cloud TPU. With the default flags on everything, the model should train to above 80% accuracy in under 48 hours (including evaluation time every few epochs).

You can launch TensorBoard (e.g. tensorboard -logdir=$MODEL_DIR) to view loss curves and other metadata regarding your training run. (Note: if you launch on your VM, be sure to configure ssh port forwarding or the GCE firewall rules appropriately.)

You can also train the AmoebaNet-D model to 93% top-5 accuracy in under 7.5 hours using the following command:

python amoeba_net.py \
  --tpu=$TPU_NAME \
  --data_dir=$DATA_DIR \
  --model_dir=$MODEL_DIR \
  --num_cells=6 \
  --image_size=224 \
  --num_epochs=35 \
  --train_batch_size=1024 \
  --eval_batch_size=1024 \
  --lr=2.56 \
  --lr_decay_value=0.88 \
  --lr_warmup_epochs=0.35 \
  --mode=train \
  --iterations_per_loop=1251

Understanding the code

For more detailed information, read the documentation within each file.

Additional notes

About the model and training regime

The model is the result of evolutionary neural architecture search presented in Regularized Evolution for Image Classifier Search.

TODO: give some more details

You can’t perform that action at this time.