AmoebaNet-D on TPU
This code was implemented based on results in the AmoebaNet paper, which should be cited as: Real, E., Aggarwal, A., Huang, Y. and Le, Q.V., 2018. Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548.
The starting point for this code was branched from the implementation for NASNet in https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet and from image processing code in https://github.com/tensorflow/tpu/blob/master/models/experimental/inception/inception_preprocessing.py.
Setup a Google Cloud project
Follow the instructions at the Quickstart Guide to get a GCE VM with access to Cloud TPU.
To run this model, you will need:
- A GCE VM instance with an associated Cloud TPU resource
- A GCS bucket to store your training checkpoints
- (Optional): The ImageNet training and validation data preprocessed into TFRecord format, and stored in GCS.
Installing extra packages
The AmoebaNet trainer uses a few extra packages. We can install them now:
pip install -U pillow pip install -U --no-deps tensorflow-serving-api
Formatting the data
The data is expected to be formatted in TFRecord format, as generated by this script.
If you do not have ImageNet dataset prepared, you can use a randomly generated
fake dataset to test the model. It is located at
Training the model
Train the model by executing the following command (substituting the appropriate values):
python amoeba_net.py \ --tpu=$TPU_NAME \ --data_dir=$DATA_DIR \ --model_dir=$MODEL_DIR
If you are not running this script on a GCE VM in the same project and zone as
your Cloud TPU, you will need to add the
specifying the corresponding values for the Cloud TPU you'd like to use.
This will train an AmoebaNet-D model on ImageNet with 256 batch size on a single Cloud TPU. With the default flags on everything, the model should train to above 80% accuracy in under 48 hours (including evaluation time every few epochs).
You can launch TensorBoard (e.g.
tensorboard -logdir=$MODEL_DIR) to view loss
curves and other metadata regarding your training run. (Note: if you launch
on your VM, be sure to configure ssh port forwarding or the GCE firewall rules
You can also train the AmoebaNet-D model to 93% top-5 accuracy in under 7.5 hours using the following command:
python amoeba_net.py \ --tpu=$TPU_NAME \ --data_dir=$DATA_DIR \ --model_dir=$MODEL_DIR \ --num_cells=6 \ --image_size=224 \ --num_epochs=35 \ --train_batch_size=1024 \ --eval_batch_size=1024 \ --lr=2.56 \ --lr_decay_value=0.88 \ --lr_warmup_epochs=0.35 \ --mode=train \ --iterations_per_loop=1251
Understanding the code
For more detailed information, read the documentation within each file.
About the model and training regime
The model is the result of evolutionary neural architecture search presented in Regularized Evolution for Image Classifier Search.
TODO: give some more details