## EfficientNets - Base Model

To perform transfer learning and fine-tuning, we will first train our base model with EfficientNet as a "feature extractor".

In [1]:
# Define key variables
import sys
import os
import pickle
import sagemaker
from sagemaker.session import Session

sys.path.append('../source')
session = Session()
bucket = session.default_bucket()
role = sagemaker.get_execution_role()

Load the metadata from preprocessing.

***Note: Make sure you have preprocessed `EfficientNet-b3` data for training the base model.***

In [2]:
root_dir = '../data/mit_indoor_67/metadata/'
efficientnet = 'efficientnet-b0'  # change to different versions for training base model
metadata_file = root_dir + efficientnet.replace("-", "_") + ".pkl"
metadata = pickle.load(open(metadata_file, 'rb'))

Define output_path, source directory and dependencies

In [3]:
prefix = 'mit_indoor_67'
output_path = os.path.join('s3://', bucket, prefix)
print('model artefacts will be saved to: {}'.format(output_path))

model artefacts will be saved to: s3://sagemaker-us-east-2-194071253362/mit_indoor_67


In [4]:
# Define source directory in training
source_dir = '../source'
dependencies = ['../source/dataset', '../source/utils']

The training script:

*Note: Except optimizer selection, Dataloading, Model selection, and training are all wrapped in helper classes/functions.*

In [5]:
!pygmentize ../source/main.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36moptim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtime[39;49;00m
[34mimport[39;49;00m [04m[36mcopy[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36mrandom[39;49;00m
[34mfrom[39;49;00m [04m[36mglob[39;49;00m [34mimport[39;49;00m glob
[34mfrom[39;49;00m [04m[36mPIL[39;4

Set job name

In [6]:
from time import gmtime, strftime
# Change model name if replicating final model
job_name = "mitindoor67-{}-base-{}".format(efficientnet, strftime("%Y-%m-%d-%H-%M-%S", gmtime()))
# job_name = "mitindoor67-{}-final-{}".format(efficientnet, strftime("%Y-%m-%d-%H-%M-%S", gmtime()))
print(job_name)

mitindoor67-efficientnet-b0-base-2020-12-10-03-30-18


### Model Training

Set instance details, Pytorch framework version and hyperparameters:

We will use an `AdamW` optimizer with default setting: `lr` (learning rate) of 0.001 and `weight_decay` of 0.01, and `subsetrandom` sampling. We will use simple fully connected layer: add a `dropout` of 0.5 and a final classification layer with `num_classes` = 67.

To make sure the base model converge and do not overfit, we initially set the model to be trained for 50 epochs with `patience` of 5 - such that if the model is not improving for 5 consecutive epochs, we will implement early stopping.

In [7]:
entry_point = 'main.py'  # training script
instance_type = 'ml.p3.2xlarge'  # training on ml.p2.xlarge may take hours hence we switch to ml.p3.2xlarge
instance_count = 1  # number of instance
framework_version = '1.6.0'  # Pytorch version
py_version = 'py3'  # Python version

# Comment out lines below to train Final Model
hyperparameters = {
                    'model' : 'EfficientNet-lite0',
                    'epochs': 20, # Set to 20 epochs initially
                    'batch-size' : 32, 
                    'sampling' : 'subsetrandom',
                    'optimizer' : 'adamw',
                    'workers' : 7,  # num_cpu - 1
                    'blocks-unfrozen' : 0,
                    'patience' : 3,
                    'dropout' : 0.5
                }  


# Uncomment lines below to train Final Model
# hyperparameters = {
#                     'model' : 'EfficientNet-b3',
#                     'epochs': 20, # Set to 20 epochs initially
#                     'batch-size' : 32, 
#                     'sampling' : 'subsetrandom',
#                     'optimizer' : 'adamw',
#                     'workers' : 7,  # num_cpu - 1
#                     'blocks-unfrozen' : 23,
#                     'lr' : 1e-4,
#                     'weight-decay' : 0.0072299723855532155,
#                     'dropout' : 0.7005047299544908,
#                     'patience' : 3,
#                 }  

Construct the estimator

In [8]:
from sagemaker.pytorch import PyTorch

# initial attempt
estimator = PyTorch(entry_point=entry_point,
                    source_dir=source_dir,
                    dependencies=dependencies,
                    role=role,
                    instance_count=instance_count,
                    instance_type=instance_type,
                    framework_version=framework_version,
                    py_version=py_version,
                    output_path=output_path,
                    sagemaker_session=session,
                    hyperparameters=hyperparameters)

### Fitting the data

Fit the training and validation data

In [9]:
# Comment out the lines below to train final model
estimator.fit({
    'train': metadata['train'],
    'val' : metadata['val']},
    job_name=job_name,
    wait = False)

# Uncomment lines below to replicate training of Final Model
# base_model_job = 'mitindoor67-efficientnet-b3-base-2020-11-28-07-08-59' # BASE JOB NAME HERE
# base_model = os.path.join(output_path, base_model_job, 'output', 'model.tar.gz')

# print(base_model)

# estimator.fit({
#     'train': metadata['train'],
#     'val' : metadata['val'],
#     'base' : base_model},
#     job_name=job_name,
#     wait = False)

Once we have trained the base model, we shall begin with hyperparameter tuning for our final model. We will feed in the base model, unfreeze a fraction of layers and [tuning hyperparameters by Bayesian search](./EfficientNets-HPO.ipynb) using AWS Sagemaker's Hyperparameter Tuner.