# `keras-unet-collection.models` user guide

This user guide is proposed for `keras-unet-collection == 0.0.10`

In [1]:
import tensorflow as tf
from tensorflow import keras

In [2]:
print('TensorFlow {}; Keras {}'.format(tf.__version__, keras.__version__))

TensorFlow 2.3.0; Keras 2.4.0


### Step 1: importing `models` from `keras_unet_collection`

In [3]:
from keras_unet_collection import models

### Step 2: defining your hyper-parameters

Commonly used hyper-parameter options are listed as follows. Full details are available through the Python helper function:

* `inpust_size`: a tuple or list that defines the shape of input tensors. `models.resunet_a_2d` supports int only, others support NoneType. Note that NoneType cannot be used with PReLU activation. 

* `filter_num`: a list that defines the number of convolutional filters per down- and up-sampling blocks.
    * For `unet_2d`, `att_unet_2d`, `unet_plus_2d`, `r2_unet_2d`, depth $\ge$ 2 is expected.
    * For `resunet_a_2d` and `u2net_2d`, depth $\ge$ 3 is expected.


* `n_labels`: number of output targets, e.g., `n_labels=2` for binary classification.

* `activation`: the activation function of hidden layers. Available choices are "ReLU", "LeakyReLU", "PReLU", "ELU", "GELU", "Snake".

* `output_activation`: the activation function of the output layer. Recommended choices are "Sigmoid", "Softmax", None (linear), "Snake".

* `batch_norm`: if specified as True, all convolutional layers will be configured as stacked "Conv2D-BN-Activation" blocks.

* `stack_num_down`: number of stacked convolutional layers per downsampling level.

* `stack_num_up`: number of stacked convolutional layers (after concatenation) per upsampling level. 

* `pool`: if specified as False, the downsampling (encoding) blocks will be configured with stridden convolutional layers (2-by-2 linear kernels with 2 strides and activation function). Otherwise, (pool=True) max-pooling is used. 

* `unpool`: if specified as False, the upsampling (decoding) blocks will be configured with transpose convolutional layers (2-by-2 transpose kernels with 2 strides and activation function). Otherwise (unpool=True), reflective padding is used. 
    
* `name`: user-specified prefix of the configured layer and model. Use `keras.models.Model.summary` to identify the exact name of each layer.

### Step 3: Configuring your model (examples are provided)

**Example 1**: U-net for binary classification with:

1. Five down- and upsampliung levels (or four downsampling levels and one bottom level).

2. Two convolutional layers per downsampling level.

3. One convolutional layer (after concatenation) per upsamling level.

2. Gaussian Error Linear Unit (GELU) activcation, Softmax output activation, batch normalization.

3. Downsampling through Maxpooling.

4. Upsampling through reflective padding.

In [4]:
unet = models.unet_2d((None, None, 3), [64, 128, 256, 512, 1024], n_labels=2,
                      stack_num_down=2, stack_num_up=1,
                      activation='GELU', output_activation='Softmax', 
                      batch_norm=True, pool=True, unpool=True, name='unet')

**Example 2**: (2-d) Vnet for binary classification with:

1. Input size of (256, 256, 1); PReLU does not support input tensor with shapes of NoneType 



1. Five down- and upsampliung levels (or four downsampling levels and one bottom level).

1. Number of stacked convolutional layers of the residual path increase with downsampling levels from one to three (symmetrically, decrease with upsampling levels).   
    * `res_num_ini=1`
    * `res_num_max=3`
    
 
2. PReLU activcation, Softmax output activation, batch normalization.

3. Downsampling through stride convolutional layers.

4. Upsampling through transpose convolutional layers.

In [5]:
vnet = models.vnet_2d((256, 256, 1), filter_num=[16, 32, 64, 128, 256], n_labels=2,
                      res_num_ini=1, res_num_max=3, 
                      activation='PReLU', output_activation='Softmax', 
                      batch_norm=True, pool=False, unpool=False, name='vnet')

**Example 3**: attention-Unet for single target regression with:

1. Four down- and upsampling levels.

2. Two convolutional layers per downsampling level.

3. Two convolutional layers (after concatenation) per upsampling level.

2. ReLU activation, Linear output activation (None), batch normalization.

3. Additive attention, ReLU attention activation.
        
4. Downsampling through stride convolutional layers.

5. Upsampling through transpose convolutional layers.   
        

In [6]:
att_unet = models.att_unet_2d((None, None, 3), [64, 128, 256, 512], n_labels=1,
                              stack_num_down=2, stack_num_up=2,
                              activation='ReLU', atten_activation='ReLU', attention='add', output_activation=None, 
                              batch_norm=True, pool=False, unpool=False, name='att-unet')

**Example 4**: U-net++ for three-label classification with:

1. Four down- and upsampling levels.

2. Two convolutional layers per downsampling level.

3. Two convolutional layers (after concatenation) per upsampling level.

2. LeakyReLU activation, Softmax output activation, no batch normalization.
        
3. Downsampling through Maxpooling.

4. Upsampling through transpose convolutional layers.

5. Deep supervision.

In [7]:
xnet = models.unet_plus_2d((None, None, 3), [64, 128, 256, 512], n_labels=3,
                           stack_num_down=2, stack_num_up=2,
                           activation='LeakyReLU', output_activation='Softmax', 
                           batch_norm=False, pool=True, unpool=False, deep_supervision=True, name='xnet')

----------
deep_supervision = True
names of output tensors are listed as follows (the last one is the final output):
	xnet_output_sup1
	xnet_output_sup2
	xnet_output_final


**Example 5**: UNet 3+ for binary classification with:

1. Four down- and upsampling levels.

2. Two convolutional layers per downsampling level.

3. One convolutional layers (after concatenation) per upsampling level.

2. ReLU activation, Sigmoid output activation, batch normalization.
        
3. Downsampling through Maxpooling.

4. Upsampling through transpose convolutional layers.

5. Deep supervision.

In [8]:
unet3plus = models.unet_3plus_2d((None, None, 3), n_labels=2, filter_num_down=[64, 128, 256, 512], 
                                  filter_num_skip='auto', filter_num_aggregate='auto', 
                                  stack_num_down=2, stack_num_up=1, activation='ReLU', output_activation='Sigmoid',
                                  batch_norm=False, pool=True, unpool=False, deep_supervision=True, name='unet3plus')

Automated hyper-parameter determination is applied with the following details:
----------
	Number of convolution filters after each full-scale skip connection: filter_num_skip = [64, 64, 64]
	Number of channels of full-scale aggregated feature maps: filter_num_aggregate = 256
----------
deep_supervision = True
names of output tensors are listed as follows (the last one is the final output):
	unet3plus_output_sup0_activation
	unet3plus_output_sup1_activation
	unet3plus_output_final_activation


* `filter_num_skip` and `filter_num_aggregate` can be specified explicitly:

In [9]:
unet3plus = models.unet_3plus_2d((512, 512, 3), n_labels=2, filter_num_down=[64, 128, 256, 512], 
                                  filter_num_skip=[64, 64, 64], filter_num_aggregate=256, 
                                  stack_num_down=2, stack_num_up=1, activation='ReLU', output_activation='Sigmoid',
                                  batch_norm=False, pool=True, unpool=False, deep_supervision=True, name='unet3plus')

----------
deep_supervision = True
names of output tensors are listed as follows (the last one is the final output):
	unet3plus_output_sup0_activation
	unet3plus_output_sup1_activation
	unet3plus_output_final_activation


**Example 6**: R2U-net for binary classification with:

1. Four down- and upsampling levels.

2. Two recurrent convolutional layers with two iterations per down- and upsampling level.

2. ReLU activation, Softmax output activation, no batch normalization.
        
3. Downsampling through Maxpooling.

4. Upsampling through reflective padding.

In [10]:
r2_unet = models.r2_unet_2d((None, None, 3), [64, 128, 256, 512], n_labels=2,
                            stack_num_down=2, stack_num_up=1, recur_num=2,
                            activation='ReLU', output_activation='Softmax', 
                            batch_norm=True, pool=True, unpool=True, name='r2-unet')

**Example 7**: ResUnet-a for 16-label classification with:

1. input size of (128, 128, 3)

1. Six downsampling levels followed by an Atrous Spatial Pyramid Pooling (ASPP) layer with 256 filters.

1. Six upsampling levels followed by an ASPP layer with 128 filters.

2. dilation rates of {1, 3, 15, 31} for shallow layers, {1,3,15} for intermediate layers, and {1,} for deep layers.

3. ReLU activation, Sigmoid output activation, batch normalization.

4. Upsampling through reflective padding.

* (Downsampling is fixed to strided convolutional layers)

In [11]:
resunet_a = models.resunet_a_2d((128, 128, 3), [32, 64, 128, 256, 512, 1024], 
                                dilation_num=[1, 3, 15, 31], 
                                n_labels=16, aspp_num_down=256, aspp_num_up=128, 
                                activation='ReLU', output_activation='Sigmoid', 
                                batch_norm=True, unpool=True, name='resunet')

Received dilation rates: [1, 3, 15, 31]
Received dilation rates are not defined on a per downsampling level basis.
Automated determinations are applied with the following details:
	depth-0, dilation_rate = [1, 3, 15, 31]
	depth-1, dilation_rate = [1, 3, 15, 31]
	depth-2, dilation_rate = [1, 3, 15]
	depth-3, dilation_rate = [1, 3, 15]
	depth-4, dilation_rate = [1]
	depth-5, dilation_rate = [1]


* `dilation_num` can be specified per down- and uplampling level:

In [12]:
resunet_a = models.resunet_a_2d((128, 128, 3), [32, 64, 128, 256, 512, 1024], 
                                dilation_num=[[1, 3, 15, 31], [1, 3, 15, 31], [1, 3, 15], [1, 3, 15], [1,], [1,],],
                                n_labels=16, aspp_num_down=256, aspp_num_up=128, 
                                activation='ReLU', output_activation='Sigmoid', 
                                batch_norm=True, unpool=True, name='resunet')

**Example 8**: U^2-Net for binary classification with:

1. Six downsampling levels with the first four layers built with RSU, and the last two (one downsampling layer, one bottom layer) built with RSU-F4.
    * `filter_num_down=[64, 128, 256, 512]`
    * `filter_mid_num_down=[32, 32, 64, 128]`
    * `filter_4f_num=[512, 512]`
    * `filter_4f_mid_num=[256, 256]`
    
    
1. Six upsampling levels with the deepest layer built with RSU-F4, and the other four layers built with RSU.
    * `filter_num_up=[64, 64, 128, 256]`
    * `filter_mid_num_up=[16, 32, 64, 128]`
    
    
3. ReLU activation, Sigmoid output activation, batch normalization.

4. Deep supervision

5. Downsampling through stride convolutional layers.

6. Upsampling through transpose convolutional layers.

*In the original work of U^2-Net, down- and upsampling were achieved through maxpooling (`pool=True`) and bilinear interpolation (`unpool=True`).

In [13]:
u2net = models.u2net_2d((None, None, 3), n_labels=2, 
                        filter_num_down=[64, 128, 256, 512], filter_num_up=[64, 64, 128, 256], 
                        filter_mid_num_down=[32, 32, 64, 128], filter_mid_num_up=[16, 32, 64, 128], 
                        filter_4f_num=[512, 512], filter_4f_mid_num=[256, 256], 
                        activation='ReLU', output_activation='Sigmoid', 
                        batch_norm=True, pool=False, unpool=False, deep_supervision=True, name='u2net')

----------
The depth of u2net_2d = len(filter_num_down) + len(filter_4f_num) = 6
----------
deep_supervision = True
names of output tensors are listed as follows (the last one is the final output):
	u2net_output_sup0_activation
	u2net_output_sup1_activation
	u2net_output_sup2_activation
	u2net_output_sup3_activation
	u2net_output_sup4_activation
	u2net_output_sup5_activation
	u2net_output_final_activation


* `u2net_2d` supports automated determination of filter numbers per down- and upsampling level. Auto-mode may produce a slightly larger network.

In [14]:
u2net = models.u2net_2d((None, None, 3), n_labels=2, 
                        filter_num_down=[64, 128, 256, 512],
                        activation='ReLU', output_activation='Sigmoid', 
                        batch_norm=True, deep_supervision=True, name='u2net')

Automated hyper-parameter determination is applied with the following details:
----------
	Number of RSU output channels within downsampling blocks: filter_num_down = [64, 128, 256, 512]
	Number of RSU intermediate channels within downsampling blocks: filter_mid_num_down = [16, 32, 64, 128]
	Number of RSU output channels within upsampling blocks: filter_num_up = [64, 128, 256, 512]
	Number of RSU intermediate channels within upsampling blocks: filter_mid_num_up = [16, 32, 64, 128]
	Number of RSU-4F output channels within downsampling and bottom blocks: filter_4f_num = [512, 512]
	Number of RSU-4F intermediate channels within downsampling and bottom blocks: filter_4f_num = [256, 256]
----------
Explicitly specifying keywords listed above if their "auto" settings do not satisfy your needs
----------
The depth of u2net_2d = len(filter_num_down) + len(filter_4f_num) = 6
----------
deep_supervision = True
names of output tensors are listed as follows (the last one is the final output):
	u2n