About input sizes
To ensure correct semantic concatenations, it is advised to use input sizes that return even spatial dimensions in every block but the last in the encoder. For example: An input size of 120² gives intermediate output shapes of [60², 30², 15²] in the encoder path for a U-Net with depth=4 . A U-Net with depth=5 with the same input size is not recommended, as a maxpooling operation on odd spatial dimensions (e.g. on a 15² input) should be avoided.

To make our lives easier, we can numerically compute the maximum network depth for a given input dimension with a simple function:

In [1]:
shape = 1920


def compute_max_depth(shape, max_depth=10, print_out=True):
    shapes = []
    shapes.append(shape)
    for level in range(1, max_depth):
        if shape % 2 ** level == 0 and shape / 2 ** level > 1:
            shapes.append(shape / 2 ** level)
            if print_out:
                print(f'Level {level}: {shape / 2 ** level}')
        else:
            if print_out:
                print(f'Max-level: {level - 1}')
            break

    return shapes


out = compute_max_depth(shape, print_out=True, max_depth=10)

Level 1: 960.0
Level 2: 480.0
Level 3: 240.0
Level 4: 120.0
Level 5: 60.0
Level 6: 30.0
Level 7: 15.0
Max-level: 7


which tells us that that we can design a U-Net as deep as this without having to worry about semantic mismatches. Conversely, we can also numerically determine the possible input shapes dimensions for a given depth:

In [14]:
low = 10
high = 1024
depth = 5


def compute_possible_shapes(low, high, depth,print_out=True):
    possible_shapes = {}
    for shape in range(low, high + 1):
        shapes = compute_max_depth(shape,
                                   max_depth=depth,
                                   print_out=False)
        if len(shapes) == depth:
            possible_shapes[shape] = shapes
    # print(possible_shapes)
    return possible_shapes
possible_shapes = compute_possible_shapes(low, high, depth)
print(possible_shapes)


{32: [32, 16.0, 8.0, 4.0, 2.0], 48: [48, 24.0, 12.0, 6.0, 3.0], 64: [64, 32.0, 16.0, 8.0, 4.0], 80: [80, 40.0, 20.0, 10.0, 5.0], 96: [96, 48.0, 24.0, 12.0, 6.0], 112: [112, 56.0, 28.0, 14.0, 7.0], 128: [128, 64.0, 32.0, 16.0, 8.0], 144: [144, 72.0, 36.0, 18.0, 9.0], 160: [160, 80.0, 40.0, 20.0, 10.0], 176: [176, 88.0, 44.0, 22.0, 11.0], 192: [192, 96.0, 48.0, 24.0, 12.0], 208: [208, 104.0, 52.0, 26.0, 13.0], 224: [224, 112.0, 56.0, 28.0, 14.0], 240: [240, 120.0, 60.0, 30.0, 15.0], 256: [256, 128.0, 64.0, 32.0, 16.0], 272: [272, 136.0, 68.0, 34.0, 17.0], 288: [288, 144.0, 72.0, 36.0, 18.0], 304: [304, 152.0, 76.0, 38.0, 19.0], 320: [320, 160.0, 80.0, 40.0, 20.0], 336: [336, 168.0, 84.0, 42.0, 21.0], 352: [352, 176.0, 88.0, 44.0, 22.0], 368: [368, 184.0, 92.0, 46.0, 23.0], 384: [384, 192.0, 96.0, 48.0, 24.0], 400: [400, 200.0, 100.0, 50.0, 25.0], 416: [416, 208.0, 104.0, 52.0, 26.0], 432: [432, 216.0, 108.0, 54.0, 27.0], 448: [448, 224.0, 112.0, 56.0, 28.0], 464: [464, 232.0, 116.0, 58.0

In [15]:
max_depth

NameError: name 'max_depth' is not defined

which tells us that we can have 3 different input shapes with such a level 8 U-Net architecture. But I dare to say that such a network with this input size is probably not useful in practice.