<a href="https://colab.research.google.com/github/tesfayeamare/Unsupervised-Domain-Adaptation/blob/main/domain_adaptation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unsupervised domain adaptation

Domain adaptation is the process of adapting a model trained on one domain, called the source domain, to perform well on another domain, called the target domain.

## Formal definition

Given a source dataset $\mathcal S = \{S_i^S,y_i^S\}_{i=1}^{N_S}$ of images and associated labes, and an unlabelled target dataset $\mathcal T = \{X_i^T\}_{i=1}^{N_T}$, where $X_i\in\mathcal X$ and $y\in\mathcal Y$, $\mathcal Y\in\{1,2,\dots,K\}$, note that $K$ is the number of object categories.

The task is to learn a function $F_\theta:\mathcal X\to\mathcal Y$ with parameters $\theta$ that maps an input image $X$ to a class label $y$ and perform well on target data.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import math

from os.path import basename

from datetime import datetime

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as D

from torch.autograd import Function
from torch.utils.tensorboard import SummaryWriter

import torchvision
import torchvision.models as M
import torchvision.transforms as T

from torchvision import datasets

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir /content/drive/MyDrive/DeepLearning/runs

## Dataset

As required, in this assignment we will be using a subset of the [Adaptiope](https://openaccess.thecvf.com/content/WACV2021/html/Ringwald_Adaptiope_A_Modern_Benchmark_for_Unsupervised_Domain_Adaptation_WACV_2021_paper.html) object recognition dataset. It follows the popular split of 80%/20% training-test split.

The categories used are:

* backpack
* bookcase,
* car jack,
* comb,
* crown,
* file cabinet,
* flat iron,
* game controller,
* glasses,
* helicopter,
* ice skates,
* letter tray,
* monitor,
* mug,
* network switch,
* over-ear headphones,
* pen,
* purse,
* stand mirror, and
* stroller.

Domains used are: *product images* and *real life*, evaluation will be performed on both direction between the domains.

In [None]:
_root_path = '/content/drive/MyDrive/DeepLearning/Adaptiope'

dataset_paths = [
    (f'{_root_path}/product_images', f'{_root_path}/real_life'),
    (f'{_root_path}/real_life', f'{_root_path}/product_images')
]

## Helpers

This section includes helper functions useful for training and evaluation.

### Random transform

Random transformations of images can be useful to improve the generalization ability of the model. By randomly transforming the training data, the model is exposed to a wider variety of variations in the input data, which can help it to learn more robust features that are not sensitive to small changes in the data. This is especially true for domain adaptation, when the model must generalize bettern than usual to perform good on the target dataset.

Transformation includes:

- random crop,
- horizontal flip,
- [automatic augments](https://pytorch.org/vision/stable/auto_examples/plot_transforms.html#augmix),
- [color jitter](https://pytorch.org/vision/stable/auto_examples/plot_transforms.html#colorjitter), and
- grayscale.

They do not happen always for all images, but have a predefined chance of happing.

#### Examples

Some examples of color jitter and automatic augments:

![Color jitter](https://pytorch.org/vision/stable/_images/sphx_glr_plot_transforms_006.png)

![Automatic augments](https://pytorch.org/vision/stable/_images/sphx_glr_plot_transforms_023.png)

### Accuracy

As evaluation metric, we use the validation accuracy.

$$
\text{Accuracy} = \frac{\mathit{TP} + \mathit{TN}}{\mathit{TP} + \mathit{TN} + \mathit{FP} + \mathit{FN} }
$$

In [None]:
def batch_to_cuda(dataLoader, device='cuda:0'):
  '''
  Move a batch from a data loader to cuda device.
  '''
  for batch in dataLoader:
      for i, t in enumerate(batch):
          batch[i] = t.to(device)
      yield batch

def calc_split(dataset, split):
  '''
  Used to compute the split of a dataset by giving its percentage. 
  '''
  assert split[0] + split[1] == 1, 'Sum of the split must be 1'

  l = len(dataset)
  return math.floor(l * split[0]), math.ceil(l * split[1])

def get_randomTransforms(chance=0.2):
  '''
  Return a composition of random transformation with custom chance. If zero
  return an empty transformation.
  '''
  assert chance >= 0 and chance <= 1, 'Chance must be between 0 and 1'

  if chance == 0:
    return T.Compose([])

  return T.Compose([
    T.Resize(400),
    T.RandomApply([T.RandomCrop(300)], p=chance),
    T.RandomHorizontalFlip(p=0.5),
    T.RandomApply([T.AugMix()], p=chance),
    T.RandomApply([T.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5))], p=chance),
    T.RandomApply([T.ColorJitter(brightness=.5, hue=.3)], p=chance),
    T.RandomApply([T.Grayscale(3)], p=chance),
  ])

def get_dataLoaders(batch_size, path_source, path_target, source_transform, target_transform):
  '''
  Given the source and target path (and their respectively transforms) returns
  data loader objects for training and testing.
  '''

  ds_source = torchvision.datasets.ImageFolder(
    root=path_source, transform=source_transform)
  ds_target = torchvision.datasets.ImageFolder(
    root=path_target, transform=target_transform)
  
  ds_target_train, ds_target_test = D.random_split(
    ds_target, calc_split(ds_target, [0.8, 0.2]))

  return (
    D.DataLoader(
      ds_source, batch_size=batch_size, shuffle=True, num_workers=2),
    D.DataLoader(
      ds_target_train, batch_size=batch_size, shuffle=True, num_workers=2),
    D.DataLoader(
      ds_target_test, batch_size=batch_size, num_workers=2)
  )

def get_accuracy(outputs, targets):
  return (outputs.max(dim=1)[1].eq(targets).sum() / outputs.size(dim=0)).item()

## Model

The model used is an adversarial neural network inspired by [Unsupervised Domain Adaptation by Backpropagation](https://arxiv.org/abs/1409.7495). The architecture includes a deep feature extractor and a deep label predictor, which together form a standard feed-forward architecture. Unsupervised domain adaptation is then achiveded by adding a domain *domain classifier* connected to the feature extractor via a *gradient reversal layer* that multiplies the gradient by a certain negative constant during the backpropagation.

<center>
  <img width="500px" src="https://i.imgur.com/BwQZMXb.png"></img>
</center>



The focus is on learning features that combine:

*   discriminativeness, and
*   domain-invariance.

This is a achieved by jointly optimizing the the *label predictor*, that predicts the class labels and the domain classifier that discriminates between the source and target domain.

One important point of this idea is that the parameters of the underlying deep feature mapping are optimized in order to minimize the loss of the label classifier and to maximize the loss of the domain classifier. The latter encourages domain-invariant features to emerge in the course of the optimization.

In particular we wanted to test how it is possible to adapt the architecture proposed in the paper to different pre-trained networks.

### Gradient reversal

During the forward propagation, the layer acts as an identity transform. Instead during the backpropagation it takes the gradient from the subsequent level (domain classifier), and multiplies it by $-\lambda$. This allows to *maximize* the error on the domain classfier instead of minimizing it.

It is achieved by extending the [PyTorch autograd](https://pytorch.org/docs/stable/autograd.html) function.

#### Lambda

During the training the meta parameter $\lambda$ controls the trade-off between the two objectives that shape the features during learning (class classification and domain classification). It is a value that gradually change from 0 to 1, and is useful to supress the noisy signal from the domain classifier at early stage of the training procedure.

It is computed in the following way: 

$$
\lambda_p = \frac{2}{1+\exp(-\gamma\cdot p)} -1.
$$

Where $\gamma$ was set to 10, and $p$ is the current progress.

![Lambda](https://i.imgur.com/HLJNMCG.png)

In [None]:
class GradientReversal(Function):
  @staticmethod
  def forward(ctx, x, l):

    ctx.l = l
    
    return x

  @staticmethod
  def backward(ctx, grad_output):
    
    output =  - ctx.l * grad_output

    return output, None

### UDAB

We wanted to provide a framework as universal as possible. Hence the UDAB module takes as input the class classifier `y` and the domain classifier `d`. This are custom layers that should be designed with the architecture of the feature extractor in mind. For instance ResNet will prefer a flatter class classifier instead of AlexNet.

In [None]:
class UDAB(nn.Module):
  def __init__(self, y, d):
    super(UDAB, self).__init__()
    
    self.l = 1.0
    self.y = y
    self.d = d
  
  def forward(self, x):
    y = self.y(x)
    d = self.d(GradientReversal.apply(torch.clone(x), self.l))
    
    return y, d

## Training

#### Baseline

Train the model supervisedly on the source domain, and evaluate it, as it is, on the target domain. This passage is called **source only version** or baseline. Note that for our model this is simply accomplished by ignoring the domain classifier branch and by feeding at trainign time only source domain images and labels.

#### Domain Adaptation

The next **domain adaptation** part will enable the UDAB pluging by training also on the unlabelled target dataset. Hence here the loss of the domain classifier branch is used.

### Step-by-step

A summary of the training and evaluation process:

1. Train a model baseline version, also called source only, using fully the source domain. Evaluate on the target domain.
2. Train a model domain adaptation version, using fully the source domain and the images of the target domain, but not the labels. Evaluate on the target domain.
3. Compare the perfomance between the two versions.

Each step is repeated for every dataset (P to RW and RW to P) and for two different pre-trained network: ResNet18 and EfficientNet.



### Optimizer

The `get_optimizer` is used to change the learning rate of the Adam optimizer for specific layers of the model. It takes as input a model, a list of fast layers, and two learning rates (`lr_fast` and `lr_slow`). The layers marked as fast are trained with a faster learning rate, this usually corresponds to the fully connected classification layers at the end of the pre-trained model. 

In [None]:
def _startswith_all(name, values):
  for value in values:
    if name.startswith(value):
      return True
  return False

def get_optimizer(model, fast_layers, lr_fast, lr_slow):
  layers = ([], [])
  
  print(f'Learning rates: fast {lr_fast}, slow {lr_slow}')

  for name, param in model.named_parameters():
    #print(name)
    if _startswith_all(name, fast_layers):
      print(f'Fast layer: {name}')
      layers[0].append(param) # fast
    else:
      layers[1].append(param) # slow
          
  print('Other layers will be set to slow')

  return optim.Adam([
    { 'params': layers[0], 'lr': lr_fast },
    { 'params': layers[1], 'lr': lr_slow } 
  ])

In [None]:
def _get_d_source(size, value):
  '''
  Return the domain result tensor.
  '''
  return torch.tensor(value, dtype=torch.float).repeat(size, 1).cuda()

def _get_lambda(batch_idx, epoch_idx, batches, epochs):
  '''
  Lambda is computed given the current progression in the training phase.
  '''
  return 2 / (1 + math.exp(-10 * 
    ((batch_idx + epoch_idx * batches) / (epochs * batches)) )) - 1

def _step_baseline(
  model,
  dataLoader,
  loss_fn,
  optimizer,
  gradient_clipping_value=1,
):
  model.train()
  
  n = 0; sum_loss = 0; sum_accuracy = 0
  
  for x, y in batch_to_cuda(dataLoader):
    # forward step
    r, _ = model(x)
    # calculate loss
    loss = loss_fn(r, y)
    # calculate gradent
    loss.backward()
    # apply gradient clipping (if passed)
    if gradient_clipping_value is not None:
        torch.nn.utils.clip_grad_norm_(model.parameters(), gradient_clipping_value)
    # update parameters
    optimizer.step()
    # reset gradient
    optimizer.zero_grad()
    # compute statistics
    n += 1
    sum_loss += loss.item()
    sum_accuracy += get_accuracy(r, y)
  
  return sum_loss / n, sum_accuracy / n

def _step(
  model,
  epoch_idx,
  epochs,
  dl_source,
  dl_target,
  loss_fn_source_y,
  loss_fn_target_y,
  loss_fn_d,
  optimizer,
  gradient_clipping_value=1,
):
  model.train()
  # compute number of batches for lambda
  batches = min(len(dl_source), len(dl_target))
  
  n = 0; sum_loss = 0; sum_accuracy = 0
  
  for batch_idx, ((x_source, y_source), (x_target, _)) in enumerate(zip(dl_source, dl_target)):
    # calculate variable lambda
    model.l = _get_lambda(batch_idx, epoch_idx, batches, epochs)
    # train on source domain with labels
    x_source = x_source.cuda()
    y_source = y_source.cuda()
    r_source, d = model(x_source)
    # calculate losses
    loss_source_y = loss_fn_source_y(r_source, y_source)
    loss_source_d = loss_fn_d(d, _get_d_source(d.size(dim=0), [0, 1]))
    # train on target domain without labels
    x_target = x_target.cuda()
    r_target, d = model(x_target)
    # calculate loss
    loss_target_y = loss_fn_target_y(r_target)
    loss_target_d = loss_fn_d(d, _get_d_source(d.size(dim=0), [1, 0]))
    # sum losses
    loss = loss_source_y + loss_target_y + loss_source_d + loss_target_d
    # compute gradient
    loss.backward()
    # apply gradient clipping (if passed)
    if gradient_clipping_value is not None:
        torch.nn.utils.clip_grad_norm_(model.parameters(), gradient_clipping_value)
    # update parameters
    optimizer.step()
    # reset gradient
    optimizer.zero_grad()
    # compute statistics
    n += 1
    sum_loss += loss.item()
    sum_accuracy += get_accuracy(r_source, y_source)
  
  return sum_loss / n, sum_accuracy / n

def _eval(
  model,
  dataLoader,
  loss_fn,
):    
  model.eval()
  
  n = 0; sum_loss = 0; sum_accuracy = 0
  
  with torch.no_grad():
    for x, y in batch_to_cuda(dataLoader):
      # forward pass
      r, _ = model(x)
      # compute loss
      loss = loss_fn(r, y)
      # compute statistics
      n += 1
      sum_loss += loss.item()
      sum_accuracy += get_accuracy(r, y)
          
  return sum_loss / n, sum_accuracy / n

### Training parameters

#### Loss functions

Cross entropy loss is used for class classification when the label is known and domain classification. Instead, for entries from the target dataset entropy loss is used. This promotes the network to pick only one class label as result.

In [None]:
def entropyLoss(x):
  p = F.softmax(x, dim=1)
  q = F.log_softmax(x, dim=1)
  return -1. * (p * q).sum(-1).mean()

In [None]:
# class losses
loss_fn_source_y = nn.CrossEntropyLoss()
loss_fn_target_y = entropyLoss

# domain loss
loss_fn_d = nn.CrossEntropyLoss()

#### Epochs

Each experiment runs on 100 epochs.

In [None]:
epochs = 100

#### Learing rates

Learning rate `lr_slow` is for the feature extractor part of the pre-trained network, while `lr_slow` is for the class classification and domain classification part.

In [None]:
lr_fast = 1e-2
lr_slow = 1e-5

### Baseline

A helper function `training_loop_baseline` is invoked every time a new experiment is performed. This train the network on both datasets and save the results to tensorboard.

In [None]:
def training_loop_baseline(
  model,
  model_name,
  batch_size,
  optimizer_fast_layers,
  transforms,
  chance=0.2
):
  
  for idx, (source, target) in enumerate(dataset_paths):
    print(f'Dataset n. {idx+1}\nBatch size: {batch_size}\nModel: {model_name}')

    start = datetime.now()
    
    dl_source, dl_target, _ = get_dataLoaders(
        batch_size,
        source, target,
        source_transform=T.Compose([get_randomTransforms(chance), transforms]),
        target_transform=transforms)
    
    optimizer = get_optimizer(model, optimizer_fast_layers, lr_fast, lr_slow)

    writer = SummaryWriter(f'/content/drive/MyDrive/DeepLearning/runs/' + 
                            f'{model_name}_dataset-{idx+1}_baseline')

    for epoch_idx in range(epochs):

      print(f'\rEpoch n. {epoch_idx+1:03}/{epochs:03}..', end='')

      loss, accuracy = _step_baseline(
          model,
          dl_source,
          loss_fn_source_y,
          optimizer,
          1)

      writer.add_scalar('training/loss', loss, epoch_idx)
      writer.add_scalar('training/accuracy', accuracy, epoch_idx)

      loss, accuracy = _eval(
          model,
          dl_target,
          loss_fn_source_y)

      writer.add_scalar('validation/loss', loss, epoch_idx)
      writer.add_scalar('validation/accuracy', accuracy, epoch_idx)

  print(f'\rTraining completed in {datetime.now() - start}')

#### ResNet18

The architecture of the UDAB plugin is inspired to the fully connected layer of ResNet18: a flat linear layer.

```
ResNet18.fc = Linear(in_features=512, out_features=1000, bias=True)
```

In [None]:
model = M.resnet18(weights=M.ResNet18_Weights.DEFAULT)

in_features = model.fc.in_features; out_features=20

UDAB_ResNet18 = UDAB(
    y=nn.Sequential(
      nn.Linear(in_features, out_features)
    ),
    d=nn.Sequential(
      nn.Linear(in_features, 2)
    )
)

model.fc = UDAB_ResNet18

model.cuda()

training_loop_baseline(
    model,
    'UDAB_ResNet18_chance-0.0',
    64,
    ['fc.'],
    M.ResNet18_Weights.DEFAULT.transforms(),
    chance=0
)

#### EfficientNet

Also the architecture of the EfficientNet's UDAB plugin is inspired to the classifier layer of EfficientNet.

```
EfficientNet.classifier = Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)
```

In this way we do not change the original network concept but just adapt the output layers to the new classification problem. This also allows to pluignthe domain classification branch.

In [None]:
model = M.efficientnet_b0(weights=M.EfficientNet_B0_Weights.DEFAULT)

in_features = model.classifier.in_features; out_features=20

UDAB_EfficientNet = UDAB(
    y=nn.Sequential(
      nn.Dropout(0.2),
      nn.Linear(in_features, out_features),
    ),
    d=nn.Sequential(
      nn.Dropout(0.2),
      nn.Linear(in_features, 2)
    )
)

model.classifier = UDAB_EfficientNet()

model.cuda()

training_loop_baseline(
    model,
    'UDAB_EfficientNet',
    64,
    ['fc.'],
    M.EfficientNet_B0_Weights.DEFAULT.transforms()
)

Dataset n. 1
Batch size: 64
Model: EfficientNet-UDAB1
Learning rates: fast 0.01, slow 1e-05
Other layers will be set to slow
Training completed in 2:19:58.966247
Dataset n. 2
Batch size: 64
Model: EfficientNet-UDAB1
Learning rates: fast 0.01, slow 1e-05
Other layers will be set to slow
Training completed in 2:26:30.250352


### Domain adaptation

A helper function `training_loop_domainAdaptation` is invoked every time a new experiment is performed. This f. train the network on both datasets and save the results to tensorboard.

In [None]:
def training_loop_domainAdaptation(
  model,
  model_name,
  batch_size,
  optimizer_fast_layers,
  transforms,
):
    
  for idx, (source, target) in enumerate(dataset_paths):
    print(f'Dataset n. {idx+1}\nModel: {model_name}')

    start = datetime.now()
    
    dl_source, dl_target_train, dl_target_test = get_dataLoaders(
      batch_size, source, target,
      source_transform=T.Compose([get_randomTransforms(), transforms]),
      target_transform=transforms)
    
    optimizer = get_optimizer(model, optimizer_fast_layers, lr_fast, lr_slow)

    writer = SummaryWriter(f'/content/drive/MyDrive/DeepLearning/runs/' + 
                            f'{model_name}_dataset-{idx+1}_domainAdapatation')

    for epoch_idx in range(epochs):

      print(f'\rEpoch n. {epoch_idx+1:03}/{epochs:03}..', end='')

      loss, accuracy = _step(
        model,
        epoch_idx,
        epochs,
        dl_source,
        dl_target_train,
        loss_fn_source_y,
        loss_fn_target_y,
        loss_fn_d,
        optimizer)

      writer.add_scalar('training/loss', loss, epoch_idx)
      writer.add_scalar('training/accuracy', accuracy, epoch_idx)

      loss, accuracy = _eval(
        model,
        dl_target_test,
        loss_fn_source_y)

      writer.add_scalar('validation/loss', loss, epoch_idx)
      writer.add_scalar('validation/accuracy', accuracy, epoch_idx)

  print(f'\rTraining completed in {datetime.now() - start}')

#### ResNet18

In [None]:
model = M.resnet18(weights=M.ResNet18_Weights.DEFAULT)

in_features = model.fc.in_features; out_features=20

UDAB_ResNet18 = UDAB(
    y=nn.Sequential(
      nn.Linear(in_features, out_features)
    ),
    d=nn.Sequential(
      nn.Linear(in_features, 2)
    )
)

model.fc = UDAB_ResNet18

model.cuda()

training_loop_domainAdaptation(
    model,
    'UDAB_ResNet18',
    64,
    ['fc.'],
    M.ResNet18_Weights.DEFAULT.transforms()
)

Dataset n. 1
Model: resnet18-UDAB1
Learning rates: fast 0.01, slow 1e-05
Fast layer: fc.y.0.weight
Fast layer: fc.y.0.bias
Fast layer: fc.y.1.weight
Fast layer: fc.y.1.bias
Fast layer: fc.y.3.weight
Fast layer: fc.y.3.bias
Fast layer: fc.y.4.weight
Fast layer: fc.y.4.bias
Fast layer: fc.d.0.weight
Fast layer: fc.d.0.bias
Other layers will be set to slow
Training completed in 1:45:35.919719
Dataset n. 2
Model: resnet18-UDAB1
Learning rates: fast 0.01, slow 1e-05
Fast layer: fc.y.0.weight
Fast layer: fc.y.0.bias
Fast layer: fc.y.1.weight
Fast layer: fc.y.1.bias
Fast layer: fc.y.3.weight
Fast layer: fc.y.3.bias
Fast layer: fc.y.4.weight
Fast layer: fc.y.4.bias
Fast layer: fc.d.0.weight
Fast layer: fc.d.0.bias
Other layers will be set to slow
Training completed in 1:57:46.764686


#### EfficientNet 

In [None]:
model = M.efficientnet_b0(weights=M.EfficientNet_B0_Weights.DEFAULT)

in_features = model.fc.in_features; out_features=20

UDAB_EfficientNet = UDAB(
    y=nn.Sequential(
      nn.Linear(in_features, 256),
      nn.BatchNorm1d(256),
      nn.Dropout(0.2),
      nn.Linear(256 , 128),
      nn.Linear(128 , out_features)
    ),
    d=nn.Sequential(
      nn.Linear(in_features, 2)
    )
)

model.classifier = UDAB_EfficientNet()

model.cuda()

training_loop_domainAdaptation(
    model,
    'EfficientNet-UDAB',
    64,
    ['fc.'],
    M.EfficientNet_B0_Weights.DEFAULT.transforms()
)

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-3dd342df.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-3dd342df.pth


  0%|          | 0.00/20.5M [00:00<?, ?B/s]

Dataset n. 1
Model: EfficientNet-UDAB1
Learning rates: fast 0.01, slow 1e-05
Other layers will be set to slow
Training completed in 2:00:53.955944
Dataset n. 2
Model: EfficientNet-UDAB1
Learning rates: fast 0.01, slow 1e-05
Other layers will be set to slow
Training completed in 2:05:08.022986


## Result

The following plots are the result of training the models ResNet18 and EfficientNet with the UDAB plugin as seen before both on dataset 1 (product to real world) and dataset 2 (real world to product).

![](https://i.imgur.com/P7IwBr2.png)

![](https://i.imgur.com/2SQgxik.png)

![](https://i.imgur.com/K0xuOiI.png)

![](https://i.imgur.com/xeNAqGE.png)


Note how dataset-1 has worse performance due the intrinsic characteristics of the dataset. I.e., it is much more easier to learn to predict product images form training on real-world object than the opposite.

### Conclusion

Table with validation accuracy at 100 epochs:

| Model        | Dataset | Baseline | Domain adaptation | Gain |
|--------------|---------|----------|-------------------|------|
| ResNet18     | P to RW | 0.74     | 0.83              | 0.09 |
| ResNet18     | RW to P | 0.96     | 0.99              | 0.03 |
| EfficientNet | P to RW | 0.76     | 0.85              | 0.09 |
| EfficientNet | RW to P | 0.98     | 0.99              | 0.01 |

EfficientNet has higher baseline values, therefore it achieve higher values on the domain adaptation part. So if the accuracy is the most important factor then EfficientNet is the best model. However the difference with ResNet18 is very small, and for challenging datasets ResNet is able to learn much faster. For instance, ResNet on dataset-1 vs. EfficientNet on dataset-1 at 20 epochs on the domain adaptation have 0.84 vs. 0.77 accuracy (see figure 1 vs. figure 3).