# Reproducibility Project: WAMDA: Weighted Alignment of Sources for Multi-source Domain Adaptation

### Group 10 Jiaming Xu (5247578) and Siwei Wang (5239982)

## Introduction
One of the learning goals for Deep Learning course is to reproduce a paper given by the course instructor. This blog gives clear information about our reproduce work for the paper WAMDA: Weighted Alignment of Sources for Multi-source Domain Adaptation from Surbhi Aggarwal, Jogendra Nath Kundu, R. Venkatesh Babu, and Anirban Chakraborty. Our reproduction is based on the original paper, also some other papers related to Multi-source Domain Adaptation，and online resources.

The paper presents a novel method for Multi-source Domain Adaptation named WAMDA which uses multiple sources to train a predictor based on their internal relevance and their relevance score related to the target. Our work is to reproduce the proposed approach on only one dataset **OfficeHome dataset** and the other two methods (Resnet and MFSAN for evaluating the effectiveness of the proposed method).

## Model sturcture

WAMDA is a method that can do effective multi-source domain adaptation based on the source-target and source-source similarities. The following gives information about the basic structure.

The proposed algorithm is divided into two parts. The first stage is *pre-adaptation training* , we can obtain the relevance score, feature extractor, source classifier, and domain classifier from this stage. Then, the other stage is *multi-source adaptation training* , the weighted alignment of domains is performed, classifiers and a target encoder are learned based on this weighted aligned space. The basic model structure is shown as follows: ![avatar](imgs/process.png)


## Experiment Setup

The original paper did experiments on three datasets(*Office-31, Office-Caltech, and Office-Home*). In addition, it uses four types of baseline(*No Adapt, Single-Source Best, Single-Source Best, and Multi-Source* to analyze the performance of MSDA methods. For our experiment, we only did experiment on *office-Home* dataset and there are only two types of baseline we have implemented: (1) *No Adapt*: Resnet and the Proposed method (2)*Multi-Source*: MFSAN and the Proposed method. Last but not the least, the implementation steps are just the same as the original paper and will be described in the next section. Due to time limitations, we only have to implement the first row and the third row of table 5 for **OfficeHomeDataset**.

## Our implementaion
All the experiments were done on Google Colab, and the framework we used is PyTorch. This method is achieved by these steps. First, the dataset was downloaded and transferred into a certain format. Then，we trained feature extractors, source classifiers, and a domain classifier based on the datasets. After that, we extract the relevance scores from the last step, and the scores will be used in the following steps. We also trained weighted aligned source encoders, target classifiers, and a target encoder.

The following libaries are used in onr experiment.

In [None]:
import time
import copy
import torch
import numpy as np
import os
from tqdm import tqdm
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import transforms
import torch.nn as nn
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
import os
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
from random import sample

## Dataset

We export the *Office-home* dataset to Google Cloud Disk and decompress it. The Office-Home dataset has been created to evaluate domain adaptation algorithms for object recognition using deep learning. It consists of images from 4 different domains: Artistic images, Clip Art, Product images, and Real-World images. For each domain, the dataset contains images of 65 object categories found typically in Office and Home settings. The following is going to show the eg from the dataset. ![avatar](imgs/dataset.jpg)

Then, we implement a class to load the data and assign each class a different label. In addition, we also implement some settings of one-hot, transformer, balance setting which can be used according to different model requirements. The following code can show the process:

In [None]:
class OfficeHomeDataset(Dataset):
    def __init__(self, data_path, domain="Real World", balance=False, one_hot=False, transform=None):
        self.transform = transform
        self.domain = domain
        self.balance = balance
        self.one_hot = one_hot

        # label dict
        self.label_dict = {"Art": 0, "Clipart":1, "Product":2, 'Real World': 3}

        # Read all file names
        self.file_names = []
        if self.domain is None:
            self.n_classes = 3
            for root, dirs, files in os.walk(data_path):
                for filename in files:
                    if filename == ".DS_Store": continue
                    elif os.path.splitext(filename)[-1] == ".txt": continue
                    self.file_names.append(os.path.join(root, filename))
        else:
            self.n_classes = 2
            domain_file = []
            source_file = []
            for root, dirs, files in os.walk(data_path):
                if self.domain in root:
                    for filename in files:
                        if filename == ".DS_Store": continue
                        elif os.path.splitext(filename)[-1] == ".txt": continue
                        domain_file.append(os.path.join(root, filename))
                else:
                    for filename in files:
                        if filename == ".DS_Store": continue
                        elif os.path.splitext(filename)[-1] == ".txt": continue
                        source_file.append(os.path.join(root, filename))
            if balance:
                self.file_names = domain_file + sample(source_file, len(domain_file))
            else:
                self.file_names = domain_file + source_file
        
        print(len(self.file_names))
        # self.file_names = sample(self.file_names, 200)

    def __len__(self):
        return len(self.file_names)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        label = []
        filename = self.file_names[idx]
        img = Image.open(filename)
        if self.transform:
            img = self.transform(img)
        # print(img.shape, filename)
        source_name = filename.split('/')[-3]
        if self.domain is None:
            label.append(self.label_dict[source_name])
        else:
            if source_name == self.domain:
                label.append(1)
            else: label.append(0)
        if self.one_hot:
            label = np.array(label)
            label = np.eye(self.n_classes)[label]
            label = np.float32(label)
        else:
            label = np.array(label)
        # sample = {'image': img, 'label': label}
        sample = [img, label]

        return sample

And we used two ways to create dataloaders, they are used for class classification and domain classification tasks separately:

In [None]:
# These two methods are used for class classification
def load_training(root_path, dir, batch_size, kwargs):
    transform = transforms.Compose(
        [transforms.Resize([256, 256]),
         transforms.RandomCrop(224),
         transforms.RandomHorizontalFlip(),
         transforms.ToTensor()])
    data = datasets.ImageFolder(root=os.path.join(root_path, dir), transform=transform)
    train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, drop_last=True, **kwargs)
    return train_loader

def load_testing(root_path, dir, batch_size, kwargs):
    transform = transforms.Compose(
        [transforms.Resize([224, 224]),
         transforms.ToTensor()])
    data = datasets.ImageFolder(root=os.path.join(root_path, dir), transform=transform)
    test_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, **kwargs)
    return test_loader

In [1]:
# These two methods are used for domain classification
def load_training(root_path, dir, batch_size, kwargs):
    transform = transforms.Compose(
        [transforms.Resize([256, 256]),
         transforms.RandomCrop(224),
         transforms.RandomHorizontalFlip(),
         transforms.ToTensor()])
    data = OfficeHomeDataset(os.path.join(root_path, dir), transform=transform)
    train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, drop_last=True, **kwargs)
    return train_loader

def load_testing(root_path, dir, batch_size, kwargs):
    transform = transforms.Compose(
        [transforms.Resize([224, 224]),
         transforms.ToTensor()])
    data = OfficeHomeDataset(os.path.join(root_path, dir), transform=transform)
    test_loader = torch.utils.data.DataLoader(data, batch_size=batch_size, shuffle=True, **kwargs)
    return test_loader

## Model stucture

In the pre-adaptation stage, we need to train a source-specific feature extractor $F_{si}$ and a source-specificclassifier $Q_{si}$ for each specific source $S_i$. For the domain, there is a domainclassifier($D_{Si}$) to be trained. In addition, in the second stage, the weighted aligned source encoder $E_{Si}$ for each source $S_i$, and the target encoder $E_T$ as well as the target classifier $Q_T$ need to be trained. The following section is going to describe each model struture in details.


### Feature Extractor and Source-specificclassifier

The archtercture of the $F_{si}$ is implemented by the following layers: 

* ImageNet pre-trained ResNet-50 till average pool layer
* Linear FC (2048, 1024) + ELU
* Linear FC (1024, 1024) + BatchNorm + ELU
* Linear FC (1024, f _dim) + ELU
* Linear FC ( f _dim, f _dim) + BatchNorm + ELU 

For the source-specificclassifier $Q_{si}$, It adds a layer on the basis of feature extractor

* LinearFC(f_dim,3)

We used pretrained Resnet model for the first layer, and the training details for Office-Home can be found in the appendix of the original paper. Our implementaion of the Feature extrator and the source-specificclassifier can be found as follings:

In [None]:
  class SourceClassifer(nn.Module):
    def __init__(self, f_dim=256, n_classes=65):
        super(SourceClassifer, self).__init__()

        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=False)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50

        self.extractor1 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.extractor2 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.extractor3 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.cls1 = nn.Linear(self.f_dim, self.n_classes)
        self.cls2 = nn.Linear(self.f_dim, self.n_classes)
        self.cls3 = nn.Linear(self.f_dim, self.n_classes)

    def forward(self, data_src, label_src = 0, mark = 1, training=True):
        
        if training == True:
            h1 = self.ResNet50(data_src)
            h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)

            if mark == 1:
                feature1 = self.extractor1(h1)
                pred1 = self.cls1(feature1)

                cls_loss = F.cross_entropy(pred1, label_src)

                return cls_loss

            if mark == 2:
                feature2 = self.extractor2(h1)
                pred2 = self.cls2(feature2)

                cls_loss = F.cross_entropy(pred2, label_src)

                return cls_loss

            if mark == 3:
                feature3 = self.extractor3(h1)
                pred3 = self.cls3(feature3)

                cls_loss = F.cross_entropy(pred3, label_src)

                return cls_loss

        else:
            h1 = self.ResNet50(data_src)
            h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)

            feature1 = self.extractor1(h1)
            pred1 = self.cls1(feature1)

            feature2 = self.extractor2(h1)
            pred2 = self.cls2(feature2)

            feature3 = self.extractor3(h1)
            pred3 = self.cls3(feature3)

            return pred1, pred2, pred3, feature1, feature2, feature3

### Domain classifier

The architecture of domain classifier adds a extra layer on the architecture of the feature extractor:

*  Linear FC (f_dim, f_dim/2) + ELU + Linear FC (f_dim/2, 2)

In [None]:
class DomainClassifier(nn.Module):
    def __init__(self, sourceClassifier, f_dim=256, n_classes=2):
        super(DomainClassifier, self).__init__()
        self.f_dim = f_dim
        self.half_f_dim = self.f_dim // 2
        self.n_classes = n_classes

        self.sourceClassifier = sourceClassifier

        self.domain_cls1 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )

        self.domain_cls2 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )

        self.domain_cls3 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )
        
    def forward(self, data_src, data_tgt=None, label_src=0, label_tgt=0, mark=1, training=True):

        if training == True:
            _, _, _, feature1, feature2, feature3 = self.sourceClassifier(data_src, training=False)
            _, _, _, feature1_tgt, feature2_tgt, feature3_tgt = self.sourceClassifier(data_tgt, training=False)

            if mark == 1:
                logits1 = self.domain_cls1(feature1)
                logits1_tgt = self.domain_cls1(feature1_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits1, label_src, weight=weights) \
                    + F.cross_entropy(logits1_tgt, label_tgt, weight=weights)

                return cls_loss

            if mark == 2:
                logits2 = self.domain_cls2(feature2)
                logits2_tgt = self.domain_cls2(feature2_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits2, label_src, weight=weights) \
                    + F.cross_entropy(logits2_tgt, label_tgt, weight=weights)

                return cls_loss

            if mark == 3:
                logits3 = self.domain_cls1(feature3)
                logits3_tgt = self.domain_cls1(feature3_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits3, label_src, weight=weights) \
                    + F.cross_entropy(logits3_tgt, label_tgt, weight=weights)

                return cls_loss

        else:
            _, _, _, feature1, feature2, feature3 = self.sourceClassifier(data_src, training=False)

            logits1 = self.domain_cls1(feature1)
            logits2 = self.domain_cls2(feature2)
            logits3 = self.domain_cls3(feature3)

            return logits1, logits2, logits3


 ### Source encoder
 
 The architecture of Source encoder is as follows:
 
 * Linear FC ( f _dim, 1024) + BatchNorm + ELU
 * Linear FC (1024, 1024) + BatchNorm + ELU
 * Linear FC (1024, c_dim) + BatchNorm + ELU
 * Linear FC (c_dim, c_dim) + BatchNorm + ELU

In [None]:
class SourceClassifier(nn.Module):
    def __init__(self, f_dim=256, n_classes=3):
        super(SourceClassifier, self).__init__()
        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=False)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50

        self.sourceFeatureExtractor = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.classifier = nn.Sequential(
            nn.Linear(self.f_dim, self.n_classes)
        )

    def forward(self, input_batch):
        h1 = self.ResNet50(input_batch)
        h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)
        source_feature = self.sourceFeatureExtractor(h1)
        classification = self.classifier(source_feature)
        return source_feature, classification

### Target encoder 
The architecture of target encoder is as follows:

* ImageNet pre-trained ResNet-50 till average pool layer
* Linear FC (2048, 1024) + ELU
* Linear FC (1024, 1024) + BatchNorm + ELU
* Linear FC (1024, c_dim) + BatchNorm + ELU
* Linear FC (c_dim, c_dim) + BatchNorm + ELU

We use pretrained Resnet model for the first layer.

In [None]:
class TargetEncoder(nn.Module):
    def __init__(self, f_dim=256, n_classes=3):
        super(TargetEncoder, self).__init__()
        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=True)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50
        for param in self.ResNet50.parameters():
            param.requires_grad = False

        self.encoder = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

    def forward(self, input_batch):
        h1 = self.ResNet50(input_batch)
        h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)
        feature = self.encoder(h1)
        return feature

### Target classifier 

The architecture of target classifier adds a extra layer on the architecture of the target encoder:
    
* Linear FC (c_dim, c_dim) + ELU + Linear FC (c_dim, 3)

In [None]:
class Targetclassifier(nn.Module):
    def __init__(self, sourceClassifier, f_dim=256, c_dim=256, n_classes=65):
        super(Targetclassifier, self).__init__()

        self.sourceClassifier = sourceClassifier
        self.f_dim = f_dim
        self.c_dim = c_dim
        self.n_classes = n_classes

        self.encoder1 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.encoder2 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.encoder3 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.cls = nn.Sequential(
            nn.Linear(self.c_dim, self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.n_classes)
        )

    def forward(self, data_src, label_src=0, mark=1, training=True, encoding=None):

        if training == True:

            if mark == 1:
                _, _, _, source_feature, _, _ = self.sourceClassifier(data_src, training=False)
                feature1 = self.encoder1(source_feature)
                pred1 = self.cls(feature1)

                loss = loss_qt(pred1, label_src, mark=1)

                return loss

            if mark == 2:
                _, _, _, _, source_feature, _ = self.sourceClassifier(data_src, training=False)
                feature2 = self.encoder2(source_feature)
                pred2 = self.cls(feature2)

                loss = loss_qt(pred2, label_src, mark=2)

                return loss

            if mark == 3:
                _, _, _, _, _, source_feature = self.sourceClassifier(data_src, training=False)
                feature3 = self.encoder3(source_feature)
                pred3 = self.cls(feature3)

                loss = loss_qt(pred3, label_src, mark=3)

                return loss

        else:
            if encoding is not None:
                pred = self.cls(encoding)

            else:

                _, _, _, source_feature, _, _ = self.sourceClassifier(data_src, training=False)
                feature1 = self.encoder1(source_feature)
                pred1 = self.cls(feature1)

                _, _, _, _, source_feature, _ = self.sourceClassifier(data_src, training=False)
                feature2 = self.encoder2(source_feature)
                pred2 = self.cls(feature2)

                _, _, _, _, _, source_feature = self.sourceClassifier(data_src, training=False)
                feature3 = self.encoder3(source_feature)
                pred3 = self.cls(feature3)

                return pred1, pred2, pred3, feature1, feature2, feature3

## Training step

Before training, we have implemented several loss functions (binary cross-entropy loss, coral loss, qt loss, align loss, distill loss, entropy loss, de loss, and T->W loss) for different models. These losses are used for backpropagation to calculate the gradient. The models and the optimizers are said before.

First, we use the dataloader to load the training data and the test data. each source has its own dataloader, and then the images are iterated. After that, our model is evaluated by the accuracy of the test set.

For the training settings, the batch size is 32 and the number of epochs is 10. In addition, the learning rate is given by the paper, so we only follow the paper's learning rate settings.

This procedure can be done three times, the first stage is to train the feature extractor, and then the domain classifier is trained based on the feature extractor. The last step is to train all models(target encoder, target classifier, and source encoder together by using different loss functions and optimizers. Due to space limitations, we will only show the function that we trained in the second part and loss functions. 


In [None]:
def loss_qt(output, target, mark=1, relevance=alpha, n_classes=3):
    label = target[0]
    weight = relevance[mark-1] / output.shape[0]

    loss = F.cross_entropy(output, target) * weight
    return loss

def CORAL(source, target):
    d = source.data.shape[1]

    xmt = torch.mean(target, 0, keepdim=True) - target
    xct = xmt.t() @ xmt
    
    xm = torch.mean(source, 0, keepdim=True) - source
    xc = xm.t() @ xm

    loss = torch.mean(torch.mul((xc - xct), (xc - xct)))
    loss = loss/(4*d*d)

    return loss

def loss_align(source1_output, source2_output, source3_output, target_output, K=3, alpha=alpha, beta=beta):
    loss = 0.0

    loss += alpha[0] * CORAL(source1_output, target_output)
    loss += alpha[1] * CORAL(source2_output, target_output)
    loss += alpha[2] * CORAL(source3_output, target_output)

    loss += beta["0-1"] * CORAL(source1_output, source2_output) / (K-1)
    loss += beta["0-2"] * CORAL(source1_output, source3_output) / (K-1)
    loss += beta["1-2"] * CORAL(source2_output, source3_output) / (K-1)

    return loss

def loss_tw(es, et, w, ind, mark=1):
    res = 0.0

    loss = nn.MSELoss()
    res = torch.sum(w * loss(et, es))

    return res

def loss_distill(w1, w2, w3, pred1, pred2, pred3, pred):
    loss = nn.L1Loss()

    res = 0.0
    w1 = w1.reshape(-1, 1)
    w2 = w2.reshape(-1, 1)
    w3 = w3.reshape(-1, 1)

    phi = w1 * pred1 + w2 * pred2 + w3 * pred3

    res = loss(pred, phi)

    return res

def loss_entropy(pred):
    res = 0.0
    p = F.softmax(pred, dim=-1)
    res = -1 * torch.sum(p * F.log_softmax(pred, dim=-1)) / pred.size()[0]

    return res

def loss_de(iteration, distill, entropy, m=0.0036):
    res = 0.0
    mu = min(1, m*iteration)

    res = (1-mu) * distill + mu * entropy

    return res

In [None]:
iteration = len(source1_weight)
print("iteration: ", iteration)
epoch = 15
cuda = True
seed = 8
log_interval = 20
class_num = 65
batch_size = 32
root_path = "./Dataset/"
source1_name = "Art"
source2_name = 'Clipart'
source3_name = 'Product'
target_name = "Real World"

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

torch.manual_seed(seed)
if cuda:
    torch.cuda.manual_seed(seed)

softmax = nn.Softmax(dim=1)

target_test_loader = load_testing(root_path, target_name, batch_size, kwargs)

def train(model1, model2, domainclassifier):

    # target_test_loader = load_training(root_path, target_name, batch_size, kwargs)
    target_iter = iter(target_test_loader)  

    for param in model1.parameters():
        param.requires_grad = False

    for i in range(1, iteration + 1):
        model1.eval()
        model2.train()         
        LEARNING_RATE = 2e-4

        # optimizer of target encoder
        optimizer2 = torch.optim.Adam([
            {'params': model2.encoder.parameters(), 'lr': LEARNING_RATE}
        ])

        try:
            target_data, __ = target_iter.next()
        except Exception as err:
            target_iter = iter(target_test_loader)
            target_data, __ = target_iter.next()

        target_data = target_data.to(device)
        target_data = Variable(target_data)

        pred1, pred2, pred3 = domainclassifier(target_data, training=False)
        sc_pred1, sc_pred2, sc_pred3, _, _, _ = sourceClassifier(target_data, training=False)

        logits1 = softmax(pred1)
        logits2 = softmax(pred2)
        logits3 = softmax(pred3)

        weights_sum = logits1[:, 1] * alpha[0] + logits2[:, 1] * alpha[1] + logits3[:, 1] * alpha[2]

        w1 = logits1[:, 1] * alpha[0] / weights_sum
        w2 = logits2[:, 1] * alpha[1] / weights_sum
        w3 = logits3[:, 1] * alpha[2] / weights_sum

        optimizer2.zero_grad()

        target_feature = model2(target_data)
        pred_target = model1(None, training=False, encoding=target_feature)

        _, _, _, feature1, feature2, feature3 = model1(target_data, training=False)

        loss = 0.0

        loss_value_tw = loss_tw(feature1, target_feature, w1, i-1, mark=1)
        loss_value_tw += loss_tw(feature2, target_feature, w2, i-1, mark=2)
        loss_value_tw += loss_tw(feature3, target_feature, w3, i-1, mark=3)
        loss_value_tw /= 3

        loss_value_distill = loss_distill(w1, w2, w3, sc_pred1, sc_pred2, sc_pred3, pred_target)
        loss_value_entropy = loss_entropy(pred_target)

        loss_value_de = loss_de(i, loss_value_distill, loss_value_entropy, m=0.01)

        loss = loss_value_tw + loss_value_de

        loss.backward()
        optimizer2.step()

        if i % log_interval == 0:
              print('Train target iter: {} [({:.0f}%)]\tT->WLoss: {:.6f}\tLoss_distill: {:.6f}\tLoss_entropy: {:.6f}\tLoss_de: {:.6f}\tLoss: {:.6f}'.format(
                  i, 100. * i / iteration, loss_value_tw.item(), loss_value_distill.item(), loss_value_entropy.item(), loss_value_de.item(), loss.item()))
    
    return model1, model2

def test(mode11, model2):
    model1.eval()
    model2.eval()

    test_loss = 0
    correct = 0
    correct1 = 0
    correct2 = 0
    correct3 = 0
    
    with torch.no_grad():
        for data, target in target_test_loader:
            if cuda:
                data, target = data.cuda(), target.reshape(-1).cuda()
            data, target = Variable(data), Variable(target)
            feature = model2(data)

            pred = model1(None, training=False, encoding=feature)
            pred = pred.data.max(1)[1]
            correct += pred.eq(target.data.view_as(pred)).cpu().sum()

        print(target_name, '\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
            correct, len(target_test_loader.dataset),
            100. * correct / len(target_test_loader.dataset)))
        print('\nsource1 accnum {}, source2 accnum {}，source3 accnum {}'.format(correct1, correct2, correct3))

    return correct
                  

        
        

## Result

We have done four types of experiments, For the no adapt settings, we have built a ResNet classifier. First, we used the pre-trained ResNet model and added a layer to do classification. Surprisingly, the result converges slower than we thought. It gained a good result after 10 epochs. Then, we also built the proposed approach to do a no adapt domain classifier, and There is only 4.4% difference from the result in the paper.We think this small difference may caused by the batch size or the number of epoches. Besides, the effect of the proposed approach is better than the ResNet. For the second experiment, we find the average accurace obtained by us only 2.7% smaller than the paper result, and the effectness can be seen from the comparison with MFSAN result.  


|Standard    |Method       |CPR->A|APR->C|ACR->P|ACP->R|Avg   |
|  ----      | ----        | ---- | ---- |----  |----  |----  |
|No Adapt by paper  |ResNet       |65.3%|49.6%|79.7%|75.4%|67.5%|
|No Adapt by us   |ResNet         |61.9%|44.7%|73.5%|72.1%|63.1%|
|No Adapt by paper   |Ours          |65.6%|53.8%|78.6%|73.2%|67.8%  |
|No Adapt by us   |Ours             |64.1% |57.7%|76.3%| |0.7153  |
|Multi-Source by paper|MFSAN        |72.1%|62.0%|80.3%|81.8%|74.1%|
|Multi-Source by us|MFSAN        |67.7%|61.3%|77.1%|79.6%|71.4%|
|Multi-Source by paper|Ours         |71.9%|61.4%|84.1%|82.3%|74.9%|
|Multi-Source by us|Ours         |70.1%|59.8%|80.6%|78.2%|72.2%|



## Conclusion

We have implemented the proposed method by pytorch and make comparison with ResNet and MFSAN on **OfficeHomeDataset**. The result proved effectiveness of WAMDA. The multi-source domain adaptation based on source-source and source-target similaritiess gained better accuracy on two sub tasks than the MFSAN method. We hope our work can help people to gain a deeper understanding about multi-source domain adapation.


## Task division

The reproduce work is done by Jiaming Xu and Siwei Wang, we follow the criteria of Full re-implementation. We have met every sunday since 28 March and discuss the project together.

Jiaming is responsible for coding the feature extractor, domain classifier and the sturcture of the second stage. In addition, Jiaming also did each experiment on one combination of the datasets. Last but not the least, Jiaming finished part of the report writting.

Siwei is responsible for coding the ResNet model and MFSAN model. Besides, Siwei did experiments on remaining three combination of the datasets. Finally, Siwei finished part of the report writting. 