# Reproducibility Project: WAMDA: Weighted Alignment of Sources for Multi-source Domain Adaptation

### Group 10 Jiaming XU and Siwei Wang

## Introduction
One of the learning goals for Deep Learning course is to reproduce a paper given by the course instructor. This blog gives clear information about our reproduce work for the paper WAMDA: Weighted Alignment of Sources for Multi-source Domain Adaptation from Surbhi Aggarwal, Jogendra Nath Kundu, R. Venkatesh Babu and Anirban Chakraborty. Our reproduction is based on original paper, also some other papers related to source Domain Adaptation，and online resources.

The paper presents a novel method for Multi-source Domain Adaptation named WAMDA which uses multiple sourece based on their internal relevance and their relavence score related to the target. Our work is to reproduce the proposed approach on only one dataset **OfficeHome dataset** and other two methods(Resnet and MFSAN for evaluate the effectiveness of the proposed method).

## Model sturcture

WAMDA is a method which can do effective multi-source domain adaptation based on the the source-target and source-source similarities. The following gives information about the basic structure.

The proposed algorithm is divided into two parts. The first stage is *pre-adaptation training* , we can obtain the relevance score, feature extractor, source classifier and domain classfier from this stage. Then, the other stage is *multi-source adaptation training* , the weighted alignment of domains are performed and a classifier is learnt based on this weighted aligned space. The basic model stucture is shown as follows: ![avatar](imgs/process.png)


## Experiment Setup

The original paper did experiments on three datasets(*Office-31, Office-Caltech and Office-Home*). In addition, it uses four types of baseline(*No Adapt, Single-Source Best,Single-Source Best and Multi-Source* to analyse the performance of MSDA methods. For our experiment, we only did experiment on *office-Home* dataset and there are only two types of baseline we have implemented: (1) *No Adapt*: Resnet and the Proposed method (2)*Multi-Source*: MFSAN and the Proposed method. Last but not the least, the implementation steps is just the same as the original paper and will be described in the next section. Due to time limitation, we have only implement the first row and the third row of table 5 for **OfficeHomeDataset**.

## Our implementaion
All the experiments were done based on Google Colab, and the framework we used is pytorch. This method is achieved by these steps. Fisrt, the dataset was downloaded and transfered into certain format. Then，we trained a feature extractor source classifier and domain classifer based on the datasets. After that, we extract the relevance scores from the last step, and scores will be used in following steps. here, we also trained a weighted aligned source encoder and the target encoder.

The following libaries are used in onr experiment.

In [None]:
import time
import copy
import torch
import numpy as np
import os
from tqdm import tqdm
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import transforms
import torch.nn as nn
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
import os
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
from random import sample

## Dataset

We export the *Office-home* dataset to Google Cloud Disk and decompress it. The Office-Home dataset has been created to evaluate domain adaptation algorithms for object recognition using deep learning. It consists of images from 4 different domains: Artistic images, Clip Art, Product images and Real-World images. For each domain, the dataset contains images of 65 object categories found typically in Office and Home settings. The folowing is going to show the eg from the dataset. ![avatar](imgs/dataset.jpg)

Then, we write a class to load the data and assign each type of data a different label. In addtion, we also implement the one-hot, transformer, balance setting which can be used according to different model requrimrnts. The following code can show the process:

In [None]:
class OfficeHomeDataset(Dataset):
    def __init__(self, data_path, domain="Real World", balance=False, one_hot=False, transform=None):
        self.transform = transform
        self.domain = domain
        self.balance = balance
        self.one_hot = one_hot

        # label dict
        self.label_dict = {"Art": 0, "Clipart":1, "Product":2, 'Real World': 3}

        # Read all file names
        self.file_names = []
        if self.domain is None:
            self.n_classes = 3
            for root, dirs, files in os.walk(data_path):
                for filename in files:
                    if filename == ".DS_Store": continue
                    elif os.path.splitext(filename)[-1] == ".txt": continue
                    self.file_names.append(os.path.join(root, filename))
        else:
            self.n_classes = 2
            domain_file = []
            source_file = []
            for root, dirs, files in os.walk(data_path):
                if self.domain in root:
                    for filename in files:
                        if filename == ".DS_Store": continue
                        elif os.path.splitext(filename)[-1] == ".txt": continue
                        domain_file.append(os.path.join(root, filename))
                else:
                    for filename in files:
                        if filename == ".DS_Store": continue
                        elif os.path.splitext(filename)[-1] == ".txt": continue
                        source_file.append(os.path.join(root, filename))
            if balance:
                self.file_names = domain_file + sample(source_file, len(domain_file))
            else:
                self.file_names = domain_file + source_file
        
        print(len(self.file_names))
        # self.file_names = sample(self.file_names, 200)

    def __len__(self):
        return len(self.file_names)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        label = []
        filename = self.file_names[idx]
        img = Image.open(filename)
        if self.transform:
            img = self.transform(img)
        # print(img.shape, filename)
        source_name = filename.split('/')[-3]
        if self.domain is None:
            label.append(self.label_dict[source_name])
        else:
            if source_name == self.domain:
                label.append(1)
            else: label.append(0)
        if self.one_hot:
            label = np.array(label)
            label = np.eye(self.n_classes)[label]
            label = np.float32(label)
        else:
            label = np.array(label)
        # sample = {'image': img, 'label': label}
        sample = [img, label]

        return sample

## Model stucture

In the pre-adaptation stage, we need to train a source-specific feature extractor $F_{si}$ and a source-specificclassifier $Q_{si}$ for each specific source $S_i$. For the domain, there is a domainclassifier($D_{Si}$) to be trained. In addition, in the second stage, the weighted aligned source encoder $E_{Si}$ for each source $S_i$, and the target encoder $E_T$ as well as the target classifier $Q_T$ need to be trained. The following section is going to describe each model struture in details.


### Feature Extractor and Source-specificclassifier

The archtercture of the $F_{si}$ is implemented by the following layers: 

* ImageNet pre-trained ResNet-50 till average pool layer
* Linear FC (2048, 1024) + ELU
* Linear FC (1024, 1024) + BatchNorm + ELU
* Linear FC (1024, f _dim) + ELU
* Linear FC ( f _dim, f _dim) + BatchNorm + ELU 

For the source-specificclassifier $Q_{si}$, It adds a layer on the basis of feature extractor

* LinearFC(f_dim,3)

We used pretrained Resnet model for the first layer, and the training details for Office-Home can be found in the appendix of the original paper. Our implementaion of the Feature extrator and the source-specificclassifier can be found as follings:

In [None]:
  class SourceClassifer(nn.Module):
    def __init__(self, f_dim=256, n_classes=65):
        super(SourceClassifer, self).__init__()

        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=False)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50

        self.extractor1 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.extractor2 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.extractor3 = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.cls1 = nn.Linear(self.f_dim, self.n_classes)
        self.cls2 = nn.Linear(self.f_dim, self.n_classes)
        self.cls3 = nn.Linear(self.f_dim, self.n_classes)

    def forward(self, data_src, label_src = 0, mark = 1, training=True):
        
        if training == True:
            h1 = self.ResNet50(data_src)
            h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)

            if mark == 1:
                feature1 = self.extractor1(h1)
                pred1 = self.cls1(feature1)

                cls_loss = F.cross_entropy(pred1, label_src)

                return cls_loss

            if mark == 2:
                feature2 = self.extractor2(h1)
                pred2 = self.cls2(feature2)

                cls_loss = F.cross_entropy(pred2, label_src)

                return cls_loss

            if mark == 3:
                feature3 = self.extractor3(h1)
                pred3 = self.cls3(feature3)

                cls_loss = F.cross_entropy(pred3, label_src)

                return cls_loss

        else:
            h1 = self.ResNet50(data_src)
            h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)

            feature1 = self.extractor1(h1)
            pred1 = self.cls1(feature1)

            feature2 = self.extractor2(h1)
            pred2 = self.cls2(feature2)

            feature3 = self.extractor3(h1)
            pred3 = self.cls3(feature3)

            return pred1, pred2, pred3, feature1, feature2, feature3

### Domain classifier

The architecture of domain classifier adds a extra layer on the architecture of the feature extractor:

*  Linear FC (f_dim, f_dim/2) + ELU + Linear FC (f_dim/2, 2)

In [None]:
class DomainClassifier(nn.Module):
    def __init__(self, sourceClassifier, f_dim=256, n_classes=2):
        super(DomainClassifier, self).__init__()
        self.f_dim = f_dim
        self.half_f_dim = self.f_dim // 2
        self.n_classes = n_classes

        self.sourceClassifier = sourceClassifier

        self.domain_cls1 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )

        self.domain_cls2 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )

        self.domain_cls3 = nn.Sequential(
            nn.Linear(self.f_dim, self.half_f_dim),
            nn.ELU(),
            nn.Linear(self.half_f_dim, self.n_classes)
        )
        
    def forward(self, data_src, data_tgt=None, label_src=0, label_tgt=0, mark=1, training=True):

        if training == True:
            _, _, _, feature1, feature2, feature3 = self.sourceClassifier(data_src, training=False)
            _, _, _, feature1_tgt, feature2_tgt, feature3_tgt = self.sourceClassifier(data_tgt, training=False)

            if mark == 1:
                logits1 = self.domain_cls1(feature1)
                logits1_tgt = self.domain_cls1(feature1_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits1, label_src, weight=weights) \
                    + F.cross_entropy(logits1_tgt, label_tgt, weight=weights)

                return cls_loss

            if mark == 2:
                logits2 = self.domain_cls2(feature2)
                logits2_tgt = self.domain_cls2(feature2_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits2, label_src, weight=weights) \
                    + F.cross_entropy(logits2_tgt, label_tgt, weight=weights)

                return cls_loss

            if mark == 3:
                logits3 = self.domain_cls1(feature3)
                logits3_tgt = self.domain_cls1(feature3_tgt)
                a = 1 / data_src.shape[0]
                weights = torch.Tensor([a] * self.n_classes).to(device)

                cls_loss = F.cross_entropy(logits3, label_src, weight=weights) \
                    + F.cross_entropy(logits3_tgt, label_tgt, weight=weights)

                return cls_loss

        else:
            _, _, _, feature1, feature2, feature3 = self.sourceClassifier(data_src, training=False)

            logits1 = self.domain_cls1(feature1)
            logits2 = self.domain_cls2(feature2)
            logits3 = self.domain_cls3(feature3)

            return logits1, logits2, logits3


 ### Source encoder
 
 The architecture of Source encoder is as follows:
 
 * Linear FC ( f _dim, 1024) + BatchNorm + ELU
 * Linear FC (1024, 1024) + BatchNorm + ELU
 * Linear FC (1024, c_dim) + BatchNorm + ELU
 * Linear FC (c_dim, c_dim) + BatchNorm + ELU

In [None]:
class SourceClassifier(nn.Module):
    def __init__(self, f_dim=256, n_classes=3):
        super(SourceClassifier, self).__init__()
        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=False)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50

        self.sourceFeatureExtractor = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

        self.classifier = nn.Sequential(
            nn.Linear(self.f_dim, self.n_classes)
        )

    def forward(self, input_batch):
        h1 = self.ResNet50(input_batch)
        h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)
        source_feature = self.sourceFeatureExtractor(h1)
        classification = self.classifier(source_feature)
        return source_feature, classification

### Target encoder 
The architecture of target encoder is as follows:

* ImageNet pre-trained ResNet-50 till average pool layer
* Linear FC (2048, 1024) + ELU
* Linear FC (1024, 1024) + BatchNorm + ELU
* Linear FC (1024, c_dim) + BatchNorm + ELU
* Linear FC (c_dim, c_dim) + BatchNorm + ELU

We use pretrained Resnet model for the first layer.

In [None]:
class TargetEncoder(nn.Module):
    def __init__(self, f_dim=256, n_classes=3):
        super(TargetEncoder, self).__init__()
        self.f_dim = f_dim
        self.n_classes = n_classes

        # Get ResNet50 model
        ResNet50 = torch.hub.load('pytorch/vision:v0.6.0', 'resnet50', pretrained=True)
        ResNet50.fc = nn.Identity()
        self.ResNet50 = ResNet50
        for param in self.ResNet50.parameters():
            param.requires_grad = False

        self.encoder = nn.Sequential(
            nn.Linear(2048, 1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.f_dim),
            nn.ELU(),
            nn.Linear(self.f_dim, self.f_dim),
            nn.BatchNorm1d(self.f_dim),
            nn.ELU()
        )

    def forward(self, input_batch):
        h1 = self.ResNet50(input_batch)
        h1 = torch.flatten(h1, start_dim=1)  # size: (batch_size, dim)
        feature = self.encoder(h1)
        return feature

### Target classifier 

The architecture of target classifier adds a extra layer on the architecture of the target encoder:
    
* Linear FC (c_dim, c_dim) + ELU + Linear FC (c_dim, 3)

In [None]:
class Targetclassifier(nn.Module):
    def __init__(self, sourceClassifier, f_dim=256, c_dim=256, n_classes=65):
        super(Targetclassifier, self).__init__()

        self.sourceClassifier = sourceClassifier
        self.f_dim = f_dim
        self.c_dim = c_dim
        self.n_classes = n_classes

        self.encoder1 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.encoder2 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.encoder3 = nn.Sequential(
            nn.Linear(self.f_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ELU(),
            nn.Linear(1024, 1024),
            nn.BatchNorm1d(1024),  # expect 2-D input
            nn.ELU(),
            nn.Linear(1024, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.c_dim),
            nn.BatchNorm1d(self.c_dim),
            nn.ELU()
        )

        self.cls = nn.Sequential(
            nn.Linear(self.c_dim, self.c_dim),
            nn.ELU(),
            nn.Linear(self.c_dim, self.n_classes)
        )

    def forward(self, data_src, label_src=0, mark=1, training=True, encoding=None):

        if training == True:

            if mark == 1:
                _, _, _, source_feature, _, _ = self.sourceClassifier(data_src, training=False)
                feature1 = self.encoder1(source_feature)
                pred1 = self.cls(feature1)

                loss = loss_qt(pred1, label_src, mark=1)

                return loss

            if mark == 2:
                _, _, _, _, source_feature, _ = self.sourceClassifier(data_src, training=False)
                feature2 = self.encoder2(source_feature)
                pred2 = self.cls(feature2)

                loss = loss_qt(pred2, label_src, mark=2)

                return loss

            if mark == 3:
                _, _, _, _, _, source_feature = self.sourceClassifier(data_src, training=False)
                feature3 = self.encoder3(source_feature)
                pred3 = self.cls(feature3)

                loss = loss_qt(pred3, label_src, mark=3)

                return loss

        else:
            if encoding is not None:
                pred = self.cls(encoding)

            else:

                _, _, _, source_feature, _, _ = self.sourceClassifier(data_src, training=False)
                feature1 = self.encoder1(source_feature)
                pred1 = self.cls(feature1)

                _, _, _, _, source_feature, _ = self.sourceClassifier(data_src, training=False)
                feature2 = self.encoder2(source_feature)
                pred2 = self.cls(feature2)

                _, _, _, _, _, source_feature = self.sourceClassifier(data_src, training=False)
                feature3 = self.encoder3(source_feature)
                pred3 = self.cls(feature3)

                return pred1, pred2, pred3, feature1, feature2, feature3

## Training step

Before training, we have implemented three loss functions(binary cross entropy loss, coral loss, qt loss) for different output. These losses are used by backpropagation to calculate the graidient. The models and the optimizers are talken before.

First, we use the dataloader to load the training data and the test data. each source has its own dataloader, and then the images are iterated. After that, our model is evaluated by accuracy of test set.

For the training settings,  the batchsize is 32 and the number of epoch is 10. In addition, the learning rate is given by the paper, so we only follow the paper's learning rate settings.

This procedure can be done three times, the first stage is to train the feature extractor and then the domain classifier is trained based on the feature extractore. The  last step is to train all models(targer encoder, target classifier and source encoder together by using different loss function and optimizers. Due to space limitations, we will only show the function that we trained in the second part. 


In [None]:
def train_models(model1, model2, path, target_path, criterion1, criterion2, optimizer1, optimizer2, num_epochs=5, batch_size=16, log_interval=40, binary_class=False, train=0.7, val=0.2, test=0.1):
    since = time.time()
    val_acc_history = []
    best_model_wts1 = copy.deepcopy(model1.state_dict())
    best_acc = 0.0
    best_model_wts2 = copy.deepcopy(model2.state_dict())
    best_loss = float("inf")
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    l = os.listdir(path)
    source_paths = list(map(lambda x: os.path.join(path, x), l))
    for epoch in tqdm(range(num_epochs)):
        print('\nEpoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)
        epoch_loss = 0.0

        source1_loader, source1_length = get_train_dataloader(source_paths[0], batch_size=batch_size)
        source2_loader, source2_length = get_train_dataloader(source_paths[1], batch_size=batch_size)
        source3_loader, source3_length = get_train_dataloader(source_paths[2], batch_size=batch_size)
        target_loader, _ = get_train_dataloader(target_path, batch_size=batch_size)

        source_test_loader, source_test_length = get_test_dataloader(path, batch_size=batch_size)

        iterations = max(source1_length, source2_length, source3_length) // batch_size
        print("iterations: ", iterations)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model1.train()
                model2.train()
            else:
                model1.eval()   # Only model1 need validation phase
            if phase == 'train':
                running_loss1 = 0.0
                running_corrects = 0

            running_loss2 = 0.0

            source1_iter = iter(source1_loader)
            source2_iter = iter(source2_loader)
            source3_iter = iter(source3_loader)
            target_iter = iter(target_loader)

            for i in range(1, iterations + 1):

          # Target domain
              try:
                target_data, __ = target_iter.next()
            except Exception as err:
                target_iter = iter(target_loader)
                target_data, __ = target_iter.next()
            target_data = target_data.detach().to(device)
            target_data = Variable(target_data)

          # Source domain 1
            try:
                source_data, source_label = source1_iter.next()
            except Exception as err:
                source1_iter = iter(source1_loader)
                source_data, source_label = source1_iter.next()
          
              source_data, source_label = source_data.to(device), source_label.reshape(-1).to(device)
              source_data, source_label = Variable(source_data), Variable(source_label）
            optimizer1.zero_grad()

              _, logits = model1(source_data)

              # Classifier result
              _, preds = torch.max(logits, 1)
              # print("preds: ", preds, ", labels: ", source1_label)

              if binary_class:
                running_corrects += torch.sum(preds == torch.max(source_label.data, 1)[1])
              else:
                running_corrects += torch.sum(preds == source_label.data)

              loss1 = criterion1(logits, source_label)
              loss1.backward()
              optimizer1.step()

              if i % log_interval == 0:
                print('Train source1 iter: {} [({:.0f}%)]\tLoss: {:.6f}\t'.format(
                  i, 100. * i / iterations, loss1.item()))
          
              # trainging of target encoder
              feature, _ = model1(source_data)

              optimizer2.zero_grad()
              target_feature = model2(target_data)
              loss2 = criterion2(feature, target_feature, 1)
              loss2.backward()
              optimizer2.step()

              epoch_loss += loss2.item()

              # Source domain 2
              try:
                source_data, source_label = source2_iter.next()
              except Exception as err:
                source2_iter = iter(source2_loader)
                source_data, source_label = source2_iter.next()
          
              source_data, source_label = source_data.to(device), source_label.reshape(-1).to(device)
              source_data, source_label = Variable(source_data), Variable(source_label)
              optimizer1.zero_grad()

              _, logits = model1(source_data)

               # Classifier result
              _, preds = torch.max(logits, 1)
              # print("preds: ", preds, ", labels: ", source2_label)

            if binary_class:
            # print(torch.sum(preds == torch.max(source2_label.data, 1)[1]))
                running_corrects += torch.sum(preds == torch.max(source_label.data, 1)[1])
            else:
            # print(torch.sum(preds == source2_label.data))
                running_corrects += torch.sum(preds == source_label.data)

              loss1 = criterion1(logits, source_label)
              loss1.backward()
              optimizer1.step()

              if i % log_interval == 0:
                print('Train source2 iter: {} [({:.0f}%)]\tLoss: {:.6f}\t'.format(
                  i, 100. * i / iterations, loss1.item()))
            
              # trainging of target encoder
              feature, _ = model1(source_data)

              optimizer2.zero_grad()
              target_feature = model2(target_data)
              loss2 = criterion2(feature, target_feature, 2)
              loss2.backward()
              optimizer2.step()

              epoch_loss += loss2.item()
    
              # Source domain 3
              try:
                source_data, source_label = source3_iter.next()
              except Exception as err:
                source3_iter = iter(source3_loader)
                source_data, source_label = source3_iter.next()
          
                source_data, source_label = source_data.to(device), source_label.reshape(-1).to(device)
              source_data, source_label = Variable(source_data), Variable(source_label)
              optimizer1.zero_grad()

              _, logits = model1(source_data)

              # Classifier result
              _, preds = torch.max(logits, 1)
              # print("preds: ", preds, ", labels: ", source3_label)

              if binary_class:
                running_corrects += torch.sum(preds == torch.max(source_label.data, 1)[1])
              else:
                running_corrects += torch.sum(preds == source_label.data)

            loss1 = criterion1(logits, source_label)
            loss1.backward()
            optimizer1.step()

              if i % log_interval == 0:
                print('Train source3 iter: {} [({:.0f}%)]\tLoss: {:.6f}\t'.format(
                  i, 100. * i / iterations, loss1.item()))
            
              # trainging of target encoder
              feature, _ = model1(source_data)

              optimizer2.zero_grad()
              target_feature = model2(target_data)
              loss2 = criterion2(feature, target_feature, 3)
              loss2.backward()
              optimizer2.step()

              epoch_loss += loss2.item()

              if i % log_interval == 0:
                print('Train target iter: {} [({:.0f}%)]\tLoss: {:.6f}\t'.format(
                  i, 100. * i / iterations, loss2.item()))
            
                avg_loss = epoch_loss / (log_interval * 3)
                if avg_loss < best_loss:
                  best_loss = avg_loss
                  best_model_wts2 = copy.deepcopy(model2.state_dict())
                epoch_loss = 0.0
            
          elif phase == 'val':
            val_loss = 0.0
            val_corrects = 0

            # Val classifier
            iterations = source_test_length // batch_size
            for i in range(1, iterations + 1):
              try:
                source_data, source_label = source_test_iter.next()
              except Exception as err:
                source_test_iter = iter(source_test_loader)
                source_data, source_label = source_test_iter.next()
          
              source_data, source_label = source_data.to(device), source_label.reshape(-1).to(device)
              source_data, source_label = Variable(source_data), Variable(source_label)

              feature, logits = model1(source_data)

              # Classifier result
              _, preds = torch.max(logits, 1)
              # print("preds: ", preds, ", labels: ", source_label)

              if binary_class:
                # print(torch.sum(preds == torch.max(source_label.data, 1)[1]))
                val_corrects += torch.sum(preds == torch.max(source_label.data, 1)[1])
              else:
                # print(torch.sum(preds == source_label.data))
                val_corrects += torch.sum(preds == source_label.data)

              loss1 = criterion1(logits, source_label)
              val_loss += loss1
          
            epoch_acc = val_corrects / source_test_length
            if epoch_acc > best_acc:
              # deep copy the model
              best_acc = epoch_acc
              best_model_wts1 = copy.deepcopy(model1.state_dict())
            val_acc_history.append(epoch_acc)
            print('Val classifier: {} [({:.0f}%)]\tLoss: {:.6f}\tAcc: {:.6f}\t'.format(
              i, 100. * i / iteration, loss2.item(), epoch_acc))

            model2.load_state_dict(best_model_wts2)
        
      return model1, model2, val_acc_history

## Result

We have done four types of experiments, For the no adapt settings, we have also built a resnet classifier. First, we used the pretrained resnet model and added a layer to do classification. Surprisingly, the result converges slower than we thougt. It gained a good result after 10 epoches. Although the average of the results obtained by the Resnet is lower than that obtained with the pretrained model, it is still about 5.5% higher than that given in the paper. Then, we also built the proposed approach to do a no apapt domain clssifier, and the average of the result is 18% higher than the propsed on. In addition, the effect of the proposed approach is better than the Resnet. 


|Standard    |Method       |CPR->A|APR->C|ACR->P|ACP->R|Avg   |
|  ----      | ----        | ---- | ---- |----  |----  |----  |
|No Adapt by paper  |ResNet       |65.3%|49.6%|79.7%|75.4%|67.5%|
|No Adapt by us   |ResNet         |61.9%|44.7%|73.5%|72.1%|63.1%|
|No Adapt by paper   |Ours          |65.6%|53.8%|78.6%|73.2%|67.8%  |
|No Adapt by us   |Ours             |64.1% |57.7%|76.3%| |0.7153  |
|Multi-Source by paper|MFSAN        |72.1%|62.0%|80.3%|81.8%|74.1%|
|Multi-Source by us|MFSAN        |67.7%|61.3%|77.1%|79.6%|71.4%|
|Multi-Source by paper|Ours         |71.9%|61.4%|84.1%|82.3%|74.9%|
|Multi-Source by us|Ours         |0.6769|0.6131|0.7709|0.7955|0.7143|



## Conclusion

We have implemented the proposed method by pytorch and make Comparison with Resnet and MFSAN on **DataHomeDataset**. The result proved effectiveness of WAMDA. The multi-source domain adaptation based on source-source and source-target similaritiess gained better accuracy on two sub tasks than the MFSAN method. We hope our work can help people to gain a deeper understanding about multi-source domain adapation.


## Task division

The reproduce work is done by Jiaming Xu and Siwei Wang. we met every sunday since 28 March and discuss the project together.

Jiaming is responsible for coding the feature extractor, domain classifier and the sturcture of the second stage. In addition, Jiaming also did each experiment on one combination of the datasets. Last but not the least, Jiaming finished part of the report writting.

Siwei is responsible for coding the resnet model and MFSAN model. Besides, Siwei did experiments on remaining three combination of the datasets. 