## 10-Mar | De-bugging

### On a single dataset (Test case 1)

If the test set is the same as training set, at each training step, the loss and qualify score should be the same 

* A single dataset: $A$
* At each training step, 
    * model $M$ trains on $A$, then produces training loss $l_t$ and generates synthetic (fake) data at training phase $A_{ft}$
    * model $M$ evaluates on the same $A$, then produces validation loss $l_v$ and generates synthetic (fake) data at evaluation phase $A_{fv}$
    * then apply a scoring function $s(\cdot)$ between real and fake data, the expectation is  
        $| s(A, A_{ft}) - s(A, A_{fv}) | < \epsilon_s$ and $| l_t - l_v | < \epsilon_l$



### On multiple datasets (Test case 2)

If a test set $A$ is the same as in pretraining sets $(A, B, C)$, the quality score produced after fine-tuning a pretrained model $M_p$ on $A$ should be equivalent to the one produced by a model $M_A$ only trained on $A$.

* 3 datasets: $A, B, C$
* Pre-train a model $M$ on 3 datasets $A, B, C$ sequentially
* Use the pretrained-model $M$ to separately fine-tune and generate synthetic (fake) data $(A_{fp}, B_{fp}, C_{fp})$ for $A, B, C$
* Train separate models $M_A$, $M_B$, $M_C$ on $A, B, C$, respectively, then generate (fake) data $(A_{fs}, B_{fs}, C_{fs})$
* Apply a scoring function $s(\cdot)$ (i.e. sdmetrics) to evaluate the quality between real data and synthetic data, the expectiation is
    $s(A, A_{fp}) \sim s(A, A_{fs})$,   
    $s(B, B_{fp}) \sim s(B, B_{fs})$,   
    $s(C, C_{fp}) \sim s(C, B_{fs})$,  

    ("$\sim$" denotes the equivalent) 

## 16-Mar | Transfer learning

* Pretraining (Source) Datasets: $D^{\text{pretrain}} = \{D_A, D_B, D_C, D_D\}$
* Transfer dataset (Test) Datasets: $D^{\text{transfer}} = \{D_E, D_F\}$

  
* Pretraining phase: A pretrained model $M_{p}$ is trained on $D_{\text{pretrain}}$
* Fine-tunining phase: For each $D^{\text{transfer}}_{i}$ dataset, the pretrained model $M_{p}$ fits to produce a fine-tuned model $M_{f i}$
* Synthetic data of a dataset $D^{\text{transfer}}_{i}$ generated by $M_{f i}$ is $\hat{D}_{fi}$
* Synthetic data of a dataset $D^{\text{transfer}}_{i}$ by its single model $M_{i}$ is $\hat{D}_{i}$


The problem is to measure the performance of transfer learning from $D^{\text{pretrain}}$ to each dataset in $D^{\text{transfer}}$
* Sub-problem 1: how to avoid the forgetting problem in the pretraining phase
* Sub-problem 2: how to know the knowledge is transferable


### Sub-problem 1: 
Avoid by aggregating datasets together as a large dataset. This is equivalient to alternatively train each dataset per epoch.


$D^{\text{pretrain}} = \{D_A, D_B, D_C, D_D\}$

* For each epoch in total epochs:  
  * Shuffle $D^{\text{pretrain}}_i$
  * For each dataset $D^{\text{pretrain}}_i$ in $D^{\text{pretrain}}$:  
    * Train $M_p$ on $D^{\text{pretrain}}_i$ for 1 epoch


### Sub-problem 2:
* Assumption 1: The fine-tuned model is converged faster than the single model
* Assumption 2: The transfer learning is good if the synthetic data generated by fine-tuned model is better than the synthetic data of the model trained on itself.


For a test dataset $D_i \in D^{\text{transfer}}$

* Train a single model $M_{i}$ on $D_i$ and generate synthetic data $\hat{D}_{i}$
* Fine-tune the pretrained model $M_p$ on $D_i$ and generate synthetic data $\hat{D}_{fi}$
* Calculate the quality score w.r.t the real data by sdmetric function $S(\cdot)$: $S(\hat{D}_{i}, D_i)$ and $S(\hat{D}_{fi}, D_i)$

How to ensure the fairness of $M_p$ and $M_i$