### Task 1

#### 1. Basic Concepts
1. The purpose of using dataset distillation in this paper is to reduce the training costs while maintaining the high performance on various machine learning tasks. The authors introduce Dataset Distillation with Attention Maching (DataDAM) to condense large datasets into smaller synthetic dataset that retain the critical information, allowing models trained on the synthetic set to achieve similar accuracy as those trained on the full dataset.
2.  The advantages are: (page 2)
- Efficient end-to-end dataset distillation: This highlights the ability of DataDAM to closely approximate the distribution of the real dataset while keeping **computational costs low**.
- Improved accuracy and scalability: DataDAM demonstrate the performance across multiple benchmark dataset and reduces the training costs by up to 100x, while also allowing for cross-architecture generation. This makes it more scalable and flexible for real-world application.
- Enhancement of downstream application: DATADAM's distilled data improves memory efficiency in continual learning tasks and accelerates neural architectures search(NAS) by providing a more representative proxy dataset, enabling a faster and more efficient learning process.  
3. The novelty includes: (page 2)
- Multiple Randomly Initialized DNNs: DataDAM uses multiple randomly initialized deep neural networks to extract meaningful representations from both real and synthetic datasets, which is different from methods that rely on pre-trained models
- Spatial attention matching (SAM): The SAM module align the most discriminative feature maps from real and synthetic datasets, reducing the gap between the dataset.
- Last-Layer Feature Alignment: It reduces disparities in the last-layer feature distributions between the real and synthetic datasets by using a complementary loss as a regularizer, ensuring high-level abstract representations are similar.
- Bias-Free Synthetic Data: The synthetic data generated by DataDAM does not introduce any bias, which is a significant improvement over prior methods, ensuring better generalization and performance.
4. The methodology of DataDAM is centered on efficiently distilling datasets through attention matching: (page 4)
- Initialization of Synthetic Dataset: The process starts by initializing a synthetic dataset, which can be done through random noise or by sampling real data.
- Feature Extraction: Real and synthetic datasets are passed through randomly initialized deep neural networks, and features are extracted at multiple layers.
- Spatial Attention Matching (SAM): Attention maps are computed for each layer, excluding the final layer. These attention maps focus on the most discriminative regions of the input image. 
- Loss Functions:
    - SAM Loss (LSAM): This loss minimizes the distance between attention maps of real and synthetic datasets across layers.
    - Maximum Mean Discrepancy Loss (LMMD): This complementary loss aligns the last-layer feature distributions of the two datasets, ensuring the high-level abstract information is captured.
- Optimization: The synthetic dataset is optimized using a combination of the SAM loss and LMMD loss to minimize the difference between real and synthetic data.
5. (page 8)
- Continual Learning: DataDAM’s ability to condense datasets efficiently makes it highly useful in continual learning scenarios, where a model must learn incrementally while preventing catastrophic forgetting. By using the distilled datasets as a replay buffer, DataDAM can significantly improve memory efficiency and performance in incremental learning tasks.
- Neural Architecture Search (NAS): The synthetic datasets generated by DataDAM can serve as proxies in NAS tasks, allowing faster evaluation of model architectures. This leads to a significant reduction in computational costs and time during the model search process, making NAS more feasible in real-world applications.

#### 2. Data Distillation Learning

In [1]:
import pandas as pd 
import shutil 
import os 

annotations = pd.read_csv('mhist_dataset/annotations.csv')

img_folder = 'mhist_dataset/images'
train_folder = 'mhist_dataset/train/'
test_folder = 'mhist_dataset/test/'

In [2]:
os.makedirs(train_folder, exist_ok=True)
os.makedirs(test_folder, exist_ok=True)

for index, row in annotations.iterrows():
    img_name = row['Image Name']
    partition = row['Partition']
    
    src_path = os.path.join(img_folder, img_name)
    if partition == 'train':
        dst_path = os.path.join(train_folder, img_name)
    else:
        dst_path = os.path.join(test_folder, img_name)
    
    shutil.copy(src_path, dst_path)

print("Images have been successfully moved")

Images have been successfully moved


In [3]:
actual_train_count = len(os.listdir(train_folder))
actual_test_count = len(os.listdir(test_folder))

expected_train_count = len(annotations[annotations['Partition'] == 'train'])
expected_test_count = len(annotations[annotations['Partition'] == 'test'])

if actual_train_count == expected_train_count and actual_test_count == expected_test_count:
    print("All files have been moved correctly and counts match!")
else:
    print("Warning: There is a mismatch between the files moved and the expected count!")


All files have been moved correctly and counts match!


In [4]:
import os 
import gzip 
import shutil

dataset_dir = 'mnist_dataset/MNIST/raw'
dataset_dest = 'mnist_dataset/MNIST/images'

os.makedirs(dataset_dest, exist_ok=True)
files_to_unzip = [f for f in os.listdir(dataset_dir) if f.endswith('.gz')]

for file in files_to_unzip:
    gz_path =os.path.join(dataset_dir, file)
    unzipped_path = os.path.join(dataset_dest, file[:-3])
    
    with gzip.open(gz_path, 'rb') as f_in:
        with open(unzipped_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

In [1]:
import torch
import torch.nn.functional as F
from torch.optim import SGD
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [15]:
transform = transforms.Compose([
    transforms.ToTensor()
])

train_dataset = datasets.ImageFolder(root=train_folder, transform=transform)
test_dataset = datasets.ImageFolder(root=test_folder, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)


NameError: name 'transforms' is not defined

In [11]:
# Initialize ConvNet-3 for MNIST
convnet_mnist = ConvNet(channel=1, num_classes=10, net_width=64, net_depth=3, net_act='relu', net_norm='batchnorm', net_pooling='maxpooling', im_size=(224,224))


In [13]:
# Initialize ConvNet-7 for MHIST
convnet_mhist = ConvNet(channel=3, num_classes=2, net_width=64, net_depth=7, net_act='relu', net_norm='batchnorm', net_pooling='maxpooling', im_size=(224,224))


In [ ]:
train_loader = DataLoader(train_folder, batch_size=64, shuffle=True)