# FLSim Tutorial: Image classification with CIFAR-10

## Introduction

In this tutorial, we will train a simple CNN image classifier on CIFAR-10 with federated learning using FLSim.

### Prerequisites

To get the most of this tutorial, you should be comfortable with training machine learning models with **PyTorch** and familiar with the concept of **federated learning (FL)**. If you are unfamiliar with either of them or could use a refresher, please take a look at the following resources before proceeding with the tutorial:

- McMahan & Ramage (2017): [Federated Learning: Collaborative Machine Learning without Centralized Training Data](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html). A short blog post from Google AI introducing the main idea of FL in a beginner-friendly way.
- McMahan et al. (2017): [Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/pdf/1602.05629.pdf). This paper first proposes the approach of federated learning. The described algorithm is now known as federated averaging (or FedAvg for short).
- PyTorch has [extensive tutorials](https://pytorch.org/tutorials/) on their website. In particular, take a look at their [image classification tutorial using CIFAR-10](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).

Now that you're familiar with PyTorch and FL, let's move on!

### Objectives 

By the end of this tutorial, we will have learnt how to

1. Build a data pipeline for federated learning with FLSim,
2. Create an image classification model compatible with FL training,
3. Create a metrics reporter to collect and report metrics,
4. Set hyperparameters for FL training, and
5. Launch an FL training flow using FLSim.

## Training an image classifier with FLSim

### Prerequisites
First, let us install flsim via pip with the command below:

In [2]:
# !pip install --quiet flsim

Some useful parameters for later - no need to change these.

In [3]:
USE_CUDA = True
LOCAL_BATCH_SIZE = 32
EXAMPLES_PER_USER = 500
IMAGE_SIZE = 32

# suppress large outputs
VERBOSE = False

### 0. About the dataset

For this tutorial, we will use the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset consists of 60k 3x32x32 3-channel color images with 32x32 pixels from 10 classes, with 6k images per class. 
There are 50k training images (5k training images per class) and 10k test images (1k test images per class).
The classes are ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, and ‘truck’.

![img](https://pytorch.org/tutorials/_images/cifar10.png)

We can get the CIFAR-10 dataset from `torchvision.datasets`.

In [4]:
from torchvision.datasets.cifar import CIFAR10

### 1. Data pipeline

First, let's define how to build the data pipeline for federated learning:

1. We create data transforms and training, eval, and test datasets. This step is identical to preparing data in non-federated learning.

In [5]:
from torchvision import transforms

# 1. Create training, eval, and test datasets like in non-federated learning.
transform = transforms.Compose(
    [
        transforms.Resize(IMAGE_SIZE),
        transforms.CenterCrop(IMAGE_SIZE),
        transforms.ToTensor(),
        transforms.Normalize(
            (0.4914, 0.4822, 0.4465), 
            (0.2023, 0.1994, 0.2010)
        ),
    ]
)
train_dataset = CIFAR10(
    root="./cifar10", train=True, download=True, transform=transform
)
test_dataset = CIFAR10(
    root="./cifar10", train=False, download=True, transform=transform
)

Files already downloaded and verified
Files already downloaded and verified



There are a few extra steps to enable training with federated learning. In particular, we need to

2. Create a sharder, which defines a mapping from examples in the training data to clients. In other words, a sharder groups rows of data into client datasets and returns a list of list of examples. FLSim provides a number of sharding strategies such as random or column-based sharding. 
In this tutorial, we use sequential sharding, which assigns the first `examples_per_user` rows to user 0, the second `examples_per_user` rows to user 1, etc. 

3. Create a data loader, which will shard and batchify training, eval, and test data. For each dataset, the data loader first assigns rows to clients using the sharder and then splits each client's data into batches of size `batch_size`. We choose not to drop the last batch.

4. Lastly, wrap the data loader with a data provider and return it. The data provider creates clients from the groupings in the data loader and adds metadata (e.g. number of examples/batches). Our data is now formatted such that the trainer will accept it.

Note that the concept of a client or device only applies to the training data, the eval and test set data are identical to non-federated learning.

In [6]:
from flsim.data.data_sharder import SequentialSharder
from flsim.utils.example_utils import DataLoader, DataProvider

# 2. Create a sharder, which maps samples in the training data to clients.
sharder = SequentialSharder(examples_per_shard=EXAMPLES_PER_USER)

# 3. Shard and batchify training, eval, and test data.
fl_data_loader = DataLoader(
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    test_dataset=test_dataset,
    sharder=sharder,
    batch_size=LOCAL_BATCH_SIZE,
    drop_last=False,
)

# 4. Wrap the data loader with a data provider.
data_provider = DataProvider(fl_data_loader)
print(f"\nClients in total: {data_provider.num_train_users()}")

Creating FL User: 100user [00:11,  8.70user/s]
Creating FL User: 20user [00:02,  9.28user/s]
Creating FL User: 20user [00:02,  9.15user/s]


Clients in total: 100





### 2. Create the model

Now, let's see how we can create a model that is compatible with FL-training.

1. First, we define a standard, non-FL image classification PyTorch `nn.Module.` In this tutorial we use a simple CNN with 4 convolutional layers, a group norm, and a linear layer. 

2. Create a `torch.device` and choose where the model will be allocated (CUDA or CPU).

As with the data pipeline, these steps are identical to creating a model in non-federated learning. Note that in contrast to non-FL learning, we haven't moved the model to device yet.

In [7]:
import torch
from flsim.utils.example_utils import SimpleConvNet

# 1. Define our model, a simple CNN.
model = SimpleConvNet(in_channels=3, num_classes=10)

# 2. Choose where the model will be allocated.
cuda_enabled = torch.cuda.is_available() and USE_CUDA
device = torch.device(f"cuda:{0}" if cuda_enabled else "cpu")

model, device

(SimpleConvNet(
   (layers): ModuleList(
     (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
     (1-3): 3 x Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2))
   )
   (gn_relu): Sequential(
     (0): GroupNorm(32, 32, eps=1e-05, affine=True)
     (1): ReLU()
     (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
   )
   (dropout): Dropout(p=0, inplace=False)
   (fc): Linear(in_features=288, out_features=10, bias=True)
 ),
 device(type='cpu'))

As with the data pipeline, there are a few extra steps that we need to take to make sure that our model is compatible with FL. In particular, we need to

3. Wrap the PyTorch module with the FLSim `FLModel`, an abstracted version of a FL-friendly model class that is accepted by the trainer and handles metric collection, as well as the forward pass for both training and evaluation. We can recover our `nn.Module` by calling `FLModel.fl_get_module()`

4. Move the model to GPU and enable CUDA if desired. `FLModel.fl_cuda()` internally calls `model.to(device)` to move the model to GPU.

In [8]:
from flsim.utils.example_utils import FLModel

# 3. Wrap the model with FLModel.
global_model = FLModel(model, device)
assert(global_model.fl_get_module() == model)

# 4. Move the model to GPU and enable CUDA if desired.
if cuda_enabled:
    global_model.fl_cuda()

### 3. Metrics Reporting

After having created our data pipeline and FL model, we will now create our metrics reporter. 
The metrics reporter allows us to collect, evaluate, and report relevant training, aggregation, and evaluation/test metrics as well as log them onto TensorBoard.



In [9]:
from flsim.interfaces.metrics_reporter import Channel
from flsim.utils.example_utils import MetricsReporter

# Create a metric reporter.
metrics_reporter = MetricsReporter([Channel.TENSORBOARD, Channel.STDOUT])

There are three functions that are of particular interest:

1. `compute_scores` computes the metrics of interest for both training and aggregation (if desired) as well as evaluation/test.

2. `create_eval_metrics` creates a dictionary that stores the value for each eval metric. 

3. `compare_metrics` compares the current eval metrics that are returned by `create_eval_metrics` to the best eval metrics so far.


For this tutorial, our only metric of interest is top-1 accuracy. In general, as with the data loading and model, you should write your own metrics reporter depending on the task. For example, if you are running an NLP task you may want to have your metrics reporter track perplexity as well.

In [10]:
import inspect

if VERBOSE:
    print(inspect.getsource(MetricsReporter.compute_scores))
    print(inspect.getsource(MetricsReporter.create_eval_metrics))
    print(inspect.getsource(MetricsReporter.compare_metrics))

### 4. Hyperparameters

We can represent the hyperparameters for FL training in a JSON config for ease of representation and we convert the JSON config to OmegaConf before passing it to the FL trainer.

In particular, we specify a FedAvg implementation with 10 users per round.

In [11]:
import flsim.configs
from flsim.utils.config_utils import fl_config_from_json
from omegaconf import OmegaConf

json_config = {
    "trainer": {
        "_base_": "base_sync_trainer",
        # there are different types of aggregator
        # fed avg doesn't require lr, while others such as fed_avg_with_lr or fed_adam do
        "_base_": "base_sync_trainer",
        "server": {
            "_base_": "base_sync_server",
            "server_optimizer": {
                "_base_": "base_fed_avg_with_lr",
                "lr": 2.13,
                "momentum": 0.9
            },
            # type of user selection sampling
            "active_user_selector": {"_base_": "base_uniformly_random_active_user_selector"},
        },
        "client": {
            # number of client's local epoch
            "epochs": 1,
            "optimizer": {
                "_base_": "base_optimizer_sgd",
                # client's local learning rate
                "lr": 0.01,
                # client's local momentum
                "momentum": 0,
            },
        },
        # number of users per round for aggregation
        "users_per_round": 5,
        # total number of global epochs
        # total #rounds = ceil(total_users / users_per_round) * epochs
        "epochs": 1,
        # frequency of reporting train metrics
        "train_metrics_reported_per_epoch": 100,
        # frequency of evaluation per epoch
        "eval_epoch_frequency": 1,
        "do_eval": True,
        # should we report train metrics after global aggregation
        "report_train_metrics_after_aggregation": True,
    }
}
cfg = fl_config_from_json(json_config)
if VERBOSE: print(OmegaConf.to_yaml(cfg))

The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  with initialize(config_path=None):


### 5. Training
Recall that we already built the data provider and created a model compatible with FL training. 
We also initialized a metrics reporter and set our desired hyperparameters.

Now, we only need to instantiate the trainer with the model and hyperparameter config we defined earlier to launch the FL training flow. We run FL training with the above JSON config and utilize `eval_score` to store the final evaluation metrics.

In [13]:
from hydra.utils import instantiate

# Instantiate the trainer.
trainer = instantiate(cfg.trainer, model=global_model, cuda_enabled=cuda_enabled)   

# Launch FL training.
final_model, eval_score = trainer.train(
    data_provider=data_provider,
    metrics_reporter=metrics_reporter,
    num_total_users=data_provider.num_train_users(),
    distributed_world_size=1,
    malicious_count=1,
    attack_type='noise',  # 'scale', 'noise', 'flip'
    attack_param={'scale_factor': -1.5,
                  'noise_std': 0.1,
                  'label_1': 5,
                  'label_2': 9},
    check_type='strict',  # 'no_check', 'strict', 'prob_zkp'
    check_param={'pred': 'l2norm', # 'l2norm', 'sphere', 'cosine'
                 'norm_bound': 0.2},
)

Round:   0%|          | 0/20 [00:00<?, ?round/s]

*** computing delta! ***
delta norm before: 0.18197865784168243
noise, delta norm after: 17.726133346557617
check failed!
*** computing delta! ***
delta norm before: 0.2043217271566391
no attack
delta norm after: 0.20000000298023224
check passed!
*** computing delta! ***
delta norm before: 0.2019915133714676
no attack
delta norm after: 0.20000000298023224
check passed!
*** computing delta! ***
delta norm before: 0.17443476617336273
no attack
delta norm after: 0.17443476617336273
check passed!
*** computing delta! ***
delta norm before: 0.17970216274261475
no attack
delta norm after: 0.17970216274261475
check passed!
Train finished Global Round: 1
(epoch = 1, round = 1, global round = 1), Loss/Training: 1.6359465916951497
(epoch = 1, round = 1, global round = 1), Accuracy/Training: 42.06161137440758


Round:   5%|▌         | 1/20 [00:28<09:00, 28.44s/round]

(epoch = 1, round = 1, global round = 1), Loss/Aggregation: 1.6238590151071548
(epoch = 1, round = 1, global round = 1), Accuracy/Aggregation: 42.04
*** computing delta! ***
delta norm before: 0.20613853633403778
noise, delta norm after: 17.611501693725586
check failed!
*** computing delta! ***
delta norm before: 0.20751015841960907
no attack
delta norm after: 0.19999998807907104
check passed!
*** computing delta! ***
delta norm before: 0.18227913975715637
no attack
delta norm after: 0.18227913975715637
check passed!
*** computing delta! ***
delta norm before: 0.19321690499782562
no attack
delta norm after: 0.19321690499782562
check passed!
*** computing delta! ***
delta norm before: 0.17303138971328735
no attack
delta norm after: 0.17303138971328735
check passed!
Train finished Global Round: 2
(epoch = 1, round = 2, global round = 2), Loss/Training: 1.6187740370631218
(epoch = 1, round = 2, global round = 2), Accuracy/Training: 42.6


Round:  10%|█         | 2/20 [00:33<04:27, 14.89s/round]

(epoch = 1, round = 2, global round = 2), Loss/Aggregation: 1.547975528240204
(epoch = 1, round = 2, global round = 2), Accuracy/Aggregation: 45.52
*** computing delta! ***
delta norm before: 0.1784418374300003
noise, delta norm after: 17.747121810913086
check failed!
*** computing delta! ***
delta norm before: 0.1832793653011322
no attack
delta norm after: 0.1832793653011322
check passed!
*** computing delta! ***
delta norm before: 0.1650639921426773
no attack
delta norm after: 0.1650639921426773
check passed!
*** computing delta! ***
delta norm before: 0.2472349852323532
no attack
delta norm after: 0.20000000298023224
check passed!


Round:  10%|█         | 2/20 [00:39<05:52, 19.56s/round]
Epoch:   0%|          | 0/1 [00:39<?, ?epoch/s]


KeyboardInterrupt: 

After training finishes, we evaluate the model and report the accuracy on the test set before finishing this tutorial.


In [11]:
# We can now test our trained model.
trainer.test(
    data_provider=data_provider,
    metrics_reporter=MetricsReporter([Channel.STDOUT]),
)

Running (epoch = 1, round = 1, global round = 1) for Test
(epoch = 1, round = 1, global round = 1), Loss/Test: 1.4767778711393476
(epoch = 1, round = 1, global round = 1), Accuracy/Test: 46.96


{'Accuracy': 46.96}

## Summary

In this tutorial, we first showed how to get the data. We then built a data provider by sharding the data to simulate multiple client devices, each with their own data, and splitting each client's data into batches. 
We defined a simple CNN as our model, wrapped it with a model compatible with FL training, and moved it to GPU. 
Lastly, we set the hyperparameters for FL training, launched the training flow, and evaluated our model.

### Additional resources

- For a more in-depth understanding of this tutorial, check out [example_utils.py](https://github.com/facebookresearch/FLSim/blob/main/flsim/utils/example_utils.py) where we define the data loader, data provider, simple CNN, `FLModel`, and metrics reporter that we use in this tutorial.

- [FLSim tutorials](https://github.com/facebookresearch/FLSim/tree/main/tutorials) - check out our other tutorial on sentiment classification.

- Kairouz et al. (2021): [Advances and Open Problems in Federated Learning](https://arxiv.org/pdf/1912.04977.pdf). As the title suggests, an in-depth overview of advances and open problems in FL.

- If you're interested in federated learning with differential privacy, take a look at [Opacus](https://opacus.ai/), a library that enables training PyTorch models with differential privacy. 
You can find a blog post introducing Opacus [here](https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy/).

