# Efficient rank training

Let's start with setting up the notebook. Skip it in case you have installed fedcore via pip and are able to import it

In [22]:
import os 
os.getcwd()

'/ptls-experiments/FedCore/examples'

In [23]:
os.chdir('../..')

In [24]:
# Older CUDA's version may not support expandable objects in CUDA malloc. This will fix it
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = ""

## Congiguration

In [25]:
from fedcore.tools.example_utils import get_scenario_for_api
from fedcore.api.main import FedCore
from fedcore.api.utils.checkers_collection import ApiConfigCheck
from fedcore.data.dataloader import load_data
from fedcore.api.utils.evaluation import evaluate_original_model, evaluate_optimised_model
from fedcore.repository.config_repository import DEFAULT_CLF_API_CONFIG

DATASET = 'CIFAR10'
DATASET_PARAMS = {'train_bs': 64,
                  'val_bs': 100,
                  'train_shuffle': True,
                  'val_shuffle': False}
METRIC_TO_OPTIMISE = ['accuracy', 'latency', 'throughput']
initial_assumption = 'ResNet18'
initial_assumption, learning_strategy = get_scenario_for_api('checkpoint', initial_assumption)

USER_CONFIG = {'problem': 'classification',
               'metric': METRIC_TO_OPTIMISE,
               'initial_assumption': initial_assumption,
               'pop_size': 1, # how many models to train in parallel
               'timeout': 1.3, # how long optimization process runs (in minutes)
               'learning_strategy': 'from_checkpoint', # whether we have pretrained model or not
               'learning_strategy_params': dict(epochs=0,
                                                learning_rate=0.0001,
                                                loss='crossentropy',
                                                custom_loss = [], # needs to rework BaseNN class
                                                custom_learning_params=dict(use_early_stopping={'patience': 30,
                                                                                                'maximise_task': False,
                                                                                                'delta': 0.01})
                                                ), # activates if basic pretrain is needed
               'peft_strategy': 'low_rank', # the compression approach
               'peft_strategy_params': dict(
                   log_each=1, # how often to print train losses
                   eval_each=5, # how often to evaluate model on validation dataset
                   scheduler='one_cycle', # which lr scheduler to use
                   epochs=10,
                   loss='crossentropy',
                   non_adaptive_threshold=0.2, # threshold for S-strategies
                    custom_criterions = {'hoer': 0.5, 'orthogonal': 0.2}, # Addtional losses with their coefficients
                   finetune_params={'epochs': 1,
                                    "learning_rate": 0.0001,
                                    'loss': 'crossentropy'} # parameters for finetuning model after rank pruning
                ),
               }

# Initialization of API config
api_config = ApiConfigCheck().update_config_with_kwargs(DEFAULT_CLF_API_CONFIG, **USER_CONFIG)

## Start learning

In [26]:
# use built-in dataset loader or define your own
input_data = load_data(DATASET)
# initialize compressor instance
fedcore_compressor = FedCore(**api_config)

# if we have already got the compression pipeline, we may launch fitting without evolutionary search
x = fedcore_compressor.fit_no_evo(input_data)

# else we may run it like this:
# fedcore_compressor.fit(input_data);

Files already downloaded and verified
Files already downloaded and verified
Creating Dask Server
Triggered OptimizerGen at 1 epoch.
Triggered SchedulerRenewal at 1 epoch.


Batch #: 100%|██████████| 782/782 [00:45<00:00, 17.30it/s]

Triggered Evaluator at 1 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 63.36it/s]

Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 7.273792279041027
Valid # epoch: 1, value: 6.7633562088012695
Including:
	Criterion `train_hoer_loss`: 5.835456371307373
	Criterion `val_hoer_loss`: 5.83475923538208
	Criterion `train_orthogonal_loss`: 0.0013179627712816
	Criterion `val_orthogonal_loss`: 0.0013199361274018884





After rank pruning left only 100.0 % of conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv2 layer params
After rank pruning left only 100.0 % of layer1.1.conv1 layer params
After rank pruning left only 100.0 % of layer1.1.conv2 layer params
After rank pruning left only 50.0 % of layer2.0.conv1 layer params
After rank pruning left only 50.0 % of layer2.0.conv2 layer params
After rank pruning left only 100.0 % of layer2.0.downsample.0 layer params
After rank pruning left only 50.0 % of layer2.1.conv1 layer params
After rank pruning left only 50.0 % of layer2.1.conv2 layer params
After rank pruning left only 75.0 % of layer3.0.conv1 layer params
After rank pruning left only 75.0 % of layer3.0.conv2 layer params
After rank pruning left only 50.0 % of layer3.0.downsample.0 layer params
After rank pruning left only 75.0 % of layer3.1.conv1 layer params
After rank pruning left only 75.0 % of layer3.1.con

Batch #: 100%|██████████| 782/782 [00:21<00:00, 35.71it/s]

Triggered Evaluator at 1 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 69.04it/s]

Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 0.733492976106951
Valid # epoch: 1, value: 0.941818356513977
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 0.0
	Criterion `train_orthogonal_loss`: 0.014547276310622692
	Criterion `val_orthogonal_loss`: 0.014539467170834541
Triggered Evaluator at 1 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 63.81it/s]


Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 0.733492976106951
Valid # epoch: 1, value: 0.6429351568222046
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 0.0
	Criterion `train_orthogonal_loss`: 0.014547276310622692
	Criterion `val_orthogonal_loss`: 0.014539467170834541
Params: 13.44 M => 11.69 M
MACs: 0.00 G => 0.00 G
Triggered OptimizerGen at 1 epoch.
Triggered SchedulerRenewal at 1 epoch.


Batch #: 100%|██████████| 782/782 [00:41<00:00, 18.81it/s]


Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 6.316850726256895
Including:
	Criterion `train_hoer_loss`: 5.103714466094971
	Criterion `train_orthogonal_loss`: 0.004544746596366167
Triggered SchedulerRenewal at 2 epoch.


Batch #: 100%|██████████| 782/782 [00:43<00:00, 18.08it/s]


Triggered FitReport at 2 epoch.
Train # epoch: 2, value: 5.36578170844661
Including:
	Criterion `train_hoer_loss`: 4.387579441070557
	Criterion `train_orthogonal_loss`: 0.005620826967060566
Triggered SchedulerRenewal at 3 epoch.


Batch #: 100%|██████████| 782/782 [00:42<00:00, 18.26it/s]


Triggered FitReport at 3 epoch.
Train # epoch: 3, value: 4.764361601656355
Including:
	Criterion `train_hoer_loss`: 3.9406425952911377
	Criterion `train_orthogonal_loss`: 0.008194840513169765
Triggered SchedulerRenewal at 4 epoch.


Batch #: 100%|██████████| 782/782 [00:45<00:00, 17.01it/s]


Triggered FitReport at 4 epoch.
Train # epoch: 4, value: 4.281845031187053
Including:
	Criterion `train_hoer_loss`: 3.6321654319763184
	Criterion `train_orthogonal_loss`: 0.008937899954617023
Triggered SchedulerRenewal at 5 epoch.


Batch #: 100%|██████████| 782/782 [00:45<00:00, 17.30it/s]

Triggered Evaluator at 5 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 62.77it/s]


Triggered FitReport at 5 epoch.
Train # epoch: 5, value: 3.935346496989355
Valid # epoch: 5, value: 3.996588706970215
Including:
	Criterion `train_hoer_loss`: 3.388068914413452
	Criterion `val_hoer_loss`: 3.387845754623413
	Criterion `train_orthogonal_loss`: 0.009525366127490997
	Criterion `val_orthogonal_loss`: 0.009528516791760921
Triggered SchedulerRenewal at 6 epoch.


Batch #: 100%|█████████▉| 780/782 [00:32<00:00, 25.02it/s]

2025-03-24 13:41:46,382 - Failed to reconnect to scheduler after 30.00 seconds, closing client


Batch #: 100%|██████████| 782/782 [00:33<00:00, 23.53it/s]


Triggered FitReport at 6 epoch.
Train # epoch: 6, value: 3.666601589573619
Valid # epoch: 5, value: 3.996588706970215
Including:
	Criterion `train_hoer_loss`: 3.1933863162994385
	Criterion `val_hoer_loss`: 3.387845754623413
	Criterion `train_orthogonal_loss`: 0.010820823721587658
	Criterion `val_orthogonal_loss`: 0.009528516791760921
Triggered SchedulerRenewal at 7 epoch.


Batch #: 100%|██████████| 782/782 [00:29<00:00, 26.47it/s]


Triggered FitReport at 7 epoch.
Train # epoch: 7, value: 3.4544218171892873
Valid # epoch: 5, value: 3.996588706970215
Including:
	Criterion `train_hoer_loss`: 3.0271987915039062
	Criterion `val_hoer_loss`: 3.387845754623413
	Criterion `train_orthogonal_loss`: 0.011851239949464798
	Criterion `val_orthogonal_loss`: 0.009528516791760921
Triggered SchedulerRenewal at 8 epoch.


Batch #: 100%|██████████| 782/782 [00:28<00:00, 27.88it/s]


Triggered FitReport at 8 epoch.
Train # epoch: 8, value: 3.3343742787075774
Valid # epoch: 5, value: 3.996588706970215
Including:
	Criterion `train_hoer_loss`: 2.8864057064056396
	Criterion `val_hoer_loss`: 3.387845754623413
	Criterion `train_orthogonal_loss`: 0.014533269219100475
	Criterion `val_orthogonal_loss`: 0.009528516791760921
Triggered SchedulerRenewal at 9 epoch.


Batch #: 100%|██████████| 782/782 [00:26<00:00, 29.31it/s]


Triggered FitReport at 9 epoch.
Train # epoch: 9, value: 3.1375955187756084
Valid # epoch: 5, value: 3.996588706970215
Including:
	Criterion `train_hoer_loss`: 2.772501230239868
	Criterion `val_hoer_loss`: 3.387845754623413
	Criterion `train_orthogonal_loss`: 0.01460874080657959
	Criterion `val_orthogonal_loss`: 0.009528516791760921
Triggered SchedulerRenewal at 10 epoch.


Batch #: 100%|██████████| 782/782 [00:31<00:00, 24.77it/s]

Triggered Evaluator at 10 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 64.73it/s]

Triggered FitReport at 10 epoch.
Train # epoch: 10, value: 3.246570540815973
Valid # epoch: 10, value: 3.3572332859039307
Including:
	Criterion `train_hoer_loss`: 2.682718515396118
	Criterion `val_hoer_loss`: 2.682617664337158
	Criterion `train_orthogonal_loss`: 0.017831264063715935
	Criterion `val_orthogonal_loss`: 0.017845189198851585





After rank pruning left only 100.0 % of conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv2 layer params
After rank pruning left only 100.0 % of layer1.1.conv1 layer params
After rank pruning left only 100.0 % of layer1.1.conv2 layer params
After rank pruning left only 50.0 % of layer2.0.conv1 layer params
After rank pruning left only 50.0 % of layer2.0.conv2 layer params
After rank pruning left only 100.0 % of layer2.0.downsample.0 layer params
After rank pruning left only 50.0 % of layer2.1.conv1 layer params
After rank pruning left only 50.0 % of layer2.1.conv2 layer params
After rank pruning left only 75.0 % of layer3.0.conv1 layer params
After rank pruning left only 75.0 % of layer3.0.conv2 layer params
After rank pruning left only 50.0 % of layer3.0.downsample.0 layer params
After rank pruning left only 75.0 % of layer3.1.conv1 layer params
After rank pruning left only 75.0 % of layer3.1.con

Batch #: 100%|██████████| 782/782 [00:17<00:00, 45.13it/s]


Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 1.007878284415473
Valid # epoch: 10, value: 3.3572332859039307
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 2.682617664337158
	Criterion `train_orthogonal_loss`: 0.1446092128753662
	Criterion `val_orthogonal_loss`: 0.017845189198851585
Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 1.007878284415473
Valid # epoch: 10, value: 3.3572332859039307
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 2.682617664337158
	Criterion `train_orthogonal_loss`: 0.1446092128753662
	Criterion `val_orthogonal_loss`: 0.017845189198851585
Params: 13.44 M => 11.69 M
MACs: 0.00 G => 0.00 G
Triggered OptimizerGen at 1 epoch.
Triggered SchedulerRenewal at 1 epoch.


Batch #: 100%|██████████| 782/782 [00:30<00:00, 25.99it/s]


Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 2.9766529591187187
Including:
	Criterion `train_hoer_loss`: 2.599703788757324
	Criterion `train_orthogonal_loss`: 0.016561400145292282
Triggered SchedulerRenewal at 2 epoch.


Batch #: 100%|██████████| 782/782 [00:33<00:00, 23.42it/s]


Triggered FitReport at 2 epoch.
Train # epoch: 2, value: 2.8500059681475314
Including:
	Criterion `train_hoer_loss`: 2.5225141048431396
	Criterion `train_orthogonal_loss`: 0.018443968147039413
Triggered SchedulerRenewal at 3 epoch.


Batch #: 100%|██████████| 782/782 [00:34<00:00, 22.54it/s]


Triggered FitReport at 3 epoch.
Train # epoch: 3, value: 2.7417937870830524
Including:
	Criterion `train_hoer_loss`: 2.4564473628997803
	Criterion `train_orthogonal_loss`: 0.019507018849253654
Triggered SchedulerRenewal at 4 epoch.


Batch #: 100%|██████████| 782/782 [00:32<00:00, 24.35it/s]


Triggered FitReport at 4 epoch.
Train # epoch: 4, value: 2.618494696629322
Including:
	Criterion `train_hoer_loss`: 2.391711950302124
	Criterion `train_orthogonal_loss`: 0.020174413919448853
Triggered SchedulerRenewal at 5 epoch.


Batch #: 100%|██████████| 782/782 [00:30<00:00, 25.36it/s]

Triggered Evaluator at 5 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 60.02it/s]


Triggered FitReport at 5 epoch.
Train # epoch: 5, value: 2.593998151362095
Valid # epoch: 5, value: 3.1650121212005615
Including:
	Criterion `train_hoer_loss`: 2.333120107650757
	Criterion `val_hoer_loss`: 2.333142042160034
	Criterion `train_orthogonal_loss`: 0.022275250405073166
	Criterion `val_orthogonal_loss`: 0.022288622334599495
Triggered SchedulerRenewal at 6 epoch.


Batch #: 100%|██████████| 782/782 [00:29<00:00, 26.20it/s]


Triggered FitReport at 6 epoch.
Train # epoch: 6, value: 2.4951586378809743
Valid # epoch: 5, value: 3.1650121212005615
Including:
	Criterion `train_hoer_loss`: 2.277880907058716
	Criterion `val_hoer_loss`: 2.333142042160034
	Criterion `train_orthogonal_loss`: 0.022876515984535217
	Criterion `val_orthogonal_loss`: 0.022288622334599495
Triggered SchedulerRenewal at 7 epoch.


Batch #: 100%|██████████| 782/782 [00:30<00:00, 25.94it/s]


Triggered FitReport at 7 epoch.
Train # epoch: 7, value: 2.4177568669209393
Valid # epoch: 5, value: 3.1650121212005615
Including:
	Criterion `train_hoer_loss`: 2.2217540740966797
	Criterion `val_hoer_loss`: 2.333142042160034
	Criterion `train_orthogonal_loss`: 0.02384697087109089
	Criterion `val_orthogonal_loss`: 0.022288622334599495
Triggered SchedulerRenewal at 8 epoch.


Batch #: 100%|██████████| 782/782 [00:27<00:00, 28.34it/s]


Triggered FitReport at 8 epoch.
Train # epoch: 8, value: 2.3642440086130594
Valid # epoch: 5, value: 3.1650121212005615
Including:
	Criterion `train_hoer_loss`: 2.1696643829345703
	Criterion `val_hoer_loss`: 2.333142042160034
	Criterion `train_orthogonal_loss`: 0.02502591721713543
	Criterion `val_orthogonal_loss`: 0.022288622334599495
Triggered SchedulerRenewal at 9 epoch.


Batch #: 100%|██████████| 782/782 [00:30<00:00, 26.03it/s]


Triggered FitReport at 9 epoch.
Train # epoch: 9, value: 2.3160296834033467
Valid # epoch: 5, value: 3.1650121212005615
Including:
	Criterion `train_hoer_loss`: 2.120313882827759
	Criterion `val_hoer_loss`: 2.333142042160034
	Criterion `train_orthogonal_loss`: 0.026230989024043083
	Criterion `val_orthogonal_loss`: 0.022288622334599495
Triggered SchedulerRenewal at 10 epoch.


Batch #: 100%|██████████| 782/782 [00:29<00:00, 26.51it/s]

Triggered Evaluator at 10 epoch.



Batch #: 100%|██████████| 100/100 [00:01<00:00, 57.53it/s]

Triggered FitReport at 10 epoch.
Train # epoch: 10, value: 2.729780821849013
Valid # epoch: 10, value: 2.838242530822754
Including:
	Criterion `train_hoer_loss`: 2.0968384742736816
	Criterion `val_hoer_loss`: 2.096817970275879
	Criterion `train_orthogonal_loss`: 0.03219258412718773
	Criterion `val_orthogonal_loss`: 0.03218986466526985





After rank pruning left only 100.0 % of conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv1 layer params
After rank pruning left only 100.0 % of layer1.0.conv2 layer params
After rank pruning left only 100.0 % of layer1.1.conv1 layer params
After rank pruning left only 100.0 % of layer1.1.conv2 layer params
After rank pruning left only 50.0 % of layer2.0.conv1 layer params
After rank pruning left only 50.0 % of layer2.0.conv2 layer params
After rank pruning left only 100.0 % of layer2.0.downsample.0 layer params
After rank pruning left only 50.0 % of layer2.1.conv1 layer params
After rank pruning left only 50.0 % of layer2.1.conv2 layer params
After rank pruning left only 75.0 % of layer3.0.conv1 layer params
After rank pruning left only 75.0 % of layer3.0.conv2 layer params
After rank pruning left only 50.0 % of layer3.0.downsample.0 layer params
After rank pruning left only 75.0 % of layer3.1.conv1 layer params
After rank pruning left only 75.0 % of layer3.1.con

Batch #: 100%|██████████| 782/782 [00:21<00:00, 36.98it/s]


Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 1.4241080399974229
Valid # epoch: 10, value: 2.838242530822754
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 2.096817970275879
	Criterion `train_orthogonal_loss`: 0.28798818588256836
	Criterion `val_orthogonal_loss`: 0.03218986466526985
Triggered FitReport at 1 epoch.
Train # epoch: 1, value: 1.4241080399974229
Valid # epoch: 10, value: 2.838242530822754
Including:
	Criterion `train_hoer_loss`: 0.0
	Criterion `val_hoer_loss`: 2.096817970275879
	Criterion `train_orthogonal_loss`: 0.28798818588256836
	Criterion `val_orthogonal_loss`: 0.03218986466526985
Params: 13.44 M => 11.69 M
MACs: 0.00 G => 0.00 G


In [27]:
opt_model = fedcore_compressor.optimised_model.model