# Evaluation Results (Tables 1 & 2 & 3 in Paper)

- **Pretrained Single_Velocity_HPT**  
   Reported by *Kong et al. (2021)*, uses the AMT-estimated MIDI notes for evaluation.  
   → This reflects the performance of **original Automatic Music Transcription (AMT)** systems.

- **Retrained Single_Velocity_HPT**  
   Uses the **ground-truth MIDI notes** for evaluation.  
   → This simulates a use case where **the MIDI transcription timing has already been corrected** (either manually or via audio–MIDI alignment tools).

- **Dual_Velocity_HPT** and **Triple_Velocity_HPT**  
   Follow the same setup as the retrained Single_Velocity_HPT, and the goal is to **refine MIDI velocity prediction** based on known note timings.
<br>
<br>

Please check the results on this link (available_later_for_anonymous). Our code will automatically upload all evaluation results to wandb logger, which is forming this online report. The following code is for a single inference:

In [None]:
!python pytorch/inference.py exp.run_infer='multi' model.type='velo'\
     model.name='Single_Velocity_HPT'\
     dataset.test_set='smd' # maps, maestro

# !python pytorch/inference.py exp.run_infer='multi' model.type='velo'\
#      model.name='Dual_Velocity_HPT' model.input2='onset'\
#      dataset.test_set='smd' # maps, maestro

# !python pytorch/inference.py exp.run_infer='multi' model.type='velo'\
#      model.name='Triple_Velocity_HPT' model.input2='onset' model.input3='exframe'\
#      dataset.test_set='smd' # maps, maestro

Inference Mode : MULTI
Model Name     : Single_Velocity_HPT
Test Set       : smd
Using Device   : cuda
Found 2 checkpoints in ./workspaces/checkpoints/Single_Velocity_HPT
------------------------------------------------------------
[1/2] 195000_iteration.pth
Proc 195000 Ckpt: 100%|███████████████████████| 49/49 [06:25<00:00,  7.87s/file]
[Done] Time: 385.84 sec
------------------------------------------------------------
[2/2] 200000_iteration.pth
Proc 200000 Ckpt: 100%|███████████████████████| 49/49 [06:28<00:00,  7.93s/file]
[Done] Time: 388.63 sec

All checkpoint inference completed in 774.47 sec


In [None]:
!python pytorch/calculate_scores.py exp.run_infer='multi' exp.num_workers=12 model.type='velo'\
     model.name='Single_Velocity_HPT'\
     dataset.test_set='smd' # maps, maestro

# !python pytorch/calculate_scores.py exp.run_infer='multi' exp.num_workers=12 model.type='velo'\
#      model.name='Dual_Velocity_HPT' model.input2='onset'\
#      dataset.test_set='smd' # maps, maestro

# !python pytorch/calculate_scores.py exp.run_infer='multi' exp.num_workers=12 model.type='velo'\
#      model.name='Triple_Velocity_HPT' model.input2='onset' model.input3='exframe'\
#      dataset.test_set='smd' # maps, maestro

Evaluation Mode : MULTI
Model Name      : Single_Velocity_HPT
Test Set        : smd
Using device    : cpu
Found 2 checkpoints in ./workspaces/checkpoints/Single_Velocity_HPT
------------------------------------------------------------
[1/2] Evaluating: 195000_iteration.pth
[Done] Time: 36.30 sec
velocity_mae: 14.8968
velocity_std: 8.6206
velocity_recall: 0.7515
------------------------------------------------------------
[2/2] Evaluating: 200000_iteration.pth
[Done] Time: 41.78 sec
velocity_mae: 15.3933
velocity_std: 8.7335
velocity_recall: 0.7493

[Saved] Summary in Wandb and CSV: ./workspaces/logs/Single_Velocity_HPT_smd.csv
All checkpoint scores completed.


We conducted inferences on Cloud GPUs with the following code. Therefore, no results recorded in this jupyter notebook. You can find our results via [wandb report](https://api.wandb.ai/links/zhanh-uwa/wc7q3j5b).

In [None]:
MODEL_CONFIGS=(
    "model.name='Single_Velocity_HPT'"
    "model.name='Dual_Velocity_HPT' model.input2='onset'"
    "model.name='Dual_Velocity_HPT' model.input2='frame'"
    "model.name='Dual_Velocity_HPT' model.input2='exframe'"
    "model.name='Triple_Velocity_HPT' model.input2='onset' model.input3='frame'"
    "model.name='Triple_Velocity_HPT' model.input2='onset' model.input3='exframe'"
    "model.name='Triple_Velocity_HPT' model.input2='frame' model.input3='exframe'"
)

DATASET='maps' # 'smd', 'maps', or 'maestro'

# Get the specific config based on SLURM_ARRAY_TASK_ID
CONFIG=${MODEL_CONFIGS[$SLURM_ARRAY_TASK_ID]}

echo "Selected model config index: $SLURM_ARRAY_TASK_ID"
echo "Running inference with config: $CONFIG"
python pytorch/inference.py exp.run_infer='multi' model.type='velo' $CONFIG dataset.test_set="$DATASET"

echo "Running scoring with config: $CONFIG"
python pytorch/calculate_scores.py exp.run_infer='multi' exp.num_workers=12 model.type='velo' $CONFIG dataset.test_set="$DATASET"