# Genomic Foundation Model Auto-Benchmarking
This script is used to auto-benchmark the Genomic Foundation Model on diversified downstream tasks. 
We have automated the benchmark pipeline based on the OmniGenome package. 
Once your foundation model is trained, you can use this script to evaluate the performance of the model. 
The script will automatically load the datasets, preprocess the data, and evaluate the model on the tasks. 
The script will output the performance of the model on each task.

## [Optional] Prepare your own benchmark datasets
We have provided a set of benchmark datasets in the tutorials, you can use them to evaluate the performance of the model.
If you want to evaluate the model on your own datasets, you can prepare the datasets in the following steps:
1. Prepare the datasets in the following format:
    - The datasets should be in the `json` format.
    - The datasets should contain two columns: `sequence` and `label`.
    - The `sequence` column should contain the DNA sequences.
    - The `label` column should contain the labels of the sequences.
2. Save the datasets in a folder like the existing benchmark datasets. This folder is referred to as the `root` in the script.
3. Place the model and tokenizer in an accessible folder.
4. Sometimes the tokenizer does not work well with the datasets, you can write a custom tokenizer and model wrapper in the `omnigenome_wrapper.py` file.
More detailed documentation on how to write the custom tokenizer and model wrapper will be provided.

## Prepare the benchmark environment
Before running the benchmark, you need to install the following required packages in addition to PyTorch and other dependencies.
Find the installation instructions for PyTorch at https://pytorch.org/get-started/locally/.
```bash
pip install omnigenome, findfile, autocuda, metric-visualizer, transformers
```

## Import the required packages

In [1]:
from omnigenome import AutoBench
import autocuda

  from .autonotebook import tqdm as notebook_tqdm


                       
      **@@ +----- @@**             / _ \  _ __ ___   _ __  (_)
        **@@ = @@**               | | | || '_ ` _ \ | '_ \ | |
           **@@                   | |_| || | | | | || | | || |
        @@** = **@@                \___/ |_| |_| |_||_| |_||_|
     @@** ------+ **@@                
  @@ ---------------+ @@          / ___|  ___  _ __    ___   _ __ ___    ___ 
  @@ +--------------- @@         | |_| ||  __/| | | || (_) || | | | | ||  __/ 
    @@** +------ **@@          
       @@** = **@@           
          @@**                    ____                      _   
       **@@ = @@**               | __ )   ___  _ __    ___ | |__  
    **@@ -----+  @@**            |  _ \  / _ \| '_ \  / __|| '_ \ 
  @@ --------------+ @@**        |____/  \___||_| |_| \___||_| |_|



## 1. Define the root folder of the benchmark datasets
Define the root where the benchmark datasets are stored.

In [2]:
root = 'RGB'  # Abbreviation of the RNA genome benchmark

## 2. Define the model and tokenizer paths
Provide the path to the model and tokenizer.

In [3]:
model_name_or_path = 'anonymous8/OmniGenome-52M'

## 3. Initialize the AutoBench
Select the available CUDA device based on your hardware.

In [4]:
device = autocuda.auto_cuda()
auto_bench = AutoBench(
    benchmark=root,
    model_name_or_path=model_name_or_path,
    device="cuda",
    overwrite=True,
)

[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  Benchmark: RGB does not exist. Search online for available benchmarks.
[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  Loaded benchmarks:  ['RNA-mRNA', 'RNA-SNMD', 'RNA-SNMR', 'RNA-SSP-Archive2', 'RNA-SSP-rnastralign', 'RNA-SSP-bpRNA', 'RNA-TE-Prediction.Arabidopsis', 'RNA-TE-Prediction.Rice', 'RNA-Region-Classification.Arabidopsis', 'RNA-Region-Classification.Rice']
[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  Benchmark Root: __OMNIGENOME_DATA__/benchmarks/RGB
Benchmark List: ['RNA-mRNA', 'RNA-SNMD', 'RNA-SNMR', 'RNA-SSP-Archive2', 'RNA-SSP-rnastralign', 'RNA-SSP-bpRNA', 'RNA-TE-Prediction.Arabidopsis', 'RNA-TE-Prediction.Rice', 'RNA-Region-Classification.Arabidopsis', 'RNA-Region-Classification.Rice']
Model Name or Path: OmniGenome-52M
Tokenizer: None
Metric Visualizer Path: ./autobench_evaluations/RGB-OmniGenome-52M-20250419_171940.mv
BenchConfig Details: <module 'bench_metadata' from 'D:\\OneDrive - University of Exeter\\AIProjects

## 4. Run the benchmark
The downstream tasks have predefined configurations for fair comparison.
However, sometimes you might need to adjust the configuration based on your dataset or resources.
For instance, adjusting the `max_length` or batch size.
To adjust the configuration, you can override parameters in the `AutoBenchConfig` class.

In [5]:
batch_size = 4
epochs = 1  # increase for real cases
seeds = [42]
auto_bench.run(epochs=epochs, batch_size=batch_size, seeds=seeds)

[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
Running evaluation for task: RNA-mRNA Progress:  1 / 10 10.0%
[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  Loaded config for RNA-mRNA from __OMNIGENOME_DATA__/benchmarks/RGB\RNA-mRNA\config.py
[2025-04-19 17:19:40] [OmniGenome 0.2.4alpha4]  {'task_name': 'RNA-mRNA', 'task_type': 'token_regression', 'label2id': None, 'num_labels': 3, 'epochs': 50, 'patience': 5, 'learning_rate': 2e-05, 'weight_decay': 0, 'batch_size': 4, 'max_length': 110, 'seeds': [45, 46, 47], 'compute_metrics': [<function RegressionMetric.__getattribute__.<locals>.wrapper at 0x000001F7DA8A1080>], 'train_file': 'D:\\OneDrive - University of Exeter\\AIProjects\\OmniGenomeBench\\examples\\tutorials\\__OMNIGENOME_DATA__/benchmarks/RGB\\RNA-mRNA/train.json', 'test_file': 'D:\\OneDrive - University of Exeter\\AIProjects\\OmniGenomeBench\\examples\\tutorials\\__OMNIGENOME_DATA__/benchmarks/RGB

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:19:45] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenRegression
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenRegression', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenRegression'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
    "AutoModel

100%|██████████| 1728/1728 [00:01<00:00, 1016.65it/s]


[2025-04-19 17:19:47] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=110, label_padding_length=110
[2025-04-19 17:19:47] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 109.0, 'max_seq_len': 109, 'min_seq_len': 109, 'avg_label_len': 110.0, 'max_label_len': 110, 'min_label_len': 110}
[2025-04-19 17:19:47] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:19:47] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 4, 4, 4, 4, 9, 4, 4, 9, 9, 4, 5, 5, 6, 9, 6, 5, 5, 9, 5, 5, 4,
        5, 6, 4, 4, 4, 6, 9, 4, 6, 6, 6, 4, 5, 6, 5, 5, 4, 4, 9, 5, 9, 5, 5, 4,
        9, 6, 6, 5, 6, 6, 4, 4, 6, 5, 5, 9, 6, 4, 5, 6, 6, 9, 9, 4, 4, 6, 5, 4,
        9, 6, 4, 6, 9, 9, 5, 6, 5, 9, 5, 4, 9, 6, 5, 4, 4, 4, 4, 6, 4, 4, 4, 5,
        4, 4, 5, 4, 4, 5, 4, 4, 5, 4, 4, 5, 2, 1]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

100%|██████████| 192/192 [00:00<00:00, 1045.15it/s]


[2025-04-19 17:19:48] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=110, label_padding_length=110
[2025-04-19 17:19:48] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 109.0, 'max_seq_len': 109, 'min_seq_len': 109, 'avg_label_len': 110.0, 'max_label_len': 110, 'min_label_len': 110}
[2025-04-19 17:19:48] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:19:48] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 4, 4, 4, 6, 9, 9, 6, 6, 4, 5, 9, 6, 9, 9, 9, 9, 6, 4, 9, 9, 6,
        6, 9, 4, 6, 4, 9, 9, 9, 6, 4, 6, 5, 4, 4, 4, 6, 5, 9, 9, 4, 6, 4, 9, 9,
        9, 6, 9, 5, 4, 6, 9, 9, 4, 6, 6, 4, 9, 6, 6, 9, 5, 9, 6, 4, 5, 5, 4, 6,
        6, 9, 9, 9, 9, 9, 5, 6, 4, 4, 6, 5, 9, 9, 6, 4, 4, 4, 4, 6, 4, 4, 4, 5,
        4, 4, 5, 4, 4, 5, 4, 4, 5, 4, 4, 5, 2, 1]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

  self.scaler = GradScaler()
Testing: 100%|██████████| 48/48 [00:03<00:00, 12.77it/s]


[2025-04-19 17:19:52] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.99506414}
[2025-04-19 17:19:52] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.99506414}


Epoch 1/1 Loss: 0.4697: 100%|██████████| 432/432 [01:17<00:00,  5.56it/s]
Testing: 100%|██████████| 48/48 [00:03<00:00, 15.86it/s]


[2025-04-19 17:21:13] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.745684}
[2025-04-19 17:21:13] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.745684}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 48/48 [00:03<00:00, 15.96it/s]


[2025-04-19 17:21:17] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.745684}
[2025-04-19 17:21:17] [OmniGenome 0.2.4alpha4]  {'root_mean_squared_error': 0.745684}

---------------------------------------------------- Raw Metric Records ----------------------------------------------------
╒═════════════════════════╤═════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                       │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪═════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ root_mean_squared_error │ RGB-RNA-mRNA-OmniGenome-52M │ [0.7457] │  0.7457   │  0.7457  │   0   │   0   │ 0.7457 │ 0.7457 │
╘═════════════════════════╧═════════════════════════════╧══════════╧═══════════╧══════════╧═══════╧═══════╧════════╧════════╛
-------------------------------------- https://github.com/yangheng95/met

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:21:19] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 8000/8000 [00:10<00:00, 786.58it/s]


[2025-04-19 17:21:29] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:21:30] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:21:30] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:21:30] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 6, 4, 4, 5, 4, 6, 7, 5, 6, 4, 7, 4, 4, 6, 4, 6, 6, 4, 4, 7, 6,
        4, 6, 4, 4, 4, 5, 7, 5, 4, 6, 4, 6, 7, 7, 5, 7, 6, 5, 7, 7, 4, 5, 4, 7,
        5, 5, 4, 6, 5, 4, 6, 6, 7, 6, 4, 5, 4, 5, 7, 5, 5, 5, 4, 5, 7, 7, 5, 4,
        7, 6, 5, 5, 6, 7, 5, 7, 5, 7, 7, 7, 6, 4, 7, 6, 5, 5, 4, 7, 7, 5, 4, 4,
        4, 6, 7, 5, 7, 4, 7, 6, 5, 4, 7, 5, 5, 6, 7, 6, 7, 6, 4, 7, 7, 6, 7, 5,
        4, 6, 5, 6, 4, 5, 4, 5, 5, 4, 7, 4, 6, 4, 6, 5, 7, 5, 5, 5, 6, 7, 7, 7,
        6, 4, 6, 6, 6, 4, 4, 7, 4, 4, 7, 5, 6, 4, 7, 7, 

100%|██████████| 1000/1000 [00:01<00:00, 773.57it/s]


[2025-04-19 17:21:31] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:21:32] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:21:32] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:21:32] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7, 7, 7, 7, 6, 7, 6, 4, 7, 7, 6, 4, 7, 5, 5, 4, 6, 5, 4, 4, 4, 5,
        4, 4, 6, 7, 7, 6, 7, 5, 7, 6, 5, 7, 4, 4, 7, 7, 4, 5, 4, 7, 6, 6, 7, 6,
        6, 6, 7, 4, 5, 4, 4, 4, 6, 4, 4, 7, 7, 6, 7, 4, 4, 6, 5, 7, 6, 4, 4, 6,
        7, 4, 7, 4, 5, 7, 7, 4, 7, 6, 7, 7, 5, 4, 7, 6, 6, 4, 6, 6, 6, 4, 7, 4,
        6, 5, 7, 4, 5, 4, 7, 7, 7, 6, 4, 6, 5, 5, 7, 7, 6, 7, 7, 4, 7, 6, 4, 5,
        6, 7, 6, 6, 5, 7, 4, 4, 6, 4, 4, 7, 4, 7, 6, 7, 6, 6, 6, 4, 4, 7, 7, 7,
        6, 5, 6, 4, 7, 7, 7, 5, 7, 5, 4, 7, 4, 4, 4, 5, 

100%|██████████| 1000/1000 [00:01<00:00, 790.03it/s]


[2025-04-19 17:21:33] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:21:33] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:21:33] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:21:33] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7, 6, 4, 7, 6, 4, 7, 6, 6, 7, 7, 4, 5, 7, 7, 5, 4, 4, 6, 4, 7, 5,
        4, 6, 6, 4, 6, 4, 6, 6, 4, 4, 5, 6, 4, 4, 5, 6, 4, 4, 7, 6, 7, 6, 6, 5,
        4, 7, 7, 6, 4, 4, 5, 4, 6, 4, 6, 7, 6, 7, 7, 6, 7, 4, 6, 5, 7, 6, 6, 7,
        7, 7, 4, 5, 5, 7, 7, 5, 4, 6, 4, 6, 4, 4, 6, 4, 4, 5, 6, 7, 4, 7, 7, 7,
        4, 4, 4, 6, 6, 7, 4, 7, 7, 4, 5, 5, 4, 5, 7, 7, 5, 4, 6, 4, 7, 6, 4, 7,
        5, 7, 7, 5, 7, 6, 6, 7, 7, 7, 5, 5, 7, 5, 4, 6, 7, 5, 7, 4, 4, 4, 5, 4,
        4, 6, 4, 5, 6, 4, 7, 6, 7, 6, 7, 5, 5, 5, 7, 4, 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 250/250 [00:15<00:00, 15.95it/s]


[2025-04-19 17:21:49] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.4817431275553723}
[2025-04-19 17:21:49] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.4817431275553723}


Epoch 1/1 Loss: 0.6202: 100%|██████████| 2000/2000 [05:43<00:00,  5.83it/s]
Evaluating: 100%|██████████| 250/250 [00:15<00:00, 16.34it/s]


[2025-04-19 17:27:48] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.5957607807439937}
[2025-04-19 17:27:48] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.5957607807439937}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 250/250 [00:15<00:00, 16.26it/s]


[2025-04-19 17:28:05] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.5970559803186319}
[2025-04-19 17:28:05] [OmniGenome 0.2.4alpha4]  {'roc_auc_score': 0.5970559803186319}

---------------------------------------------------- Raw Metric Records ----------------------------------------------------
╒═════════════════════════╤═════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                       │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪═════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ roc_auc_score           │ RGB-RNA-SNMD-OmniGenome-52M │ [0.5971] │  0.5971   │  0.5971  │   0   │   0   │ 0.5971 │ 0.5971 │
├─────────────────────────┼─────────────────────────────┼──────────┼───────────┼──────────┼───────┼───────┼────────┼────────┤
│ root_mean_squared_error │ RGB-RNA-mRNA-OmniGenome-52M │ [0.7457] │  0.

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:28:10] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 8000/8000 [00:10<00:00, 795.02it/s]


[2025-04-19 17:28:20] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:28:21] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:28:21] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:28:21] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 6, 4, 4, 5, 4, 6, 7, 5, 6, 4, 7, 4, 4, 6, 4, 6, 6, 4, 4, 7, 6,
        4, 6, 4, 4, 4, 5, 7, 5, 4, 6, 4, 6, 7, 7, 5, 7, 6, 5, 7, 7, 4, 5, 4, 7,
        5, 5, 4, 6, 5, 4, 6, 6, 7, 6, 4, 5, 4, 5, 7, 5, 5, 5, 4, 5, 7, 7, 5, 4,
        7, 6, 5, 5, 6, 7, 5, 7, 5, 7, 7, 7, 6, 4, 7, 6, 5, 5, 4, 7, 7, 5, 4, 4,
        4, 6, 7, 5, 7, 4, 7, 6, 5, 4, 7, 5, 5, 6, 7, 6, 7, 6, 4, 7, 7, 6, 7, 5,
        4, 6, 5, 6, 4, 5, 4, 5, 5, 4, 7, 4, 6, 4, 6, 5, 7, 5, 5, 5, 6, 7, 7, 7,
        6, 4, 6, 6, 6, 4, 4, 7, 4, 4, 7, 5, 6, 4, 7, 7, 

100%|██████████| 1000/1000 [00:01<00:00, 795.88it/s]


[2025-04-19 17:28:22] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:28:22] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:28:22] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:28:22] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7, 7, 7, 7, 6, 7, 6, 4, 7, 7, 6, 4, 7, 5, 5, 4, 6, 5, 4, 4, 4, 5,
        4, 4, 6, 7, 7, 6, 7, 5, 7, 6, 5, 7, 4, 4, 7, 7, 4, 5, 4, 7, 6, 6, 7, 6,
        6, 6, 7, 4, 5, 4, 4, 4, 6, 4, 4, 7, 7, 6, 7, 4, 4, 6, 5, 7, 6, 4, 4, 6,
        7, 4, 7, 4, 5, 7, 7, 4, 7, 6, 7, 7, 5, 4, 7, 6, 6, 4, 6, 6, 6, 4, 7, 4,
        6, 5, 7, 4, 5, 4, 7, 7, 7, 6, 4, 6, 5, 5, 7, 7, 6, 7, 7, 4, 7, 6, 4, 5,
        6, 7, 6, 6, 5, 7, 4, 4, 6, 4, 4, 7, 4, 7, 6, 7, 6, 6, 6, 4, 4, 7, 7, 7,
        6, 5, 6, 4, 7, 7, 7, 5, 7, 5, 4, 7, 4, 4, 4, 5, 

100%|██████████| 1000/1000 [00:01<00:00, 784.98it/s]


[2025-04-19 17:28:24] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=208, label_padding_length=208
[2025-04-19 17:28:24] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 202.0, 'max_seq_len': 202, 'min_seq_len': 202, 'avg_label_len': 208.0, 'max_label_len': 208, 'min_label_len': 208}
[2025-04-19 17:28:24] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:28:24] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7, 6, 4, 7, 6, 4, 7, 6, 6, 7, 7, 4, 5, 7, 7, 5, 4, 4, 6, 4, 7, 5,
        4, 6, 6, 4, 6, 4, 6, 6, 4, 4, 5, 6, 4, 4, 5, 6, 4, 4, 7, 6, 7, 6, 6, 5,
        4, 7, 7, 6, 4, 4, 5, 4, 6, 4, 6, 7, 6, 7, 7, 6, 7, 4, 6, 5, 7, 6, 6, 7,
        7, 7, 4, 5, 5, 7, 7, 5, 4, 6, 4, 6, 4, 4, 6, 4, 4, 5, 6, 7, 4, 7, 7, 7,
        4, 4, 4, 6, 6, 7, 4, 7, 7, 4, 5, 5, 4, 5, 7, 7, 5, 4, 6, 4, 7, 6, 4, 7,
        5, 7, 7, 5, 7, 6, 6, 7, 7, 7, 5, 5, 7, 5, 4, 6, 7, 5, 7, 4, 4, 4, 5, 4,
        4, 6, 4, 5, 6, 4, 7, 6, 7, 6, 7, 5, 5, 5, 7, 4, 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 250/250 [00:15<00:00, 16.21it/s]


[2025-04-19 17:28:40] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.2737850675253224, 'matthews_corrcoef': 0.06335346758418496}
[2025-04-19 17:28:40] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.2737850675253224, 'matthews_corrcoef': 0.06335346758418496}


Epoch 1/1 Loss: 1.2502: 100%|██████████| 2000/2000 [05:43<00:00,  5.82it/s]
Evaluating: 100%|██████████| 250/250 [00:15<00:00, 16.47it/s]


[2025-04-19 17:34:39] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.44408230644413554, 'matthews_corrcoef': 0.2776818514600447}
[2025-04-19 17:34:39] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.44408230644413554, 'matthews_corrcoef': 0.2776818514600447}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 250/250 [00:15<00:00, 16.49it/s]


[2025-04-19 17:34:55] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.4705120824131688, 'matthews_corrcoef': 0.3114356674710262}
[2025-04-19 17:34:55] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.4705120824131688, 'matthews_corrcoef': 0.3114356674710262}

---------------------------------------------------- Raw Metric Records ----------------------------------------------------
╒═════════════════════════╤═════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                       │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪═════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├─────────────────────────┼─────────────────────────────┼──────────┼───────────┼──────────┼───────┼───────┼────────┼────────┤


Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:35:00] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 608/608 [00:00<00:00, 1031.73it/s]


[2025-04-19 17:35:00] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=504
[2025-04-19 17:35:00] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 130.54276315789474, 'max_seq_len': 501, 'min_seq_len': 56, 'avg_label_len': 504.0, 'max_label_len': 504, 'min_label_len': 504}
[2025-04-19 17:35:00] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:00] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 6, 5, 5, 5, 5, 9, 4, 6, 5, 9, 5, 4, 6, 9, 5, 9, 6, 6, 9, 5, 4,
        6, 4, 6, 5, 6, 5, 9, 5, 6, 6, 5, 9, 9, 4, 9, 4, 4, 5, 5, 6, 6, 6, 9, 6,
        6, 9, 5, 4, 9, 6, 6, 6, 9, 9, 5, 6, 4, 4, 5, 5, 5, 5, 4, 9, 6, 6, 6, 6,
        5, 5, 5, 4, 5, 5, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

100%|██████████| 82/82 [00:00<00:00, 1047.57it/s]


[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=328, label_padding_length=328
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 131.23170731707316, 'max_seq_len': 321, 'min_seq_len': 67, 'avg_label_len': 328.0, 'max_label_len': 328, 'min_label_len': 328}
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 9, 6, 5, 6, 5, 6, 6, 9, 4, 6, 6, 4, 6, 4, 6, 9, 6, 6, 4, 4, 5, 9, 5,
        5, 6, 4, 5, 6, 6, 6, 5, 9, 5, 4, 9, 4, 4, 5, 5, 5, 6, 9, 4, 6, 6, 9, 5,
        5, 5, 4, 6, 6, 4, 9, 5, 6, 4, 4, 4, 5, 5, 9, 6, 6, 5, 5, 6, 5, 6, 5, 4,
        4, 5, 5, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

100%|██████████| 76/76 [00:00<00:00, 1109.12it/s]


[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=312, label_padding_length=312
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 117.39473684210526, 'max_seq_len': 308, 'min_seq_len': 60, 'avg_label_len': 312.0, 'max_label_len': 312, 'min_label_len': 312}
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:01] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 9, 9, 5, 5, 6, 6, 5, 6, 4, 9, 5, 4, 9, 4, 9, 5, 9, 9, 4, 4, 4, 6, 6,
        9, 9, 4, 9, 4, 5, 5, 9, 6, 9, 9, 5, 5, 5, 4, 9, 9, 5, 5, 6, 4, 4, 5, 4,
        5, 4, 6, 5, 4, 6, 9, 5, 4, 4, 6, 5, 9, 9, 9, 4, 4, 6, 4, 6, 5, 5, 6, 4,
        9, 6, 4, 9, 4, 6, 9, 6, 5, 5, 5, 4, 5, 5, 4, 6, 5, 6, 9, 6, 4, 4, 4, 6,
        9, 4, 6, 6, 9, 5, 9, 9, 6, 5, 5, 6, 6, 4, 9, 5, 2, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 19/19 [00:01<00:00, 15.49it/s]


[2025-04-19 17:35:02] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.4720883878727751, 'matthews_corrcoef': 0.23668201635681113}
[2025-04-19 17:35:02] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.4720883878727751, 'matthews_corrcoef': 0.23668201635681113}


Epoch 1/1 Loss: 0.7500: 100%|██████████| 152/152 [00:25<00:00,  5.85it/s]
Evaluating: 100%|██████████| 19/19 [00:01<00:00, 15.53it/s]


[2025-04-19 17:35:30] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.8876411792666348, 'matthews_corrcoef': 0.8244443180430253}
[2025-04-19 17:35:30] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.8876411792666348, 'matthews_corrcoef': 0.8244443180430253}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 21/21 [00:01<00:00, 15.45it/s]


[2025-04-19 17:35:32] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.8800914439102971, 'matthews_corrcoef': 0.813313796203534}
[2025-04-19 17:35:32] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.8800914439102971, 'matthews_corrcoef': 0.813313796203534}

-------------------------------------------------------- Raw Metric Records --------------------------------------------------------
╒═════════════════════════╤═════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                               │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪═════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M         │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├─────────────────────────┼─────────────────────────────────────┼──────────┼───────────┼

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:35:36] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 3104/3104 [00:02<00:00, 1321.78it/s]


[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=504
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 101.23228092783505, 'max_seq_len': 501, 'min_seq_len': 68, 'avg_label_len': 504.0, 'max_label_len': 504, 'min_label_len': 504}
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 5, 9, 9, 9, 5, 6, 6, 9, 5, 9, 9, 9, 4, 6, 9, 6, 5, 6, 4, 9, 6, 9, 6,
        6, 4, 4, 5, 5, 4, 5, 9, 4, 5, 9, 4, 9, 5, 5, 4, 9, 9, 5, 5, 6, 4, 4, 5,
        4, 6, 4, 9, 4, 4, 6, 9, 6, 4, 4, 4, 5, 4, 5, 4, 9, 5, 4, 6, 5, 6, 5, 9,
        6, 4, 5, 6, 4, 9, 4, 6, 9, 9, 6, 4, 5, 9, 5, 6, 5, 4, 4, 6, 6, 6, 9, 5,
        5, 6, 5, 6, 4, 4, 4, 4, 9, 4, 6, 6, 9, 5, 4, 4, 6, 6, 5, 5, 6, 6, 4, 6,
        6, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

100%|██████████| 389/389 [00:00<00:00, 1329.99it/s]


[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=392, label_padding_length=392
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 95.96401028277634, 'max_seq_len': 388, 'min_seq_len': 55, 'avg_label_len': 392.0, 'max_label_len': 392, 'min_label_len': 392}
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:39] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 9, 5, 5, 9, 6, 6, 9, 6, 6, 5, 9, 4, 9, 4, 6, 5, 6, 4, 6, 6, 6, 6, 6,
        4, 4, 4, 5, 6, 5, 5, 9, 6, 4, 9, 5, 5, 5, 4, 9, 9, 5, 5, 6, 4, 4, 5, 9,
        5, 4, 6, 4, 4, 6, 5, 9, 4, 4, 6, 5, 5, 5, 5, 9, 5, 5, 4, 5, 6, 5, 5, 6,
        4, 9, 6, 6, 9, 4, 5, 9, 6, 5, 6, 6, 9, 6, 9, 9, 9, 5, 6, 5, 5, 6, 9, 6,
        6, 6, 4, 6, 4, 6, 9, 4, 6, 6, 9, 5, 4, 5, 9, 6, 5, 5, 4, 6, 6, 4, 2, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

100%|██████████| 388/388 [00:00<00:00, 1228.98it/s]


[2025-04-19 17:35:40] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=480, label_padding_length=480
[2025-04-19 17:35:40] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 106.70618556701031, 'max_seq_len': 477, 'min_seq_len': 72, 'avg_label_len': 480.0, 'max_label_len': 480, 'min_label_len': 480}
[2025-04-19 17:35:40] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:35:40] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 6, 4, 6, 6, 5, 6, 9, 6, 9, 5, 5, 6, 4, 4, 5, 9, 6, 6, 5, 9, 4, 4,
        6, 6, 4, 6, 5, 5, 6, 6, 9, 5, 9, 9, 6, 4, 4, 4, 4, 5, 5, 6, 6, 9, 6, 6,
        6, 5, 5, 6, 6, 4, 4, 6, 6, 5, 5, 5, 6, 9, 6, 9, 6, 6, 6, 9, 9, 5, 6, 4,
        6, 9, 5, 5, 5, 4, 5, 5, 6, 5, 5, 9, 5, 5, 6, 5, 5, 4, 2, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 97/97 [00:06<00:00, 15.76it/s]


[2025-04-19 17:35:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.48003478513947645, 'matthews_corrcoef': 0.251113820008015}
[2025-04-19 17:35:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.48003478513947645, 'matthews_corrcoef': 0.251113820008015}


Epoch 1/1 Loss: 0.6091: 100%|██████████| 776/776 [02:13<00:00,  5.82it/s]
Evaluating: 100%|██████████| 97/97 [00:06<00:00, 16.06it/s]


[2025-04-19 17:38:06] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9373069106292483, 'matthews_corrcoef': 0.9028343575144284}
[2025-04-19 17:38:06] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9373069106292483, 'matthews_corrcoef': 0.9028343575144284}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 98/98 [00:06<00:00, 16.21it/s]


[2025-04-19 17:38:13] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9728583458467837, 'matthews_corrcoef': 0.9579414610773724}
[2025-04-19 17:38:13] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9728583458467837, 'matthews_corrcoef': 0.9579414610773724}

---------------------------------------------------------- Raw Metric Records ----------------------------------------------------------
╒═════════════════════════╤════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                  │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M            │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├─────────────────────────┼────────────────────────────────────────┼──

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:38:15] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

 48%|████▊     | 4461/9232 [00:04<00:04, 1059.82it/s]



 76%|███████▌  | 6973/9232 [00:06<00:02, 1109.87it/s]



100%|██████████| 9232/9232 [00:08<00:00, 1103.79it/s]


[2025-04-19 17:38:23] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 129.8156412478336, 'max_seq_len': 1024, 'min_seq_len': 12, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 9, 5,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 9,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 17:38:25] [OmniGenome 0.2.4alpha

100%|██████████| 1161/1161 [00:01<00:00, 1081.56it/s]


[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 130.96554694229113, 'max_seq_len': 1024, 'min_seq_len': 14, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 5,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 6,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:26] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 17:38:26] [OmniGenome 0.2.4alph

100%|██████████| 1154/1154 [00:01<00:00, 1107.81it/s]


[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 128.01473136915078, 'max_seq_len': 1024, 'min_seq_len': 25, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 5, 5,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 9,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    2,    2,  ..., -100, -100, -100])}
[2025-04-19 17:38:27] [OmniGenome 0.2.4alpha4]  Using Trainer: <class 'omnigenome.src.trainer.accelerate_trainer.AccelerateTrainer'>


  self.scaler = GradScaler()
Evaluating: 100%|██████████| 289/289 [00:18<00:00, 15.94it/s]


[2025-04-19 17:38:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.41489616879302665, 'matthews_corrcoef': 0.18007090245575402}
[2025-04-19 17:38:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.41489616879302665, 'matthews_corrcoef': 0.18007090245575402}


Epoch 1/1 Loss: 0.7926: 100%|██████████| 2308/2308 [06:54<00:00,  5.57it/s]
Evaluating: 100%|██████████| 289/289 [00:18<00:00, 16.03it/s]


[2025-04-19 17:45:59] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7506652553022602, 'matthews_corrcoef': 0.6064882861889552}
[2025-04-19 17:45:59] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7506652553022602, 'matthews_corrcoef': 0.6064882861889552}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 291/291 [00:18<00:00, 16.04it/s]


[2025-04-19 17:46:18] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7502921556229177, 'matthews_corrcoef': 0.6064033194534778}
[2025-04-19 17:46:18] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7502921556229177, 'matthews_corrcoef': 0.6064033194534778}

---------------------------------------------------------- Raw Metric Records ----------------------------------------------------------
╒═════════════════════════╤════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                  │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M            │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├─────────────────────────┼────────────────────────────────────────┼──

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:46:23] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForSequenceClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForSequenceClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForSequenceClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMask

100%|██████████| 3399/3399 [00:08<00:00, 381.44it/s]


[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  0         		1539      		45.28%
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  1         		1860      		54.72%
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  Total samples: 3399
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 501.8608414239482, 'max_seq_len': 502, 'min_seq_len': 29, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:46:32] [OmniGenome 0.2.4alpha4] 

100%|██████████| 426/426 [00:01<00:00, 386.71it/s]


[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  0         		207       		48.59%
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  1         		219       		51.41%
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  Total samples: 426
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 502.0, 'max_seq_len': 502, 'min_seq_len': 502, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:46:34] [OmniGenome 0.2.4alpha4]  Preview of 

100%|██████████| 424/424 [00:01<00:00, 386.57it/s]


[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  0         		201       		47.41%
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  1         		223       		52.59%
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  Total samples: 424
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 502.0, 'max_seq_len': 502, 'min_seq_len': 502, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:46:35] [OmniGenome 0.2.4alpha4]  Preview of 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 106/106 [00:06<00:00, 16.22it/s]


[2025-04-19 17:46:42] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3216, 'matthews_corrcoef': 0.0}
[2025-04-19 17:46:42] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3216, 'matthews_corrcoef': 0.0}


Epoch 1/1 Loss: 0.6482: 100%|██████████| 850/850 [02:25<00:00,  5.85it/s]
Evaluating: 100%|██████████| 106/106 [00:06<00:00, 16.55it/s]


[2025-04-19 17:49:14] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.6495471014492753, 'matthews_corrcoef': 0.30772444027319046}
[2025-04-19 17:49:14] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.6495471014492753, 'matthews_corrcoef': 0.30772444027319046}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 107/107 [00:06<00:00, 16.52it/s]


[2025-04-19 17:49:22] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.6601828557403203, 'matthews_corrcoef': 0.3295066167896064}
[2025-04-19 17:49:22] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.6601828557403203, 'matthews_corrcoef': 0.3295066167896064}

--------------------------------------------------------------- Raw Metric Records ---------------------------------------------------------------
╒═════════════════════════╤══════════════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                            │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪══════════════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M                      │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├───────────────────

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:49:23] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForSequenceClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForSequenceClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForSequenceClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMask

100%|██████████| 4697/4697 [00:12<00:00, 388.44it/s]


[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  0         		2195      		46.73%
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  1         		2502      		53.27%
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:35] [OmniGenome 0.2.4alpha4]  Total samples: 4697
[2025-04-19 17:49:36] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:49:36] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 502.0, 'max_seq_len': 502, 'min_seq_len': 502, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:49:36] [OmniGenome 0.2.4alpha4]  Preview of

100%|██████████| 588/588 [00:01<00:00, 372.04it/s]


[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  0         		258       		43.88%
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  1         		330       		56.12%
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  Total samples: 588
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 502.0, 'max_seq_len': 502, 'min_seq_len': 502, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:49:38] [OmniGenome 0.2.4alpha4]  Preview of 

100%|██████████| 587/587 [00:01<00:00, 381.31it/s]


[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  
Label Distribution:
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  Label     		Count     		Percentage
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  0         		259       		44.12%
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  1         		328       		55.88%
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  ----------------------------------------
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  Total samples: 587
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=504, label_padding_length=0
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 502.0, 'max_seq_len': 502, 'min_seq_len': 502, 'avg_label_len': 1.0, 'max_label_len': 1, 'min_label_len': 1}
[2025-04-19 17:49:39] [OmniGenome 0.2.4alpha4]  Preview of 

  self.scaler = GradScaler()
Evaluating: 100%|██████████| 147/147 [00:09<00:00, 15.78it/s]


[2025-04-19 17:49:49] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3061465721040189, 'matthews_corrcoef': 0.0}
[2025-04-19 17:49:49] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3061465721040189, 'matthews_corrcoef': 0.0}


Epoch 1/1 Loss: 0.6317: 100%|██████████| 1175/1175 [03:21<00:00,  5.84it/s]
Evaluating: 100%|██████████| 147/147 [00:08<00:00, 16.49it/s]


[2025-04-19 17:53:20] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7186528255126137, 'matthews_corrcoef': 0.4389950704998246}
[2025-04-19 17:53:20] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7186528255126137, 'matthews_corrcoef': 0.4389950704998246}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 147/147 [00:08<00:00, 16.40it/s]


[2025-04-19 17:53:29] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7427322170034104, 'matthews_corrcoef': 0.49504530737460883}
[2025-04-19 17:53:29] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.7427322170034104, 'matthews_corrcoef': 0.49504530737460883}

--------------------------------------------------------------- Raw Metric Records ---------------------------------------------------------------
╒═════════════════════════╤══════════════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                            │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪══════════════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M                      │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 0.4705 │ 0.4705 │
├─────────────────

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 17:53:35] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 17860/17860 [02:23<00:00, 124.63it/s]


[2025-04-19 17:55:59] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:56:00] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 968.9286674132139, 'max_seq_len': 1024, 'min_seq_len': 241, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:56:00] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:56:00] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7,  ..., 7, 4, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 17:56:00] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7,  ..., 4, 4, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 17:56:00] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 17:56:00] [OmniGenome 0.2.4alph

100%|██████████| 2232/2232 [00:17<00:00, 126.01it/s]


[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 963.9560931899641, 'max_seq_len': 1024, 'min_seq_len': 220, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 7,  ..., 7, 6, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 5, 7,  ..., 5, 5, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    2,    2, -100])}
[2025-04-19 17:56:18] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 17:56:18] [OmniGenome 0.2.4alph

100%|██████████| 2233/2233 [00:17<00:00, 125.20it/s]


[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 968.8920734437976, 'max_seq_len': 1024, 'min_seq_len': 307, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 4,  ..., 6, 4, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 4,  ..., 1, 1, 1]), 'attention_mask': tensor([1, 1, 1,  ..., 0, 0, 0]), 'labels': tensor([-100,    0,    0,  ..., -100, -100, -100])}
[2025-04-19 17:56:36] [OmniGenome 0.2.4alpha4]  Using Trainer: <class 'omnigenome.src.trainer.accelerate_trainer.AccelerateTrainer'>


  self.scaler = GradScaler()
Evaluating: 100%|██████████| 559/559 [00:34<00:00, 16.02it/s]


[2025-04-19 17:57:13] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3125039822115352, 'matthews_corrcoef': 0.0028170192809846384}
[2025-04-19 17:57:13] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.3125039822115352, 'matthews_corrcoef': 0.0028170192809846384}


Epoch 1/1 Loss: 0.5833: 100%|██████████| 4465/4465 [13:19<00:00,  5.58it/s]
Evaluating: 100%|██████████| 559/559 [00:34<00:00, 16.20it/s]


[2025-04-19 18:11:10] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9573647225361115, 'matthews_corrcoef': 0.9478401072462141}
[2025-04-19 18:11:10] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9573647225361115, 'matthews_corrcoef': 0.9478401072462141}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 558/558 [00:34<00:00, 16.14it/s]


[2025-04-19 18:11:47] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9601409160538914, 'matthews_corrcoef': 0.9498126050346627}
[2025-04-19 18:11:47] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9601409160538914, 'matthews_corrcoef': 0.9498126050346627}

------------------------------------------------------------------- Raw Metric Records -------------------------------------------------------------------
╒═════════════════════════╤══════════════════════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                                    │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪══════════════════════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M                              │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   

Some weights of OmniGenomeModel were not initialized from the model checkpoint at anonymous8/OmniGenome-52M and are newly initialized: ['OmniGenome.pooler.dense.bias', 'OmniGenome.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[2025-04-19 18:11:52] [OmniGenome 0.2.4alpha4]  Model Name: OmniGenomeModelForTokenClassification
Model Metadata: {'library_name': 'OmniGenome', 'omnigenome_version': '0.2.4alpha4', 'torch_version': '2.5.1+cu12.4+gita8d6afb511a69687bbb2b7e88a3cf67917e1697e', 'transformers_version': '4.49.0', 'model_cls': 'OmniGenomeModelForTokenClassification', 'tokenizer_cls': 'EsmTokenizer', 'model_name': 'OmniGenomeModelForTokenClassification'}
Base Model Name: anonymous8/OmniGenome-52M
Model Type: omnigenome
Model Architecture: None
Model Parameters: 52.453345 M
Model Config: OmniGenomeConfig {
  "OmniGenomefold_config": null,
  "_name_or_path": "anonymous8/OmniGenome-52M",
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "anonymous8/OmniGenome-52M--configuration_omnigenome.OmniGenomeConfig",
    "AutoModel": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeModel",
    "AutoModelForMaskedLM": "anonymous8/OmniGenome-52M--modeling_omnigenome.OmniGenomeForMaskedLM",
  

100%|██████████| 21988/21988 [03:22<00:00, 108.65it/s]


[2025-04-19 18:15:15] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 18:15:16] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 998.5409314171367, 'max_seq_len': 1024, 'min_seq_len': 183, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 18:15:16] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 18:15:16] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 7,  ..., 7, 7, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 18:15:16] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 7,  ..., 4, 7, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 18:15:16] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 18:15:16] [OmniGenome 0.2.4alph

100%|██████████| 2749/2749 [00:25<00:00, 109.44it/s]


[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 998.8988723172063, 'max_seq_len': 1024, 'min_seq_len': 415, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 6, 5,  ..., 4, 4, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 7,  ..., 6, 4, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    2,    2, -100])}
[2025-04-19 18:15:42] [OmniGenome 0.2.4alpha4]  Detected max_length=1024 in the dataset, using it as the max_length.
[2025-04-19 18:15:42] [OmniGenome 0.2.4alph

100%|██████████| 2749/2749 [00:25<00:00, 107.16it/s]


[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  Max sequence length updated -> Reset max_length=1024, label_padding_length=1024
[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  {'avg_seq_len': 998.3728628592215, 'max_seq_len': 1024, 'min_seq_len': 335, 'avg_label_len': 1024.0, 'max_label_len': 1024, 'min_label_len': 1024}
[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  Preview of the first two samples in the dataset:
[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 7, 7,  ..., 6, 7, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  {'input_ids': tensor([0, 4, 6,  ..., 4, 6, 2]), 'attention_mask': tensor([1, 1, 1,  ..., 1, 1, 1]), 'labels': tensor([-100,    0,    0,  ...,    1,    1, -100])}
[2025-04-19 18:16:08] [OmniGenome 0.2.4alpha4]  Using Trainer: <class 'omnigenome.src.trainer.accelerate_trainer.AccelerateTrainer'>


  self.scaler = GradScaler()
Evaluating: 100%|██████████| 688/688 [00:42<00:00, 16.08it/s]


[2025-04-19 18:16:54] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.30871424625722743, 'matthews_corrcoef': 0.02311674133518035}
[2025-04-19 18:16:54] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.30871424625722743, 'matthews_corrcoef': 0.02311674133518035}


Epoch 1/1 Loss: 0.6063: 100%|██████████| 5497/5497 [16:22<00:00,  5.60it/s]
Evaluating: 100%|██████████| 688/688 [00:42<00:00, 16.16it/s]


[2025-04-19 18:34:01] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9361362634555904, 'matthews_corrcoef': 0.9142724052009094}
[2025-04-19 18:34:01] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9361362634555904, 'matthews_corrcoef': 0.9142724052009094}


  self.unwrap_model().load_state_dict(torch.load(self._model_state_dict_path))
Testing: 100%|██████████| 688/688 [00:42<00:00, 16.15it/s]


[2025-04-19 18:34:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9343617175084818, 'matthews_corrcoef': 0.918299054748215}
[2025-04-19 18:34:46] [OmniGenome 0.2.4alpha4]  {'f1_score': 0.9343617175084818, 'matthews_corrcoef': 0.918299054748215}

------------------------------------------------------------------- Raw Metric Records -------------------------------------------------------------------
╒═════════════════════════╤══════════════════════════════════════════════════════════╤══════════╤═══════════╤══════════╤═══════╤═══════╤════════╤════════╕
│ Metric                  │ Trial                                                    │ Values   │  Average  │  Median  │  Std  │  IQR  │  Min   │  Max   │
╞═════════════════════════╪══════════════════════════════════════════════════════════╪══════════╪═══════════╪══════════╪═══════╪═══════╪════════╪════════╡
│ f1_score                │ RGB-RNA-SNMR-OmniGenome-52M                              │ [0.4705] │  0.4705   │  0.4705  │   0   │   0   │ 

## 5. Benchmark Checkpointing
Whenever the benchmark is interrupted, the results will be saved and available for further execution.
You can also clear the checkpoint to start fresh:
```python
AutoBench(bench_root=root, model_name_or_path=model_name_or_path, device=device, overwrite=True).run()
```