## Training SAR on a Toy Dataset

We now demonstrate how to train a recognizer on a provided dataset in a Python interpreter. Another common practice is to train a model from CLI (command line interface), as illustrated [here](https://mmocr.readthedocs.io/en/dev-1.x/get_started/quick_run.html#training).

Since training a full academic dataset is time consuming (usually takes about several hours or even days), we will train on the toy dataset for the SAR text recognition model and visualize the predictions. Text detection and other downstream tasks such as KIE follow similar procedures.

Training a model usually consists of the following steps:
1. Convert the dataset into [formats supported by MMOCR](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/datasets.html). It should never be a concern if the dataset is obtained from Dataset Preparer. Otherwise, you will need to manually download and prepare the dataset following the [guide](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/recog.html), or even have to write a custom conversion script if your dataset is not on the list.
2. Modify the config for training. 
3. Train the model. 

In this example, we will use an off-the-shelf toy dataset to train SAR, and the first step will be skipped. The full demonstration of the first step can be found at the next section: Evaluating SAR on academic testsets.

### Visualize the Toy Dataset

We first get a sense of what the toy dataset looks like by visualizing one of the images and labels. The toy dataset consisits of ten images as well as annotation files in both json and lmdb format, and we only use json annotations in this tutorial.

In [None]:
# from mmengine.hub import get_config

# cfg = get_config('mmocr::textrecog/sar/sar_resnet31_parallel-decoder_5e_toy.py')
# cfg

  from torch.distributed.optim import \
  import pkg_resources
  _bootstrap._exec(spec, module)


Config (path: /home/bonting/micromamba/envs/bonting-id/lib/python3.11/site-packages/mmocr/.mim/configs/textrecog/sar/sar_resnet31_parallel-decoder_5e_toy.py): {'toy_data_root': 'tests/data/rec_toy_dataset/', 'toy_rec_train': {'type': 'OCRDataset', 'data_root': 'tests/data/rec_toy_dataset/', 'data_prefix': {'img_path': 'imgs/'}, 'ann_file': 'labels.json', 'pipeline': None, 'test_mode': False}, 'toy_rec_test': {'type': 'OCRDataset', 'data_root': 'tests/data/rec_toy_dataset/', 'data_prefix': {'img_path': 'imgs/'}, 'ann_file': 'labels.json', 'pipeline': None, 'test_mode': True}, 'default_scope': 'mmocr', 'env_cfg': {'cudnn_benchmark': False, 'mp_cfg': {'mp_start_method': 'fork', 'opencv_num_threads': 0}, 'dist_cfg': {'backend': 'nccl'}}, 'randomness': {'seed': None}, 'default_hooks': {'timer': {'type': 'IterTimerHook'}, 'logger': {'type': 'LoggerHook', 'interval': 1}, 'param_scheduler': {'type': 'ParamSchedulerHook'}, 'checkpoint': {'type': 'CheckpointHook', 'interval': 1}, 'sampler_seed':

In [None]:
from mmengine import Config

%cd ~/bonting-identification
cfg = Config.fromfile('mmocr_configs/CEGD-R_evaluation_textrecog.py')

/home/bonting/bonting-identification


  _bootstrap._exec(spec, module)


Config (path: mmocr_configs/cegd-r_evaluation_textrecog.py): {'mjsynth_textrecog_data_root': 'data/mjsynth', 'mjsynth_textrecog_train': {'type': 'OCRDataset', 'data_root': 'data/mjsynth', 'ann_file': 'textrecog_train.json', 'pipeline': None, '_scope_': 'mmocr'}, 'mjsynth_sub_textrecog_train': {'type': 'OCRDataset', 'data_root': 'data/mjsynth', 'ann_file': 'subset_textrecog_train.json', 'pipeline': None, '_scope_': 'mmocr'}, 'synthtext_textrecog_data_root': 'data/synthtext', 'synthtext_textrecog_train': {'type': 'OCRDataset', 'data_root': 'data/synthtext', 'ann_file': 'textrecog_train.json', 'pipeline': None, '_scope_': 'mmocr'}, 'synthtext_sub_textrecog_train': {'type': 'OCRDataset', 'data_root': 'data/synthtext', 'ann_file': 'subset_textrecog_train.json', 'pipeline': None, '_scope_': 'mmocr'}, 'synthtext_an_textrecog_train': {'type': 'OCRDataset', 'data_root': 'data/synthtext', 'ann_file': 'alphanumeric_textrecog_train.json', 'pipeline': None, '_scope_': 'mmocr'}, 'cute80_textrecog_da

## Evaluating SAR

This section provides guidance on how to evaluate a model using with pretrained weights in a Python interpreter. Apart from such a practice, another common practice is to test a model from CLI (command line interface), as illustrated [here](https://mmocr.readthedocs.io/en/dev-1.x/get_started/quick_run.html#testing).

Typically, the evaluation process involves several steps:

1. Convert the dataset into [formats supported by MMOCR](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/datasets.html). It should not be a concern if the dataset is obtained from [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html), which can download, extract and convert the dataset into a MMOCR-ready form with a single line of command. Otherwise, you will need to manually download and prepare the dataset following the [guide](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/det.html), or even have to write a custom conversion script if your dataset is not on the list.
2. Modify the config for testing. 
3. Test the model. 

Now we will demonstrate how to test a model on different datasets.


### Toy Dataset

With the checkpoint we obtained from the last section, we can evaluate it on the toy dataset again. Some more explanataions about the evaulation metrics are available [here](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/evaluation.html). 

In [None]:
from mmengine.runner import Runner
import time
from mmengine import Config

%cd ~/bonting-identification
cfg = Config.fromfile('mmocr_configs/CEGD-R_evaluation_textrecog.py')

# The location of pretrained weight
cfg['load_from'] = 'https://download.openmmlab.com/mmocr/textrecog/abinet/abinet-vision_20e_st-an_mj/abinet-vision_20e_st-an_mj_20220915_152445-85cfb03d.pth'

# Optionally, give visualizer a unique name to avoid dupliate instance being
# created in multiple runs
cfg.visualizer.name = f'{time.localtime()}'

runner = Runner.from_cfg(cfg)
runner.test()

/home/bonting/bonting-identification
07/10 13:41:58 - mmengine - [4m[97mINFO[0m - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.11.13 | packaged by conda-forge | (main, Jun  4 2025, 14:48:23) [GCC 13.3.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 1089348508
    GPU 0: NVIDIA GeForce RTX 3090
    CUDA_HOME: /opt/cuda
    NVCC: Cuda compilation tools, release 12.9, V12.9.86
    GCC: gcc (GCC) 15.1.1 20250425
    PyTorch: 2.4.1
    PyTorch compiling details: PyTorch built with:
  - GCC 11.4
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2.2-Product Build 20240823 for Intel(R) 64 architecture applications
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,c

  _bootstrap._exec(spec, module)


07/10 13:41:58 - mmengine - [4m[97mINFO[0m - Config:
auto_scale_lr = dict(base_batch_size=1536)
cute80_textrecog_data_root = 'data/cute80'
cute80_textrecog_test = dict(
    _scope_='mmocr',
    ann_file='textrecog_test.json',
    data_root='data/cute80',
    pipeline=None,
    test_mode=True,
    type='OCRDataset')
data_root = 'data/CEGD-R_MMOCR/'
dataset_type = 'OCRDataset'
default_hooks = dict(
    checkpoint=dict(_scope_='mmocr', interval=1, type='CheckpointHook'),
    logger=dict(_scope_='mmocr', interval=100, type='LoggerHook'),
    param_scheduler=dict(_scope_='mmocr', type='ParamSchedulerHook'),
    sampler_seed=dict(_scope_='mmocr', type='DistSamplerSeedHook'),
    sync_buffer=dict(_scope_='mmocr', type='SyncBuffersHook'),
    timer=dict(_scope_='mmocr', type='IterTimerHook'),
    visualization=dict(
        _scope_='mmocr',
        draw_gt=False,
        draw_pred=False,
        enable=False,
        interval=1,
        show=False,
        type='VisualizationHook'))
default



Loads checkpoint by local backend from path: ckpt/pretrained_mmocr/abinet-vision_20e_st-an_mj_20220915_152445-85cfb03d.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: data_preprocessor.mean, data_preprocessor.std

07/10 13:41:59 - mmengine - [4m[97mINFO[0m - Load checkpoint from ckpt/pretrained_mmocr/abinet-vision_20e_st-an_mj_20220915_152445-85cfb03d.pth


  checkpoint = torch.load(filename, map_location=map_location)


07/10 13:41:59 - mmengine - [4m[97mINFO[0m - Epoch(test) [100/641]    eta: 0:00:02  time: 0.0034  data_time: 0.0003  memory: 205  
07/10 13:42:00 - mmengine - [4m[97mINFO[0m - Epoch(test) [200/641]    eta: 0:00:01  time: 0.0035  data_time: 0.0003  memory: 205  
07/10 13:42:00 - mmengine - [4m[97mINFO[0m - Epoch(test) [300/641]    eta: 0:00:01  time: 0.0035  data_time: 0.0003  memory: 205  
07/10 13:42:01 - mmengine - [4m[97mINFO[0m - Epoch(test) [400/641]    eta: 0:00:00  time: 0.0034  data_time: 0.0003  memory: 205  
07/10 13:42:01 - mmengine - [4m[97mINFO[0m - Epoch(test) [500/641]    eta: 0:00:00  time: 0.0034  data_time: 0.0003  memory: 205  
07/10 13:42:01 - mmengine - [4m[97mINFO[0m - Epoch(test) [600/641]    eta: 0:00:00  time: 0.0034  data_time: 0.0003  memory: 205  
07/10 13:42:01 - mmengine - [4m[97mINFO[0m - Epoch(test) [641/641]    recog/word_acc: 0.0047  recog/word_acc_ignore_case: 0.0047  recog/word_acc_ignore_case_symbol: 0.0047  recog/char_recall: 0

{'recog/word_acc': 0.0047,
 'recog/word_acc_ignore_case': 0.0047,
 'recog/word_acc_ignore_case_symbol': 0.0047,
 'recog/char_recall': 0.0854,
 'recog/char_precision': 0.102}

It's also possible to evaluate with a stronger and more generalized pretrained weight, which were trained on larger datasets and achieved quite competitve acadmical performance, though it may not defeat the previous checkpoint overfitted to the toy dataset. ([readme](https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#sar))


### SVTP Dataset

SVTP dataset is one of the six commonly used academic test sets that systematically reflects a text recognizer's performance. Now we will evaluate SAR on this dataset, and we are going to use [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html) to get it prepared first.

In [None]:
!python tools/dataset_converters/prepare_dataset.py svtp --task textrecog

SVTP is now available in `data/svtp`, and the dataset config is available at `configs/textrecog/_base_/datasets/svtp.py`. Now we first point the `test_dataloader` to SVTP, then perform testing with the overfitted checkpoint. As this checkpoint is just overfitted to such a small dataset, it's not surprising that it performs well on the toy dataset and bad on SVTP.

In [None]:
from mmengine import Config

svtp_cfg = Config.fromfile('configs/textrecog/_base_/datasets/svtp.py')
svtp_cfg.svtp_textrecog_test.pipeline = cfg.test_pipeline
cfg.test_dataloader.dataset = svtp_cfg.svtp_textrecog_test

# The location of pretrained weight
cfg['load_from'] = 'work_dirs/sar_resnet31_parallel-decoder_5e_toy/epoch_100.pth'

# Optionally, give visualizer a unique name to avoid dupliate instance being
# created in multiple runs
cfg.visualizer.name = f'{time.localtime()}'

runner = Runner.from_cfg(cfg)
runner.test()

Let's evaluate the pretrained one for comparision.

In [None]:
# The location of pretrained weight
cfg['load_from'] = 'https://download.openmmlab.com/mmocr/textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real_20220915_171910-04eb4e75.pth'
cfg.visualizer.name = f'{time.localtime()}'
runner = Runner.from_cfg(cfg)
runner.test()