Official Implementation for IEEE ICASSP 2026 paper "An Effective Data Augmentation Method by Asking Questions about Scene Text Images"
- Python: 3.9.19
- PyTorch (CUDA 11.8):
torch==2.1.0+cu118torchaudio==2.1.0+cu118torchvision==0.16.0+cu118
- Source: WordArt (ECCV 2022) — artistic text recognition (CornerTransformer, MMOCR).
- Download: Run the download script from the WordArt folder:
- TrOCR:
python TrOCR/WordArt/download_wordart.py - TrOCRAug:
python TrOCRAug/WordArt/download_wordart.py
- TrOCR:
- The script fetches the dataset from Google Drive and extracts it into the corresponding
WordArtdirectory (train/testimages and label files).
-
Source: RRC Challenge 10 – Historical Handwritten Marriage Records. More information can be found here
-
Setup: Use the Esposalles layout under
HandTrOCRandHandTrOCRAug:Training:
- Download all 3 parts of the training set from the RRC website.
- Create a folder named
train. - Put the contents of all 3 parts into this
trainfolder (so that each record folder lies directly undertrain/).
Test:
- Download the test set from the RRC website.
- Take the Records (record folders) and place them in a folder named
test. - Inside
test, create a subfolder namedgt. - Put the ground-truth XML files (also from the website) into
test/gt.
The resulting structure should look like:
HandTrOCR/Esposalles/ (or HandTrOCRAug/Esposalles/) ├── train/ │ ├── <record_folder_1>/ │ ├── <record_folder_2>/ │ └── ... └── test/ ├── gt/ │ ├── <gt_xml_1>.xml │ └── ... ├── <record_folder_1>/ ├── <record_folder_2>/ └── ...Repeat the same
Esposalleslayout for bothHandTrOCRandHandTrOCRAugif you use both.
Train from the project root (or adjust paths as needed):
- WordArt (TrOCR / TrOCRAug):
python TrOCR/train_WD.pyorpython TrOCR/train_Straug.py
python TrOCRAug/train_add.py - Esposalles (HandTrOCR / HandTrOCRAug):
python HandTrOCR/train_WD.pyorpython HandTrOCR/train_Straug.py
python HandTrOCRAug/train_add.py
Each folder contains its own train script; run the one that matches the dataset and variant you want.
- TrOCR / TrOCRAug: WordArt dataset, TrOCR-based training and augmentation.
- HandTrOCR / HandTrOCRAug: Esposalles dataset, handwritten TrOCR training and augmentation.