This is implementation of this paper
The Bengali language comprises numerous graphemes, which are the smallest functional units in a writing system. Detecting these graphemes is crucial for developing an OCR application.
OCR application is mostly used embeded devices. So we utilized a class of efficient models for mobile and embedded vision applications called MobileNet. Specifically, we used MobileNetV2. Since each grapheme contains three components, it is multilabel classification problem. As a results, we modified the softmax layer to facilitate our multilabel classification problem.
We used this dataset which is also available in Kaggle. After downloading change $PATH$
to the dataset directory. Then, run the following command sequentially to pre-proccess the data by getting inside the data directory.
python create_image_pickles.py
python create_folds.py
python create_chunk.py
Training the model requires to specify the TRAINING FOLDS
, VALIDATION FOLDS
. In addition,BATCH_SIZE, IMAGE_WIDTH, IMAGE_LENGTH, EPOCHS
can be also specified. Command for training:
python main.py --mode train --training_folds ($Num1$, $Num2$, $Num3$, $Num4$) --validation_folds ($Num4$,)
command for testing:
python main.py --mode test
If you find this codebase useful, please cite our paper:
@article{taif2024Grap,
title={Handwritten Grapheme Classification in Bengali Language Using MobileNet},
author={Taif Al Musabe},
journal={techRxiv preprint techrxiv.170422019.94163857},
year={2024}
}
We refer to tutorial from Abhishek Thakur Youtube Channel.
Our code is BSD-3 licensed. See LICENSE.txt for details.