Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Our code is based on BROS.
name | # params |
---|---|
utel-base-uncased | 110M |
utel-large-uncased | 340M |
We conducted the FUNSD EE experiment based on the FUNSD data preprocessed in LayoutLM. Original code can be found in this link. To run it, please follow the steps below:
- move to
preprocess/funsd/
. - run
bash preprocess.sh
. - run
preprocess_2nd.py
. This scripts converts the preprocessed data in LayoutLM to fit this repo.
Data will be created in datasets/funsd/
.
Run the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_bies.yaml