Skip to content

thibaultdouzon/long-range-document-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

long-range-document-transformer

Models

Pre-trained models are available here. For data privacy reasons (models were pre-trained with MLM task on private data), classification heads are removed from the models but the encoder remains.

LayoutLM

LayoutLM1 was pre-trained in 3 flavours with a maximum sequence length of 518 tokens The flavours differentiate by the 2D relative attention bias applied to the input. These versions are referred to as SplitPage in the paper.

Linformer

Linformer2 was pre-trained with 2048 sequence length.

Cosformer

Cosformer3 was pre-trained in 3 flavours with a maximum sequence length of 2048 tokens. The 2D relative attention biases are similar to those used for layoutLM but not exactly identical.

Cosformer is not compatible with fp16 inference or training. More investigation is needed to evaluate its compatibility with bf16.

Example

Example comming soon :-)

Cite this work

@incollection{Douzon_2023,
    doi = {10.1007/978-3-031-41501-2_4},
    url = {https://doi.org/10.1007%2F978-3-031-41501-2_4},
    year = 2023,
    publisher = {Springer Nature Switzerland},
    pages = {47--64},
    author = {Thibault Douzon and Stefan Duffner and Christophe    Garcia and J{\'{e}}r{\'{e}}my Espinas},
    title = {Long-Range Transformer Architectures for~Document    Understanding},
    booktitle = {Document Analysis and Recognition {\textendash} {ICDAR} 2023 Workshops}
}

Footnotes

  1. https://arxiv.org/abs/1912.13318

  2. https://arxiv.org/abs/2006.04768

  3. https://arxiv.org/abs/2202.08791

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages