A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations (LIR)

The official implementations for the EMNLP 2021 paper A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations.

Ziyi Yang, Yinfei Yang, Daniel Cer, Eric Darve

Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment shows that LIR is highly effectively on this task, yielding almost 100% relative improvement in MAP for weak-alignment models. We then evaluate the LIR on Amazon Reviews and XEVAL dataset, with the observation that removing language information is able to improve the cross-lingual transfer performance.

Dataset

LIR tests on LAReQA dataset. The folder xquad-r is directly copied from LAReQA official repo. Dataset mlqa-r can be found here.

Cite LIR

If you find LIR useful for you research, please cite our paper:

@inproceedings{yang-etal-2021-simple,
    title = "A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations",
    author = "Yang, Ziyi  and
      Yang, Yinfei  and
      Cer, Daniel  and
      Darve, Eric",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.470",
    pages = "5825--5832",
}

Name	Name	Last commit message	Last commit date
Latest commit ziyi-yang Update bib Dec 14, 2021 239b784 · Dec 14, 2021 History 6 Commits
xquad-r	xquad-r	init commit	Sep 21, 2021
README.md	README.md	Update bib	Dec 14, 2021
lir.ipynb	lir.ipynb	init commit	Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations (LIR)

Dataset

Cite LIR

About

Releases

Packages

Languages

ziyi-yang/LIR

Folders and files

Latest commit

History

Repository files navigation

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations (LIR)

Dataset

Cite LIR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages