GitHub - zuela-ai/ANGOFA: This repository contains code and resources for the paper titled "ANGOFA: Leveraging OFA Embedding Factorization for Angolan Languages".

ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

🚀 Welcome to the repository for ANGOFA, a project aimed at developing tailored pre-trained language models specifically fine-tuned for Angolan languages using a combination of OFA embedding initialization and synthetic data.

Overview:

In recent years, the development of pre-trained language models (PLMs) has showcased their capacity to transcend linguistic barriers and facilitate knowledge transfer across diverse languages. However, very-low resource languages like those spoken in Angola have been largely overlooked, creating a void in the multilingual landscape. ANGOFA addresses this gap by introducing four PLMs fine-tuned specifically for Angolan languages.

Key Features:

🌍 Multilingual Adaptation: Tailored PLMs fine-tuned for Angolan languages using Multilingual Adaptive Fine-tuning (MAFT) approach
📚 Enhanced Performance: Surveying the role of informed embedding initialization and synthetic data in improving PLM performance in downstream tasks.
✨ Variants: Includes ANGXLMR and ANGOFA variants, each with distinct fine-tuning processes and configurations.

Hugging Face Models 🤗

Both ANGOFA and ANGXLM-R models are available on Hugging Face's model hub for easy access and experimentation: AngXLMR,AngOFA,AngXLMR-SYN, AngOFA-SYN.

Acknowlegment

This work was supported in part by Oracle Cloud credits and related resources provided by Oracle

Citation

If you find this work helpful, please consider citing our paper:

@misc{quinjica2024angofa,
      title={ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model},
      author={Osvaldo Luamba Quinjica and David Ifeoluwa Adelani},
      year={2024},
      eprint={2404.02534},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
continued-pretraining		continued-pretraining
embedding-initialization		embedding-initialization
evaluation		evaluation
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continued-pretraining

continued-pretraining

embedding-initialization

embedding-initialization

evaluation

evaluation

LICENSE

LICENSE

README.md

README.md

Repository files navigation

ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model

Overview:

Key Features:

Hugging Face Models 🤗

Acknowlegment

Citation

About

Releases

Packages

Languages

License

zuela-ai/ANGOFA

Folders and files

Latest commit

History

Repository files navigation

Overview:

Key Features:

Hugging Face Models 🤗

Acknowlegment

Citation

About

Resources

License

Stars

Watchers

Forks

Languages