Skip to content

zuela-ai/ANGOFA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Welcome to the repository for ANGOFA, a project aimed at developing tailored pre-trained language models specifically fine-tuned for Angolan languages using a combination of OFA embedding initialization and synthetic data.

Overview:

In recent years, the development of pre-trained language models (PLMs) has showcased their capacity to transcend linguistic barriers and facilitate knowledge transfer across diverse languages. However, very-low resource languages like those spoken in Angola have been largely overlooked, creating a void in the multilingual landscape. ANGOFA addresses this gap by introducing four PLMs fine-tuned specifically for Angolan languages.

Key Features:

  • 🌍 Multilingual Adaptation: Tailored PLMs fine-tuned for Angolan languages using Multilingual Adaptive Fine-tuning (MAFT) approach
  • 📚 Enhanced Performance: Surveying the role of informed embedding initialization and synthetic data in improving PLM performance in downstream tasks.
  • ✨ Variants: Includes ANGXLMR and ANGOFA variants, each with distinct fine-tuning processes and configurations.

Hugging Face Models 🤗

Both ANGOFA and ANGXLM-R models are available on Hugging Face's model hub for easy access and experimentation: AngXLMR,AngOFA,AngXLMR-SYN, AngOFA-SYN.

Acknowlegment

This work was supported in part by Oracle Cloud credits and related resources provided by Oracle

Citation

If you find this work helpful, please consider citing our paper:

@misc{quinjica2024angofa,
      title={ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model},
      author={Osvaldo Luamba Quinjica and David Ifeoluwa Adelani},
      year={2024},
      eprint={2404.02534},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

This repository contains code and resources for the paper titled "ANGOFA: Leveraging OFA Embedding Factorization for Angolan Languages".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published