JAIS 30b Chat
Model navigation navigation

JAIS 30b Chat from Core42 is an auto-regressive bi-lingual LLM for Arabic & English with state-of-the-art capabilities in Arabic.
The model is based on transformer-based decoder-only (GPT-3) architecture and uses SwiGLU non-linearity. It uses LiBi position embeddings, enabling the model to extrapolate to long sequence lengths, providing improved context length handling. The tuned versions use supervised fine-tuning (SFT).
Overview: The pretraining data for Jais-30b is a total of 1.63 T tokens consisting of English, Arabic, and code. Jais-30b-chat model is finetuned with both Arabic and English prompt-response pairs. We extended our finetuning datasets used for jais-13b-chat which included a wide range of instructional data across various domains. We cover a wide range of common tasks including question answering, code generation, and reasoning over textual content. To enhance performance in Arabic, we developed an in-house Arabic dataset as well as translating some open-source English instructions into Arabic.
Data Freshness: The pretraining data has a cutoff of December 2022, with some tuning data being more recent, up to October 2023.