Skip to content
Core42 logo

JAIS 30b Chat

Playground
What are some famous places to visit in the UAE?
ما هي الأماكن الشهيرة التي يجب زيارتها في الإمارات؟

Model navigation navigation

Core42

JAIS 30b Chat from Core42 is an auto-regressive bi-lingual LLM for Arabic & English with state-of-the-art capabilities in Arabic.

Model Architecture

The model is based on transformer-based decoder-only (GPT-3) architecture and uses SwiGLU non-linearity. It uses LiBi position embeddings, enabling the model to extrapolate to long sequence lengths, providing improved context length handling. The tuned versions use supervised fine-tuning (SFT).

Training Datasets

Overview: The pretraining data for Jais-30b is a total of 1.63 T tokens consisting of English, Arabic, and code. Jais-30b-chat model is finetuned with both Arabic and English prompt-response pairs. We extended our finetuning datasets used for jais-13b-chat which included a wide range of instructional data across various domains. We cover a wide range of common tasks including question answering, code generation, and reasoning over textual content. To enhance performance in Arabic, we developed an in-house Arabic dataset as well as translating some open-source English instructions into Arabic.

Data Freshness: The pretraining data has a cutoff of December 2022, with some tuning data being more recent, up to October 2023.

About

JAIS 30b Chat is an auto-regressive bilingual LLM for Arabic & English with state-of-the-art capabilities in Arabic.
Context
8k input · 4k output
Training date
Dec 2022
Rate limit tier
Provider support

Languages

 (2)
English, and Arabic