Skip to content
Microsoft logo

Phi-3.5-MoE instruct (128k)

Playground
What is the history of the Great Wall of China?
Can you explain the basics of machine learning?
Can you explain the concept of time dilation in physics?

Model navigation navigation

Microsoft

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Resources

🏡 Phi-3 Portal

📰 Phi-3 Microsoft Blog

📖 Phi-3 Technical Report

👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3.5-MoE has 16x3.8B parameters with 6.6B active parameters when using 2 experts. The model is a mixture-of-expert decoder-only Transformer model using the tokenizer with vocabulary size of 32,064.

Training Data

This is a static model trained on an offline dataset with 4.9T tokens and a cutoff date October 2023 for publicly available data. Future versions of the tuned models may be released as we improve models.

About

A new mixture of experts model
Context
131k input · 4k output
Training date
Aug 2024
Rate limit tier
Provider support

Languages

 (23)
English, Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hungarian, Italian