🔮The Codex

Mixture of Experts (MoE)

An AI architecture that uses specialized sub-networks to handle different types of tasks efficiently.

📖 Apprentice Explanation

Mixture of Experts is like having a team of specialists instead of one generalist. The AI routes each question to the expert best suited to answer it, making the system both smarter and more efficient.

🧙 Archmage Notes

MoE models use a gating network to route tokens to specialized expert sub-networks. Only a subset of experts activate per token, reducing compute. Mixtral, GPT-4 (rumored), and Switch Transformer use MoE. Challenges include load balancing and training stability.