🔮The Codex
Mixture of Experts (MoE)
An AI architecture that uses specialized sub-networks to handle different types of tasks efficiently.
📖 Apprentice Explanation
Mixture of Experts is like having a team of specialists instead of one generalist. The AI routes each question to the expert best suited to answer it, making the system both smarter and more efficient.
🧙 Archmage Notes
MoE models use a gating network to route tokens to specialized expert sub-networks. Only a subset of experts activate per token, reducing compute. Mixtral, GPT-4 (rumored), and Switch Transformer use MoE. Challenges include load balancing and training stability.
