🔮The Codex
Inference
The process of an AI model generating outputs from new inputs after training.
📖 Apprentice Explanation
Inference is when an AI actually does its job — when you ask ChatGPT a question and it generates an answer. Training is the learning phase; inference is the doing phase.
🧙 Archmage Notes
Inference optimization techniques include quantization (INT8, INT4), pruning, distillation, speculative decoding, and KV-cache optimization. Inference costs often dominate production AI deployments.
