🔮The Codex

Inference

The process of an AI model generating outputs from new inputs after training.

📖 Apprentice Explanation

Inference is when an AI actually does its job — when you ask ChatGPT a question and it generates an answer. Training is the learning phase; inference is the doing phase.

🧙 Archmage Notes

Inference optimization techniques include quantization (INT8, INT4), pruning, distillation, speculative decoding, and KV-cache optimization. Inference costs often dominate production AI deployments.

Related Enchantments

🔮

Token

🔮

Quantization