🔮The Codex
Token
The basic unit of text that AI models process — roughly 3/4 of a word.
📖 Apprentice Explanation
AI doesn't read words like we do. It breaks text into smaller pieces called tokens. One word might be one or two tokens. This matters because AI tools have limits on how many tokens they can handle.
🧙 Archmage Notes
Tokenization methods include BPE (Byte Pair Encoding), WordPiece, and SentencePiece. Token limits affect context windows, pricing, and model capabilities. GPT-4 uses ~100K vocabulary tokens.
