🔮The Codex

Token

The basic unit of text that AI models process — roughly 3/4 of a word.

📖 Apprentice Explanation

AI doesn't read words like we do. It breaks text into smaller pieces called tokens. One word might be one or two tokens. This matters because AI tools have limits on how many tokens they can handle.

🧙 Archmage Notes

Tokenization methods include BPE (Byte Pair Encoding), WordPiece, and SentencePiece. Token limits affect context windows, pricing, and model capabilities. GPT-4 uses ~100K vocabulary tokens.