🔮The Codex

Prompt Injection

A security attack where malicious instructions are hidden in prompts to manipulate AI behavior.

📖 Apprentice Explanation

Prompt injection is when someone tricks an AI into ignoring its rules by hiding special instructions in their input. It's like social engineering for AI — manipulating the system to do something it shouldn't.

🧙 Archmage Notes

Prompt injection attacks include direct injection (user input), indirect injection (embedded in retrieved content), and jailbreaking. Defenses include input sanitization, output filtering, and architectural separation. Remains an unsolved challenge in LLM security.