Prompt Injection: What It Is, How It Works, and How to Defend Against It
Prompt injection is an attack where crafted input makes a large language model follow an attacker's instructions instead of the developer's. It is ranked LLM01, the number one risk, in the OWASP Top 10 for LLM Applications. Because a model cannot reliably tell the difference between trusted instructions and untrusted data, the durable defense lives at the system and runtime layer, not in the prompt itself.

What is prompt injection?
Prompt injection is the manipulation of a large language model through input that the model treats as instructions. An attacker supplies text that overrides, contradicts, or extends the developer's original prompt, and the model acts on the attacker's version. OWASP lists it as LLM01 in its Top 10 for LLM Applications, which makes it the most widely recognized risk in the category.
The reason it works is structural. A language model receives the system prompt, the developer's instructions, and the user's input as one continuous stream of text. It has no built-in boundary that marks some of that text as authoritative and the rest as data. When attacker-controlled text is convincing enough, the model can follow it. This is why prompt injection is often compared to injection attacks in traditional software, though the mechanics differ: there is no parser to escape and no query to sanitize, only a model weighing competing instructions.
How prompt injection works: direct vs indirect
There are two forms, and the difference matters for defense. Direct prompt injection is when the user typing to the model is the attacker. They enter input designed to override the system prompt, for example asking the model to ignore prior instructions and reveal its configuration or perform an action outside its intended scope.
Indirect prompt injection is the more serious form for production systems. Here the malicious instructions are not typed by the user at all. They are placed in content the model will later read: a web page the agent browses, a document it summarizes, an email in an inbox it has access to, or a record retrieved from a vector database. When the model ingests that content, the hidden instructions become part of its context and can redirect its behavior. Indirect injection is dangerous precisely because the attacker never needs direct access to the application. They only need their content to reach the model.
In both cases the injection is the entry point, not the damage. The impact depends on what the model is connected to and what it is allowed to do.
Prompt injection vs jailbreaking
Prompt injection and jailbreaking are related but not the same. Jailbreaking targets the model's safety training: the goal is to make the model produce content it was trained to refuse. Prompt injection targets the application: the goal is to change what the model does inside a larger system, such as calling a tool, exfiltrating data it can access, or taking an action on the attacker's behalf.
A system can be compromised through prompt injection with no jailbreak at all. The model behaves as designed, but the system around it grants too much capability. Conflating the two leads teams to over-invest in alignment and model guardrails while leaving the real attack surface, the model's permissions and tools, unguarded.
Why prompt injection cannot be fully patched in the model
There is no model update that eliminates prompt injection, because the weakness is not a bug in a specific model. It is a property of how language models process text. As long as instructions and data share the same channel, a sufficiently capable model can be persuaded by data that looks like instructions. Filtering and fine-tuning raise the bar, but they do not close the gap, and attackers adapt.
The practical conclusion is not that AI applications are indefensible. It is that the model should be treated as an untrusted component. You do not try to make untrusted input safe by asking it nicely. You contain what it can do.
How to defend against prompt injection
Effective defense assumes the model will be influenced and limits the blast radius when it is. The controls that matter are system-level and runtime-level, not prompt-level:
- Least-privilege tool access. Give the model only the tools and scopes it needs. Most high-impact agent failures are authorization failures, not model failures.
- Validate tool arguments against policy. Check what the model asks a tool to do before the tool does it, rather than trusting the model's output.
- Sandbox execution and separate domains. Keep file access, network access, retrieval, and memory in separate, constrained domains so a foothold in one does not become control of all.
- Gate irreversible actions. Require explicit confirmation for actions that move money, change access, or delete data.
- Monitor at runtime. Watch the tool calls, arguments, side effects, and cross-system pivots the model actually performs, so an injected instruction that turns into a harmful action is detected when it executes, not guessed at from the text.
That last point is where runtime intelligence changes the picture. Prompt hygiene tries to predict bad input. Runtime visibility observes real behavior, which is what an injected instruction ultimately has to become to cause harm. Kodem's approach to securing the AI application stack treats the model as untrusted input and focuses on what executes, and its application detection and response capability is designed to detect the moment behavior crosses from normal into exploit.
Prompt injection and the broader agentic attack chain
Prompt injection is best understood as initial access in a longer chain. After the foothold, real damage requires tool access, excessive permissions, autonomous execution, and weak oversight. Defending the chain means containing capability at every step, not just hardening the prompt. We explore that argument in depth in Prompt Injection Was Never the Real Problem, and place it in the wider context of AI application security.
Frequently Asked Questions
- What is prompt injection? Prompt injection is an attack where crafted input makes a large language model follow an attacker's instructions instead of the developer's. It is ranked LLM01, the top risk, in the OWASP Top 10 for LLM Applications.
- What is the difference between prompt injection and jailbreaking? Jailbreaking targets the model's safety training to produce content it was trained to refuse. Prompt injection targets the application to change what the model does, such as calling a tool or exfiltrating data. A system can be compromised by prompt injection with no jailbreak at all.
- What is indirect prompt injection? Indirect prompt injection hides malicious instructions in content the model later reads, such as a web page, document, email, or a record retrieved from a vector database. The attacker never needs direct access to the application; they only need their content to reach the model's context.
- Can prompt injection be fully prevented? No model update eliminates it, because instructions and data share the same channel. You raise the bar with filtering and fine-tuning, but the durable defense is to treat the model as untrusted and contain what it can do through least privilege, sandboxing, and runtime monitoring.
- Is prompt injection in the OWASP Top 10? Yes. Prompt injection is LLM01, the number one entry in the OWASP Top 10 for LLM Applications.
Related blogs

What is Agentic AI Security?
Agentic AI security protects AI agents and the tools, memory, and systems they touch. The main risks, and how to contain them at the runtime layer.
2
A Primer on Runtime Intelligence
See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.
Platform Overview Video
Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.
The State of the Application Security Workflow
This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.
.avif)
Get real-time insights across the full stack…code, containers, OS, and memory
Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.

Stay up-to-date on Audit Nexus
A curated resource for the many updates to cybersecurity and AI risk regulations, frameworks, and standards.


