Prompt Injection Was Never the Real Problem

A review of “The Promptware Kill Chain”

Get the Playbook

Mahesh Babu

January 16, 2026

0 min read

Application Security

AI Security

Prompt Injection was Never the Real Problem

A review of “The Promptware Kill Chain”

Over the last two years, “prompt injection” has become the SQL injection of the LLM era: widely referenced, poorly defined, and often blamed for failures that have little to do with prompts themselves.

A recent arXiv paper, “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware,” tries to correct that by reframing prompt injection as just the initial access phase of a broader, multi-stage attack chain.

As a security researcher working on real production AppSec and AI systems, I think this paper is directionally right and operationally incomplete.

This post is a technical critique: what the paper gets right, where the analogy breaks down, and how defenders should actually think about agentic system compromise.

What the paper gets right

1. Prompt injection is not the vulnerability, it is the entry point

The most important contribution of the paper is conceptual: prompt injection is not “the bug.” It is merely how attackers get a foothold.

In real incidents, damage happens only after that foothold is combined with:

Tool access.
Excessive permissions.
Autonomous execution.
Insufficient oversight.

Reframing the problem as a campaign rather than a single exploit is correct and long overdue.

Security teams that still ask “how do we stop prompt injection?” are asking the wrong question. The right question is:

What can an attacker do after the model is influenced?

2. Mapping agent abuse to a kill chain is useful, up to a point

The paper’s five stages—Initial Access, Privilege Escalation, Persistence, Lateral Movement, Actions on Objective—are familiar to defenders and help translate AI risk into existing security mental models.

That translation matters. It allows teams to reuse incident response thinking, logging strategies, and control placement rather than inventing “AI security” from scratch.

This is a net positive.

3. The real risk multipliers are autonomy and permissions

Where the paper is strongest is its emphasis on system design, not model cleverness.

Agents become dangerous when:

Tools are over-privileged.
Actions execute without confirmation.
Memory is writable and reused.
Multiple systems are stitched together implicitly.

This aligns with what we see in production environments: most real-world failures are authorization and orchestration failures, not model failures.

Where the framing starts to break

1. “Jailbreaking = privilege escalation” is an oversimplification

The paper treats a model’s willingness to comply as the privilege boundary. That’s appealing—but inaccurate.

In real systems, the true privilege boundary is not the model. It is:

Tool authorization scopes.
Policy enforcement in the orchestrator.
Execution sandboxing.
Human-in-the-loop gates.

Many high-impact agent failures require no jailbreak at all. The model behaves “as designed”; the system around it does not.

Over-centering jailbreaks risks pushing teams to harden alignment while leaving the actual attack surface—capabilities and permissions—wide open.

2. “No architectural boundary exists” is rhetorically strong—and practically wrong

The paper suggests that because LLMs cannot inherently distinguish instructions from data, no patch can fix the problem.

That’s true inside the model. It is not true at the system level.

In practice, we can and do enforce boundaries by:

Treating the model as an untrusted component.
Limiting tool capabilities via least privilege.
Validating tool arguments against policy.
Sandboxing file and network access.
Gating irreversible actions with explicit confirmation.
Separating memory, retrieval, and execution domains.

The correct conclusion is not “this is unpatchable.”
It is: assume initial access, design for containment.

3. “Promptware” as malware is evocative—but imprecise

Calling language-based payloads “malware” usefully signals intent and campaign thinking. But it also imports assumptions that do not always hold:

There is no executable binary.
Persistence is often probabilistic.
Propagation depends on workflows, not replication.
Impact is mediated by permissions, not code execution.

What we are really seeing is a new class of agentic application exploits, where language is the payload and orchestration is the execution substrate.

That distinction matters for defense.

Where the paper needs more rigor

1. The framework is not yet reproducible

This is a conceptual synthesis, not a validated analytic framework.

The paper does not answer:

Would two analysts classify the same incident the same way?
Where do stages collapse or blur in practice?
What telemetry corresponds to each stage?

Without decision rules and observables, kill chains risk becoming storytelling devices rather than operational tools.

2. Key phases defenders care about are missing

By starting at Initial Access, the paper sidelines:

Reconnaissance and targeting (why this RAG corpus?).
Weaponization (how payloads are tuned for retrieval and model family).
Execution mechanics (orchestrator bugs, argument injection, plugin misuse).

Ironically, those are where many preventive controls live.

3. Evidence quality is uneven

Some cited “incidents” come from media or blog posts rather than primary reports or reproducible research.

For a paper positioning itself as a security framework, evidence tiers should be explicit: peer-reviewed demo, preprint, vendor blog, anecdote.

A more useful mental model

If I were to compress this into guidance for defenders building or securing agentic systems:

Assume prompt compromise will happen.
Treat the model as hostile input, not a trusted authority.
Shift focus from model behavior to system capability.
The question is never “what did the model say?”
It is always “what was it allowed to do?”
Design explicit control points around tools, memory, and execution.
Capability boundaries matter more than prompt hygiene.
Instrument for intent, not text.
Log tool calls, arguments, side effects, and cross-system pivots.
Think in chains—but defend in layers.
Kill chains are useful for analysis; containment is what stops damage.

Bottom line

“The Promptware Kill Chain” is an important course correction. It pushes the community away from treating prompt injection as a novelty bug and toward understanding agent compromise as a system-level failure mode.

Where it falls short is in over-indexing on the malware analogy and under-specifying the architectural controls that actually prevent impact.

The future of AI security will not be won by better prompts or stricter alignment alone. It will be won by treating LLMs as untrusted components embedded in powerful systems and engineering those systems accordingly.

References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv. https://arxiv.org/abs/1606.06565
Khan, B., Schneier, B., & Brodt, O. (2026). The promptware kill chain: How prompt injections gradually evolved into a multi-step malware. arXiv. https://arxiv.org/abs/2601.09625 (corrected authors)
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, A., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., Isaac, W., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from language models. arXiv. https://arxiv.org/abs/2112.04359
MITRE. (2023). MITRE ATT&CK® framework. https://attack.mitre.org
Saltzer, J. H., & Schroeder, M. D. (1975). The protection of information in computer systems. Proceedings of the IEEE, 63(9), 1278–1308. https://doi.org/10.1109/PROC.1975.9939

Get the Playbook

Table of contents

Example H2

Example H3

Related blogs

A Guide to Securing AI Code Editors: Cursor, Claude Code, Gemini CLI, and OpenAI Codex

AI-powered code editors such as Cursor, Claude Code, Gemini CLI, and OpenAI Codex are rapidly becoming part of enterprise development environments.

From Discovery to Resolution: A Single Source of Truth for Vulnerability Statuses

Continuous visibility from first discovery to final resolution across code repositories and container images, showing who fixed each vulnerability, when it was resolved and how long closure took. Kodem turns issue statuses into ownership for engineers, progress tracking for leadership and defensible risk reduction for application security.

Security Issues in popular AI Runtimes - Node.js, Deno, and Bun

Node.js, Deno, and Bun are the primary runtimes for executing JavaScript and TypeScript in modern applications. They form the backbone of AI backends, serverless deployments, and orchestration layers. Each runtime introduces distinct application security issues. For product security teams, understanding these runtime weaknesses is essential because attacks often bypass framework-level defenses and exploit the runtime directly.

Stop the waste.
Protect your environment with Kodem.

Get a personalized demo

A Primer on Runtime Intelligence

See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.

5.1k

Applications covered

1.1m

False positives eliminated

4.8k

Triage hours reduced

Platform Overview Video

Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.