Prompt Injection Was Never the Real Problem

A review of “The Promptware Kill Chain”

written by
Mahesh Babu
published on
January 16, 2026
topic
Application Security
AI Security

A review of “The Promptware Kill Chain

Over the last two years, “prompt injection” has become the SQL injection of the LLM era: widely referenced, poorly defined, and often blamed for failures that have little to do with prompts themselves.

A recent arXiv paper, The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware,” tries to correct that by reframing prompt injection as just the initial access phase of a broader, multi-stage attack chain.

As a security researcher working on real production AppSec and AI systems, I think this paper is directionally right and operationally incomplete.

This post is a technical critique: what the paper gets right, where the analogy breaks down, and how defenders should actually think about agentic system compromise.

What the paper gets right

1. Prompt injection is not the vulnerability, it is the entry point

The most important contribution of the paper is conceptual: prompt injection is not “the bug.” It is merely how attackers get a foothold.

In real incidents, damage happens only after that foothold is combined with:

  • tool access,
  • excessive permissions,
  • autonomous execution,
  • and insufficient oversight.

Reframing the problem as a campaign rather than a single exploit is correct—and long overdue.

Security teams that still ask “how do we stop prompt injection?” are asking the wrong question. The right question is:

What can an attacker do after the model is influenced?

2. Mapping agent abuse to a kill chain is useful, up to a point

The paper’s five stages—Initial Access, Privilege Escalation, Persistence, Lateral Movement, Actions on Objective—are familiar to defenders and help translate AI risk into existing security mental models.

That translation matters. It allows teams to reuse incident response thinking, logging strategies, and control placement rather than inventing “AI security” from scratch.

This is a net positive.

3. The real risk multipliers are autonomy and permissions

Where the paper is strongest is its emphasis on system design, not model cleverness.

Agents become dangerous when:

  • tools are over-privileged,
  • actions execute without confirmation,
  • memory is writable and reused,
  • and multiple systems are stitched together implicitly.

This aligns with what we see in production environments: most real-world failures are authorization and orchestration failures, not model failures.

Where the framing starts to break

1. “Jailbreaking = privilege escalation” is an oversimplification

The paper treats a model’s willingness to comply as the privilege boundary. That’s appealing—but inaccurate.

In real systems, the true privilege boundary is not the model. It is:

  • tool authorization scopes,
  • policy enforcement in the orchestrator,
  • execution sandboxing,
  • and human-in-the-loop gates.

Many high-impact agent failures require no jailbreak at all. The model behaves “as designed”; the system around it does not.

Over-centering jailbreaks risks pushing teams to harden alignment while leaving the actual attack surface—capabilities and permissions—wide open.

2. “No architectural boundary exists” is rhetorically strong—and practically wrong

The paper suggests that because LLMs cannot inherently distinguish instructions from data, no patch can fix the problem.

That’s true inside the model. It is not true at the system level.

In practice, we can and do enforce boundaries by:

  • treating the model as an untrusted component,
  • limiting tool capabilities via least privilege,
  • validating tool arguments against policy,
  • sandboxing file and network access,
  • gating irreversible actions with explicit confirmation,
  • and separating memory, retrieval, and execution domains.

The correct conclusion is not “this is unpatchable.”
It is: assume initial access, design for containment.

3. “Promptware” as malware is evocative—but imprecise

Calling language-based payloads “malware” usefully signals intent and campaign thinking. But it also imports assumptions that do not always hold:

  • There is no executable binary.
  • Persistence is often probabilistic.
  • Propagation depends on workflows, not replication.
  • Impact is mediated by permissions, not code execution.

What we are really seeing is a new class of agentic application exploits, where language is the payload and orchestration is the execution substrate.

That distinction matters for defense.

Where the paper needs more rigor

1. The framework is not yet reproducible

This is a conceptual synthesis, not a validated analytic framework.

The paper does not answer:

  • Would two analysts classify the same incident the same way?
  • Where do stages collapse or blur in practice?
  • What telemetry corresponds to each stage?

Without decision rules and observables, kill chains risk becoming storytelling devices rather than operational tools.

2. Key phases defenders care about are missing

By starting at Initial Access, the paper sidelines:

  • reconnaissance and targeting (why this RAG corpus?),
  • weaponization (how payloads are tuned for retrieval and model family),
  • and execution mechanics (orchestrator bugs, argument injection, plugin misuse).

Ironically, those are where many preventive controls live.

3. Evidence quality is uneven

Some cited “incidents” come from media or blog posts rather than primary reports or reproducible research.

For a paper positioning itself as a security framework, evidence tiers should be explicit: peer-reviewed demo, preprint, vendor blog, anecdote.

A more useful mental model

If I were to compress this into guidance for defenders building or securing agentic systems:

  1. Assume prompt compromise will happen.
    Treat the model as hostile input, not a trusted authority.
  2. Shift focus from model behavior to system capability.
    The question is never “what did the model say?”
    It is always “what was it allowed to do?”
  3. Design explicit control points around tools, memory, and execution.
    Capability boundaries matter more than prompt hygiene.
  4. Instrument for intent, not text.
    Log tool calls, arguments, side effects, and cross-system pivots.
  5. Think in chains—but defend in layers.
    Kill chains are useful for analysis; containment is what stops damage.

Bottom line

“The Promptware Kill Chain” is an important course correction. It pushes the community away from treating prompt injection as a novelty bug and toward understanding agent compromise as a system-level failure mode.

Where it falls short is in over-indexing on the malware analogy and under-specifying the architectural controls that actually prevent impact.

The future of AI security will not be won by better prompts or stricter alignment alone. It will be won by treating LLMs as untrusted components embedded in powerful systems and engineering those systems accordingly.

References

  1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv. https://arxiv.org/abs/1606.06565
  2. Khan, B., Schneier, B., & Brodt, O. (2026). The promptware kill chain: How prompt injections gradually evolved into a multi-step malware. arXiv. https://arxiv.org/abs/2601.09625 (corrected authors)
  3. Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, A., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., Isaac, W., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from language models. arXiv. https://arxiv.org/abs/2112.04359
  4. MITRE. (2023). MITRE ATT&CK® framework. https://attack.mitre.org
  5. Saltzer, J. H., & Schroeder, M. D. (1975). The protection of information in computer systems. Proceedings of the IEEE, 63(9), 1278–1308. https://doi.org/10.1109/PROC.1975.9939

Blog written by

Mahesh Babu

Head of Marketing

More blogs

View all

From SBOM Inventory to Package Intelligence

How Kodem turns SBOM packages into the control plane for investigation, governance and remediation

January 14, 2026

CVE-2026-21858: Ni8mare: Unauthenticated Remote Code Execution in n8n

An unauthenticated Remote Code Execution (RCE) flaw, tracked as CVE-2026-21858 (CVSS 10.0), has been discovered in n8n, the widely-adopted workflow automation platform. With over 100 million Docker pulls and an estimated 100,000 locally deployed instances, this vulnerability transforms n8n from a productivity tool into a severe single point of potential failure for organizations globally.

January 8, 2026

Guess Who's Back: Shai-Hulud 3.0 The Golden Path

Security analysts recently identified a new variant of the Shai-Hulud npm supply chain worm in the public registry, signaling continued evolution of this threat family. This variant, dubbed “The Golden Path” exhibits modifications from prior waves of the malware, suggesting ongoing evolution in the threat actor’s tradecraft.

December 29, 2025

A Primer on Runtime Intelligence

See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

Platform Overview Video

Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

The State of the Application Security Workflow

This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.

Get real-time insights across the full stack…code, containers, OS, and memory

Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.

Stay up-to-date on Audit Nexus

A curated resource for the many updates to cybersecurity and AI risk regulations, frameworks, and standards.