
A critical pre-authenticated remote code execution (RCE) vulnerability, tracked as CVE-2026-22778 (CVSS 9.8), has been discovered in vLLM, a widely used inference and serving engine for large language models.
Publicly exposed vLLM deployments running video models are vulnerable to full server compromise. An attacker can trigger the flaw by submitting a malicious video link to vLLM’s API, resulting in arbitrary command execution on the underlying system without authentication.
The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and stems from a chained exploit that combines an information disclosure flaw in error handling with a heap buffer overflow in a bundled video decoding dependency. When exploited together, these weaknesses allow attackers to bypass memory protections and gain code execution within the vLLM process.
While the issue is limited to deployments that enable multimodal video processing, the default exposure model of vLLM makes this vulnerability particularly dangerous for internet-facing inference services.
Affected Versions
Immediate Actions
- Upgrade to vLLM v0.14.1 or later.
- Verify OpenCV is updated to a patched version.
- Reduce exposure of inference APIs by blocking multimodal requests containing a
video_urlparameter to the following endpoints:POST /v1/chat/completionsPOST /v1/invocations
If you Can’t Patch Immediately
- Disable video and multimodal endpoints.
- Restrict access to inference APIs to trusted internal users or services.
What is vLLM?
vLLM is an open-source, high-throughput inference engine designed to efficiently serve large language models across cloud and self-hosted environments. It is widely adopted for running LLMs at scale due to its performance and memory efficiency, particularly under concurrent workloads.
As vLLM adoption has expanded beyond text-only inference to support multimodal inputs such as images and video, it has become increasingly exposed through public-facing APIs. This expanded attack surface is central to CVE-2026-22778, which affects deployments that enable video processing.
By chaining an information disclosure flaw with a heap buffer overflow in a bundled video decoding dependency, an unauthenticated attacker can achieve arbitrary code execution on vulnerable vLLM systems.
Technical Details
CVE-2026-22778 is not a single bug, but a chained exploit that combines an information disclosure issue with a heap-based buffer overflow in vLLM’s video processing pipeline. The exploit unfolds across multiple layers of the inference stack as follows:
- API request handling: An attacker sends a request to vLLM’s Completions or Invocations API containing a
video_url, triggering the multimodal video processing path without requiring authentication. - Video ingestion via OpenCV: vLLM processes the supplied video using OpenCV’s
cv2.VideoCapture()interface, which delegates decoding to a bundled FFmpeg library. - JPEG2000 decoding in FFmpeg: FFmpeg invokes the JPEG2000 decoder
libopenjp2to parse video frames, trusting attacker-controlled metadata embedded in the file structure. - Heap buffer overflow through crafted
cdefbox: A malicious JPEG2000 file abuses the channel definitioncdefbox to remap image channels without validating buffer sizes, causing a heap-based buffer overflow during decoding. - Function pointer corruption: The overflow overwrites adjacent heap memory, including a function pointer used by the decoder or cleanup routines.
- Arbitrary code execution: When the corrupted function pointer is later dereferenced, execution flow is redirected, resulting in arbitrary code execution within the vLLM process.
When chained together, these flaws allow reliable, unauthenticated remote code execution on vLLM deployments that enable video processing.
Information Disclosure through Error Handling
When vLLM receives malformed image or video input, it relies on Python imaging libraries to parse the data. In vulnerable versions, error messages generated during this process are returned directly to the client.
These error messages can include raw object representations containing heap memory addresses, for example:
cannot identify image file <_io.BytesIO object at 0x7a95e299e750>This leaks precise memory addresses from the vLLM process. Address Space Layout Randomization (ASLR) is a key defense against memory corruption exploits. By leaking heap addresses, vLLM effectively hands attackers the information needed to bypass ASLR and precisely target memory locations during exploitation.
Heap Buffer Overflow in Video Decoding
The second flaw resides in vLLM’s video processing pipeline. When handling video inputs, vLLM uses OpenCV, which in turn relies on FFmpeg for decoding.
In affected versions, a vulnerability in FFmpeg’s JPEG2000 decoder allows a specially crafted video file to trigger a heap buffer overflow. The overflow occurs when pixel data is written beyond the bounds of an allocated buffer, overwriting adjacent memory structures.
In practice, this overflow can overwrite function pointers used during memory cleanup. Once control flow is redirected, the attacker can execute arbitrary commands within the vLLM process.
Chained Exploitation: From Input to Code Execution
An attacker can combine these two flaws into a reliable exploit chain:
- Send a malformed image or video to trigger an error response.
- Extract leaked heap addresses from the error message.
- Send a crafted JPEG2000 video payload.
- Use the known memory layout to overwrite function pointers.
- Achieve arbitrary code execution without authentication.
No credentials are required, and the attack can be carried out remotely against exposed vLLM endpoints.
Why this Matters for AI Security
CVE-2026-22778 highlights a growing reality: AI infrastructure inherits the full risk surface of traditional software stacks, including memory corruption, dependency vulnerabilities, and unsafe error handling.
As AI systems expand beyond text into images, video, and agent-driven workflows, these risks compound. Security controls must extend beyond model behavior to the infrastructure that serves them.
References
- GitHub Security Advisory. (2026, February 2). GHSA-4r2x-xpjr-7cvv: Remote code execution in vLLM via malicious video processing. GitHub. https://github.com/advisories/GHSA-4r2x-xpjr-7cvv
- National Vulnerability Database (NVD). (2026). CVE-2026-22778: Remote code execution in vLLM. National Institute of Standards and Technology. https://nvd.nist.gov/vuln/detail/CVE-2026-22778
- OX Security Research Team. (2026, February 2). CVE-2026-22778: vLLM RCE vulnerability analysis. OX Security Blog. https://www.ox.security/blog/cve-2026-22778-vllm-rce-vulnerability/
- Underhill, K. (2026, February 2). Critical vLLM flaw puts AI systems at risk of remote code execution. eSecurity Planet. https://www.esecurityplanet.com/artificial-intelligence/critical-vllm-flaw-puts-ai-systems-at-risk-of-remote-code-execution/
- Schwake, E. (2025, December 5). Critical vLLM flaw exposes the soft underbelly of AI infrastructure. Salt Security Blog. https://salt.security/blog/critical-vllm-flaw-exposes-the-soft-underbelly-of-ai-infrastructure
Related blogs

When the Supply Chain Becomes the Attack Surface: Inside the TeamPCP Campaign
In March 2026, a widely trusted security tool was turned into an attack vector. Trivy, an open-source vulnerability scanner used across CI/CD pipelines, was compromised and used to exfiltrate sensitive credentials from build environments.
5

How a trusted HTTP client becomes the threat: Inside the Axios supply chain attack
In the early hours of 31 March 2026, security researchers noticed something odd: two new releases of the ubiquitous axios HTTP client (versions 1.14.1 and 0.30.4) shipped with a dependency that had never appeared in the project before.
7

CanisterWorm: Compromised npm Publisher Enables Install-Time Supply Chain Attack
On March 20, 2026, researchers at Socket disclosed a supply chain attack involving a compromised npm publisher account used to distribute malicious versions across 29 packages. By March 21, the scope expanded, with 135 affected packages identified, now tracked as part of the CanisterWorm campaign.
3
A Primer on Runtime Intelligence
See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.
Platform Overview Video
Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.
The State of the Application Security Workflow
This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.
.png)
Get real-time insights across the full stack…code, containers, OS, and memory
Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.
