CVE-2026-22778: Critical Remote Code Execution in vLLM Multimodal Inference

Kodem Security Research Team

February 3, 2026

0 min read

Vulnerabilities

CVE-2026-22778: Critical Remote Code Execution in vLLM Multimodal Inference

A critical pre-authenticated remote code execution (RCE) vulnerability, tracked as CVE-2026-22778 (CVSS 9.8), has been discovered in vLLM, a widely used inference and serving engine for large language models.

Publicly exposed vLLM deployments running video models are vulnerable to full server compromise. An attacker can trigger the flaw by submitting a malicious video link to vLLM’s API, resulting in arbitrary command execution on the underlying system without authentication.

The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and stems from a chained exploit that combines an information disclosure flaw in error handling with a heap buffer overflow in a bundled video decoding dependency. When exploited together, these weaknesses allow attackers to bypass memory protections and gain code execution within the vLLM process.

While the issue is limited to deployments that enable multimodal video processing, the default exposure model of vLLM makes this vulnerability particularly dangerous for internet-facing inference services.

Affected Versions

Immediate Actions

Upgrade to vLLM v0.14.1 or later.
Verify OpenCV is updated to a patched version.
Reduce exposure of inference APIs by blocking multimodal requests containing a video_url parameter to the following endpoints:
1. POST /v1/chat/completions
2. POST /v1/invocations

If you Can’t Patch Immediately

Disable video and multimodal endpoints.
Restrict access to inference APIs to trusted internal users or services.

What is vLLM?

vLLM is an open-source, high-throughput inference engine designed to efficiently serve large language models across cloud and self-hosted environments. It is widely adopted for running LLMs at scale due to its performance and memory efficiency, particularly under concurrent workloads.

As vLLM adoption has expanded beyond text-only inference to support multimodal inputs such as images and video, it has become increasingly exposed through public-facing APIs. This expanded attack surface is central to CVE-2026-22778, which affects deployments that enable video processing.

By chaining an information disclosure flaw with a heap buffer overflow in a bundled video decoding dependency, an unauthenticated attacker can achieve arbitrary code execution on vulnerable vLLM systems.

Technical Details

CVE-2026-22778 is not a single bug, but a chained exploit that combines an information disclosure issue with a heap-based buffer overflow in vLLM’s video processing pipeline. The exploit unfolds across multiple layers of the inference stack as follows:

API request handling: An attacker sends a request to vLLM’s Completions or Invocations API containing a video_url, triggering the multimodal video processing path without requiring authentication.
Video ingestion via OpenCV: vLLM processes the supplied video using OpenCV’s cv2.VideoCapture() interface, which delegates decoding to a bundled FFmpeg library.
JPEG2000 decoding in FFmpeg: FFmpeg invokes the JPEG2000 decoder libopenjp2 to parse video frames, trusting attacker-controlled metadata embedded in the file structure.
Heap buffer overflow through crafted cdef box: A malicious JPEG2000 file abuses the channel definition cdef box to remap image channels without validating buffer sizes, causing a heap-based buffer overflow during decoding.
Function pointer corruption: The overflow overwrites adjacent heap memory, including a function pointer used by the decoder or cleanup routines.
Arbitrary code execution: When the corrupted function pointer is later dereferenced, execution flow is redirected, resulting in arbitrary code execution within the vLLM process.

When chained together, these flaws allow reliable, unauthenticated remote code execution on vLLM deployments that enable video processing.

Information Disclosure through Error Handling

When vLLM receives malformed image or video input, it relies on Python imaging libraries to parse the data. In vulnerable versions, error messages generated during this process are returned directly to the client.

These error messages can include raw object representations containing heap memory addresses, for example:

cannot identify image file <_io.BytesIO object at 0x7a95e299e750>

This leaks precise memory addresses from the vLLM process. Address Space Layout Randomization (ASLR) is a key defense against memory corruption exploits. By leaking heap addresses, vLLM effectively hands attackers the information needed to bypass ASLR and precisely target memory locations during exploitation.

Heap Buffer Overflow in Video Decoding

The second flaw resides in vLLM’s video processing pipeline. When handling video inputs, vLLM uses OpenCV, which in turn relies on FFmpeg for decoding.

In affected versions, a vulnerability in FFmpeg’s JPEG2000 decoder allows a specially crafted video file to trigger a heap buffer overflow. The overflow occurs when pixel data is written beyond the bounds of an allocated buffer, overwriting adjacent memory structures.

In practice, this overflow can overwrite function pointers used during memory cleanup. Once control flow is redirected, the attacker can execute arbitrary commands within the vLLM process.

Chained Exploitation: From Input to Code Execution

An attacker can combine these two flaws into a reliable exploit chain:

Send a malformed image or video to trigger an error response.
Extract leaked heap addresses from the error message.
Send a crafted JPEG2000 video payload.
Use the known memory layout to overwrite function pointers.
Achieve arbitrary code execution without authentication.

No credentials are required, and the attack can be carried out remotely against exposed vLLM endpoints.

Why this Matters for AI Security

CVE-2026-22778 highlights a growing reality: AI infrastructure inherits the full risk surface of traditional software stacks, including memory corruption, dependency vulnerabilities, and unsafe error handling.

As AI systems expand beyond text into images, video, and agent-driven workflows, these risks compound. Security controls must extend beyond model behavior to the infrastructure that serves them.

References

GitHub Security Advisory. (2026, February 2). GHSA-4r2x-xpjr-7cvv: Remote code execution in vLLM via malicious video processing. GitHub. https://github.com/advisories/GHSA-4r2x-xpjr-7cvv
National Vulnerability Database (NVD). (2026). CVE-2026-22778: Remote code execution in vLLM. National Institute of Standards and Technology. https://nvd.nist.gov/vuln/detail/CVE-2026-22778
OX Security Research Team. (2026, February 2). CVE-2026-22778: vLLM RCE vulnerability analysis. OX Security Blog. https://www.ox.security/blog/cve-2026-22778-vllm-rce-vulnerability/
Underhill, K. (2026, February 2). Critical vLLM flaw puts AI systems at risk of remote code execution. eSecurity Planet. https://www.esecurityplanet.com/artificial-intelligence/critical-vllm-flaw-puts-ai-systems-at-risk-of-remote-code-execution/
Schwake, E. (2025, December 5). Critical vLLM flaw exposes the soft underbelly of AI infrastructure. Salt Security Blog. https://salt.security/blog/critical-vllm-flaw-exposes-the-soft-underbelly-of-ai-infrastructure

Get the Playbook

Table of contents

Example H2

Example H3

Related blogs

When the Supply Chain Becomes the Attack Surface: Inside the TeamPCP Campaign

In March 2026, a widely trusted security tool was turned into an attack vector. Trivy, an open-source vulnerability scanner used across CI/CD pipelines, was compromised and used to exfiltrate sensitive credentials from build environments.

How a trusted HTTP client becomes the threat: Inside the Axios supply chain attack

In the early hours of 31 March 2026, security researchers noticed something odd: two new releases of the ubiquitous axios HTTP client (versions 1.14.1 and 0.30.4) shipped with a dependency that had never appeared in the project before.

CanisterWorm: Compromised npm Publisher Enables Install-Time Supply Chain Attack

On March 20, 2026, researchers at Socket disclosed a supply chain attack involving a compromised npm publisher account used to distribute malicious versions across 29 packages. By March 21, the scope expanded, with 135 affected packages identified, now tracked as part of the CanisterWorm campaign.

Stop the waste.
Protect your environment with Kodem.

Get a personalized demo

A Primer on Runtime Intelligence

See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.

5.1k

Applications covered

1.1m

False positives eliminated

4.8k

Triage hours reduced

Platform Overview Video

Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.