CVE-2026-22778: Critical Remote Code Execution in vLLM Multimodal Inference

Kodem Security Research Team
February 3, 2026
February 3, 2026

0 min read

Vulnerabilities
CVE-2026-22778: Critical Remote Code Execution in vLLM Multimodal Inference

A critical pre-authenticated remote code execution (RCE) vulnerability, tracked as CVE-2026-22778 (CVSS 9.8), has been discovered in vLLM, a widely used inference and serving engine for large language models.

Publicly exposed vLLM deployments running video models are vulnerable to full server compromise. An attacker can trigger the flaw by submitting a malicious video link to vLLM’s API, resulting in arbitrary command execution on the underlying system without authentication.

The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and stems from a chained exploit that combines an information disclosure flaw in error handling with a heap buffer overflow in a bundled video decoding dependency. When exploited together, these weaknesses allow attackers to bypass memory protections and gain code execution within the vLLM process.

While the issue is limited to deployments that enable multimodal video processing, the default exposure model of vLLM makes this vulnerability particularly dangerous for internet-facing inference services. 

Affected Versions

Immediate Actions

  1. Upgrade to vLLM v0.14.1 or later.
  2. Verify OpenCV is updated to a patched version.
  3. Reduce exposure of inference APIs by blocking multimodal requests containing a video_url parameter to the following endpoints:
    1. POST /v1/chat/completions
    2. POST /v1/invocations

If you Can’t Patch Immediately

  1. Disable video and multimodal endpoints.
  2. Restrict access to inference APIs to trusted internal users or services.

What is vLLM?

vLLM is an open-source, high-throughput inference engine designed to efficiently serve large language models across cloud and self-hosted environments. It is widely adopted for running LLMs at scale due to its performance and memory efficiency, particularly under concurrent workloads.

As vLLM adoption has expanded beyond text-only inference to support multimodal inputs such as images and video, it has become increasingly exposed through public-facing APIs. This expanded attack surface is central to CVE-2026-22778, which affects deployments that enable video processing.

By chaining an information disclosure flaw with a heap buffer overflow in a bundled video decoding dependency, an unauthenticated attacker can achieve arbitrary code execution on vulnerable vLLM systems.

Technical Details

CVE-2026-22778 is not a single bug, but a chained exploit that combines an information disclosure issue with a heap-based buffer overflow in vLLM’s video processing pipeline. The exploit unfolds across multiple layers of the inference stack as follows:

  1. API request handling: An attacker sends a request to vLLM’s Completions or Invocations API containing a video_url, triggering the multimodal video processing path without requiring authentication.
  2. Video ingestion via OpenCV: vLLM processes the supplied video using OpenCV’s cv2.VideoCapture() interface, which delegates decoding to a bundled FFmpeg library.
  3. JPEG2000 decoding in FFmpeg: FFmpeg invokes the JPEG2000 decoder libopenjp2 to parse video frames, trusting attacker-controlled metadata embedded in the file structure.
  4. Heap buffer overflow through crafted cdef box: A malicious JPEG2000 file abuses the channel definition cdef box to remap image channels without validating buffer sizes, causing a heap-based buffer overflow during decoding.
  5. Function pointer corruption: The overflow overwrites adjacent heap memory, including a function pointer used by the decoder or cleanup routines.
  6. Arbitrary code execution: When the corrupted function pointer is later dereferenced, execution flow is redirected, resulting in arbitrary code execution within the vLLM process.

When chained together, these flaws allow reliable, unauthenticated remote code execution on vLLM deployments that enable video processing.

Information Disclosure through Error Handling

When vLLM receives malformed image or video input, it relies on Python imaging libraries to parse the data. In vulnerable versions, error messages generated during this process are returned directly to the client.

These error messages can include raw object representations containing heap memory addresses, for example:

cannot identify image file <_io.BytesIO object at 0x7a95e299e750>

This leaks precise memory addresses from the vLLM process. Address Space Layout Randomization (ASLR) is a key defense against memory corruption exploits. By leaking heap addresses, vLLM effectively hands attackers the information needed to bypass ASLR and precisely target memory locations during exploitation.

Heap Buffer Overflow in Video Decoding

The second flaw resides in vLLM’s video processing pipeline. When handling video inputs, vLLM uses OpenCV, which in turn relies on FFmpeg for decoding.

In affected versions, a vulnerability in FFmpeg’s JPEG2000 decoder allows a specially crafted video file to trigger a heap buffer overflow. The overflow occurs when pixel data is written beyond the bounds of an allocated buffer, overwriting adjacent memory structures.

In practice, this overflow can overwrite function pointers used during memory cleanup. Once control flow is redirected, the attacker can execute arbitrary commands within the vLLM process.

Chained Exploitation: From Input to Code Execution

An attacker can combine these two flaws into a reliable exploit chain:

  1. Send a malformed image or video to trigger an error response.
  2. Extract leaked heap addresses from the error message.
  3. Send a crafted JPEG2000 video payload.
  4. Use the known memory layout to overwrite function pointers.
  5. Achieve arbitrary code execution without authentication.

No credentials are required, and the attack can be carried out remotely against exposed vLLM endpoints.

Why this Matters for AI Security

CVE-2026-22778 highlights a growing reality: AI infrastructure inherits the full risk surface of traditional software stacks, including memory corruption, dependency vulnerabilities, and unsafe error handling.

As AI systems expand beyond text into images, video, and agent-driven workflows, these risks compound. Security controls must extend beyond model behavior to the infrastructure that serves them.

References

  1. GitHub Security Advisory. (2026, February 2). GHSA-4r2x-xpjr-7cvv: Remote code execution in vLLM via malicious video processing. GitHub. https://github.com/advisories/GHSA-4r2x-xpjr-7cvv 
  2. National Vulnerability Database (NVD). (2026). CVE-2026-22778: Remote code execution in vLLM. National Institute of Standards and Technology. https://nvd.nist.gov/vuln/detail/CVE-2026-22778 
  3. OX Security Research Team. (2026, February 2). CVE-2026-22778: vLLM RCE vulnerability analysis. OX Security Blog. https://www.ox.security/blog/cve-2026-22778-vllm-rce-vulnerability/ 
  4. Underhill, K. (2026, February 2). Critical vLLM flaw puts AI systems at risk of remote code execution. eSecurity Planet. https://www.esecurityplanet.com/artificial-intelligence/critical-vllm-flaw-puts-ai-systems-at-risk-of-remote-code-execution/ 
  5. Schwake, E. (2025, December 5). Critical vLLM flaw exposes the soft underbelly of AI infrastructure. Salt Security Blog. https://salt.security/blog/critical-vllm-flaw-exposes-the-soft-underbelly-of-ai-infrastructure
Table of contents

Related blogs

CVE-2026-21858: Ni8mare: Unauthenticated Remote Code Execution in n8n

An unauthenticated Remote Code Execution (RCE) flaw, tracked as CVE-2026-21858 (CVSS 10.0), has been discovered in n8n, the widely-adopted workflow automation platform. With over 100 million Docker pulls and an estimated 100,000 locally deployed instances, this vulnerability transforms n8n from a productivity tool into a severe single point of potential failure for organizations globally.

January 8, 2026

Guess Who's Back: Shai-Hulud 3.0 The Golden Path

Security analysts recently identified a new variant of the Shai-Hulud npm supply chain worm in the public registry, signaling continued evolution of this threat family. This variant, dubbed “The Golden Path” exhibits modifications from prior waves of the malware, suggesting ongoing evolution in the threat actor’s tradecraft.

December 29, 2025

Kai at Work: A Day in the Life of an AI AppSec Engineer

Kai, Kodem’s secure-by-design AI AppSec Engineer, is integrated directly into the platform to deliver contextualized and actionable answers precisely when AppSec teams need them. By converting your existing security data into conversational intelligence, Kai eliminates the need for hours of manual investigation and context-switching. You can now ask questions as you would to a senior, humble, and tireless engineer.

December 15, 2025

Stop the waste.
Protect your environment with Kodem.

Get a personalized demo
Get a personalized demo

A Primer on Runtime Intelligence

See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

Platform Overview Video

Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

The State of the Application Security Workflow

This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.

Get real-time insights across the full stack…code, containers, OS, and memory

Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.

Kodem Security Research Team
Publish date

0 min read

Vulnerabilities