Summary
A memory corruption vulnerability that leading to a crash (denial-of-service) and potentially remote code execution (RCE) exists in vLLM versions 0.10.2 and later, in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation.
Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM.
Details
A vulnerability that can lead to RCE from the completions API endpoint exists in vllm, where due to missing checks when loading user-provided tensors, an out-of-bounds write can be triggered. This happens because the default behavior of torch.load(tensor, weights_only=True) since pytorch 2.8.0 is to not perform validity checks for sparse tensors, and this needs to be enabled explicitly using the torch.sparse.check_sparse_tensor_invariants context manager.
The vulnerability is in the following code in vllm/entrypoints/renderer.py:148
def _load_and_validate_embed(embed: bytes) -> EngineEmbedsPrompt:
tensor = torch.load(
io.BytesIO(pybase64.b64decode(embed, validate=True)),
weights_only=True,
map_location=torch.device("cpu"),
)
assert isinstance(tensor, torch.Tensor) and tensor.dtype in (
torch.float32,
torch.bfloat16,
torch.float16,
)
tensor = tensor.to_dense()
Because of the missing checks, loading invalid prompt embedding tensors provided by the user can cause an out-of-bounds write in the call to to_dense .
Acknowledgements
Finder: AXION Security Research Team (Omri Fainaro, Bary Levy): discovery and coordinated disclosure.
Impact
All users with access to this API are able to exploit this vulnerability. Unsafe deserialization of untrusted input can be abused to achieve DoS and potentially remote code execution (RCE) in the vLLM server process. This impacts deployments running vLLM as a server or any instance that deserializes untrusted/model-provided payloads.
The application does not adequately validate input before processing it, allowing unexpected values to reach sensitive code paths. Typical impact: varies by context: data corruption, logic bypass, or denial of service.
CVE-2025-62164 has a CVSS score of 8.8 (High). The vector is network-reachable, low privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (0.11.1); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Frequently Asked Questions
- What is CVE-2025-62164? CVE-2025-62164 is a high-severity improper input validation vulnerability in vllm (pip), affecting versions >= 0.10.2, < 0.11.1. It is fixed in 0.11.1. The application does not adequately validate input before processing it, allowing unexpected values to reach sensitive code paths.
- How severe is CVE-2025-62164? CVE-2025-62164 has a CVSS score of 8.8 (High). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of vllm are affected by CVE-2025-62164? vllm (pip) versions >= 0.10.2, < 0.11.1 is affected.
- Is there a fix for CVE-2025-62164? Yes. CVE-2025-62164 is fixed in 0.11.1. Upgrade to this version or later.
- Is CVE-2025-62164 exploitable, and should I be worried? Whether CVE-2025-62164 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether CVE-2025-62164 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix CVE-2025-62164? Upgrade
vllmto 0.11.1 or later.