Summary
This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.
Description
When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.
For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.
Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.
Environment
- GPU: NVIDIA A100 (40G)
- CUDA: 11.8
- PyTorch: 2.3.1
- OS: Ubuntu 18.04
- vLLM: v0.5.1
Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.
Leakage
We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.
Results
In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:
- With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.
- When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.
Fixes
Impact
CVE-2025-46570 has a CVSS score of 2.6 (Low). The vector is network-reachable, low privileges required, and user interaction required. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (0.9.0); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Kodem Kai can prioritize this vulnerability in your dependency tree and generate a fix recommendation.
Frequently Asked Questions
- What is CVE-2025-46570? CVE-2025-46570 is a low-severity security vulnerability in vllm (pip), affecting versions < 0.9.0. It is fixed in 0.9.0.
- How severe is CVE-2025-46570? CVE-2025-46570 has a CVSS score of 2.6 (Low). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of vllm are affected by CVE-2025-46570? vllm (pip) versions < 0.9.0 is affected.
- Is there a fix for CVE-2025-46570? Yes. CVE-2025-46570 is fixed in 0.9.0. Upgrade to this version or later.
- Is CVE-2025-46570 exploitable, and should I be worried? Whether CVE-2025-46570 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether CVE-2025-46570 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix CVE-2025-46570? Upgrade
vllmto 0.9.0 or later.