Summary
The /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests
Details
In serving_engine.py, the chat_template_kwargs are unpacked into kwargs passed to chat_utils.py apply_hf_chat_template with no validation on the keys or values in that chat_template_kwargs dict. This means they can be used to override optional parameters in the apply_hf_chat_template method, such as tokenize, changing its default from False to True.
Both serving_chat.py and serving_tokenization.py call into this _preprocess_chat method of serving_engine.py and they both pass in chat_template_kwargs.
So, a chat_template_kwargs like {"tokenize": True} makes tokenization happen as part of applying the chat template, even though that is not expected. Tokenization is a blocking operation, and with sufficiently large input can block the API server's event loop, which blocks handling of all other requests until this tokenization is complete.
This optional tokenize parameter to apply_hf_chat_template does not appear to be used, so one option would be to just hard-code that to always be False instead of allowing it to be optionally overridden by callers. A better option may be to not pass chat_template_kwargs as unpacked kwargs but instead as a dict, and only unpack them after the logic in apply_hf_chat_template that resolves the kwargs against the chat template.
Impact
Any authenticated user can cause a denial of service to a vLLM server with Chat Completion or Tokenize requests.
The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap. Typical impact: resource exhaustion leading to denial of service.
CVE-2025-62426 has a CVSS score of 6.5 (Medium). The vector is network-reachable, low privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (0.11.1); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Frequently Asked Questions
- What is CVE-2025-62426? CVE-2025-62426 is a medium-severity allocation of resources without limits or throttling vulnerability in vllm (pip), affecting versions >= 0.5.5, < 0.11.1. It is fixed in 0.11.1. The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap.
- How severe is CVE-2025-62426? CVE-2025-62426 has a CVSS score of 6.5 (Medium). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of vllm are affected by CVE-2025-62426? vllm (pip) versions >= 0.5.5, < 0.11.1 is affected.
- Is there a fix for CVE-2025-62426? Yes. CVE-2025-62426 is fixed in 0.11.1. Upgrade to this version or later.
- Is CVE-2025-62426 exploitable, and should I be worried? Whether CVE-2025-62426 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether CVE-2025-62426 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix CVE-2025-62426? Upgrade
vllmto 0.11.1 or later.