Summary
Workarounds
There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.
References
Impact
This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: https://github.com/mlc-ai/xgrammar/security/advisories/GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend key of the extra_body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.
The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap. Typical impact: resource exhaustion leading to denial of service.
GHSA-HF3C-WXG2-49Q9 has a CVSS score of 6.5 (Medium). The vector is network-reachable, low privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (0.8.4); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Frequently Asked Questions
- What is GHSA-HF3C-WXG2-49Q9? GHSA-HF3C-WXG2-49Q9 is a medium-severity allocation of resources without limits or throttling vulnerability in vllm (pip), affecting versions >= 0.6.5, < 0.8.4. It is fixed in 0.8.4. The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap.
- How severe is GHSA-HF3C-WXG2-49Q9? GHSA-HF3C-WXG2-49Q9 has a CVSS score of 6.5 (Medium). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of vllm are affected by GHSA-HF3C-WXG2-49Q9? vllm (pip) versions >= 0.6.5, < 0.8.4 is affected.
- Is there a fix for GHSA-HF3C-WXG2-49Q9? Yes. GHSA-HF3C-WXG2-49Q9 is fixed in 0.8.4. Upgrade to this version or later.
- Is GHSA-HF3C-WXG2-49Q9 exploitable, and should I be worried? Whether GHSA-HF3C-WXG2-49Q9 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether GHSA-HF3C-WXG2-49Q9 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix GHSA-HF3C-WXG2-49Q9? Upgrade
vllmto 0.8.4 or later.