CVE-2026-34756

CVE-2026-34756 is a medium-severity allocation of resources without limits or throttling vulnerability in vllm (pip), affecting versions >= 0.1.0, < 0.19.0. It is fixed in 0.19.0.

Summary

A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.

Details

The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:

  1. Protocol Layer:
    In vllm/entrypoints/openai/chat_completion/protocol.py, the n parameter is defined simply as an integer without any pydantic.Field constraints for an upper bound.
class ChatCompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api/reference/chat/create
    messages: list[ChatCompletionMessageParam]
    model: str | None = None
    frequency_penalty: float | None = 0.0
    logit_bias: dict[str, float] | None = None
    logprobs: bool | None = False
    top_logprobs: int | None = 0
    max_tokens: int | None = Field(
        default=None,
        deprecated="max_tokens is deprecated in favor of "
        "the max_completion_tokens field",
    )
    max_completion_tokens: int | None = None
    n: int | None = 1
    presence_penalty: float | None = 0.0
  1. SamplingParams Layer (Incomplete Validation):
    When the API request is converted to internal SamplingParams in vllm/sampling_params.py, the _verify_args method only checks the lower bound (self.n < 1), entirely omitting an upper bounds check.
    def _verify_args(self) -> None:
        if not isinstance(self.n, int):
            raise ValueError(f"n must be an int, but is of type {type(self.n)}")
        if self.n < 1:
            raise ValueError(f"n must be at least 1, got {self.n}.")
  1. Engine Layer (The OOM Trigger):
    When the malicious request reaches the core engine (vllm/v1/engine/async_llm.py), the engine attempts to fan out the request n times to generate identical independent sequences within a synchronous loop.
        # Fan out child requests (for n>1).
        parent_request = ParentRequest(request)
        for idx in range(parent_params.n):
            request_id, child_params = parent_request.get_child_info(idx)
            child_request = request if idx == parent_params.n - 1 else copy(request)
            child_request.request_id = request_id
            child_request.sampling_params = child_params
            await self._add_request(
                child_request, prompt_text, parent_request, idx, queue
            )
        return queue

Because Python's asyncio runs on a single thread and event loop, this monolithic for-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances via copy(request), driving the host's Resident Set Size (RSS) up by gigabytes per second until the OS OOM-killer terminates the vLLM process.

Impact

Vulnerability Type: Resource Exhaustion / Denial of Service

Impacted Parties:

  • Any individual or organization hosting a public-facing vLLM API server (vllm.entrypoints.openai.api_server), which happens to be the primary entrypoint for OpenAI-compatible setups.
  • SaaS / AI-as-a-Service platforms acting as reverse proxies sitting in front of vLLM without strict HTTP body payload validation or rate limitations.

Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.

The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap. Typical impact: resource exhaustion leading to denial of service.

CVE-2026-34756 has a CVSS score of 6.5 (Medium). The vector is network-reachable, low privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (0.19.0); upgrading removes the vulnerable code path.

Affected versions

vllm (>= 0.1.0, < 0.19.0)

Security releases

vllm → 0.19.0 (pip)

Kodem intelligence

Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.

Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.

See it in your environment

Remediation advice

Upgrade vllm to 0.19.0 or later to resolve this vulnerability.

Kodem Kai can prioritize this vulnerability in your dependency tree and generate a fix recommendation.

Frequently Asked Questions

  1. What is CVE-2026-34756? CVE-2026-34756 is a medium-severity allocation of resources without limits or throttling vulnerability in vllm (pip), affecting versions >= 0.1.0, < 0.19.0. It is fixed in 0.19.0. The application allocates resources such as memory, threads, or file descriptors based on untrusted input without enforcing a cap.
  2. How severe is CVE-2026-34756? CVE-2026-34756 has a CVSS score of 6.5 (Medium). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
  3. Which versions of vllm are affected by CVE-2026-34756? vllm (pip) versions >= 0.1.0, < 0.19.0 is affected.
  4. Is there a fix for CVE-2026-34756? Yes. CVE-2026-34756 is fixed in 0.19.0. Upgrade to this version or later.
  5. Is CVE-2026-34756 exploitable, and should I be worried? Whether CVE-2026-34756 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
  6. What actually determines whether CVE-2026-34756 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
  7. How do I fix CVE-2026-34756? Upgrade vllm to 0.19.0 or later.

Other vulnerabilities in vllm

CVE-2026-54233CVE-2026-54236CVE-2026-53923CVE-2026-12491CVE-2026-48746

Stop the waste.
Protect your environment with Kodem.