CVE-2021-43854

CVE-2021-43854 is a high-severity uncontrolled resource consumption vulnerability in nltk (pip), affecting versions < 3.6.6. It is fixed in 3.6.6.

Summary

Workarounds

The execution time of the vulnerable functions is exponential to the length of a malicious input. With other words, the execution time can be bounded by limiting the maximum length of an input to any of the vulnerable functions. Our recommendation is to implement such a limit.

References

For more information

If you have any questions or comments about this advisory:

Impact

The vulnerability is present in PunktSentenceTokenizer, sent_tokenize and word_tokenize. Any users of this class, or these two functions, are vulnerable to a Regular Expression Denial of Service (ReDoS) attack.
In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. The effect of this vulnerability is noticeable with the following example:

from nltk.tokenize import word_tokenize

n = 8
for length in [10**i for i in range(2, n)]:
    # Prepare a malicious input
    text = "a" * length
    start_t = time.time()
    # Call `word_tokenize` and naively measure the execution time
    word_tokenize(text)
    print(f"A length of {length:<{n}} takes {time.time() - start_t:.4f}s")

Which gave the following output during testing:

A length of 100      takes 0.0060s
A length of 1000     takes 0.0060s
A length of 10000    takes 0.6320s
A length of 100000   takes 56.3322s
...

I canceled the execution of the program after running it for several hours.

If your program relies on any of the vulnerable functions for tokenizing unpredictable user input, then we would strongly recommend upgrading to a version of NLTK without the vulnerability, or applying the workaround described below.

Crafted input forces the application to consume excessive CPU, memory, or other resources, degrading or denying service. Typical impact: denial of service.

CVE-2021-43854 has a CVSS score of 7.5 (High). The vector is network-reachable, no privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (3.6.6); upgrading removes the vulnerable code path.

Affected versions

nltk (< 3.6.6)

Security releases

nltk → 3.6.6 (pip)

Kodem intelligence

Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.

Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.

See it in your environment

Remediation advice

The problem has been patched in NLTK 3.6.6. After the fix, running the above program gives the following result:

A length of 100      takes 0.0070s
A length of 1000     takes 0.0010s
A length of 10000    takes 0.0060s
A length of 100000   takes 0.0400s
A length of 1000000  takes 0.3520s
A length of 10000000 takes 3.4641s

This output shows a linear relationship in execution time versus input length, which is desirable for regular expressions.
We recommend updating to NLTK 3.6.6+ if possible.

Frequently Asked Questions

  1. What is CVE-2021-43854? CVE-2021-43854 is a high-severity uncontrolled resource consumption vulnerability in nltk (pip), affecting versions < 3.6.6. It is fixed in 3.6.6. Crafted input forces the application to consume excessive CPU, memory, or other resources, degrading or denying service.
  2. How severe is CVE-2021-43854? CVE-2021-43854 has a CVSS score of 7.5 (High). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
  3. Which versions of nltk are affected by CVE-2021-43854? nltk (pip) versions < 3.6.6 is affected.
  4. Is there a fix for CVE-2021-43854? Yes. CVE-2021-43854 is fixed in 3.6.6. Upgrade to this version or later.
  5. Is CVE-2021-43854 exploitable, and should I be worried? Whether CVE-2021-43854 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
  6. What actually determines whether CVE-2021-43854 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
  7. How do I fix CVE-2021-43854? Upgrade nltk to 3.6.6 or later.

Other vulnerabilities in nltk

CVE-2026-54293CVE-2026-33236CVE-2026-33230CVE-2026-0846CVE-2026-0847

Stop the waste.
Protect your environment with Kodem.