Nvidia has patched a critical vulnerability affecting its container toolkit (formerly known as Nvidia docker).
The vulnerability — tracked as CVE-2024-0132 — has been assigned a CVSS score of 9 out of 10 and can allow a rogue user or application to break out of their dedicated container and gain full access to the underlying host.
“Nvidia Container Toolkit 1.16.1 or earlier contains a Time-of-check Time-of-Use (TOCTOU) vulnerability when used with default configuration where a specifically crafted container image may gain access to the host file system,” Nvidia said in a patch note posted on its Security Bulletin.
The company added that, under certain circumstances, the successful exploitation of the vulnerability might allow code execution, denial of service, escalation of privileges, information disclosure, and data tampering.
Time of Check Time of Use vulnerability
Nvidia Container Toolkit allows Nvidia containers, which are specialized software packages designed to facilitate the deployment of applications particularly involving artificial intelligence and machine learning use cases, to access the GPU hardware. It includes tools and libraries that enable applications running inside containers to utilize the GPU.
According to a Wiz Research blog post, whose researchers Nvidia has credited for the discovery of the vulnerability, the flaw enables attackers controlling a toolkit-executed container image, a lightweight, standalone, executable package containing everything required to run an application, to escape that container and gain full access to the host.
This stems from a flawed condition called “time of check time of use” (TOCTOU) which is a race condition that happens when a program checks a condition and then uses the result of that check without ensuring that condition hasn’t changed in the interim.
While the specific technical details of potential exploitation weren’t disclosed for security reasons, the Wiz blog shared a potential attack flow. “The attacker crafts a specially designed image to exploit CVE-2024-0132,” researchers said in the blog. “The attacker runs the malicious image on the target platform. This can be performed either directly in services allowing shared GPU resources or indirectly through a supply chain or social engineering attack such as a user running an AI image from an untrusted source.”
Who should patch?
The container-escape vulnerability, as pointed out in the patch notes, affects all Nvidia Container Toolkit versions up to and including v1.16.1. According to Wiz researchers, the Toolkit is widely used, and the flaw could be affecting 35% of cloud environments.
“This library is widely adopted as the go-to NVIDIA-supported solution for GPU access within containers,” the researchers added. “Moreover, it comes pre-installed in many AI platforms and virtual machine images (like AMIs), as it’s a common infrastructure requirement for AI applications.”
For shared environments like Kubernetes the bug can allow escaping a container and access data and secrets on other “applications running on the same node – or even on the same cluster”, exposing the entire environment. It is therefore recommended for organizations using a shared compute model to immediately update the toolkit.
“An attacker could deploy a harmful container, break out of it, and use the host machine’s secrets to target the cloud service’s control systems,” the researchers said. “This could give the attacker access to sensitive information, like the source code, data, and secrets of other customers using the same service.” The company noted that the vulnerability does not impact use cases where Container Device Interface (CDI) is used. For everyone else looking to use the Nvidia Container toolkit, a patch is now available.