vSphere 6.7 – ESXi and TPM 2.0

With vSphere 6.7 I’m happy to announce the support of TPM 2.0! This blog will go into detail on how we are leveraging the TPM 2.0 chip found on most modern servers. I’ll also clarify some mis-conceptions and try to put into context what pieces are doing what during the boot of ESXi 6.7.

First, we’ll start out with “What is a TPM?” and what its capabilities are.

Trusted Platform Module or “TPM”

A TPM (Trusted Platform Module) is a computer chip/microcontroller that can securely store artifacts used to authenticate the platform (your PC or laptop). These artifacts can include measurements, passwords, certificates, or encryption keys. A TPM can also be used to digitally sign content and store platform measurements that help ensure that the platform remains trustworthy.  The  Trusted Computing Group has a great detailed overview of what a TPM is and does. I will attempt to provide a journeyman’s overview below.

TPM Device Support

Since ESXi 5.x, ESXi has had support for TPM 1.2. Prior to 6.7 the API’s and functionality of TPM 1.2 was limited to 3rd party applications created by VMware partners.

In 6.7 we have introduced support for TPM 2.0. TPM 2.0 and TPM 1.2 are two entirely different implementations and there is no backwards compatibility. For all intents and purposes, they are considered two different devices to ESXi.

If you are running 6.5 on a server with TPM 2.0 you will not see the TPM 2.0 device because there’s no support in 6.5 for TPM 2.0. New features in 6.7 do not use the TPM 1.2 device.

TPM performance

Speed

A TPM is a very slow device. It typically lives on the same bus that serial devices, parallel ports and other low-speed devices live.

Cryptographic Signing

A TPM is not designed for high speed cryptographic operations. You’re not going to do every cryptographic operations with a TPM. A CPU is leaps and bounds faster for that. A TPM would sign something to prove that it was signed by the TPM.

Storage Space

The amount of space to store measurements and credentials is measured in KB. It’s very small. You are not going to store 100’s of VM’s keys on a TPM!

Attestation

The term “attestation” is used by the InfoSec community quite a bit. It’s a declaration or evidence of a result. In this case we are using an attestation of a host to provide evidence that the host has booted with Secure Boot enabled thereby ensuring only signed code is used.

How does ESXi 6.7 use a TPM 2.0 device?

At a high level, TPM 2.0 is used to store measurements of a known good boot of ESXi. This measurement is then compared by vCenter with what ESXi reports.

This is done by building upon the Secure Boot work done in vSphere 6.5. Read more about that work on my blog where I talk about ESXi and Secure Boot providing trusted assurance.

If you haven’t read that blog yet, then please stop now and go read it. It will make what is discussed next much clearer.

In other words, the TPM provides a mechanism that provides assurance that ESXi has booted with Secure Boot enabled. By confirming that Secure Boot is enabled we can then ensure that ESXi has booted using only digitally signed code.

This is an excellent example of the iterative approach to security we are delivering on. In 6.5 we delivered Secure Boot support. In 6.7 we built upon that by delivering TPM 2.0 to provide assurance that Secure Boot is turned on.

Note: If you are having issues when you enable Secure Boot and you are sure you have signed VIBs, please see https://kb.vmware.com/s/article/2147606 and https://kb.vmware.com/s/article/54481 for more information. You may have upgraded using ESXCLI when you should be upgrading using the ISO.

Supply Chain Assurance

Before we get started, let me address a question I have gotten in the past that revolves around “How can we be assured the firmware is valid?”. That is a great question but to be very clear, it is outside the scope of what is being discussed. Our “Root of Trust” has to start someplace and for the ESXi and TPM 2.0 boot process it starts with valid hardware and firmware.

Supply chain assurance of hardware and firmware is rooted in discussions with your server and CPU vendors. They should be able to provide you with a level of assurance that their hardware and firmware meets your security needs.

This also brings into scope things like administrator access to firmware/BIOS settings and IPMI/iDRAC/iLO type of access to the server console. To provide assurance that ESXi can boot securely means you must have a good security process in place for this type of access. That means network isolation of those interfaces, limiting access to those interfaces to only the most trusted in your org and logging all changes to those interfaces. That’s how you establish your “root of trust”.

Boot time steps

This section will provide more technical detail as to what is going on behind the scenes. Below is an animated GIF of the boot and attestation process.

UEFI Firmware

At Power On, the Host Hardware loads the UEFI Firmware. The UEFI Firmware then validates the Boot Loader against the digital certificate stored in the server vendor supplied firmware. (See the Secure Boot blog for more info). If the Boot Loader was tampered with then the UEFI Firmware would halt the boot process.

Boot Loader

The term “Boot Loader” consists of two components: The actual “boot loader” and “vmkboot” For ease of discussion and visualization we will consider them to be a single item.

Now that the UEFI firmware has validated the boot loader the vmkboot component can be loaded. Within the vmkboot component is a VMware digital certificate. This certificate is used to validate the signature of the ESXi Kernel.

vmkboot uses the TPM 2.0 API to write measured values, represented as hashes of modules and settings, to the TPM device. This includes whether Secure Boot was enabled. Hashes are SHA-512.

This process is done at each boot time.

vmkboot then validates the ESXi “VM Kernel” using the VMware digital certificate.

Kernel

The term “VM Kernel” consists of multiple components: The actual “Kernel”, the Init process and the Secure Boot Verifier. For the ease of discussion and visualization we will consider them to be a single item.

Kernel: The Kernel validates the Init process.

vCenter: TPM measurements, VIB metadata and untrusted event logs are sent to VC for inspection.

vCenter: vCenter compares TPM 2.0 stored hash values against hash values reported in the event logs and VIB metadata and makes an attestation assessment. If the values are the same, then the host has passed attestation. A quick way to demonstrate failing attestation is to disable Secure Boot!

Kernel: The Init process runs the Secure Boot Verifier, validating all VIB’s. All VIB digital signatures chain to the VMware digital certificate in the Secure Boot Verifier. When this completes and all VIBs check out then processes like hostd can run and VM’s can start.

Security Report

After the host has completed its boot process and the TPM and event log and VIB metadata measurements are compared by vCenter a security report is generated in the vSphere HTML5 client. You can see an example of the Security Report showing the attestation status of a number of hosts.

You will see a mix of hosts that have TPM 1.2 and 2.0 chips. TPM 1.2 hosts will always report an attestation of N/A. In the example above, you will see that host 10.20.235.198 has failed attestation.

The current method of retrieving the attestation status is via the report in the HTML5 client in vCenter. I’ll be exploring other options from an automation standpoint in a future blog.

Some will ask questions such as “But will this mean that VM’s won’t run on/vMotion to a host that has failed attestation?”. The answer is that VM’s will continue to run on host that has failed attestation.

What I can say in response is that “We are very aware of the ask for this capability” and we would really welcome your feedback.

Standalone Hosts

The question of “Can I get an attestation of a standalone host?” will come up. The answer is “No”. The why is because there is no 3rd party (e.g. vCenter) comparing the TPM values with the ESXi event logs and VIB metadata. Querying the host directly means you are essentially asking ESXi to attest to itself. There is no 3rd party involved.

Wrap Up

There you have it! TPM 2.0 provides the assurance that Secure Boot did its job and vCenter provides a handy report to show which hosts have failed their attestation.

Also, please check out the FAQ on vSphere Central for more info on TPM and virtual TPM. (Coming this week!)

If you have questions, post them here or find me on Twitter. My work Twitter account is: @vspheresecurity

mike