Boxing clever with IOMMUs by dfsdf224s


									                                     Boxing clever with IOMMUs
                    Grzegorz Miło´∗
                                 s                                       Derek G. Murray∗
              Input/Output Memory Management Units (IOMMUs) have been touted as the solution to many prob-
          lems in virtualisation security. Used na¨vely, they can improve fault isolation and reduce the amount of
          trusted code. We contend that it is possible to do better.
              In this paper, we introduce page boxing, a novel abstraction that allows untrusted virtual machines to
          manage data without having access to its contents. We illustrate how this can be used with an IOMMU
          to create a confidential end-to-end channel between disks and virtual machines. Unlike alternative ap-
          proaches, we avoid the use of encryption, which gives the potential for high performance.

1        Introduction
In a modern computer, any software which controls a device that is capable of direct memory access (DMA)
must be trusted. Such software may program the device to perform DMA to or from any memory location,
which allows it to breach the confidentiality and integrity of that data. In a virtualised system, these devices
may be controlled by the hypervisor or a management domain, but in either case a large amount of driver
code resides in the trusted computing base.
     IOMMUs have been proposed as a solution to this problem. These improve isolation by creating a
virtual address space for each device – akin to the virtual address spaces that an MMU creates for the CPU
– which explicitly restricts the memory that a device may use for DMA.
     We propose a novel mechanism, called page boxing, that uses an IOMMU to create a confidential end-
to-end channel between a virtual machine and disk, while retaining the complexity of the device subsystem
in the management domain. We base our mechanism on the split-driver device channel approach from Xen,
with a front-end in the guest VM, and a back-end that performs physical I/O in the management domain [4].
We use the MMU and IOMMU together to ensure that, while the management domain may perform DMA
involving data that belongs to the guest VM, it may not read or write this data.
     Our threat model is based on three protection domains: the management domain (Dom0, pace Xen [1]),
a VM with private data (DomU) and another VM on the same host (DomV), all of which run on a trusted
hypervisor. We demonstrate how page boxing prevents Dom0 from reading or writing any data that belongs
to DomU in Section 2. We also introduce a technique called self-bootstrapping, which ensures that Dom0
cannot use a malicious DomV to access DomU’s data indirectly, in Section 3. Our approach concentrates
on confidentiality: we assume that DomU maintains integrity at the file system level (e.g. using ZFS [14]).
We do not attempt to provide availability when Dom0 is malicious.

2        Page boxing
At a high level, page boxing is a mechanism that allows Dom0 to control the movement of data without
being able to read or write it. For example, when writing a block to disk, DomU passes the data to Dom0,
through a split device, and Dom0 ultimately writes this data to disk [4]. Dom0 may rearrange the data (for
        University of Cambridge Computer Laboratory, Cambridge, United Kingdom

                                   Figure 1: The moving boxes diagram

example, to schedule a multi-block transaction), but it need not inspect or modify the content of each block.
Our mechanism places this data in a “locked black box”, to which only DomU and the associated physical
device have the key. The hypervisor uses the MMU to lock the box in Dom0, and the IOMMU to make the
content of the box available to the correct hardware (see Figure 1).
    The box abstraction is used to encapsulate memory frames, which can then be used to transfer I/O data
between DomU and an associated disk device in the confidential manner described above. A page box has
the following semantics:

   • The original owner domain creates a box by donating a memory frame and specifying the keyholder
     domain. If the original owner is not the keyholder, it must relinquish any mappings to that frame.
     Ownership can be granted to another domain (see below).
   • To map a box for reading or writing, the mapping domain must be both an owner and the keyholder
     for that box.
   • A domain may instruct the hypervisor to copy or move data between boxes, if and only if (i) that
     domain owns both boxes, and (ii) the keyholder for both boxes is the same.
   • The owner can arrange for DMA between a device and the box, if and only if that device is associated
     with the keyholder domain.
   • When a box is no longer in use (there are no mappings to it) the original owner can destroy it. The
     boxed memory frame is scrubbed and returned to the original owner.

    Page boxing complements Xen’s grant table mechanism. The grant tables allow VMs to share or transfer
memory between each other [4]. They are used to transfer bulk data in split I/O devices. We propose a grant
table extension that allows boxes to be shared between domains. Granting a box confers ownership of the
box on the grantee. If a box is granted to a domain, and the domain is also the keyholder for the box, it is
allowed to map the encapsulated page for reading and writing.
    Figure 1 shows how disk I/O with boxed pages can maintain confidentiality. DomU initiates a disk write
by boxing a page of data to be written out (1) and granting it to Dom0 (2). The disk device driver in Dom0
may first move the data between locked boxes (3), before finally programming the disk controller to perform
a DMA read from the boxed memory (4). The hypervisor programs the IOMMU to guarantee that the boxed
memory can only be written out to a device that is associated with the box keyholder domain.

3    Self-bootstrapping
We use a second technique, called self-bootstrapping, to ensure that Dom0 cannot collude with a malicious
DomV to read DomU’s data. Obviously, Dom0 must not be responsible for loading DomU’s OS, because it
could substitute a malicious kernel that leaks sensitive data (e.g. using a network connection). The solution
is to store the OS in a location that only DomU can access: on a disk belonging to DomU.
     Since only DomU can access this data, it must bootstrap itself. We use a modified version of the disag-
gregated VM builder (DomB), which runs in a separate domain from Dom0 [6]. Instead of passing DomB
a kernel image, Dom0 passes a disk identifier for the VM that is to be started. DomB issues a privileged
hypercall that instructs the IOMMU to associate the specified disk with the newly created domain. (This
hypercall also ensures that the disk has not already been associated with a VM, to prevent Dom0 instigating
corruption through two VMs simultaneously writing to the same disk.)
     DomB then loads a bootloader environment into the new VM. This is a small piece of code, based on
GRUB [13] (but running in protected mode), that runs in DomU and reads a domain configuration file for
DomU from its associated disk. From there, it can identify the kernel, any modules and any command-line
parameters that should be used. It loads these before handing control over to the actual DomU kernel.
     If we assume that each VM can only be associated with one disk (and enforce this in the hypervisor),
this prevents a malicious DomV from accessing DomU’s disk, as a given VM can only access the disk
from which it boots. A more sophisticated approach would be required if multiple disks are permitted.
Self-bootstrapping also creates its own bootstrapping problem: if only DomU can write to a disk, how does
DomU’s operating system get written to that disk? We solve this by extending the bootloader environment
with PXE support, which enables installation over the network [8].

4    Conclusions and related work
DMA has long been the elephant in the corner for secure virtualisation, and IOMMUs are often proposed
as the solution. The are two main approaches: one that uses the IOMMU to give devices direct access
to VMs running unmodified OSs [2, 11], and another that limits the DMA capability of management and
driver domains to their own memory [4, 5, 6]. However, both of these approaches give devices access to the
whole of a VM’s address space. Page boxing limits the scope of DMA to explicitly nominated pages, and
thereby gives better fault isolation. Retaining split drivers, rather than giving direct device access, also has
the advantage of portability and guest simplicity, by removing device-specific code [7].
     Our approach has some similarity to Overshadow, which uses a VMM to protect user process data from
a malicious kernel [3]. Overshadow uses encryption together with shadow page tables to guarantee the
confidentiality of this data; however, it incurs a severe burden on the CPU, which harms the performance of
applications with a high userspace-kernel data rate. Furthermore, it is not aimed at multiple-VM systems.
By contrast, we intentionally avoid encryption, and our approach is designed to work well on multiple VMs.
     The trusted platform module (TPM) enables another hardware-based approach to providing confiden-
tiality. Technologies such as BitLocker [12] and VPFS [15] use the TPM to provide trusted storage for a
symmetric encryption key, which is then used to encrypt all data that is written to disk. BitLocker only
measures the boot process, and does not protect against post-boot attacks on the OS. VPFS evades this issue
by insisting that client applications are rewritten to run directly on a microkernel. Our approach defends
against post-boot attacks on Dom0, and supports unmodified applications running on a commodity OS. We
could, however, use the dynamic root-of-trust feature in modern chipsets to measure the hypervisor, and
launch control policies to mandate that it supports page boxing [9, 10].
     In conclusion, page boxing represents a novel use for the IOMMUs that are starting to appear in com-
modity hardware. We have shown how they can be used to enforce data confidentiality in virtualised envi-
ronments, without the need for encryption.

 [1] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian
     Pratt, and Andrew Warfield. Xen and the art of virtualization. In SOSP ’03: Proceedings of the
     nineteenth ACM symposium on Operating systems principles, pages 164–177, New York, NY, USA,
     2003. ACM.

 [2] Muli Ben-Yehuda, Jon Mason, Orran Krieger, Jimi Xenidis, Leendert Van Doorn, Asit Mallick, Jun
     Nakajima, and Elsie Wahlig. Utilizing IOMMUs for Virtualization in Linux and Xen. In Proceedings
     of the 2006 Ottawa Linux Symposium, 2006.

 [3] Xiaoxin Chen, Tal Garfinkel, E. Christopher Lewis, Pratap Subrahmanyam, Carl A. Waldspurger,
     Dan Boneh, Jeffrey Dwoskin, and Dan R.K. Ports. Overshadow: a virtualization-based approach
     to retrofitting protection in commodity operating systems. In ASPLOS XIII: Proceedings of the 13th
     international conference on Architectural support for programming languages and operating systems,
     pages 2–13, New York, NY, USA, 2008. ACM.

 [4] Keir Fraser, Steven Hand, Ian Pratt, Andrew Warfield, Rolf Neugebauer, and Mark Williamson. Safe
     hardware access with the Xen virtual machine monitor. In Proceedings of the First Workshop on
     Operating System and Architectural Support for the on demand IT Infrastructure (OASIS-2004), 2004.

 [5] Hermann H¨ rtig. Security architectures revisited. In EW10: Proceedings of the 10th workshop on
     ACM SIGOPS European workshop, pages 16–23, New York, NY, USA, 2002. ACM.

 [6] Derek Gordon Murray, Grzegorz Milos, and Steven Hand. Improving xen security through disaggre-
     gation. In VEE ’08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on
     Virtual execution environments, pages 151–160, New York, NY, USA, 2008. ACM.

 [7] Timothy Roscoe, Kevin Elphinstone, and Gernot Heiser. Hype and virtue. In HOTOS’07: Proceedings
     of the 11th USENIX workshop on Hot topics in operating systems, pages 1–6, Berkeley, CA, USA,
     2007. USENIX Association.

 [8] (Unattributed). Preboot Execution Environment (PXE) Specification: Version 2.1, 9 1999. http://, accessed
     9th June, 2008.

 [9] (Unattributed). Intel R Trusted Execution Technology Preliminary Architec ture Specification.
     Technical report, Intel Corporation, 2006.
     security/downloads/31516804.pdf, accessed 9th June, 2008.

[10] (Unattributed). AMD64 Architecture Programmer’s Manual Volume 2: System Programming.
     Technical report, Advanced Micro Devices, 2007.
     content type/white papers and tech docs/24593.pdf, accessed 9th June, 2008.

[11] (Unattributed). Intel R Virtualization Technology for Directed I/O: Architecture Specification.
     Technical report, Intel Corporation, 9 2007.
     computing/vptech/Intel(r) VT for Direct IO.pdf, accessed 9th June, 2008.

[12] (Unattributed). BitLocker Drive Encryption, 2008.
     en-us/windowsvista/aa905065.aspx, accessed 9th June, 2008.

[13] (Unattributed). GNU GRUB – GNU Project, 2008.,
     accessed 9th June, 2008.

[14] (Unattributed). Solaris ZFS, 2008.,
     accessed 9th June, 2008.

[15] Carsten Weinhold and Hermann H¨ rtig. Vpfs: building a virtual private file system with a small trusted
     computing base. In Eurosys ’08: Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference
     on Computer Systems 2008, pages 81–93, New York, NY, USA, 2008. ACM.


To top