A Fast Rejuvenation Technique for Server Consolidation with ...

Reviews
A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines Kenichi Kourai Shigeru Chiba Tokyo Institute of Technology Server consolidation with VMs  Server consolidation is widely carried out   Multiple server machines are integrated on one physical machine Recently, using virtual machines (VM) Multiplexing resources VM VM ... VMM hardware  VMs are run on a virtual machine monitor (VMM)  Software aging of VMMs  Software aging of a VMM is critical  Software aging is... • The phenomenon that software state degrades with time • E.g. exhaustion of system resources  Software aging of a VMM affects all VMs on it • E.g. performance degradation VM VM ... VMM Software rejuvenation of VMMs  Preventive maintenance   Performed before software aging of a VMM affects its VMs Occasionally stops a VMM, cleans its internal state, and restarts it Cleans the internal state automatically and completely The easiest way  Typical example: rebooting a VMM   Drawbacks (1/2): Increasing service downtime  The VMM reboot needs:  Rebooting all OSes running on the VMs • The time tends to be long • Larger number of VMs • Longer startup time of services VM OS OS VMM ...  A hardware reset • The BIOS power-on self test is time-consuming OS shutdown VMM shutdown hardware reset VMM boot OS boot Drawbacks (2/2): Performance degradation  The file cache is lost by the OS reboot  OSes cannot restore performance until the file cache is re-filled • They strongly rely on the file cache to speed up file accesses  The time tends to be long • The file cache size is increasing • Large amount of memory for a VM • Free memory as the file cache OS process file cache disk Warm-VM reboot  Fast rejuvenation technique  Efficiently reboots only a VMM • The VMM reboot causes no OS reboot  Basic idea • Suspend all VMs before the VMM reboot • Resume them after the reboot  Challenge • How does a VMM efficiently deal with the large memory images of VMs? On-memory suspend of VMs  Freezes the memory images of VMs on the main memory  That memory area is just reserved • The time does not depend on the memory size  Saving them into a slow disk is inefficient Suspend To RAM Traditional suspend is ACPI S4 state disk VM  ACPI S3 state for VMs   freez e main memory On-memory resume of VMs  Unfreezes the memory images preserved on the main memory  They are reused directly as the memory of VMs • No need to read them from a slow disk  The file cache of OSes is also restored • No performance degradation VM disk unfreez e main memory Quick reload of VMMs  Directly boots a new VMM without a hardware reset  The memory images of VMs are preserved through the VMM reboot • Software can keep track of them • A hardware reset does not guarantee this  A VMM is rebooted quickly • No overhead due to a hardware reset new VMM preload main memory VM old VMM Comparison with other methods  Cold-VM reboot  Needs the OS reboot A naive implementation of the warm-VM reboot • VMs are saved into a disk Cold-VM Saved-VM Warm-VM Yes No No  Saved-VM reboot  Reboot method Depend on # of VMs Depend on services Performance degradation Yes Yes No Yes No No No No Depend on mem size of VMs No Model for availability  Must consider the software rejuvenation of both a VMM and OSes  Warm-VM reboot • The OS rejuvenation is independent OS rejuvenation VMM rejuvenation OS rejuvenation  Cold-VM reboot • The OS rejuvenation is affected by the VMM rejuvenation • # of the OS rejuvenation increases VMM rejuvenation RootHammer  We have implemented the warm-VM reboot into Xen 3.0.0 VM physical  On-memory suspend/resume memory memory • Based on Xen's suspend/resume • Manages the mapping from the VM memory to the physical memory  Quick reload • Based on the kexec mechanism in Linux • Kexec for a VMM is included in the latest Xen • It is not for reusing the memory images Experiments  Examine that the warm-VM reboot reduces downtime and performance degradation  Comparison • Cold-VM reboot with the OS reboot • Saved-VM reboot using Xen's suspend/resume ... server client Linux Linux VMM 2 dual-core 12 GB 15,000 rpm gigabit Opteron SDRAM SCSI disk Ethernet Linux Performance of on-memory suspend/resume  Suspend/resume of one VM with 11 GB of memory   Ours: 1 sec Xen's: 280 sec • Depends on the memory size  Suspend/resume of 11 VMs   Ours: 4 sec OS reboot: 58 sec • Depends on # of VMs Effect of quick reload VMM boot hardware reset or quick reload VMM shutdown  The time of rebooting a VMM with no VMs  70 60 50 40 30 20 10 0 Warm-VM Cold-VM Warm-VM reboot • 11 sec • The time of quick reload is negligible  Cold-VM reboot • 59 sec • The time due to a hardware reset is 48 sec Downtime of services  Warm-VM reboot  Always the same • 42 sec  Saved-VM reboot  Depends on # of VMs • 429 sec (11 VMs)  Cold-VM reboot  Affected by the service type • 157 sec (sshd) • 241 sec (JBoss) Availability of JBoss  The warm-VM reboot achieves four 9s  Assumptions • OS rejuvenation every week • 34 sec • VMM rejuvenation every 4 weeks • In 0.5 week after the last OS rejuvenation 1 week OS rejuvenation Warm-VM reboot Cold-VM reboot Saved-VM reboot 99.993% 99.985% 99.977% 0.5 week VMM rejuvenation Performance degradation  The throughput of the Apache web server   before and after the VMM reboot Warm-VM reboot • No degradation  Cold-VM reboot • Degraded by 69% Software rejuvenation in a cluster environment  Clustering achieves zero downtime  Multiple hosts can provide the same service  Let us consider the total throughput of all hosts in a cluster total throughput  Warm-VM reboot • (m-1)p mp (m-1)p 42 sec 241 sec m: # of hosts p: throughput of one host  Cold-VM reboot • (m-1)p • (m-0.69)p for a while after the reboot t Comparison with VM migration in a cluster environment  VM migration achieves nearly zero downtime  VMs are moved to another host • Xen's live migration, VMware's VMotion total throughput mp (m-1)p 42 sec 17 min  Total throughput  Normal run • (m-1)p • One host is reserved for migration t  Live migration • (m-1.12)p Related work  Microreboot [Candea et al.'04]  Reboots only a part of subcomponents • The warm-VM reboot enables rebooting only a parent component (VMM for VMs)  Checkpointing/restart [Randell '75]  Saves/restores OS processes • Similar to suspend/resume of VMs  Optimizations of suspend/resume  Incremental suspend, compression of memory images Conclusion  We proposed the warm-VM reboot  On-memory suspend/resume • Freezes/unfreezes the memory images of VMs  Quick reload • Preserves the memory images through the VMM reboot  It achieved fast rejuvenation   Downtime reduced by 83% at maximum No performance degradation

Related docs
premium docs
Other docs by Dtotheon Rabto...