Role of “Storage Virtualization” in “Cloud Storage”. 1. Introduction 2

Document Sample
Role of “Storage Virtualization” in “Cloud Storage”. 1. Introduction 2 Powered By Docstoc
					Jagadish Paranjape                                                     
                                                               1. Server Virtualization:
                                                               Hardware      resources     are   time
     (Computer Science 555 - Research Paper)
                                                      multiplexed to give appearance of dedicated
        Role of “Storage Virtualization” in
                                                      hardware resources to the Operating Systems
                 “Cloud Storage”.
                                                      running      within    a    ‘Virtual   Machine’.
                                                      Virtualization allows sharing of hardware
1.      Introduction:                                 resources. Operating Systems running within
         Server Virtualization is seen as enabling    the virtual machines can host application
technology for ‘Cloud Computing’. Although            servers or any other application like any
virtual machines play a key role in providing         operating system running on the real hardware
computation        as    a    service,     ‘Storage   can.[16][6]
Virtualization’ and ‘Network Virtualization’                   Virtual machines offer benefits like
technologies are critical in providing                isolation, live migration to another hardware
‘Infrastructure as a Service’ - ‘IaaS’. This paper    resource. All the instructions executed by
will focus on the storage related technologies        applications running within a VM can be
and particularly ‘Storage Virtualization’ and its     monitored and execution of potentially
importance as a complementary technology to           malicious instructions can be avoided. Since
server virtualization in realizing IaaS.              virtualized Operating Systems run at lower
         At the beginning a brief overview of         privilege levels than the Virtual Machine
how virtualization technologies are used in a         Monitors that control them, running a Virtual
‘modern data center’ is provided. Then ‘Storage       Machine is secure than running an OS directly
Virtualization’ is discussed in detail. Then some     over the         hardware.[6]
techniques like Façade[4] and VectorDot[5] for                 Thus by hosting multiple VMs on the
maintaining storage performance are discussed.        same server hardware, the capacity of the
Paper concludes with discussion on emerging           hardware ban be better utilized reducing the
ideas in IaaS.                                        total power consumption, isolation provided by
                                                      virtual machines adds security to the
                                                      applications running in different VMs, live
2.      Virtualization Technologies:                  migration of VMs enables load balancing
         Modern data centers with server              between the hardware servers. [16][1]
virtualization, storage virtualization and
network virtualization are at the core of                     2. Network Virtualization:
‘Infrastructure as a Services’ (Iaas). Data centers            Virtual networks are built using overlay
are composed of layers of resources such as           networks that can be used for creating virtual
servers, switches and shared storage. [1]             server farms within a data center.[5].Virtual
         Servers can host multiple Virtual Servers    machines on the virtual network are made
using virtualization technologies like Xen[6],        invisible to the internet. Traffic thresholds can
VMWare[16]. Network switches connect servers          be configured for the virtual machines in order
with storage devices. Data center storage can be      to restrict network bandwidth usage by that
in the form of a SAN – Storage Area Network,          VM. Virtual networks also provide additional
NAS – Network Attached Storage or a                   security to the network traffic passing through
combination of both.                                  them as it is protected from other virtual
                                                      networks in the same data center. [14]. Thus

virtual networks can be used for creating                    1. Characteristics of Storage
multiple virtual farms belonging to different                     Virtualization:
customers sharing other data center resources        3.1.1 Device Transparency:
like physical servers and storage.
         Violin[15] is one such system that allows           Physical storage devices are transparent
network virtualization. In Violin[15], A virtual     to the applications accessing the storage pool.
Local Area Network VLAN consists of virtual          Thus multiple storage devices with different
machines as virtual hosts. Virtual machines          performance, capacity and other storage
belonging to the same VLAN may reside on             attributes can be part of the same storage pool .
different physical servers. Virtual hosts are        [5].
connected to a ‘virtual switch’. Another ‘Virtual    3.1.2 Location Transparency:
Machine’ acts as a ‘Virtual Router’ with multiple
                                                             The physical location where the data
virtualized network interface cards (VNIC) and
                                                     actually resides is transparent to the
connects multiple VLANs. [15].
                                                     applications. The virtualization software
                                                     manages the mapping between the apparent
3.      Storage Virtualization:                      data location and the actual physical location
         Lumb et al. point out that the concept      where the data gets stored.[5].
of storage virtualization existed since IBM MVS
                                                              2. Types of Storage virtualization:
mainframe operating system [2]. Storage
                                                     3.2.1 File Level Virtualization:
virtualization is an abstraction that gives a
unified view of underlying heterogeneous                      This is more traditional way of storage
physical storage devices (a storage pool [5]).The    virtualization. This is implemented by using
virtualization layer divides the physical storage    Distributed File Systems with a root DFS server
into logical units and provides indirection          maintaining the information necessary for
(mapping) between the logical storage location       resolving the file names to the ‘appropriate
in      to     physical     storage      location.   physical storage locations’. [8] Using File Level
         SNIA – Storage Networking Industry          Virtualization, multiple file systems are made to
Association [17] defines Storage Virtualization      appear as a single file system.[1].
as follows:                                                   NAS – Network Attached Storage
                                                     operates at the granularity of a File. NAS device
 1. “The act of abstracting, hiding or isolating
                                                     is connected to a LAN and all the client requests
the internal functions of a storage (sub)system
                                                     for files are handled by the NAS appliance. The
or service from applications, host computers,
                                                     client applications are unaware of the location
general network resources, for the purpose of
                                                     and device type of the physical storage.[17]
enabling application and network-independent
management of storage or data”. [17].
                                                     3.2.2 Block Level Virtualization:
2. “The application of virtualization to storage
services or devices for the purpose of                       This is implemented at a lower level
aggregation functions or devices, hiding             than a file. At the core of it is the abstraction of
complexity, or adding new capabilities to lower      a ‘virtual disk’ [1] [9]. Clients of a Virtual disk
level storage resources”. [17].                      (File systems, databases etc) get a view of
                                                     Virtual Disk as a raw byte storage space
                                                     composed of “blocks” just like a regular physical
 Jagadish Paranjape                                                     
disk.[Petal:9]                                         controller hardware (Eg: IBM SVC – SAN Volume
         It is the job of the storage virtualization   Controller [2]) or as a software in the switches
layer to abstract shared physical storage into         used for connecting servers with the physical
‘Virtual Disks’ and resolve ‘virtual disk’ access      storage. (Eg: EMC-Invista [3]). (Ameek et al.
requests into ‘physical storage’ access requests.      Discuss their datacenter testbed ‘Harmony’ and
The virtual address can be in the form <Virtual        storage virtualization setup in [1] in greater
Disk Identifier, Offset> and it needs to be            detail)[1].
mapped to appropriate physical address which           3.3.2 Out of Band Indirection:
can be <Server location, Physical Disk Identifier,
Offset>.[Petal:9]                                               In case of the out of band indirection
                                                       the virtualization software / hardware
         SAN – Storage Area Networks operate at
the granularity of disk blocks and hence are well      responsible for indirection is not within the data
                                                       path. The virtual servers make an explicit
suited for block level virtualization. SAN can be
used as backing store for NAS and then SAN             indirection request to the out of band
                                                       server/device for resolving the virtual address
aware clients can access the storage at block
level where as the legacy clients can continue to      of the storage and then request the physical
                                                       device for their storage needs. [8-465]. In this
access files using the NAS interface. [10].
         For providing Storage as a service, SAN       approach special software needs to be installed
                                                       at each server for communicating with out of
based block level virtualization looks more
promising as it gives appearance of a raw disk         band-device.[13]
                                                                In SoftUDC[5] server based storage
space by abstracting storage into Virtual Disks.
Virtual Disk capacities can be configured based        virtualization is used. The storage virtualization
                                                       layer (Virtual Volume Manager) is implemented
on the application needs. Also at block level
virtualization other services like storage             as part of ‘gatekeeper’ layer that sits along Xen-
                                                       VMM-Virtual Machine Monitor [6] of every
performance, availability can be built in [17].
Since NAS operates at granularity of a File,           physical server hosting multiple virtual servers.
                                                       Gatekeeper monitors all I/O and network traffic
achieving flexibility similar to that of block
based virtualization could be difficult.               and is responsible for access control and I/O
                                                       security. Virtual Volume Manager is responsible
         3. Types of Indirection:                      for mapping of the VSDs-Virtual Storage Devices
         Depending on the location where the           with the actual physical storage devices in the
indirection software layer (which converts             storage-pool.[5].
virtual disk accesses requests to the physical
storage access requests) is installed, it is           3.3.3 Indirection and scaling:
classified as ‘in-band’ indirection or ‘out of
                                                                For scaling in band indirection based
                                                       systems, additional hardware that performs
3.3.1 In band indirection:
                                                       indirection is required. The number of servers
         When the indirection takes place within       that can access the storage pool using single in-
the data path between the nodes hosting the            band indirection appliance gets restricted by the
virtual servers and the physical storage device, it    bandwidth of the indirection device. (In above
is called as in-band indirection. [8]                  example multiple IBM SVC appliances will be
         In data center environment this level of      required).[13]. Scaling of the systems using out
indirection can be provided by a special               of band indirection is easier as the bandwidth of

the data path is not constrained by the number                  5. Benefits of Storage
of servers connected to the out of band                             Virtualization:
indirection appliance. Out of band virtualization       Ease of Management: Storage virtualization
suffers from other problems. It requires a driver       simplifies the storage management task. The
module to be installed on each host OS and thus         administrator only configures the ‘Virtual Disks’
if an operating systems does not have                   and the actual storage management is handled
compatible driver module, that operating                by        the       virtualization       software.
system cannot be configured to access that              Improved Storage Utilization: Since multiple
virtualized storage. [13]                               Virtual Disks can share the same physical
                                                        storage device, there is better utilization of the
        4. Multi-dimentional Storage
             Virtualization:                            Flexible storage allocation: Storage can be
         Huang et al.[7] discuss need of a multi        dynamically allocated / unallocated based on
dimensional storage virtualization and have             application demands. Eg: For running certain
proposed ‘Stonehenge’ system. Generally                 task if a snapshot of a database is required,
storage virtualization considers dimension of           additional storage can be dynamically allocated
capacity [7] in which additional physical storage       and after the task is completed, the additional
can be added to the system without having to            resources         can          be        released.
re-configure all the system. ’Stonehenge’               Performance optimization: Using Techniques
virtualizes capacity as well as efficiency and          like “stripping” the virtual disk can be
other physical disk attributes like bandwidth,          configured to perform at a higher bandwidth
latency and       availability.[7]                      than an individual physical disk can. [1][7].
         ‘Stonehenge’ partitions physical storage       Inbuilt Reliability: Replication factor can be
into multiple ‘virtual disks’ and allows assigning      configured as part of attributes of the virtual
of disk attributes like - Availability: Expressed in    storage.
terms of degree of replication, Bandwidth:              Non-disruptive addition or removal of physical
expressed in terms of number of disk access             storage: Additional physical storage can be
requests per unit time, Capacity: Total amount          added to the storage pool or unused disks can
of user data that can be put into the virtual disk,     be removed from the pool transparently to the
Delay: Worst case end to end delay that a disk          applications using the pool. [13].
access request, Elasticity: The delay and
bandwidth characteristics of the disk expressed                 6. Load Balancing and storage
in         terms         of        probabilities.[7].               performance guarantees:
          ’Stonehenge‘ maps the attributes that                  Data Centers with capacities of
user assigns to the virtual disk to a set of            hundreds of terabytes and aggregate transfer
physical      storage     servers.     If   capacity    rates of hundreds of gigabytes/second are
requirements of a virtual disk cannot be fulfilled      possible with fast switched fabrics and large disk
by a single physical server, then the virtual disk      arrays. [4] Such Data Centers are capable of
is assigned to multiple physical storage servers.       serving storage needs of multiple organizations.
Similarly, if single physical server cannot fulfill     Typically a storage service provider (SSP-[4])
the bandwidth requirements of a virtual disk,           owns a data center and offers storage as a
then the virtual disk is ‘striped’ and partially        service      to     multiple     customers.
assigned to multiple physical servers. [7]                       The SSP provides certain performance
 Jagadish Paranjape                                                  
guarantees- “SLO: Service Level Objective” to       availability.[1].
the customer as part of “SLA: Service Level                  Ameek Et al. present ‘VectorDot’ an
Agreement”. SLO specifies capacity, availability    algorithm for selecting the server where VMs
and performance requirements that the storage       should be moved from an overloaded server.
should meet. [4] This can be seen as ‘Cloud         The algorithm analyses the current state of the
Storage Service’ as part of IaaS. Load balancing    resources in the data center. The state of the
is     critical   for   meeting     the     SLAs.   data center comprises of status (current usage
3.6.1 Real time scheduling based approach in        value, capacity, threshold value) of each of the
Façade [4]:       Lumb      et     al.    define    server node, switch node, storage node, storage
“performance isolation” as ‘the performance         virtualization appliance, network connection
experienced by a workload from a given              topology, VMs and VDisks. [1].A node
customer should not suffer because of the           (server/storage/switch) is considered to be
variations on the workloads from other              overloaded when current usage value at the
customers’[4]. Such isolation can be provided by    node exceeds the threshold value. Finding a
over provisioning of the resources to the           node where the overloaded node needs to be
customers, but results in increased costs. Other    moved is difficult to determine. Even if a server
solution could be allocating separate physical      node is less loaded , if the switch to which it is
storage resources to different customers, but       connected is overloaded then that server
this solution is inflexible and adding new          cannot be selected as a destination. The
physical storage becomes difficult and needs        VectorDot algorithm [1] categorizes above
reconfiguration.[4]                                 constraints       as    ‘multidimensional’    and
          ‘Façade’ [4] is a virtualization layer    ‘hierarchical’ and tries to determine which items
between the hosts and the storage devices and       need to be moved and to which server.
is designed for achieving ‘performance isolation’            In SoftUDC approach [5]: When a virtual
- for meeting the SLO.[4] Façade achieves this by   machine needs to be migrated from one node
real time scheduling and controlling the storage    to another node (may be as a part of load
device                   queue.                     balancing), the Virtual Volume Manager
It is based on two assumptions: [4]                 migrates the VSD – Virtual Storage Device along
a. ’Reducing the length of the device queue         with it.Since VSD only acts as storage access
reduces the latency at the device’.                 point, there is no physical movement of data.
b. ’Increasing the device queue may increase        [5]. So only the virtual to physical storage
the device throughput’.                             mapping related information contained in the
                                                    VSDs is moved along with the Virtual Server
3.6.2 Virtual Server and Virtual Storage            when it migrates from one node to another
        migration based approaches:                 node.
        A server can get overloaded due to
excessive demand generated for a shared             4.   Current trends in ‘Cloud Storage’:
resource Eg: Memory, CPU, network switches                  Live Migration of VMs within same data
and storage - disks. Virtual Machines hosting       centre (same LAN) is already used as a way of
the applications and corresponding virtual disks    load balancing. Research is being conducted in
accessed by them can be migrated from               Live Migration of VMs between data centers (in
overloaded physical server to a less loaded         Wide Area Networks)[11]. This type VM
physical server without impacting application       migration may be required for moving the VMs

between multiple cloud computing vendors                  Façade: virtual storage devices with
without disrupting the application availability. In   performance guarantees.
intra data center migration only the migration of         Conference On File And Storage
the ‘Virtual Disks’ which act as ‘access points’      Technologies 2003
[5] to storage is sufficient but in inter data            Proceedings of the 2nd USENIX
center migration, actual contents of backing          Conference on File and Storage
storage needs to be moved as well for avoiding        Technologies.
performance degradation that will arise due to  
I/O       over      the       network.        [11]    id=1090694.1090710
        Another interesting idea is ‘MetaCDN’         [5] SoftUDC: A Software-Based Data Center
[18] creating an affordable ‘Content Delivery         for Utility Computing
Network’ by aggregating cloud storage services            Mahesh Kallahalla,Mustafa Uysal,Ram
provided by multiple vendors such as Amazon’s         Swaminathan,David E.Lowell,Mike
S3 [19] and Nirvanix SDN.[20].                        Wray,Tom Christian,
                                                          Nigel Edwards, Chris I. Dalton, Frederic
                                                      Gittler. - HP Labs.
                                                          IEEE computer , Volume 37 Issue 11,
     5 . References:
                                                      [6] Paul Barham,Boris Dragovic,Keir
    [1] Aameek Singh, Madhukar Korupolu,
                                                      Fraser,Steven Hand,Tim Harris,Alex Ho,Rolf
    Dushmanta Mohapatra, IBM Almaden
    Research Center
                                                          Ian Pratt,Andrew Warfield.
        Server-Storage Virtualization: Integration
                                                          Xen and the Art of Virtualization.
    and Load Balancing in Data Centers.
                                                          ACM Symposium on Operating Systems
        Conference on High Performance
                                                      Principles, Bolton Landing, NY, USA.
    Networking and Computing
                                                          SESSION: Virtual machine monitors,
        Proceedings of the 2008 ACM/IEEE
                                                      2003 , Pages: 164 – 177
    conference on Supercomputing - Volume
    00, Austin, Texas
                                                      [7] Lan Huang,Gang Peng,Tzi-cker Chiueh
                                                          Multi-dimensional storage virtualization
     [2] IBM Storage Virtualization — Value to
                                                          ACM SIGMETRICS Performance
    you. White Paper.
                                                      Evaluation Review - Volume 32 , Issue 1 ,
    [3] EMC invista – Storage Virtualization
                                                      [8] By Chris Wolf, Erick M. Halter.
                                                          Virtualization: from the desktop to the
                                                          ISBN: 1-59059-495-9, Apress, 2005
    [4] Christopher R. Lumb, Arif Merchant,
    Guillermo A. Alvarez
Jagadish Paranjape                                              
   [9] Edward K. Lee and Chandramohan A.               12th NASA Goddard, 21st IEEE
   Thekkath                                       Conference on Mass Storage Systems and
       Petal:Dtributed Virtual Disks              Technologies (MSST)
       Digital Equipment Corporation         
       ACM SIGOPS Operating Systems Review,       mmary?doi=
   1996                                           [13] Storage Management as Means to            Cope with Exponential Information Growth
   id=248208.237157&dl=GUIDE&dl=GUIDE&i                A. Brinkmann, F. Meyer auf der Heide,
   dx=248208&part=periodical&WantType=pe          K. Salzwedel,C. Scheideler*, M. Vodisek, and
   riodical&title=ACM%20SIGOPS                    U. Rückert
   %20Operating%20Systems%20Review                     Proceedings of SSGRR, 2003
                                                  [14] Virtual Distributed Environments in a
                                                  Shared Infrastructure.
                                                       Xuxian Jiang,Dongyan Xu,Sebastien
   [10] Anupam Bhide, Anu Engineer,               Goasguen, Paul Ruth. Purdue University
   Anshuman Kanetkar, Aditya Kini Calsoft Inc,         IEEE Computer, 2005
       Christos Karamanolis, Dan Muntz, Zheng     ownload?
   Zhang HP research Labs,Gary Thunquest          doi=
   Palo Alto.                                     [15] VIOLIN: Virtual Internetworking on
       File Virtualization with DirectNFS.        Overlay Infrastructure
       Tenth Goddard Conference on Mass                Xuxian Jiang, Dongyan Xu - Purdue
   Storage Systems and Technologies 2002          University.
       Nineteenth IEEE Symposium on Mass          [16] VMWare Virtualization
   Storage Systems                               /what-is-virtualization.html
   [11] Takahiro Hirofuchi,Hirotaka               [17] SNIA – Storage Networking Industry
   Ogawa,Hidemoto Nakada,Satoshi                  Association.
   Itoh,Satoshi Sekiguchi                             Frank Bunn – VERITAS software, Nick
                                                  Simpson – Datacore Software,
        9th IEEE/ACM International Symposium          Robert Peglar – XIOtech corporation,
   on Cluster Computing and the Grid              Gene Nagle StoraAge Networking
        A Live storage Migration Mechanism        Technologies.
   Over WAN.                                          Technical Tutorial – Storage       Virtualization.
   [12] André Brinkmann , Michael                 _networking_primer/stor_virt/sniavirt.pdf
   Heidebuer , Friedhelm Meyer Auf Der            [18] James Broberg, Rajkumar Buyya, and
   Heide , Ulrich Rückert , Kay                   Zahir Tari
        Salzwedel , Mario Vodisek                      MetaCDN: Harnessing 'Storage Clouds'
        V:Drive - Costs and Benefits of an Out-   for high performance content delivery
   of-Band Storage Virtualization System               Journal of Network and Computer

Application, Volume 32 , Issue 5

[19] Amazon Simple Storage Service.

[20] Nirvanix Storage Delivery Network –
cloud storage service.

Shared By: