Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Datacenter Technology by ghd26630

VIEWS: 39 PAGES: 32

Datacenter Technology document sample

More Info
									APPLYING ADVANCES IN DATACENTER TECHNOLOGY
          TO HIGHLY MOBILE AND ADVERSE
            ENVIRONMENTAL CONDITIONS




                               By


                      Michael J. Willetts
                 michael.willetts@gmail.com




    A Graduate Research Report Submitted for IMAT 670
   In Partial Fulfillment of the requirements of the Degree of
         Master of Science of Information Technology




         University of Maryland - University College
                        December 2008
                             TABLE OF CONTENTS
                                                        Page
ABSTRACT……………………………………………………………………………………...iv
LIST OF TABLES………………………………………………………………………………..v
LIST OF FIGURES………………………………………………………………………………vi
CHAPTER
    I. INTRODUCTION
          Need for the Study…………………………………………………………….......1
          Limitations………………………………………………………………………...1
          Assumptions……………………………………………………………………….1
          Definitions of the Terms…………………………………………………………..2
    II. REVIEW OF THE LITERATURE
          Literature Review………………………………………………………………….3
    III. Technology Review
          Environmental Systems…………………………………………………………...6
          Storage Area Networks……………………………………………………………7
          Replication Technologies………………………………………………………….8
          Snap Shot Technology…………………………………………………………….9
          Backup to Disk…………………………………………………………………...10
          De-Duplication…………………………………………………………………...10
          Blade Server Technology………………………………………………………...11
          Solid State Disks…………………………………………………………………12
          Encryption………………………………………………………………………..13
          Virtualization…………………………………………………………………….13
          Boot from SAN…………………………………………………………………..14
          WAN/IP Acceleration……………………………………………………………14




                                     ii
                     TABLE OF CONTENTS (CONT)
   IV. GATHERING THE REQUIREMENTS
        Defining the Need………………………………………………………………..15
        Climate Insensitive……………………………………………………………….15
        Operationally Flexible…………………………………………………………...16
        Highly Mobile……………………………………………………………………16
   V. THE ARCHITECTURE
        Building the Foundation…………………………………………………………18
        Modular Hardware……………………………………………………………….19
        Virtualization Framework………………………………………………………..19

        Support Services…………………………………………………………………20

        Application Development………………………………………………………..20

        Disaster Recovery………………………………………………………………..21

        The Big Picture…………………………………………………………………..22

   VI. CONCLUSION

        Conclusion……………………………………………………………………….25

REFERENCES………………………………………………………………………………….26




                              iii
                                             Abstract
Over the course of the past several years, datacenter technology has undergone some rapid
advances. In parallel to this, the world has become much more of a global community. Natural
disasters and regional conflicts among other things have placed a strain on many governments
and global organizations to react timely and effectively. Many times these missions are in
remote locations, with little to no basic infrastructure to support such major operations. In
particular the command and control requirements of these missions are by their nature very
difficult to support as there is usually very little local Information Technology (IT) support, and
Internet links into these areas are limited to expensive, unreliable, highly latent, and low
bandwidth satellite connections. This paper will address this problem by focusing on recent
datacenter innovations and how to apply them to these environments.




                                                iv
                      LIST OF TABLES
                                                     Page
Cable Connection Comparison……………………………………………………………… 12




                            v
                     LIST OF FIGURES
                                                  Page
Processing Node………………………………………………………………………………….21
Storage Head Node………………………………………………………………………………21
Storage Expansion Node…………………………………………………………………………22
IP Communications Node………………………………………………………………………..22
Interconnections…………………………………………………………………………….……23




                           vi
                                                                                                 1




                                         CHAPTER ONE
                                        INTRODUCTION
       The rise of the information age has brought with it many advantages. Businesses and
other organizations are used to a wealth of information being available at their fingertips.
Because of this many business processes have become dependent on the automation that
Information Technology (IT) can provide. Almost everything in the modern world is connected
and automated in some way. These dependencies have given rise to large scale global
information networks and data centers. These IT dependencies have also caused some problems.


                                       Need for the Study
        In the past 10 years, the world has seen a drastic increase in missions requiring effective
highly mobile command and control. From the wars in Iraq and Afghanistan, to response to
hurricane Katrina and numerous others, the one constant to come out of these is the need for
effective command and control over the local area(Talbot, 2004). The way to do this is by
providing an IT infrastructure that can adapt to these conditions. Historically this has been hard
to do. IT has changed over the past decades from an isolated island where a person worked on a
particular problem to a collaboration centric model where information is placed on servers and
worked on by teams. Information is expected to be available at all times whenever and wherever
somebody may need it. The problems arise because this model of collaboration was designed
and implemented with a focus on a highly available datacenter, supported by a highly reliable
infrastructure, with excellent high speed communication links. These are not guaranteed in the
environments this paper is concerned with. This being said, many innovations that were made
for a static datacenter can also be applied to a highly mobile environment. This paper will
discuss the needs and requirements of these environments, provide a review of technologies and
recent advances in them that have the potential to help meet these requirements, and finally how
to bring the system together.
                                           Limitations
        This study will be limited to a high level architecture. Specifics of which technology
vendor solutions solve these problems will not be discussed and will be left for technical design
documents and trade studies. Specific requirements for environments, weights, and capacities
will also not be addressed in this document and will be left for a detailed requirements analysis.
                                          Assumptions
       The design put forward by this paper is currently nothing more than a thought experiment
by the author, utilizing his experience from past situations. The need is real; all other
requirements should be treated as assumptions by the author.
                                                                                                2


                                   Definitions of the Terms
Backups: Recovery information maintained in the event of a system failure or human error.
Blade Server: A server that has been configured to be supported in a modular, highly dense
environment.
Business Continuance Volume (BCV): A disaster recovery device in a SAN that allows for
recovery from corrupted file systems or malicious data alteration.
Continuance of Operations (COOP): A disaster recovery plan that expands standard disaster
recovery planning to include site disasters.
De-Duplication: A process by which data is compressed from a large set of copies to a single
instance, leaving pointers in the place of the original files.
Fiber Channel (FC): A low-overhead network protocol primarily utilized in storage networking.
Heating Ventilation and Air Conditioning (HVAC): A term that is used to describe the climate
control technology of a system.
Input Output Operations Per Second (IOPS): A measure of performance for disk based
subsystems that applications can specify.
Small Computer Serial Interface (SCSI): A set of standards for physically connecting computers
and peripheral devices.
Storage Area Network (SAN): Architecture to attach remote computer storage devices to servers
so that they appear locally attached.
Solid State Drive (SSD): A technology that utilizes solid state memory to store information
rather than moving magnetic platters.
Tactical: Of or pertaining to a maneuver or plan of action designed as an expedient toward
gaining a desired end or temporary advantage.
Rack Unit (U): A unit of measure used to describe the height of equipment intended for
mounting in a 19 or 23 inch rack. One rack unit is 1.75 inches high.
                                                                                               3




                                       CHAPTER TWO
                               REVIEW OF THE LITERATURE
SAN-based data replication
Barkley, P. (2004).
        This article provides an overview of different data replication techniques. It explores
host based replication, array based replication and storage network based replication and how it
integrates with information lifecycle management / tiered storage principles.
Deduplication: Stop repeating yourself.
Connor, D. (2006).
        This article discusses the advantages and disadvantages of data deduplication in an
enterprise level environment. It serves to point out the basic functionality of a deduplication
system and gives some examples of space savings.
Storage Networking Fundamentals: An Intoduction to Storage Devices, Subsystems,
Applications, Management, and File Systems (Vol 1).
Farley, M. (2004).
        This book provides a foundation for understanding Storage Area Networks, and the
technologies that support them. It covers basic Fibre Channel, SCSI, ATA, SATA, and others.
It also covers how they integrate into a storage environment with technologies like RAID,
remote copy, dynamic multipathing and more.
Microsoft Plans Roofless Data Centers.
Hoover, J. (2008).
         This article discusses Microsoft’s adoption of containerized modular datacenters and how
it is becoming a trend with businesses that require scale on demand computing.
The Box: How the Shipping Container Made the World Smaller and the World Economy
Bigger.
Levinson, M. (2006).
       This book discusses the impact of the simple shipping container on the world. In fact the
simple metal box ushered in the era of globalization since its advent in 1956. This helps provide
perspective on just how much change has occurred, and provides some history to the major new
paradigm in datacenter design.
                                                                                                 4


Storage Networking Protocol Fundamentals (Vol 2).
Long, J. (2006).
        This book provides a much deeper dive into the realm of the networking protocols that
power SANs. It covers in-depth Fiber Channel, SCSI, Ethernet, TCP/IP and how they apply to
the transport of SCSI.
VMmark: A scalable Benchmark for Virtualized Systems
Makhija, V. , Herndon B. , Smith P. , Roderick L. , Zamost E. & Anderson J. (2006).
       This article discusses the establishment of a performance specification for virtual servers.
No previous standard was in place and customers were reliant on individual application
performance. This article attempts to synthesize a benchmark to provide an objective view of the
performance of a virtual hardware platform.
DataCore Reveals Top Ten “Lessons Learned”
Marshall, D. (2006).
        A short article describing some of the lessons learned from a software user group
roundtable. It includes many insights into how to avoid many common problems in data center
infrastructure design.
Blade servers: Cutting edge No matter how you slice it, blades offer advantages over
conventional servers.
McCormick, J. (2006).
       This article provides an overview of the benefits of blade servers in the enterprise
covering scaling, costs, and management benefits.
Stanford center turns to Sun Blackbox for extra capacity.
Niccolai, J. (2008).
      Another article describing how modular containerized datacenters can be used to
accommodate on demand computing.
VMWare Infrastructure 3 Advanced Technical Design Guide and Advanced Operations
Guide.
Oglesby R., Herold S. & Laverick M.(2008).
    This book covers design and operations best practices for environments virtualized by
VMWare. The techniques from this apply to most virtualization implementations.
How Technology Failed in Iraq.
Talbot, D. (2004).
        This article illustrates and discusses the technology successes and failures in the most
recent large scale tactical military application of information technology. It serves to highlight
the need for a solution to this problem.
                                                                                           5


Capacity and Performance Overhead in Dynamic Resource Allocation to Virtual Containers.
Zhikui W., Xiaoyun Z., Pradeep P. & Singhal S. (2007).
      This article discusses a method of measuring and accounting for the overhead associated
with managing multiple applications or host servers on a single hardware platform. These
methods help predict how virtualization systems will scale with complexity.
                                                                                                 6




                                       CHAPTER THREE
                                   TECHNOLOGY REVIEW
        As with most things, the state of the art in data center design steadily progresses. Many
of the recent advances, though not necessarily intended for use in the environments discussed in
this paper, are applicable. This section will serve as a review of some of the more relevant ones.
This will not serve as an exhaustive list and will not be treated as such. This will also not serve
as an exhaustive in-depth review of each technology, but summarize the state of the art and
identify recent advances that could be utilized in highly mobile and adverse conditions.


                                     Environmental Systems
       Environmental systems and design have come a long way in the data center. In times
past when computers were more mainframe oriented, the environmental system for the data
center was designed around it. Since this time computing has become much more
commoditized. Parts are standard, and interchangeable to a certain extent. The traditional
“rackable server” was born out of this. Problems started occurring however in the early part of
the 2000’s. Servers were becoming cheaper, and their power consumption was increasing.
Couple this with a decrease in the footprint of the average server and we start to have a problem.
Many data centers were not and still are not equipped to handle this increase in power and
cooling per unit of space. Supplying power and cooling to such densely packed high
consumption servers was returning to more of an art form.
       Of course where there is a need is also great opportunity. Many infrastructure
manufacturers seized upon these needs and designed new parts and systems to support them.
Fundamental to this became the concept of increasing the efficiency of existing cooling systems.
The concepts of hot aisles and cold aisles were embraced to help maximize cooling capacity.
Power systems were scaled to higher densities. Even problems such as Ethernet cable density
were addressed with the advent of modular systems that would allow a rack to scale as needed.
Eventually this led to a shift in datacenter design principles, where power, cooling, and cabling
were as important as determining the type of server and its specifics. The role of infrastructure
manager was born.
       During this time, the server industry was not sitting back ignoring these trends. New
CPU designs from the major manufacturers and server design stratagems began to favor
performance per watt. This was caught up in the “green computing” movement. Major
manufacturers including now base major marketing campaigns on how “going green” can save
you and your company money in energy costs.
                                                                                                 7


       Some interesting side projects came out of all this. Various manufactures and companies
including Sun, Rackable, and even Microsoft came up with the concept of a data center in a box.
(Hoover, 2008) In essence they decided to take a standard sea lift shipping container, and build
a balanced data center within it. Since the datacenter is prebuilt from parts that are known, it is
possible to specify what power requirements would be necessary. All connections could be pre-
wired at a factory, with external links reaching to the outside world. Cooling requirements for
the components and cooling capacity of the HVAC system could be balanced and implemented
within the container all at increased efficiency. Excess capacity in the HVAC system would
allow a greater range of external operating environments. The thought was why continue to
build expensive fixed space datacenters, when we can do all the hard work designing and
balancing the data center, you just find someplace to park the container. This concept will have
a major impact on the design. (Niccolai, 2008)


                                     Storage Area Networks
        The concept of the Storage Area Network or SAN is not a really a new one. They have
been around for 25 years or so. Beginning back in the days of main frame computing they were
specialized storage at a specialized price. Eventually in the pursuit of greater markets, SANs
were expanded to “Open Systems”. Open Systems basically meant that things other than a
mainframe can now connect and take advantage of the possibilities provided by a SAN. But
what were these? (Farley, 2004)
   The SAN initially provided a few basic things:

   •   High Speed – As a cached disk array, it provided superior performance to their local hard
       drive based equivalents.
   •   Expandability – A SAN typically could hold more disks than any one single host.
   •   Shared Storage – A SAN opened the possibility of having the same storage device
       presented to multiple host systems.
   •   Reliability – A SAN can be configured to be highly redundant. This provides added
       layers of security any data that is placed on it.
       The features described are primarily about the storage element side of the SAN. The
network side of the SAN is often over looked. This side has also gone through many recent
innovations including:

   •   Fiber Channel (FC) – The back bone of SANs for years, it has recently seen upgrades in
       speed from 2, to 4, and now 8 and 10 Gbit/s speeds. It is highly optimized and very light
       weight incurring very little overhead, but does require a separate set of dedicated FC
       switches to provide the communication between host and storage.
   •   Fiber Channel over IP (FCIP) – FCIP is a bridging technology. It allows for FC fabrics
       in remote locations to be connected over a IP based network.
   •   iSCSI – The cost equalizer. FC is expensive; many smaller organizations want the power
       of the storage elements of the SAN, without paying the high cost of maintaining an
       expensive parallel FC network. iSCSI fits the bill here by being capable of riding an
       existing IP network and utilizing existing IP/Ethernet switch technology. It achieves this
                                                                                                 8


       bye encapsulating the SCSI storage calls in IP packets. It is not as high performance,
       being subject to additional overheads, incurring latency, and required Quality of Service
       (QoS) concerns, but it gets the job done when budgets do not allow for more expensive
       solutions.
   •   Fiber Channel over Ethernet (FCoE) – This represents the latest and greatest technology
       to enter the SAN area. It hopes to combine the best aspects of both of its predecessors.
       The low latency and established storage protocols of Fiber Channel, while having it
       reside on the same Ethernet network as the IP based communications. This effectively
       side steps the encapsulation issues of iSCSI, while providing the cost benefits of
       maintaining a single set of data center switches for both storage and user communication.
       The argument is that having 10Gb for each network is unnecessary and it would be better
       to share a connection. FCoE is so recent however that support from major manufacturers
       is only beginning to now come online. This technology for sure has a future in the
       system, but unfortunately is not possible right now.
        These features allowed the SAN to see high adoption rates in mission critical systems.
Particularly in database systems that required a high number of Input Output operations Per
Second or IOPS, and clustered systems that were configured for high availability. The SAN
however was not done. Through the course of time, it learned additional tricks to help keep
adding value. These additional features will have a large enough impact on the design that they
are broken out and discussed separately later. (Long, 2006).
        The power of the SAN will end up playing a pivotal role in the design. By centralizing
the storage to a single highly redundant location many possibilities are opened up that can be
taken advantage of.


                                    Replication Technologies
        Replication of data is a concept that has been around for a long time. From simple
backup copies to more modern versions, it is a concept that is best thought of as “Why keep only
one copy of something when you can keep two or more?” For this discussion we will focus on
replication in one particular form: Offsite replication capabilities.
         During the early part of the 2000’s there were many disasters that forced companies to re-
evaluate the security of their data. Large data centers existed and some companies had site
disaster plans in place before but events such as the September 11th terrorist attacks brought
newfound attention to the subject. It was concluded that many times a Continuance of
Operations or COOP plan was not fully developed. The result of this brought new attention to
sets of technology that specialized in replicating data from one site to another.
   In the replication world there are two basic categories:

   •   Host Based – Replication is performed by agents running on the host operating system to
       another host configured the same that serves as the target. Writes typically do not happen
       synchronously; instead it usually involves set recovery time windows. The software can
       also have problems as it is subject to all changes that affect the host as well. However it
       is relatively cheap.
                                                                                                 9


   •   Array/Appliance Based – Replication is performed by the attached storage array (SAN),
       or by dedicated inline appliances on the storage network. This solution allows for
       synchronous replication when possible as well as the asynchronous when the network or
       data change rate conditions require it. It also has the added benefit of not requiring any
       CPU cycles from the host, freeing it concentrate on its designated application tasks. It
       does have the drawback of being relatively expensive as it is usually only found on
       SANs. That being said, once the cost of a SAN is absorbed, the additional cost of
       replication capabilities for it is usually not substantial.
       Replication technologies will play a vital role in the design of this system. The very
nature of the missions dictates that there will be a risk that the mobile element of the design
could be destroyed, either by natural disaster or enemy action. The loss of the data center should
not compromise the entire mission. Replication to an alternate location will be essential.


                                     Snap Shot Technology
       Snap Shot technology is a rather recent addition, previously found only in high end SAN
systems it has gradually worked its way to be a feature of operating systems as well. Snap Shots
come in two basic varieties.

   •   Business Continuance Volumes (BCV) – A BCV is basically a fully readable/writeable
       point in time copy of an active data set. A separate set of disks are synchronized to the
       primary ones, then are split apart. A BCV has the advantage that it is a fully independent
       copy of the data set at that point in time. The disadvantage is that they are not very
       storage efficient as they require the same amount of space as the original dataset. This is
       usually a feature of a high end SAN system.
   •   Shadow Copies – Shadow copies are considered the standard form of Snap Shot
       technology. As such when most people refer to Snap Shots they are referring to this type
       of copy. They provide similar features to BCVs but do so in a more storage efficient
       manner. They do this by maintaining a fairly complex set of pointers that can be
       assembled to provide the dataset at any given point when a Snap Shot is taken. Only the
       changes to the data set are tracked and the result is a very storage efficient method of
       providing backup copies. One downside is that it is still one system, and not independent
       volumes that you can do what you wish with. Another is that the entire system is
       dependent on the pointer system and the original data set. Should one become corrupted,
       then it can cascade through the entire system. This is why Snap Shots of this type are
       explicitly not considered backups of the data.
       Snap shots have come of age. It is almost expected that any modern file system will
support this feature to increase usability and reduce strain on the backup system. BCV systems
can be utilized for the most critical systems to provide an extra level of disaster recovery, or to
provide data to auxiliary reporting systems that may require it. Snap Shots of both types could
be applied to the mission profile and will be utilized where possible in the design of the system.
(Barkley, 2004)
                                                                                                10


                                         Backup to Disk
         Backup to disk is another relatively recent technology. While backing up to floppy disks
and zip drives was common to most desktop users, in an Enterprise environment, tape has been
king. This has gradually been changing over the course of the past few years. While tape is still
necessary for offsite storage, it has proven to have numerous flaws as well. Among them is the
fact that it is very slow under certain circumstances; namely when random restores are required.
The sequential nature of the tape means that entire tapes can be traversed looking for a single
file. If it happened to be backed up streamed to multiple drives and tapes, it could be even worse
and must wait even longer.
        Lacking official statistics but going from an informal survey of backup operators, almost
90% of backup recovery requests are covered within 30 days. In the case of critical systems, you
usually restore the most recent backup available, and in most user cases, they notice the file is
missing or corrupted relatively quickly. What does this mean? It means that 90% of your
restores are performed on data that is a month old or less. This substantially shrinks the active
restore set of your backups. It also means that your active restore set is most likely to be random
restores, not sequential restores. It was also noticed that the cause of many problems in backup
systems was the mechanical nature of the tape drive/library itself. The drive needs cleaned,
robots wear out and fail, and so do the tapes themselves. It was not long until industry noticed
these problems thought that backing up to disk could solve many of them. Backing up to disk
would be fast, recoveries would be fast, and the reduced need of the active backup set would
allow the most recent subset to be maintained on disk, while long term backups could be copied
to tape and then stored or vaulted to offsite. It also eliminated the possibility of a bad tape
causing multiple problems for your backups, as these systems usually employ SAN technology
to spread out over multiple disks. Couple this with the rapid expansion of storage capacity
available on hard drives and the solution seemed perfect. Backup to disk then took on two
forms:

   •   Backup to Disk – In this form the backup software recognizes a disk as a storage location
       and will proceed to back up to it. This is usually an option in the backup software and
       may involve added cost.
   •   Tape Library Emulation – In this form the backup to disk system emulates a tape library
       when presenting itself to the backup system host. This has the benefit of usually
       requiring very little changes to existing installations. In essence you swap one tape
       library for what appears to be another, or add it as an additional “tape library” to the
       system, though with all the advantages of the disk based system.
        Backup to disk will be a necessary part of the design. Tape drives are among the most
sensitive devices in the IT world. They cannot handle dust, and the robotics of the libraries
cannot handle the constant movement. Local backups will have to be supported by backup to
disk, with offsite vaulting to tape.
                                         De-duplication
       Data de-duplication is another recent development that is closely tied to backup systems.
Storage space is always at a premium, and one of the greatest wastes of space is backups.
Because of their nature, backups are only useful if they are needed. Unfortunately you can’t
                                                                                                   11


afford to be without them should an emergency arise. Backups constantly store the same data
over and over again. Backup software tries to eliminate some of this by performing incremental,
and differentials where only subsets of the data that have changed are backed up. Unfortunately
it still does not solve the problem of multiple backup sets taking up a large amount of space. De-
duplication technology was developed to address this. Sitting in line with the backup system, it
monitors the data that passes through and only stores any given file one time. All other
references to it are then passed as pointers. A pointer consumes significantly less space than a
full copy of the file. Because of this de-duplication, a greater number of backup sets can be
maintained in the same amount of storage, or the amount of storage applied to backups can be
reduced, potentially freeing space for other applications. As physical space will be at a premium
in a mobile environment, any method of reducing the number of disks or tapes required for
backup purposes is a must. This technology will certainly be of use in the design. (Connor,
2006)


                                     Blade Server Technology
        Rack mount server technology is nothing new; it has been around for quite awhile. The
problems with it however go back to those listed in the environmental technology section. As
quantities of servers grew so did the power and cooling requirements, as well as cabling
requirements. Blade servers were designed to address these issues. The secret is not in the
server itself as they usually consist of mostly the same technology and parts as their standard
rack mount cousins. In this case the difference is in the blade enclosure itself. Typical blade
server enclosures not only encompass the server technology, but also incorporate both IP and FC
switch technology. These switches can uplink at 10Gb to the core switches in the data center
over a few strands of fiber optic cabling. The power supply connections are also limited to a
subset of redundant power supplies that power the entire enclosure. By doing the math, we are
able to compare the cabling requirements of a typical 10U blade enclosure supporting 16 servers
to that of an equivalent number of standard 1U rack mount servers.

      Connection        Blade Enclosure Supporting 16 Servers         16 1U Rack Mount Servers
Network Cat6e
Connections                                0                                      32
Network Fiber Optic
Connections                                4                                       0
SAN Fiber Optic
Connections                                8                                      32
Power Supply Cables                        6                                      32
Keyboard Video
Mouse Connections                          1                                      16
Remote Management
Connections                               2                                      16
Total                                     21                                     128


        By realizing that we also save 6U of rack space by utilizing them and it is possible to
begin to see the benefits of utilizing blade server technology. It is also worth noting that this also
                                                                                                 12


saves rack space by not requiring external IP or FC switches that would consume additional
space and require additional cabling. Power consumption efficiencies are also realized here as
no power supply is 100% efficient and there are less power supplies converting power for use in
the blade enclosure than the standard server setup. Blade servers also offer one more key
advantage. Replacement of a single server is simplified. Once the enclosure is cabled, it stays
static. Blade servers then plug into the backplane of the enclosure and all needed connections
are provided. Changing a faulty server is now reduced to a few minutes, compared to a longer
error prone process of removing all the cabling to change the server. There are drawbacks
however. Some enclosures are dependent on a single backplane to connect all servers. Should
this backplane fail it would affect more than one server. The other is the weight of a fully loaded
blade system. Depending on the mobility requirements it may not be possible to lift fully loaded.
Blade servers have a very large upside. The design will try to incorporate blades, while
overcoming these drawbacks. (McCormick, 2006)
                                         Solid State Disks
        From the very beginning hard disk drives have had moving parts. Moving parts are prone
to failure especially so when subjected to vibrations and accelerations (with the resulting abrupt
deceleration). Moving parts especially high speed moving parts also consume a large amount of
energy, and consequently produce a large amount of excess waste heat. This pretty much
describes the state of the art in modern hard drive technology. With spindle speeds ranging from
5,400 RPM to 15,000 RPM they are very fast moving parts. Hard disk drives are also among the
most replaced items in a server or SAN. Hard disk drives get along better in adverse conditions
than tapes, but not by much. They are highly sensitive to dust and heat and any number of other
environmental conditions.
        Recent advancements are providing a way to avoid many of these problems. Solid state
or flash memory has been steadily increasing in capacity over the past few years. It is now
approaching the point where it can be integrated in with existing hard drive electronics to
provide solid state disks or SSDs. What are the benefits? Solid state disks have no moving parts
as such they generate much less heat, consume less power, and are less sensitive to dust and
other particles. They also have some performance benefits. Since they are not based on rotating
disk platters, random read requests can be serviced much faster as there is no rotational seek
latency. Again however there are drawbacks to these. Capacities, though growing are not at the
same level as magnetic disk (currently around half of the common capacity), and also as with
most new technologies, it is still expensive. This places it at a very bad storage/cost ratio. There
are also problems with the write speeds of SSDs. They cannot write as fast as the fastest
magnetic disks. Fortunately this can be compensated by placing them in a cached disk system of
a SAN, where the writes are sent to high speed cache memory then de-staged to disk and the read
speed advantages are magnified. This all combines to make SSDs currently only viable in a few
certain circumstances. One of these is extremely high performance applications where cost is
not a true factor. The other in environments where we expect to be power constrained, and
subjected to a large amount of vibrations, accelerations, and other adverse environmental
conditions. In other words the environment we expect for this system. SSDs will play a key role
in the design of this system.
                                                                                                 13




                                            Encryption
       In almost any IT environment encountered today, data protection is among the more often
discussed topics. Numerous cases of lost data tapes containing personal information are
discovered, and by now I imagine most people have received a letter stating that some company
you did not know had your information has lost your information and they are terribly sorry.
Data protection in the environment we are considering is equally important. This section will not
talk about individual application level security or application level encryption. Instead it will
focus on encryption provided at more of a system level. This is in hopes to provide a base level
of security to the entire system.
   There are two main areas that we need to be concerned with for this system.

   •   Data at Rest Encryption – Data at rest encryption is fairly self explanatory. It involves
       processes and methods to secure data while it is stationary in one location. In this case on
       the server, on the user’s workstation, or on a backup tape.
   •   Data in Flight Encryption – Data in flight encryption involves protecting the data while it
       is in transit across a network, be that a storage network or an IP network.
        In this case from a systems perspective data at rest encryption will cover cases of theft or
loss of physical devices containing information from the system, limited to the object level of a
hard disk. Data in flight will be restricted to external gateways. Various encryption technologies
exist that provide these capabilities. Which ones are available to you depends on who you are.
There are government grade encryption devices available to cover both aspects. There are also
commercially available products that provide similar capabilities. We will mostly limit the
discussion in the system design to “it exists and would go here”.
                                          Virtualization
        Virtualization is nothing new. Many people state this and at a base level it is very much
true. The basics of a multi-tasking operating system have been around for some time, and at its
core virtualization is very similar. In this case a very base level operating system called a
hypervisor is placed on a server that then allocates CPU, memory and other resources to “guest”
operating systems. The theory behind this is that no (or at least very few) current modern servers
are fully utilized. This is usually backed up by statistics of server utilization and charts showing
the growth of computing power. This is even made more evident by the recent mass
commercialization of multi-core x86 processors. Very few applications are capable of
effectively utilizing two CPUs, let alone four or eight. In these cases virtualization makes
perfect sense. There is no point having CPUs sitting idle because there is no application to take
advantage of them. Better utilized servers are more efficient, and running multiple virtual
servers on a single physical server saves space. These are the most commonly touted advantages
of virtualization. Virtualization however provides another key benefit: Encapsulation.
       Encapsulation in the virtualization sense means that a physical server is reduced to or
encapsulated in a file or set of files. The power of this is often under estimated. This effectively
reduces a physical entity to a logical one. This logical entity is physical platform independent,
and easily transportable. A physical entity reduced to data. This fact allows us to apply not one
                                                                                                 14


but two sets of high availability options to any particular service. Most all of the mechanisms
previously discussed that provide data protection can now protect this virtual server as well.
Combine this with the redundancies built into the hardware and an extremely resilient system is
possible. Virtualization will play a significant role in the design of the system. (Oglesby R.,
Herold S.& Laverick M., 2008)
                                         Boot from SAN
        Part of SAN and server technology previously discussed, boot from SAN combines the
two to help form yet another layer of abstraction. Boot from SAN essentially keeps the
operating system drives of a server located on the SAN. When a server boots it accesses the
SAN, reads the drives and proceeds to boot from them. Why is this simple process relevant?
Because of the abstraction layer it provides. A server can be reconfigured to change its
identification to that of a previous server, and blade servers can even go a step further, assigning
a role to a specific slot within the enclosure. By utilizing boot from SAN a server can be
replaced with relative ease. No more swapping drives, re-imaging, or re-installing. Plug and
replace. This also allows servers to take advantage of the replication capabilities of the SAN to
provide disaster recovery. Boot from SAN will also play a role in the design of the system.
                                      WAN/IP Acceleration
        Acceleration is a very dangerous term when it comes to computers and networks. One
would think from the description above that this involves some way of making the Wide Area
Network (WAN) go faster. This is true, but not in the literal sense. This technology does not
actually magically make a reduced bandwidth location have a faster connection. What it does do
however is effectively reduce the amount of data that must traverse a network link, once that data
has already traversed that network segment once. What really happens here is that data is cached
on both sides of a slow network link. A WAN acceleration device then sits in line with the IP
network and watches for patterns. When it recognizes data pattern X, it can effectively compress
that packet by signaling to the remote paired device to reproduce data pattern X on the remote
network. This has the result of reducing complete packet payloads and in some cases files to a
single integer pointer. The amount bandwidth consumption can be reduced can be quite
significant. However this is of course dependent on how repetitive your transmitted data is. This
acceleration can apply to all forms of IP traffic. This means we can even utilize it to accelerate
our SAN replication if it is configured correctly. In our defined environment, with reliance on
potentially slow wide area connections. WAN/IP acceleration technology could prove very
useful.
                                                                                                15




                                        CHAPTER FOUR
                             GATHERING THE REQUIREMENTS
                                        Defining the need
        Requirements definition can make or break a project, it is understandable and epitomized
by the saying “garbage in, garbage out”. It is therefore essential that we are able to accurately
capture the requirements for this system in order to provide the highest probability of success.
One problem is that the very nature of these missions makes them unpredictable. It is nearly
impossible to predict every possible separate environment, or every condition that could be
encountered. You could be in a desert one month, tropical jungle the next, and then end up on a
frozen tundra. You could be in a modern city with available power systems and high speed
Internet connectivity, or you could wind up in a city that has been recently ravaged by a natural
disaster, with no functioning utilities. In some cases you could end up not in a city at all. This
poses the question then, how can you plan for an environment if you do not know what your
environment will be?
        A second problem is that it cannot be predicted what will or will not be required for any
particular mission. One operation may have you interfacing with another local government that
runs standards different than what you are used to or perhaps a technology change in
collaboration software. In any of these cases it is very difficult to plan, as each new system can
bring with it a new set of requirements including new servers and support.
        The third problem is mobility. A humanitarian crisis or a conflict can arise almost
anywhere in the world at any time. As such the system needs to be able to be transported to a
potentially remote location, and in some cases relocate within that location as needed.
   In defining the requirements for this system then it would seem that the three over arching
requirements it must be designed to are:

   •   Climate Insensitive
   •   Operationally Flexible
   •   Highly Mobile
       By breaking these areas down further and we can finally get some specific requirements
defined.
                                       Climate Insensitive
        As discussed previously, it is highly unlikely that it will be known where this system will
be sent. As such it is impossible to plan for any one particular climate. By definition then, we
must plan for all climates that could possibly be encountered within reason. The requirements
for temperature, humidity, dust, etc. are outside the scope of this paper as this is not a paper
                                                                                               16


focused on environmental control systems; however it is possible to define the operating range of
the systems internally and to maximize environmental thresholds, and then provide these to the
external environmental design team. It is also possible to specify a limit on the power
consumption assuming that a system is to be mobile and self contained, it must be able to be
powered by a mobile generator. This effectively places an upper limit on the power the system
may utilize. In order to meet these requirements it is suggested that the system be self contained
for climate, and bounded for power based off of selected generator size. The detailed power and
environmental specifications will be left for the detailed design documents.
                                     Operationally Flexible
        In order to assume any mission that may be encountered, it is necessary to make the
system as flexible as possible. It must be capable of running any modern version of an x86 or
x64 based operating system. The system needs to have sufficient storage, memory, and CPU
cycles to support a variety of missions. These resources must also be capable of being
dynamically allocated with a minimum of reconfiguration. The system must also be capable of
being upgraded easily, and in the field as required. The detailed amounts of each will be left for
the detailed design documents.
                                         Highly Mobile
       The nature of the missions this system is being designed for requires a high mobility
component. The system must be capable of fitting into standard transportation systems for rapid
deployment. These systems include standard air lift capability, sea lift capability, and ground
based transportation systems. The system must be able to endure the forces encountered during
transportation, including vibration and gravitational shocks associated with each of these
transport methods. The specific weight and volume restrictions will be determined later, but the
system should support four different configurations.

   •   Man Portable – This configuration needs to be able to be torn down, stored, and
       reassembled. The components should fit in standard modular hardened enclosures.
       Weight restrictions will be in accordance with a four man lift. Cable connections
       between hardened enclosures should be kept to a minimum. This system could be
       utilized where a tactical vehicle configuration would not be needed, and the resources
       provided by a command center would not be necessary.

   •   Tactical Vehicle - The tactical vehicle configuration would be utilized in environments
       where physical infrastructure in the environment is at a minimum, and the operations
       center requires extremely high mobility.

   •   Mobile Command Center (non-tactical) – The non-tactical mobile command center
       configuration is targeted toward environments where basic infrastructure is in place and
       would support a major operation.

   •   Mobile Command Center (tactical) – The tactical mobile command center configuration
       is targeted toward environments where basic infrastructure is not in place, but major
       operation support is required.
                                                                                            17


        Which configuration would be required depends on the nature of the mission. Each
configuration should utilize similar components, differentiated only by transport mechanism and
scale of deployment. Detailed specifications will be left to the detailed design documents.
                                                                                               18




                                       CHAPTER FIVE
                                    THE ARCHITECTURE
       With the requirements identified we can now begin to work on a high level architecture
to support them. The concepts identified earlier will play a large role in the development of the
complete system.
                                    Building the Foundation
        The beginning of any architecture starts with a good foundation. As identified by the
requirements, it will be necessary to design the system for extremes in climate and mobility. In
order to achieve this it will be necessary to build the system as a modular system. As shown in
proof of concepts and actual production designs it is possible to build a datacenter in a
containerized form. The design of the system will leverage this previous work. The foundation
will be discussed by configuration below.

   •   Man Portable – The primary concern with the man portable configuration is the
       individual component weight restriction. Any component will need to be sufficiently
       light to be lifted by a small team of personnel. Our concern here will be providing a
       modular enclosure that can protect the equipment from shocks, yet still provide easy
       access to components. Such enclosures are already in common use and will not be
       enumerated here. Power and HVAC considerations in this case are extremely limited as
       they are completely dependent on external sources, and very few cooling efficiencies will
       be achieved. It will also suffer from increased set up/stand up times due to the extra
       cabling required to interconnect the enclosures. Because of these conditions this
       configuration will be the least optimized and desirable from an efficiency stand point.

   •   Tactical Vehicle – For this configuration a standard high mobility tactical vehicle is
       modified to house a complete system. The standard HMMWV has an array of options
       available to increase and enclose the cargo space to accommodate this. This in effect
       makes the entire system mobile. By designing for a large contained system it is possible
       to build the system for a specified thermal envelope as determined by available power
       and cooling that could be integrated into the system or accompany the system on
       secondary vehicles. The setup time once in area for this configuration will be sharply
       lower as all that is required is that the vehicle drives in to the operational area and
       connect to the external network, power, and HVAC.

   •   Mobile Command Center (non-tactical) – The mobile command center (non-tactical)
       mission differs from that of the tactical vehicle in scope and required infrastructure. For
       areas that have sufficient resources such as intact roads, it is possible to design a non-
       tactical version of the mobile command center in a common commercial truck. This
                                                                                                19


       provides the benefits of the enclosed system, lower setup time, but has the drawback of
       lower mobility in the case of less developed operational areas.

   •   Mobile Command Center (tactical) – The mobile command center (tactical) needs to be
       able to provide the same level of service as its non-tactical counterpart, but must be able
       to be more easily transported to remote areas. Building in the standard shipping
       container fits this bill as the transport systems in place around the world already
       accommodate them, and numerous methods of local transport exist including military
       transport systems. It also provides the same system/modular design benefits as its non-
       tactical cousin, but is more dependent on external transportation systems. (Levinson,
       2006)
                                         Modular Hardware
        Selecting the hardware correctly should allow the system to be relatively independent of
the specified transport mechanism. As discussed in the technology review section, a
combination of standard blade servers and SAN components allows for space savings and high
flexibility. For extreme environments SSD storage elements can replace standard magnetic disk
technology to reduce weight, heat, power, and environmental sensitivity, but achieves this at
significantly elevated cost and somewhat decreased over all capacity. A single blade enclosure
correctly configured can supply all major components that used to be implemented as separate
discrete structures. The blade enclosure will serve as one basic unit of our modular system and
be considered the processing module. The second modular unit of the system will consist of the
storage elements and be considered the storage module. A SAN system provides the most
flexible storage platform available. SAN storage will differ slightly as instead of consisting of a
single module, will instead grow from a single head module, with additional storage expansion
modules. Each of these modules can be specified to a maximum thermal, electrical, and physical
envelope for the environmental systems engineers. The processing module will be rated by the
total CPU cycles and RAM of the contained servers. The modular hardware will also utilize
boot from SAN technology. This feature allows the system to recover quickly from individual
component failures. When boot from SAN is used, a physical server can be replaced quickly and
easily in a blade server environment by just swapping with a replacement server. The identity of
the server is stored on the SAN and unique identification items such as MAC and WWN are
maintained in the enclosure. The actual physical server has no identity itself but serves just as a
commodity part. The SAN also has the capability of supporting backup to disk and snapshot
technology. Fiber Channel encryption technology will be utilized to provide data at rest
encryption to secure the data on the SAN. A third modular unit will be necessary which will
consist of external connectivity systems to support the local area and connect for any reach back
services necessary. (Marshall, 2006)
                                    Virtualization Framework
        What truly allows the design to be practical is the application of a virtual framework.
With a virtual framework applied we no longer have to pre-specify a number of servers to satisfy
a particular problem set. Instead we are able to pool all resources available and allocate as
needed. This is a dramatic advantage over the traditional method of system design as there is a
significant amount of wasted resources with those models. With virtualization we are able to
                                                                                                  20


provision them on demand. The design will have to account for the overhead scaling required
for management as well. (Zhikui W., Xiaoyun Z., Pradeep P. & Singhal S., 2007)
        The virtualization framework also provides an additional significant advantage over the
traditional model. The encapsulation feature allows a server’s identity to be reduced to that of a
set of files. There is no longer a permanent physical connection between a service and a server.
Instead the virtualization framework provides an abstraction layer that insulates this. Reducing
services to data files allows for additional advantages in disaster recovery methods and models.
The shared storage nature of the SAN allows for migration of the virtual machines from one
hardware system to another with little to no down time. It also allows for the replication of the
data to remote sites for COOP considerations. Additional advantages are found when migration
of the mission is necessary as well. If an incoming unit is supplied with the same modular
system, then migration of the mission data is as simple as replicating the files to the new unit’s
storage modules. Historical retention of the data is possible by backing up the system to an
archive unit upon redeployment. The VMMark specification will help to quantify how much
virtualized performance capacity is available. (Makhija, V., Herndon B., Smith P., Roderick L.,
Zamost E. & Anderson J., 2006)
                                         Support Services
       With external links sometimes being restricted to highly latent and slow satellite uplinks,
WAN/IP acceleration will be necessary in this environment to help compensate for the
“bandwidth challenged” nature of these missions. This technology has already been tested and
proof of concepts done to show its effectiveness. Data de-duplication technology will also be
necessary to increase the available utilization percentage of the limited storage capacity.


                                     Application Deployment
        The requirements statement indicates that the nature of any mission is not always known
in advance. Since it is impossible to predict it is best to plan for as many possibilities as we can.
The virtualization frame work allows for the deployment of most x86 or x64 platform based
applications. This includes WAN/IP acceleration, and de-duplication technologies. The virtual
versions of these applications already exist. Where possible, any application should be tested in
a safe environment then replicated forward to an operational command post. Where this is not
practical templates of standard virtual server builds can be maintained for quick deployment of
base services. Virtual framework compatibility should be a perquisite of any application targeted
for this environment. As virtualization takes hold in the industry at large however virtualization
incompatibility is becoming less of a problem. Should a required application be proven to not
support virtualization, then it becomes necessary to dedicate a full blade server to the application.
This should be done as a last resort as it reduces the available resource pool for the entire system.
                                                                                                 21


                                        Disaster Recovery
        The modular and virtual nature of the system does a significant job of insulating the
system from hardware failures. Utilizing replication technology however it is possible to
maintain a separate hot spare site should it be required. This can be in a secure rear location or a
secondary mobile element. In addition, backup to disk and snap shots will be utilized to ensure
that any recovery point objectives and recovery time objectives are able to be met.
                                                                                          22


                                 The Big Picture
All of the above concepts are integrated and illustrated by the figures depicted below.
23
24
                                                                                                25




                                         CHAPTER SIX
                                         CONCLUSION
        Many times when we initially look at problems we think they are completely unique.
Applying a fairly standard set of modern datacenter architecture principles to a fairly unique
problem set reveals that this not always the case. Requirements for high mobility and climate
insensitivity can be addressed by utilizing technologies that were not originally designed to solve
these problems directly, but can be adapted to serve in the roles necessary for success. The
humble shipping container and an all terrain tactical vehicle can be modified to serve in these
roles quite adequately and is far from some dramatic engineering effort. Examples of these are
already in production and could be adapted to serve. The unknown quality of these missions can
be addressed by technology originally designed to increase efficiency in the data center through
virtualization and modularization. By applying standard datacenter design practices we are able
to architect a solution to this problem.
                                                                                             26


                                              References
Barkley, P. (2004). SAN-based data replication. Computer Technology Review. Retrieved
       November 24, 2008 from http://findarticles.com/p/articles/mi_m0BRZ/is_6_24/
       ai_n6145654
Connor, D. (2006). Deduplication: Stop Repeating yourself. Network World. Retrieved
       November 28, 2008 from http://www.networkworld.com/news/2006/091806-storage
      -deduplication.html
Farley, M. (2004). Storage Networking Fundamentals: An Intoduction to Storage Devices,
        Subsystems, Applications, Management, and File Systems (Vol 1).Cisco Press.
Hoover, J. (2008). Microsoft Plans Roofless Data Centers. Information Week. Retrieved
      December 4, 2008 from http://www.informationweek.com/news/hardware/data_centers/
      showArticle.jhtml?articleID=212201783&pgno=1&queryText=&isPrev=
Levinson, M. (2006). The Box: How the Shipping Container Made the World Smaller and the
       World Economy Bigger. Princeton, New Jersey: Princeton University Press
Long, J. (2006). Storage Networking Protocol Fundamentals (Vol 2). Cisco Press
Makhija, V., Herndon B., Smith P., Roderick L., Zamost E. & Anderson J. (2006).
      VMmark: A scalable Benchmark for Virtualized Systems VMWare. Retrieved November
      24, 2008 from http://www.vmware.com/pdf/vmmark_intro.pdf
Marshall, D. (2006). DataCore Reveals Top Ten “Lessons Learned” Infoworld. Retrieved
      November 24, 2008 from http://weblog.infoworld.com/virtualization/archives/2006/06/
      datacore_reveal.html
McCormick, J. (2006). Blade servers: Cutting edge No matter how you slice it, blades offer
     advantages over conventional servers. Government Computer News Retrieved
     November 24, 2008 from http://www.gcn.com/print/25_11/40644-1.html#
Niccolai, J. (2008). Stanford center turns to Sun Blackbox for extra capacity. Infoworld.
       Retrieved December 4, 2008 from http://www.infoworld.com/article/08/01/30/Stanford-
       center-turns-to-Sun-Blackbox_1.html
Oglesby R., Herold S. & Laverick M.(2008). VMWare Infrastructure 3 Advanced Technical
      Design Guide and Advanced Operations Guide. Mattoon, Illinois: United Graphics Inc
Talbot, D. (2004). How Technology Failed in Iraq. MIT. Retrieved November 25, 2008 from
        http://www.technologyreview.com/computing/13893/page1/
Zhikui W., Xiaoyun Z., Pradeep P. & Singhal S. (2007). Capacity and Performance
       Overhead in Dynamic Resource Allocation to Virtual Containers. IEEE. Retrieved
       November 25, 2008 from http://www.hpl.hp.com/techreports/2007/HPL-2007-67.pdf

								
To top