Intelligent Storage Processors
Abstract
This paper provides information about storage processors for the Microsoft ® Windows® family of operating systems. It provides guidelines for designers to build next generation storage systems which provide a level of performance which is unattainable using general purpose processors and more importantly provide a hardware platform which is manageable through storage services embedded in Microsoft operating systems.
Contents
Introduction ..................................................................................................................................................... 3 Overview ................................................................................................................................................... 3 History of Network Processors ................................................................................................................. 4 What’s Different About Storage? ............................................................................................................. 5 A Closer Look at Virtualization ................................................................................................................. 7 Variations and Combinations.............................................................................................................. 8 Why Storage Processors?........................................................................................................................ 9 Logical Targets.................................................................................................................................. 10 Embedded Initiators .......................................................................................................................... 10 Advanced Storage Functions ........................................................................................................... 10 Summary of Benefits......................................................................................................................... 11 Intelligent Storage Processor Architecture ............................................................................................ 11 FibreSlice™ and Windows NT Storage Architecture ............................................................................ 12 Conclusion .............................................................................................................................................. 13
Intelligent Storage Processors - 2
Windows Hardware Engineering Conference WinHEC Sponsors’ Disclaimer: The contents of this document have not been authored or confirmed by Microsoft or the WinHEC conference co-sponsors (hereinafter “WinHEC Sponsors”). Accordingly, the information contained in this document does not necessarily represent the views of the WinHEC Sponsors and the WinHEC Sponsors cannot make any representation concerning its accuracy. THE WinHEC SPONSORS MAKE NO WARRANTIES, EXPRESS OR IMPLIED, WITH RESPECT TO THIS INFORMATION. Microsoft, Windows, Windows Server, and Windows NT are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Other product and company names mentioned herein may be the trademarks of their respective owners.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 3
Introduction
This paper provides information about storage processors, a new class of silicon device dedicated to accelerating storage commands and providing management capabilities which are not possible with general purpose processors. It compares and contrasts storage processors with network processors, and further illustrates how not all storage processors are equal. It also explores the concepts of storage virtualization with respect to storage processors, and how they are a key component for designing RAID and Virtualization frameworks. Finally, it delves into how storage processor technology can be leveraged with Microsoft Enterprise Storage features to differentiate and add value to products designed with both.
Overview
Until recently, designing storage frameworks included the use of discreet components such as off-the-shelf generic processors and firmware to meet competitive standards of performance and manageability. With the advent of new drive design, packaging and management software, this traditional approach is being supplanted with a new class of storage processor to meet the new demands of a network centric storage model. Many new storage processor devices are now becoming available on the market, but not all storage processors are created equal. Most of these emerging products have a traditional network centric view of storage and tend to look at storage networks as a symmetric network. In reality, storage networks are asymmetric. Hosts and disks are not peers, as their behavior is very different. To meet the evolving demands of storage and the adoption of storage virtualization, a new class of Intelligent Storage Processor (ISP) has appeared as a fundamental building block. Unlike their cousins in Local Area Network switches and routers, however, these devices are optimized for dealing with the complexities of a Storage Area Network (SAN) environment. To achieve maximum benefits in a storage network, a storage processor must be able to do much more than lookup destinations, forward frames, probe packets, manipulate data and gather statistics. In a traditional sense, storage controllers accept read and write commands sent to a logical volume, then handle the details of mapping those requests to physical storage devices. This RAID functionality is in essence the simplest form of virtualization. When we talk about storage virtualization today, we're extending the RAID functionality and LUN management functionality found in storage controllers to operate within network environments. Storage virtualization is a powerful tool for exploiting the storage architectures of the future. Storage or server assets can be added, moved, changed and modified without affecting the entire system. Storage resources can then behave more like a true utility. By representing resources logically, they can be pooled, provisioned, and deployed with greater flexibility. Mirroring, Snapshots, and replication can be used to eliminate single points of failure. This paper highlights the unique requirements of storage virtualization, explores some of the various applications that can benefit from virtualization technology, and outlines how the Aristos Logic FibreSlice architecture can be uniquely leveraged by the Window NT® storage architecture to provide enhanced functionality such as: Hardware acceleration of storage applications Real time data collection
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 4
Real time eventing Exceptional performance Exceptional manageability
History of Network Processors
Network processors first appeared in high-end Local Area Network (LAN) switches, where their greater performance and lower cost made them the clear successor to dedicated microprocessors running custom firmware, or proprietary ApplicationSpecific Integrated Circuits (ASICs). In fact, it was the ability of network processors to combine the speed and performance of ASICs with the programming flexibility of microprocessors that made them so successful. Network processors were able to achieve this desirable balance because the computational tasks required of switch ports in a LAN environment are well understood. LAN network processors usually have microengines dedicated to the core LAN switching functions – Classification, Lookup, Data Manipulation, Queue Management, Filter/Forward, and Control/Management. These basic operations are performed on each arriving packet, using pipelined architectures to achieve desired performance. In Figure 1, network processors and a fast switching fabric are all that is necessary to construct a high performance layer1/2/3/4 LAN switch. The network processors are programmed to provide hardware assists, and sometimes complete offloads, for functions such as IP Security (IPSec), Virtual Local Area Networks (VLAN), Differentiated Services (DiffServ), Multi-Protocol Label Switching (MLPS), and many others.
Figure 1: Typical Network processor Based LAN Switch Architecture
Figure 2 shows the inner structure of a LAN Network processor, including the streaming packet and memory interfaces.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 5
Figure 2: Functional Blocks and Interfaces in a LAN Network processor
As you can see from Figure 2, a LAN network processor is optimized for supporting the high-overhead LAN services associated with security, classification, filtering, and routing at wire-speed frame forwarding rates.
What’s Different About Storage?
It is tempting to imagine that all of the sophistication and advantages that have gone into LAN network processors can be directly applied to storage area networks. At first glance, it certainly seems possible. Both networks use PHYs running at roughly the same speeds, both use switches, both require destination lookup and frame forwarding, both have requirements for packet probing, manipulation, and the gathering of statistical data. In fact, many of the functions are transferable, however a LAN network processor is simply not equipped to deal efficiently with the unique requirements of the storage data path. This requires something new: an Intelligent Storage Processor (ISP). An Intelligent Storage Processor offloads a generic CPU from processing performance critical read/write commands, accelerates storage applications and provides manageability features. The FibreSliceTM from Aristos Logic is the first Intelligent Storage Processor in the industry. To appreciate how storage imposes new requirements on the functional blocks involved in data transfer, compare the component-level diagram of typical networked storage architecture in Figure 3 with the LAN equivalent in Figure 1. In the LAN network processor, it was possible to identify classifier, lookup and scheduler functions because a LAN switch performs these functions on every port. It is completely symmetrical, and from the perspective of the LAN switch, its role is to optimize the efficiency of the frame forwarding function between ports of equivalent peers. In the storage architecture, ports support either servers (shown as Host Bus Adapters, or HBAs) or storage (shown as the combination of a storage controller and associated physical disk drives). The servers and storage are not peers. In fact their relationship is an asymmetrical relationship of initiators (the HBAs) and targets (the Storage), where behaviors and system roles vary widely.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 6
Figure 3: Typical Networked Storage (SAN) Architecture
When projecting the requirements of a storage processor on this model, it becomes apparent that the features and optimizations necessary to provide analogous benefits to the LAN architectures will vary depending on (1) where in the data path the ISP is placed, and (2) what storage-specific roles the ISP will be called on to offload and assist. In modern RAID architectures, for example, the storage controller is responsible for a high degree of storage-specific intelligence called command termination and redirection. Command termination means the storage controller acts as a virtual target, literally intercepting read and write commands as though it were a disk drive. The storage controller then creates one or more new commands based on the "real" commands it has terminated. When the storage controller redirects the new commands (sends them to a physical device), it is acting as a virtual initiator. Command termination is necessary to support virtualization functions where the performance, protection, and redundancy provided through striping and mirroring is desired. Once a storage controller capable of RAID functionality terminates a command, it must then apply the appropriate RAID algorithm, based on the type of command and RAID architecture in effect. For example, a write command to a RAID 1 (mirrored) LUN will result in the termination of the write command, followed by the redirection of two new write commands, each targeted at one of the volumes in the mirrored set. The original write command will then be acknowledged by the storage controller, but only after each physical drive in the mirrored set has acknowledged their "logical initiators" in the storage controller. In addition to command termination and RAID functionality, the storage controller often provides enhanced performance through intelligent cache management. A cache is implemented in fast memory at the virtual volume level. The storage controller then uses the virtual initiator capability to populate to cache with data that is statistically most likely to be requested by the real initiators. A well-designed caching algorithm can often place the required data from a physical device in the storage controller cache memory before it is requested, eliminating the need to resource new commands to the slower physical devices. Writes can also be acknowledged faster if the write updates a block locked in cache. To protect against data corruption, cache memories on storage controllers must take precautions to
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 7
assure data persistence, even in the event of power loss to volatile memory. This could be accomplished through battery back-up, or write-through to the physical device, with the cached copy marked as dirty until an acknowledgement has been received. It should be apparent from these examples that simply classifying and moving frames in and out of the storage controller at high speeds is not enough to duplicate these sophisticated functions. Moving even closer to the physical drives, spindle management becomes the most important optimization function. Disk drive rotational latencies and seek times are so large relative to command processing overhead that there are opportunities for “predictive reads” on individual drives, and striping related data across multiple drives can multiply the sustainable I/O Operations Per Second (IOPS) by aggregating many physical drives into single logical units. Initiators on the HBA side have quite different optimizations. Here, cache would be a liability since servers do not typically back up their outstanding writes with redundant battery-based power. Instead, an initiator needs to have hardware assisted Direct Memory Access (DMA) to quickly map outstanding I/O contexts. A “context” is a set of saved pointers, necessary to keep track of where in-host memory, in-bound data and status need to be placed. Without the pipelining made possible by this approach, host processors would be bogged down processing the constant interrupts and context switching associated with hundreds of outstanding storage I/Os.
A Closer Look at Virtualization
Virtualization is nothing more than a storage service that presents a logical view of storage instead of a device-bound physical view. To understand why this concept is important, we reference the Microsoft Virtual Disk Services Model (Figure 4). The core of any virtualization service is the block aggregation function. This function can be thought of as "blocks in, blocks out." Physical devices such as disk drives and storage controllers provide the physical persistent storage (blocks in), and a mapping function creates a new linear array of blocks, which may be accessed in the same manner, using the same protocol, as a physical storage device. Since the new "device" is actually only a termination point and redirection service for storage access protocols, the term "Virtualization" has come into popular usage to describe the function.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 8
Figure 4: The Microsoft Virtual Disk Services Model
Variations and Combinations
In the shared storage model, virtualization can be performed in the host (by a software based volume manager or RAID-enabled HBA), in the network, or in the device. A storage controller is an example of device-based virtualization. Interesting variations on these generic approaches are possible. For example, an embeddable software-based volume manager can be hosted in a fabric. Similarly, a fabric can embed a storage controller or server blades. Host bus adapters may perform some functions of a host-based volume manager application (e.g. LUN Masking). No matter where virtualization is performed, an Intelligent Storage Processor-based solution must be designed to support: The fundamental storage services Mapping Pooling
The performance and availability services Read and Write caching Command termination Redirection
Additionally, support should include intelligent assists for the more advanced storage data services applications: Snapshot Migration Remote copy
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 9
These services and features are the fundamental elements of all sophisticated storage implementations, from RAID to virtualization in a SAN.
Why Storage Processors?
Recall that a LAN network processor created value by identifying and optimizing the LAN functions that sat as potential bottlenecks in the network critical data path. The storage-critical function of virtualization, however, is not technically in the data path. Through command termination and redirection, it both terminates one path and then initiates one or more different paths. This requires a completely different set of dedicated functions, and a different conceptual framework for the designer. At this point, we have contrasted the functional blocks in a LAN network processor with the different optimization requirements present in the storage data path. We have also shown that virtualization is the layer (or layers) providing a logical view of storage, and we have noted that the data processing requirements of a networked storage architecture are highly dependent on where in the system the processing element is placed. Those requirements are highly differentiated for efficient initiators, network elements, and target controllers. In the last section, we identified the building blocks that are distributed or shared among all storage processing functions, mapped those functions to RAID controllers, and showed that RAID is simply a specialized form of virtualization. It is precisely these functions that should be optimized in the design of an Intelligent Storage Processor. A generic Intelligent Storage Processor must be flexible enough to handle multiple storage applications, but specialized enough to provide the functional "building blocks" necessary to offload critical storage applications and services. Figure 5 shows a generalized ISP architecture, which may be contrasted and compared with the LAN network processor architecture in figure 1. Reflecting the asymmetrical nature of storage networks, each port on an ISP may be routed (or in some designs, hardwired) to either logical targets or an embedded initiator.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 10
Figure 5: Intelligent Storage Processor (ISP) Architecture
Logical Targets
Servers access logical volumes through a host bus adapter, which, in the language of storage protocol, is an Initiator. Initiators are responsible for "initiating" I/O operations such as read and write, and maintaining current state information until the operation is successfully completed. The burden of error recovery and retrying commands is also an initiator function. At the ISP, an initiator will always be routed to a port processor running Target emulation code. Targets are devices that respond to initiator requests with either data or status. Targets operate asynchronously, and are not required to maintain current state information for the duration of the I/O operation. Target Emulation has the effect of making the port respond to requests for data and status exactly as a physical storage device would. The emulated logical volume (often referred to by its Logical Unit Number, or LUN) may map directly to a physical device, or may aggregate multiple physical devices. Logical volumes may also be supported by cache memory, RAID configurations, and other functions designed to enhance the performance or reliability of the logical volume. The logical-to-physical mapping structures and services available to the resulting logical volume define and bound the benefits available from virtualization.
Embedded Initiators
Another core function in the ISP is the Embedded Initiator. This functional block has the same protocol requirements as a host bus adapter - it must initiate I/O operations to external targets and track them to completion. The embedded initiator supports the "back end" of the virtualization function, redirecting write operations or requesting the physical blocks from external devices once the logical to physical mapping has been resolved. All initiators are not created equal. The process of "reverse mapping" the physical to logical data flow may be supported by specialized hardware engines capable of fast context switching and DMA, or slower software parsing in buffer memory may be used. Initiators may also be responsible for Discovery of new block storage devices, recognition of device moves, failover path management, error detection, and command retries.
Advanced Storage Functions
Just as a LAN network processor may provide benefits to the designer by optimizing advanced functions such as IPSec and packet classification, ISPs can provide
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 11
similar functional blocks geared towards the unique requirements of storage networking. The Storage equivalent of the advanced functions found in LAN network processors includes caching, snapshot, migration, remote copy, storage pooling through RAID mapping, and RAID redundancy. ISPs like FibreSlice can even support cluster management with designed-in support for bound, redundant controllers. Together, these features take the guesswork out of designing advanced storage products. For example, one of the trickiest disciplines of storage controller design is cache management. Well-designed and properly optimized cache controllers can significantly increase storage performance, while a poorly designed caching algorithm may actually degrade performance. With zero-tolerance for data loss increasingly becoming a standard requirement, and with the availability of low-cost disk drives, RAID protection of logical volumes is no longer a system luxury. With RAID quickly becoming a check-off item, the system designer can accelerate time-to-market and improve system performance by selecting an ISP that includes flexible RAID functionality. Things to consider include hardware support for the XOR function, RAID levels supported, and rebuild assists. ISPs may also include advanced services for management and monitoring of storage resources. With storage provisioning moving towards a policy-driven model, the ability to gather statistical and real-time storage metrics from physical devices, logical volumes, and ports will become important considerations.
Summary of Benefits
By choosing to leverage an Intelligent Storage Processor in your network storage product family, you get optimization of sophisticated storage functions without the need for specialized in-house expertise. Designers can focus on value and differentiators, while achieving lower system cost, tighter integration, and higher performance than discrete designs.
Intelligent Storage Processor Architecture
Aristos Logic’s proprietary Intelligent Storage Processor Technology, is a scalable fabric of programmable processing elements and memory data structure controllers that allows flexible configuration and re-purposing of a single architecture for multiple SAN product implementations. Figure 6 shows the storage datapath through FibreSlice, from fabric-attached hosts to external storage, with all system interfaces. By comparing figure 6 to the LAN network processor in Figure 2, you can see that FibreSlice is architected to provide dedicated support for the functions most critical to high-performance logical volume virtualization, while still providing the external interfaces necessary to support a wide variety of system designs.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 12
Figure 6: Functional Blocks and Interfaces in FibreSlice
FibreSlice™ and Windows NT Storage Architecture
Aristos Logic is working to leverage the enterprise storage related features in Windows Server™ 2003 to ensure that all the hardware features of FibreSlice are easily leveraged by Windows storage applications. The range of work being done includes: Leveraging the Storport driver model to ensure that the software can drive I/O to obtain a maximum throughput and leverage all the hardware acceleration features of the FibreSlice architecture Leveraging the extensibility provided by Windows Management Initiative – WMI allows an IHV to provide product differentiation by surfacing additional instrumentation including data and events classes. These dynamically created classes can be discovered and utilized by WMI aware systems management applications Leveraging the Volume Shadow Copy Service (VSS) to provide storage and storage management applications platform integrated access to the snapshot features of the FibreSlice architecture Leveraging the Virtual Disk Service (VDS) to provide storage and storage management applications integrated platform access to storage in an application meaningful way (Define a RAID 5 volume of 10GB size; allocate a RAID 1 volume of 20 GB size; etc) Leveraging the power of Multi Path I/O Development kit to utilize the multi path capabilities of Fibre Slice and provide
WinHEC 2003 Microsoft Windows Hardware Engineering Conference
Intelligent Storage Processors - 13
Conclusion
This paper explored the differences between network processors and storage processors and also detailed some important concepts of storage virtualization. Furthermore, it went on to explore how the Microsoft VDS and VSS architectures fit into this world of storage processors and how Aristos Logic is using the Windows NT enterprise storage features to differentiate their FibreSlice device.
WinHEC 2003 Microsoft Windows Hardware Engineering Conference