IDC White Paper Active Archiving for Midsize Firms Addressing the Need for Preserving Business Information 
IDC 584 I D C V E N D O R S P O T L I G H T Act ive Archiving for Midsize Firms: Addressing the Need for Preserving Business Information September 2007 Adapted from Worldwide Email Archiving Applications 20072011 Forecast and 2006 Vendor Shares: Storage Optimization, Mailbox Management, and Records Retention for eDiscovery and Compliance Drive Investments by Vivian Tero and Laura DuBois, IDC #206729 Sponsored by ProStor Systems Executive Summary Active archiving has emerged in recent years to become a business and information technology (IT) imperative. In the past, a simple backup to tape cartridge may have sufficed for making a copy of files and records to satisfy regulatory retention, preservation, and recovery requirements, but today’s organization faces far more complex needs. Information-intensive corporate environments require content-based search and fast random access while economically supporting data growth and disaster recovery requirements. Digital information takes many forms text, image, video, audio, mixed media and is generated by numerous applications, each with its own file format. Much of this information is an information asset, core to everyday business operations, especially in business and financial intelligence, but if it is not properly managed, it could present a risk for a corporation. Today’s business has high-level requirements for active data and information archiving and retrieval. Knowledge workers need easy, relevant access to strategic information in documents. Finance and accounting people need access to financial records for accounting and audit purposes. IT needs bulletproof storage that’s easy to work with, low cost, and optimized for information-rich documents created by a wide range of specific applications, from database to email. And all companies want a solution that lowers the total cost of ownership (TCO) via lower-cost hard drives and advances in information retention technology that removes archived data from expensive primary storage, reclaims storage space, and reduces the amount of data that needs to be backed up. In many cases, however, the importance of an information asset is unclear from just its filename. Backups to sequential tape cartridges are woefully inadequate and unreliable for today’s complex business and technological requirements. Backup tapes may be acceptable for making copies in case of a failure but are insufficient for most retrieval needs. Active archiving addresses this problem by delivering a random-access, hard disk drive solution, which is a fast, nonsequential solution that makes disk an economically viable approach for archiving. The issue with today’s disk-based archiving solutions is that they still need to be backed up to some type of removable media for longteer preservation of data. While disk-based archiving solutions have been available for the enterprise, the needs of the midmarket have largely gone unmet. IDC believes the time for an archival storage approach to address the midmarket is long overdue. Active archival solutions ensure the retention, preservation, and potential reuse of information and are poised to replace much of what has conventionally been thought of as "backup," which is for data protection and recovery in the event of a failure. ProStor Systems, using a random-access hard disk drive housed in a removable cartridge and its advanced RDX intelligent technologies, has the potential to become a key player in this market. ©2007 IDC 2 Active Archiving for Midsize Businesses Active archiving is nothing new at the enterprise level. Large corporations have successfully used both tape backup cartridges and fast, reliable, nonsequential archival storage technologies for a number of years. Yet all businesses large, medium-sized, and small want to increase archival storage space for email, file systems, databases, and production applications; have fast access to archived information assets; and mitigate risks to both business processes and information storage. In providing this level of service, IT people seek a cost-effective archiving solution that eliminates needless backups, reclaims expensive primary storage, and supports management of information assets. IDC defines active archiving as a process that involves policies and technologies to support how a firm wishes to manage its information and policies. Active archiving solutions consist of the following components: ! Formalized policies and processes around the management, storage, access, search, retrieval, and movement of fixed or static content. ! Software to capture the management and disposition policies around fixed content from the application layer and translate them into archival retention and disposition policies. This software provides an automated and efficient way of storing, indexing, and retrieving fixed content, including automatically archiving or moving unchanging or fixed data out of a primary system and into a secondary system, as well as moving fixed content from a less accessible secondary system back to a nearline or primary system. This bidirectional data movement is intended to support corporate retention, storage optimization, data preservation, or intended business reuse objectives. ! Storage hardware systems that support the back-end infrastructure requirements around capacity, availability, search and performance, and accessibility all within the context of the organization’s budget and operational constraints. Active archiving solutions often maintain links to the archived data so that end users can still access it via normal interfaces and paths and instantiate business policies around data migration, copying, retention, expiration, deletion, and shredding. Actively archived data is accessible without a restoration process and data is indexed, either at the full-content or file-attribute level, so that it can be easily searched in response to audits, discovery, and business use. In the past few years, active archiving using removable hard disk media has become economically and technologically practical for small and midsize enterprises (SMEs) for the following reasons: ! TCO has been significantly reduced, due in part to the lower cost of hard drives and significant advances in information retention technology. ! Utilizing an external disk drive that is connected to the computer or server, backup and archiving is simple and infinitely scalable by simply inserting one hard drive cartridge after another. Cartridges are reusable for thousands of cycles. Upgrades or expansions are equally scalable and can be budgeted for investment protection. ! The cartridge form factor makes archiving easy and practical for both users and IT generalists. Archiving can be as simple as a drag-and-drop function. In addition, there are many available archiving applications that can take advantage of intelligent active archiving with removable disk cartridges. ©2007 IDC 3 ! Archiving technology supports complete application integration, both custom and commercial, using standard physical interfaces such as Ethernet and standard network protocols such as CIFS and NFS. ! Active archiving satisfies business records, financial reporting, compliance, and legal and regulatory requirements for access to specific documents within specific time periods. ! 2.5in. SATA hard disk drives in ruggedized cartridges provide fast, tiered storage, with the capability to archive over 100GB of data in less than an hour. As disk drive storage capacity increases, newer cartridges remain forward-and backward-compatible. ! Removable cartridge form factors make it simple to achieve offsite archival storage and backup protection. Currently, cartridges are available in five capacities, from 40GB to 300GB. In 2008, capacity will increase to at least 500GB. All cartridges remain fully forward-and backwardcompaatible Changing Business Landscape Drives Need for Active Archiving In the drive to provide high-quality service for the business while constantly striving for lower operating costs, IT departments have a vested interest in faster, more reliable, and less expensive archival technologies. However, in many midsize businesses, there may not be an IT organization as such and the businesspeople must decide on the best archival technology and, in most cases, become the users/operators as well. To their credit, midsize companies understand that information is the lifeblood of their businesses. Knowing what information assets the archival system is designed to protect and how it is designed to protect them helps simplify the decision-making process. Both midsize and smaller businesses should consider the following in their information archiving: ! Regulatory issues such as Sarbanes-Oxley, Gramm-Leach-Bliley, and HIPAA and business compliance issues such as transaction records, accounting ledgers, client and partner communications (e.g., in email or documents), security, and authentication files. ! Legal records, business forensics and evidentiary documentation, electronic discovery, and proof of chain of custody. ! Archival optimization. Removable hard disk archiving can be optimized to any business or organization, regardless of size and future growth patterns. Cartridges of varying capacities can be purchased whenever needed and are completely portable for offsite storage or disaster recovery protection. Archiving from primary storage to a high-performance active archive is fast and eliminates unnecessary steps in the process. ! Business optimization. Both small and large businesses are beginning to recognize the need to archive different types of information. Varying types of information generally require different retention and disposition policies to meet business objectives or regulatory compliance. An intelligent active archive appliance should have the ability to manage many different types of archives simultaneously. ©2007 IDC 4 Considering ProStor Systems ProStor Systems Inc. was launched in Boulder, Colorado, in 2004 to develop, manufacture, and market the first removable hard disk drive technology for active archiving and backup. ProStor employs 50 people. Its investors include Boulder Ventures Ltd., New Enterprise Associates (NEA), Silicon Valley Bank, Sutter Hill Ventures, and Western Technology Investment. Total funding to date is $26.4 million. In late 2005, ProStor introduced its patented RDX disk cartridge technology, a desktop docking unit that uses removable hard disk drive cartridges for high-speed, reliable archiving and backup. Similar in size to a tape cartridge, the RDX cartridge contains a ruggedized 2.5in. SATA hard disk drive in 40MB, 80MB, 120GB, 160GB, and 300GB capacities. The RDX cartridge stores data nonsequentially, compressing data at a 2:1 ratio, at a native data transfer rate of up to 45MBps. As industry storage capacities grow, ProStor plans to introduce higher-capacity cartridges. All current cartridges are, and future cartridges will be, forward-and backward-compatible. The minimum expected service life for cartridges is 5,000 load/unload cycles and for docks is 10,000 load/unload cycles. Recently completed third-party testing has proven an archive shelf life of 30 years for the RDX media, equalling the most optimistic expectations for tape as an archival medium. ProStor Systems has introduced the InfiniVault, an enterprise-class solution for midmarket businesses that wish to take full advantage of optimizing their IT infrastructure and ensuring they are in regulatory compliance by deploying an active archiving solution. The InfiniVault is an intelligent archive appliance that combines regulatory intelligence services for business services and storage intelligence on the technology side. Regulatory intelligence software provides active archiving for any archive or HSM applications such as those used for email archiving or unstructured data that use standard network CIFS or NFS protocols. Archived files may be accessed over the network whenever needed. The active archive can be set for any length of time. Unlike other products, the file will remain visible in the active archive even after the data has been removed from the active archive and the RDX cartridges are removed to a safe offsite location. This means it is always easy to retrieve information from an InfiniVault archive. Storage intelligence software creates independent archives on a single system for common business documents from applications such as Microsoft Office as well as for business records, human resources records, databases, and email messages. The InfiniVault installs as a network mount point or network drive. The InfiniVault system consists of the following components: ! A system controller with policy-based archive and compliance software ! An intelligent active archive with a standard network interface ! RDX bays and cartridges in removable disk arrays (RDAs) providing for nearline storage with removability for data vaulting ! Browser-based management ©2007 IDC 5 The InfiniVault systems all feature some amount of permanent RAID-protected storage for active archiving, providing immediate access to stored information. The systems are also connected to multiple RDX cartridges in RDAs that hold up to 10 removable RDX hard disk drive cartridges. Systems can be configured with multiple RDAs, allowing up to 100 RDX cartridges to be connected to the appliance. The choice of configurations allows the business to design the best system for its needs, from preservation-level active archiving and infinite capacity to information movement and protection-level offsite archiving using multiple RDX cartridges. Another important feature is that all data being archived to an InfiniVault is written to the active archive and to RDX cartridges simultaneously, meaning that the archived data is always protected. The InfiniVault and removable RDX hard disk drive cartridge offering provides the most complete solution for archiving. It uses the computer’s native interface and appears on the network as a drive. Because it uses 2.5in. hard disk drives, it is faster than traditional archiving systems. Among its other technical attributes are the following: ! Data exported in standard interchange format for ediscovery purposes ! As economical as tape, but with greater reliability and longer shelf life ! AES 256 encryption and key management for security ! Content-addressable security using a hash key ! Content index for each stored file and text-level search capabilities ! Retention management and chain of custody for legal issues ! Information audit trail for tracking actions and data locations ! Single instance storage will not reduplicate at the file level in a given archive ! Data compression to maximize capacity ! Secure delete using digital shredding for documents whose retention period has expired ! Immutability ensured by WORM cartridge format and WORM-protected active archive Solution Selection Considerations Active archiving is becoming an essential business process. Access to documents for business purposes, compliance, auditing, and legal matters is more important than ever, and the speed, accuracy, and reliability of access are often mission-critical requirements. Costs have dropped, and therefore, SMEs now can afford an efficient and effective archiving and backup system. When choosing a solution, SMEs should list all the content types required to archive and check them against the active archive system support. Then they need to answer the following questions: ! Is the active archive system architecture easily integrated into existing IT? ! Will the active archive system architecture scale as the business grows? ! Does the active archive solution provide intelligent support for business policies? ! Does the technology ensure long and reliable data life? ©2007 IDC 6 Challenges and Opportunities for ProStor Challenges for ProStor, as well as for other suppliers offering active archiving solutions, include the following key hurdles, which a combination of technology, processes, and policies can address: ! Many organizations are still confused over the differences between archive and backup. Backup serves as a process to provide for restore and recovery in the event that the primary system or data is corrupted. Backups are typically done periodically and media used is rotated and thus older backups are destroyed. Conversely, archive is the explicit function of preserving and ensuring retention of specified records for the purpose of regulatory compliance, discovery, and general business use. In addition, the process of archiving data removes it from expensive primary storage, reclaiming storage space and reducing the amount of data that needs to be backed up. The solutions to archive and discover information provide for policies for retention, preservation, and disposition and allow for content-based searching. This technology is different from that used for traditional backup to provide recovery points from which to restore in the event of a failure or disaster. ! Corporatewide policies for how a firm manages its information, the retention periods it sets, the manner in which it archives various files and records, and the expectation in terms of search, discovery, and retrieval must be established. These policies must then be documented and instantiated via technology. The challenge many firms face is the bringing together of key business and technology stakeholders who can establish the policies. These stakeholders include records, legal, compliance, risk, and IT professionals. Policies are a necessary first step to archiving, but they are often difficult to establish. Once established, they must be consistently enacted, monitored, and verified to mitigate potential legal or regulatory risk. ! Data may still outlive the technology. Many organizations have infinite or permanent retention of critical business or financial records. However, the application that was used to create the data may not be around to support recall of the data if it is required 100 years from now. Moreover, the media on which the data lives 100 years from now may no longer be interoperable with the application on which the data was originally written. The archive industry is still in a state of standards development, and standards are needed to ensure adequate levels of interoperability between the current technology of the data and the technology of the data created 100 years earlier. Conclusion Information security threats and attacks, as well as business environment challenges, have placed ever-increasing responsibilities on companies of all sizes. SMEs, unaccustomed to the rigorous information preservation and protection strategies employed by large enterprises, are especially vulnerable but nonetheless accountable. Fortunately, active archiving and backup solutions as comprehensive and sophisticated as those deployed at the enterprise level are now available at reasonable cost to midmarket businesses. The range of archival and backup solutions spans tape cartridges, DVD-ROMs, auxiliary hard drives, and removable hard disk drive cartridges. IDC sees significant advantage in the removable hard disk drive cartridge, especially when it is coupled with the intelligent file management/archiving and compliance software value-add. ©2007 IDC 7 ProStor Systems has created additional value with its rollout of the InfiniVault, which gives midsize businesses the performance of a RAID-protected active archive coupled with one or more RDAs, each with 10-cartridge capacity, for long-term and vaulted archiving. Predicated on ProStor Systems’ claim of an ROI within one year, additional savings from reclaimed hard disk space, which reduces the need to purchase more storage, and ease of administration, IDC believes the InfiniVault and RDX hard disk drive cartridge active archiving solution is worth serious consideration. A B O U T T H I S P U B L I C A T I O N This publication was produced by IDC Go-to-Market Services. The opinion, analysis, and research results presented herein are drawn from more detailed research and analysis independently conducted and published by IDC, unless specific vendor sponsorship is noted. IDC Go-to-Market Services makes IDC content available in a wide range of formats for distribution by various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee. C O P Y R I G H T A N D R E S T R I C T I O N S Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. For permission requests, contact the GMS information line at 508-988-7610 or gms@idc.com. Translation and/or localization of this document requires an additional license from IDC. For more information on IDC, visit www.idc.com. For more information on IDC GMS, visit www.idc.com/gms. Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com