"At the Core"
At the Core This article ➢ examines electronic discovery: its relationship to records management and its future RIM implications ➢ discusses how e-mail, storage, backup, software, and operating systems are evolving ➢ offers predictions for electronic discovery in 2010 34 The Information Management Journal • November/December 2003 Electronic Discovery in 2010 The past 10 years have proved that the escalating costs of data collection and review in discovery, as well as the complexity of the systems themselves, demand a major realignment of how business data is maintained Deborah H. Juhnke T he year is 2010. Margaret Techway, a highly placed, first-generation, holographic memory engineer, has recently left her company, Innovations Inc., to join market-newcomer 3-D Strategies. Upon her departure, the “data-freeze” provision of Innovations’ e-risk management policy was implemented automatically. A remotely performed, quick forensic review of her primary workstation uncovers suspicious activity during the previous two weeks, which gives Innovations cause to file a lawsuit against Techway and 3-D Strategies for trade secret theft. The challenges of proving the case, however, are just beginning. Blogs, biometric keys, and blades are only a few of the technological hurdles attorneys will face in developing the case. Because instant messaging (IM) has replaced e-mail as the preferred form of business communication but has not been consistently monitored or saved at Innovations, there are no e-mail archives to search. What files there are had been copied to a removable thumb drive and taken by Techway, leaving little evidence of their removal. Asking for the thumb drive in discovery will be only half the battle, however, because Techway’s thumbprint is necessary to access the drive. 3-D Strategies has adopted blade servers that are configured with a random array of inexpensive disk (RAID) format, meaning that Innovations’ attorneys cannot simply ask for “the server” drive. The increased capacities and more complicated backup models hamper the plaintiff’s attempts to narrow the scope of digital data discovery. Finally, because Techway has participated in an unstructured public weblog (blog) dedicated to the discussion of new technologies (and sanctioned by Innovations), there are some questions regarding whether the trade secrets taken were, in fact, secrets anymore. November/December 2003 • The Information Management Journal 35 This brief vignette illustrates several points: digital documents, Web cams, and IM have become main- • Reliance on the “document” paradigm must change. In years stream, and new sources of digital data present themselves daily. past, discovery was comparatively simple. Ask for documents, get These new technologies offer risks along with rewards. paper. But no longer. Much of what constitutes relevant Organizations must accept that both technology and re- discovery today and in the future will not, cannot, or should designed processes will be required to help manage, search, and not be printed. produce an increasing variety and volume of data. As volumes • Constant vigilance in understanding new technology as it increase and sources multiply, it will no longer be possible to relates to electronic discovery is required. Remember when gather and review all data. there was no such thing as a personal digital assistant (PDA)? • Computer-based discovery cannot be treated like paper- Over the past 10 years, fledgling technologies such as cell phones, based discovery. The quill pen has given way to the digital pen, creating a responsibility to respect and protect this more fragile form of evidence. When viewed in light of recent corporate scandals, topics such as these are more relevant than ever to records managers, lawyers, and corporate management. The past decade has pro- vided some lessons, but there are many more to learn. The Document Is Dead There was a time when documents were described in discov- ery as “writings of every kind and description that are fixed in any form of physical media.” The problem is that the common legal definition of a document is conceptually misleading in the context of electronic discovery issues. This is particularly true for collection and review of voice, video, databases, and Internet-based communications. When addressing these types of data, the average person’s concept of a document – some- thing that may be printed, read, and held in a person’s hand – begins to blur. Although expanding the legal definition of a document to Predictions for Electronic include electronic data creates the obligation to produce such data in discovery, it offers no guidance on how that production Discovery in 2010 should be carried out. Consequently, there is significant varia- tion in methods used to produce electronic data for discovery. • Indiscriminate conversion and production of data will end. The assumed intent of production is to provide meaningful information, but there are ways in which this intent may be • The “document” will be replaced by the “dataset.” intentionally or inadvertently circumvented. • Calculated (not random) sampling will be standard. With paper documents or even word processing files, the meaning is fairly clear. There is a beginning, an end, and a logi- • Language used to request and describe electronic discovery cal structure. True documents tend to be self-contained, or at will become more specific. worst, refer to other documents in support of their content. This • There will be more use of technology and techniques for makes fitting digital data into the conceptual framework of a document particularly troublesome. filtering, including search–and–review tools based on artifi- There have been attempts in the past five years or so to shoe- cial intelligence models. horn digital data generated in discovery into the document par- • Computer technology will no longer be Microsoft-centric. adigm, including printing it to paper, printing it to image, extracting it into file structures, and posting it to the Web for • E-mail will give way to other forms of communication as the review. As technology advances, however, these techniques will primary source of data discovery. become less suitable. They will fall short in their ability to • There will be a need for the wider use of experts, consultants, accommodate all relevant forms of data and must evolve to and attorney “specialists.” remain viable. Likewise, forays into electronic discovery that have been limited to the collection and review of e-mail should • The judiciary will become more educated and experienced be made cautiously: the good stuff may be left behind. The case in the use and abuse of electronic discovery. where relevant data is found buried in a single field within a corporate database is only one example. 36 The Information Management Journal • November/December 2003 Key Discovery Technologies Technology What It Is Example Electronic Discovery Issues Instant • allows immediate communication via AOL, MSN Messenger • enables users to circumvent messaging the Internet corporate e-mail • similar to e-mail, but without con- • no record unless saved straint, tracking, or preservation proactively • informal Alternative e-mail systems that operate outside the • PocoMail • enables users to circumvent e-mail corporate environment • ISP-based e-mail such as Yahoo corporate e-mail • no record unless saved proactively • informal Biometrics security based on personal physical thumbprint access on PDAs or USB can confound discovery and data characteristics, such as retinal scan and port drives retrieval efforts by making access diffi- fingerprint cult or impossible Filtering filters spam or other messages • Spam Assassin • cannot assume that data sent was software and files • filters embedded in ISP services received such as AOL, Earthlink, and MSN • on subscription services alone (not busi- ness e-mail), 11.7 percent of messages requested were never received, accord- ing to Information World Collaboration enables communication between com- • Eroom • may be overlooked as source of data software panies and individuals in remote or • WebEx • may be only copy of relevant data Web-based environments • difficult to monitor for data preservation • eventually may become part of the operating system itself Virtual offices business model whereby employees in Jet Blue reservation agents dispersed data selected departments work from home Portable small, removable storage devices holding • Pocket Drive • hard to find storage up to 40 GB and costing only about $400 • Microdrive (IBM) • hard to track • SanDisk CompactFlash • easy to steal data Blogs Web-based personal or topic-specific See blogger.com for examples • ad-hoc nature bulletin boards • difficult to track, collect, or identify • if found, could be good evidence Blade servers network-based servers based on “blades” IBM, HP, others • more difficult for the untrained user to see that are added to a chassis, enabling • holds more data many servers to be housed in a small • more difficult to seize and review, as space and boosting network efficiency they are generally formatted as RAID Digital files former analog files that are now digi- .wav and .MP3 an often-forgotten source of relevant (beyond word tized, including voice and audio data, particularly when used to processing) broadcast corporate information Data mining programs that enable data from a variety generally customized or business- presumption is that all information is of sources to be viewed in the aggregate specific, such as for hotel industry or locatable because it is in data ware- and from varying perspectives manufacturing house World Wide all content formatted for the Internet any Web site • another overlooked source of evi- Web or a corporate intranet dentiary information • difficult to track and preserve Digital bringing together digital data of many cell phones • creates another good source convergence types and sources into a single location of digital evidence • hard to monitor and track Peer-to-peer communication protocol that enables Groove networks • difficult to track and monitor data networking PCs to talk directly to one another with- maintained in this environment out sharing access to a centralized server • poses problems for data preserva- tion efforts because of its decentral- ized nature 38 The Information Management Journal • November/December 2003 New data types and greater reliance on electronic communi- It is increasingly difficult to identify and collect the most cation also present a significant records management challenge appropriate evidence. In this respect, technology is both a bless- – one that must be addressed by changes in process and in the ing and a curse. A curse because each year brings new places technology used to manage that process. where relevant data may lurk and ways to exploit the weaknesses in data management structures; a blessing because as each So What’s New? weakness is identified, inventive companies develop the tools to From the Fortune 50 to the “mom-and-pop,” organizations bolster or eradicate it. are increasingly implementing digital technologies. Unfortu- The enormous popularity of do-it-yourself in everything nately, the impact of new and more ephemeral data sources on from home repair to self-help is filtering into the field of digital records management and litigation are the farthest thing from discovery, sometimes with disastrous results. Inadvertent over- the minds of those who implement new technology. writing of data and failure to preserve are two areas in which the Collaboration software, data warehouses, ISP-hosted e-mail, do-it-yourselfer risks exposure and sanctions. The days of sim- and Web-based content all present opportunities for indiscrim- ply collecting e-mail from an Outlook server and calling discov- inate archiving and dissemination of corporate information. ery done are waning, if not already gone. Such consequences are often lost, however, in the cost-benefit Those who find this preposterous should consider the discussions among IT staff and corporate management. Sarbanes-Oxley Act. Its document retention provisions alone In 1995, Microsoft Windows was predominant, there were few mandate a higher standard of care. When taken in the context of personal computers, and the PDA had not yet been born. Storage litigation and discovery, however, Sarbanes-Oxley goes well was measured in megabytes, not gigabytes, and only “gear-heads” beyond monetary sanctions to the specter of jail time. Thus, and professors wandered the Internet. Fast forward to 2003 and where to focus attention becomes increasingly important. consider the current landscape: cyber hacking, computer viruses, the Linux operating system, terabytes and petabytes, Internet Instant Messaging and E-mail cafes, and cell phones that take pictures. What was once the stuff IM is an immediate issue for most companies. It is ubiqui- of science fiction and spy movies is now mainstream. So how do tous, generally unmonitored, and a great way to circumvent these advances impact electronic discovery? restrictive corporate e-mail policies. [Editor’s Note: see “IM: November/December 2003 • The Information Management Journal 39 Invaluable New Business Tool or Records Management become critical in cases involving trade-secret theft, for exam- Nightmare?” on page 27.] According to a study quoted in ple. Biometrics and hardware-based security can still foil an Information Week, “By 2007, businesses will be supporting 182 investigator’s attempts to access the data, however. IBM is plac- million IM users”; PC World estimates IM users will top 250 ing storage of biometric factors and encryption keys on a dedi- million. But when misused, IM can be used to leak everything cated processor on the computer’s motherboard. To gain access, from financial data to source code. For example, consider the some removable media require fingerprint recognition or put- possibility of an IM thread about pricing between competing ting the device into its host computer. On the high end, storage area networks (SANs) are replacing companies and its implications for antitrust violations. the need to add larger hard drives to individual servers. Both Almost as bad as IM misuse is the fact that commercial SANs and outsourced data warehousing can easily be over- Internet service providers such as AOL and Yahoo have intro- looked as a relevant data cache. duced more sophisticated encryption options and premium e-mail services that enable customers to store more e-mail in Backup It appears the industry may also be moving beyond backup tapes into the world of “data protection appliances,” a phrase Disk-based backups ... that is not a euphemism for file cabinets. Tape backup, which is linear, subject to failure, and tedious for data recovery, is being may soon supplant challenged by small computer systems interface (SCSI) devices that keep an initial copy of a protected drive and log changes at backup tapes whose intervals as short as 30 seconds. Disk-based backups such as these may soon supplant backup tapes whose only goal is data only goal is data recovery rather than data archiving. For now, tapes continue to grow and by 2010, super advanced intelligent tape (SAIT) may recovery rather hold as much as 4 terabytes per tape. The underlying issue, however, is too much data. As storage than data archiving becomes less expensive and more data is generated, the tempta- tion is simply to keep it available. If that trend continues, the potential liability and cost of gathering and filtering this data for their personal accounts for longer time periods. As cell phones litigation will be staggering. Consider that the reported average and PDAs converge, they, too, will harbor data that may be storage capacity of a company’s Windows NT servers is 43 ter- subject to both retention and discovery. abytes. To put this number in perspective, if 43 terabytes of doc- The effects of increasing e-mail volume are becoming evi- uments were printed, they would stack over 800 miles high. dent. Last year, as a cost-saving measure and in response to a Rapid Restore, a new IBM ThinkPad feature, creates a hidden 100-percent increase in e-mail in two years, EDS asked its service partition that backs up the entire system image, from employees to save messages in their local Microsoft Outlook data files to registry settings, with periodic updates. Although inboxes, rather than on the Exchange server. This short-term fix not the same as an evidentiary image, this backup will let users is just one example of how companies react to an immediate locate and restore single files that have been corrupted or delet- problem without considering the long-term impact. ed. That is good news for discovery but bad news for those try- Compounding the situation is the fact that many users have not ing to maintain tight controls. been trained to use their e-mail systems effectively, making it much more difficult to retrieve and isolate relevant e-mail. Software and Operating Systems The good news is that the new version of Exchange, code-named Integrated messaging, version control, audit trail, and event Titanium, promises to protect messaging from hackers and inte- notification are all components of the latest online collabora- grates an automatic backup component that takes regular snap- tion tools. Objectively, they are excellent tools for streamlining shots of the data. It also will further the centralization of e-mail to such business processes as product development, corporate fewer servers, facilitating both discovery and data retention. management, and more. When litigation threatens, however, they are just one more place where data may lurk. Storage Technology is slowly moving away from a Microsoft-centric Storage has become personal. Corporate servers are no longer view of business computing. Linux and other open-source plat- the exclusive keepers of corporate data. Thumb drives, flash forms will heighten the variety and complexity of internal data cards, and micro-drives are now capable of holding gigabytes of review and storage. Futuristic applications such as visualization data that can be downloaded simply and secretly. Employees can and mapping technology, rather than the printed report, may more easily take their work (or anything else) home or to a ultimately hold the best evidence. It is therefore critical that cor- competitor. Gaining access to such devices in discovery may porate managers, attorneys, and records managers understand 40 The Information Management Journal • November/December 2003 current and future technologies and their effect on both reten- A do-it-yourself trend is beginning to emerge, as lawyers and tion requirements and proactive discovery in litigation. For IT personnel take on more responsibility for managing elec- example, will cell-site data (which antenna towers or wireless tronic discovery. Large companies may want to build in-house facilities a cell phone accesses) or .wav files be important in the expertise in electronic discovery. However, they must recognize company’s next litigation? Probably not, but IM and collabora- that they will require significant training and an ongoing pro- tive software probably will be. gram to update them on current tools and technologies. The law is not settled as to form, scope, and cost of electron- ic discovery. Two recent cases, Zubulake v. UBS Warburg LLC, 2003 ILRWeb (P&F) 2253 [SDNY, 2003], and Rowe A single advanced Entertainment, Inc. v. The William Morris Agency, Inc. 205 F.R.D. 421 (January 16, 2002), offer guidance but do not acknowledge server, when the coming storm created by the compression of court dockets and the expansion of information and new technologies. clustered, can hold The Future of Electronic Discovery a whopping 11 Some technologies will flourish, some will die, some will just keep hanging on. Predicting which will survive is like predicting petabytes (11,000 the outcome of the next Kentucky Derby. One thing is certain, however: Our computing environments will continue to change terabytes) of data and impact discovery in litigation. Clearly, information managers must develop an understand- ing of hardware and software beyond that gained through personal experience to adequately pursue or defend electronic Two Roads Diverged discovery in litigation. It is likewise easy to take a “been there, Imagine that a person is carrying five pingpong balls back done that” attitude toward electronic discovery, but the times and forth across the room in her hands. Each time she crosses are quickly changing. AOL alone now generates 3 terabytes of the room, another ball is added. After only a few trips she starts logs a month. A single advanced server, when clustered, can hold to drop a ball here and there, and she suddenly realizes that she a whopping 11 petabytes (11,000 terabytes) of data. could put all the pingpong balls into a box to make the task eas- Emerging best practices for data retention and preservation ier. She continues to carry the box back and forth, each trip can help corporate counsel address these issues proactively. adding another ball, but now baseballs, basketballs, and foot- They will require rethinking in terms of how to approach dis- balls are added. The box finally becomes too heavy to carry, covery. Records and information management systems have not however, and she eventually drops all the balls. historically been deployed with litigation in mind, but perhaps This not-so-subtle metaphor helps illustrate how most peo- they should be. The escalating costs of data collection and review in discovery, as well as the complexity of the systems ple have thus far approached computer-based discovery (i.e., themselves demand a major realignment of how data is main- continuing to follow the same practices used for paper-based tained in the ordinary course of business. discovery), seeking only to contain the increasing amount and Thus, an e-risk management plan has become an imperative. variety of data in a larger container. But take a step back and As with most things, a focus on minimizing risk now will yield consider whether carrying all those balls back and forth was benefits in the future. really necessary. The costs and time associated with computer-based discovery Deborah H. Juhnke is Vice President of Seattle-based Computer can be greatly minimized with a little prior planning. Careful Forensics Inc. and is a leader in the field of electronic data selection of datasets, filtering, and sampling techniques offer recovery. She may be contacted at firstname.lastname@example.org. ways to focus discovery efforts and limit unnecessary collection. Needless to say, if a comprehensive e-risk management plan is References implemented prior to litigation, the amount of data available “Corporate Instant Messaging Ready to Take Off.” Information for review will likely be much smaller. For example, the “2003 Week. 2 April 2003. E-Mail Rules, Policies and Practices Survey,” co-sponsored by American Management Association, The ePolicy Institute, and “E-mail Habits Are Risky Business.” PCWorld.com. 24 June Clearswift, revealed a lack of e-mail retention and deletion 2003. training and policies in U.S. corporations. According to Nancy “The E-mail Scandal.” Infoworld.com. 25 November 2002. Flynn, ePolicy Institute’s executive director, “... only 27 percent “How Secure Is Instant Messaging?” PC World, October 2002. of the [1,100 U.S. companies that participated in the survey] are doing any training about retention and deletion of e-mail, and “More Than an In-Box.” InformationWeek.com, 6 May 2002. only 34 percent have any retention and deletion policies at all.” “2003 Infoworld Storage Survey.” Infoworld.com. 2003. 42 The Information Management Journal • November/December 2003