At the Core by jianglifang


									At the Core
This article
➢   examines electronic discovery: its
    relationship to records management
    and its future RIM implications

➢   discusses how e-mail, storage,
    backup, software, and operating
    systems are evolving

➢   offers predictions for electronic
    discovery in 2010

  34     The Information Management Journal   •   November/December 2003
Electronic Discovery
         The past 10 years have proved
   that the escalating costs of data collection
     and review in discovery, as well as the
     complexity of the systems themselves,
      demand a major realignment of how
           business data is maintained

                             Deborah H. Juhnke

 T     he year is 2010. Margaret Techway, a highly placed, first-generation,
       holographic memory engineer, has recently left her company, Innovations
       Inc., to join market-newcomer 3-D Strategies. Upon her departure, the
       “data-freeze” provision of Innovations’ e-risk management policy was
  implemented automatically. A remotely performed, quick forensic review of
  her primary workstation uncovers suspicious activity during the previous two
  weeks, which gives Innovations cause to file a lawsuit against Techway and
  3-D Strategies for trade secret theft. The challenges of proving the case,
  however, are just beginning. Blogs, biometric keys, and blades are only a few
  of the technological hurdles attorneys will face in developing the case.
    Because instant messaging (IM) has replaced e-mail as the preferred form of
  business communication but has not been consistently monitored or saved at
  Innovations, there are no e-mail archives to search. What files there are had
  been copied to a removable thumb drive and taken by Techway, leaving little
  evidence of their removal. Asking for the thumb drive in discovery will be
  only half the battle, however, because Techway’s thumbprint is necessary
  to access the drive. 3-D Strategies has adopted blade servers that are
  configured with a random array of inexpensive disk (RAID) format, meaning that
  Innovations’ attorneys cannot simply ask for “the server” drive. The increased
  capacities and more complicated backup models hamper the plaintiff’s
  attempts to narrow the scope of digital data discovery.
    Finally, because Techway has participated in an unstructured public weblog
  (blog) dedicated to the discussion of new technologies (and sanctioned by
  Innovations), there are some questions regarding whether the trade secrets
  taken were, in fact, secrets anymore.

                                         November/December 2003   •   The Information Management Journal   35
   This brief vignette illustrates several points:                       digital documents, Web cams, and IM have become main-
• Reliance on the “document” paradigm must change. In years              stream, and new sources of digital data present themselves daily.
past, discovery was comparatively simple. Ask for documents, get         These new technologies offer risks along with rewards.
paper. But no longer. Much of what constitutes relevant                     Organizations must accept that both technology and re-
discovery today and in the future will not, cannot, or should            designed processes will be required to help manage, search, and
not be printed.                                                          produce an increasing variety and volume of data. As volumes
• Constant vigilance in understanding new technology as it               increase and sources multiply, it will no longer be possible to
relates to electronic discovery is required. Remember when               gather and review all data.
there was no such thing as a personal digital assistant (PDA)?           • Computer-based discovery cannot be treated like paper-
Over the past 10 years, fledgling technologies such as cell phones,      based discovery. The quill pen has given way to the digital pen,
                                                                         creating a responsibility to respect and protect this more fragile
                                                                         form of evidence.
                                                                            When viewed in light of recent corporate scandals, topics
                                                                         such as these are more relevant than ever to records managers,
                                                                         lawyers, and corporate management. The past decade has pro-
                                                                         vided some lessons, but there are many more to learn.

                                                                         The Document Is Dead
                                                                            There was a time when documents were described in discov-
                                                                         ery as “writings of every kind and description that are fixed in
                                                                         any form of physical media.” The problem is that the common
                                                                         legal definition of a document is conceptually misleading in the
                                                                         context of electronic discovery issues. This is particularly true
                                                                         for collection and review of voice, video, databases, and
                                                                         Internet-based communications. When addressing these types
                                                                         of data, the average person’s concept of a document – some-
                                                                         thing that may be printed, read, and held in a person’s hand –
                                                                         begins to blur.
                                                                            Although expanding the legal definition of a document to
Predictions for Electronic                                               include electronic data creates the obligation to produce such
                                                                         data in discovery, it offers no guidance on how that production
Discovery in 2010                                                        should be carried out. Consequently, there is significant varia-
                                                                         tion in methods used to produce electronic data for discovery.
• Indiscriminate conversion and production of data will end.             The assumed intent of production is to provide meaningful
                                                                         information, but there are ways in which this intent may be
• The “document” will be replaced by the “dataset.”                      intentionally or inadvertently circumvented.
• Calculated (not random) sampling will be standard.                        With paper documents or even word processing files, the
                                                                         meaning is fairly clear. There is a beginning, an end, and a logi-
• Language used to request and describe electronic discovery             cal structure. True documents tend to be self-contained, or at
  will become more specific.                                             worst, refer to other documents in support of their content. This
• There will be more use of technology and techniques for                makes fitting digital data into the conceptual framework of a
                                                                         document particularly troublesome.
  filtering, including search–and–review tools based on artifi-
                                                                            There have been attempts in the past five years or so to shoe-
  cial intelligence models.                                              horn digital data generated in discovery into the document par-
• Computer technology will no longer be Microsoft-centric.               adigm, including printing it to paper, printing it to image,
                                                                         extracting it into file structures, and posting it to the Web for
• E-mail will give way to other forms of communication as the
                                                                         review. As technology advances, however, these techniques will
  primary source of data discovery.                                      become less suitable. They will fall short in their ability to
• There will be a need for the wider use of experts, consultants,        accommodate all relevant forms of data and must evolve to
  and attorney “specialists.”                                            remain viable. Likewise, forays into electronic discovery that
                                                                         have been limited to the collection and review of e-mail should
• The judiciary will become more educated and experienced                be made cautiously: the good stuff may be left behind. The case
  in the use and abuse of electronic discovery.                          where relevant data is found buried in a single field within a
                                                                         corporate database is only one example.

36     The Information Management Journal   •   November/December 2003
                                    Key Discovery Technologies
 Technology                      What It Is                                    Example                      Electronic Discovery Issues

 Instant             • allows immediate communication via        AOL, MSN Messenger                     • enables users to circumvent
 messaging             the Internet                                                                       corporate e-mail
                     • similar to e-mail, but without con-                                              • no record unless saved
                       straint, tracking, or preservation                                                 proactively
                                                                                                        • informal
 Alternative         e-mail systems that operate outside the     • PocoMail                             • enables users to circumvent
 e-mail              corporate environment                       • ISP-based e-mail such as Yahoo         corporate e-mail
                                                                                                        • no record unless saved proactively
                                                                                                        • informal
 Biometrics          security based on personal physical         thumbprint access on PDAs or USB       can confound discovery and data
                     characteristics, such as retinal scan and   port drives                            retrieval efforts by making access diffi-
                     fingerprint                                                                        cult or impossible
 Filtering           filters spam or other messages              • Spam Assassin                        • cannot assume that data sent was
 software            and files                                   • filters embedded in ISP services       received
                                                                   such as AOL, Earthlink, and MSN      • on subscription services alone (not busi-
                                                                                                          ness e-mail), 11.7 percent of messages
                                                                                                          requested were never received, accord-
                                                                                                          ing to Information World
 Collaboration       enables communication between com-          • Eroom                                •   may be overlooked as source of data
 software            panies and individuals in remote or         • WebEx                                •   may be only copy of relevant data
                     Web-based environments                                                             •   difficult to monitor for data preservation
                                                                                                        •   eventually may become part of the
                                                                                                            operating system itself
 Virtual offices     business model whereby employees in         Jet Blue reservation agents            dispersed data
                     selected departments work from home
 Portable            small, removable storage devices holding    • Pocket Drive                         • hard to find
 storage             up to 40 GB and costing only about $400     • Microdrive (IBM)                     • hard to track
                                                                 • SanDisk CompactFlash                 • easy to steal data

 Blogs               Web-based personal or topic-specific        See for examples           • ad-hoc nature
                     bulletin boards                                                                    • difficult to track, collect, or identify
                                                                                                        • if found, could be good evidence

 Blade servers       network-based servers based on “blades” IBM, HP, others                            • more difficult for the untrained user to see
                     that are added to a chassis, enabling                                              • holds more data
                     many servers to be housed in a small                                               • more difficult to seize and review, as
                     space and boosting network efficiency                                                they are generally formatted as RAID
 Digital files       former analog files that are now digi-      .wav and .MP3                          an often-forgotten source of relevant
 (beyond word        tized, including voice and audio                                                   data, particularly when used to
 processing)                                                                                            broadcast corporate information

 Data mining         programs that enable data from a variety generally customized or business-         presumption is that all information is
                     of sources to be viewed in the aggregate specific, such as for hotel industry or   locatable because it is in data ware-
                     and from varying perspectives            manufacturing                             house

 World Wide          all content formatted for the Internet      any Web site                           • another overlooked source of evi-
 Web                 or a corporate intranet                                                              dentiary information
                                                                                                        • difficult to track and preserve
 Digital             bringing together digital data of many      cell phones                            • creates another good source
 convergence         types and sources into a single location                                             of digital evidence
                                                                                                        • hard to monitor and track

 Peer-to-peer        communication protocol that enables        Groove networks                         • difficult to track and monitor data
 networking          PCs to talk directly to one another with-                                            maintained in this environment
                     out sharing access to a centralized server                                         • poses problems for data preserva-
                                                                                                          tion efforts because of its decentral-
                                                                                                          ized nature

38       The Information Management Journal   •   November/December 2003
   New data types and greater reliance on electronic communi-              It is increasingly difficult to identify and collect the most
cation also present a significant records management challenge         appropriate evidence. In this respect, technology is both a bless-
– one that must be addressed by changes in process and in the          ing and a curse. A curse because each year brings new places
technology used to manage that process.                                where relevant data may lurk and ways to exploit the weaknesses
                                                                       in data management structures; a blessing because as each
So What’s New?                                                         weakness is identified, inventive companies develop the tools to
   From the Fortune 50 to the “mom-and-pop,” organizations             bolster or eradicate it.
are increasingly implementing digital technologies. Unfortu-               The enormous popularity of do-it-yourself in everything
nately, the impact of new and more ephemeral data sources on           from home repair to self-help is filtering into the field of digital
records management and litigation are the farthest thing from          discovery, sometimes with disastrous results. Inadvertent over-
the minds of those who implement new technology.                       writing of data and failure to preserve are two areas in which the
Collaboration software, data warehouses, ISP-hosted e-mail,            do-it-yourselfer risks exposure and sanctions. The days of sim-
and Web-based content all present opportunities for indiscrim-         ply collecting e-mail from an Outlook server and calling discov-
inate archiving and dissemination of corporate information.            ery done are waning, if not already gone.
Such consequences are often lost, however, in the cost-benefit             Those who find this preposterous should consider the
discussions among IT staff and corporate management.                   Sarbanes-Oxley Act. Its document retention provisions alone
   In 1995, Microsoft Windows was predominant, there were few          mandate a higher standard of care. When taken in the context of
personal computers, and the PDA had not yet been born. Storage         litigation and discovery, however, Sarbanes-Oxley goes well
was measured in megabytes, not gigabytes, and only “gear-heads”        beyond monetary sanctions to the specter of jail time. Thus,
and professors wandered the Internet. Fast forward to 2003 and         where to focus attention becomes increasingly important.
consider the current landscape: cyber hacking, computer viruses,
the Linux operating system, terabytes and petabytes, Internet          Instant Messaging and E-mail
cafes, and cell phones that take pictures. What was once the stuff        IM is an immediate issue for most companies. It is ubiqui-
of science fiction and spy movies is now mainstream. So how do         tous, generally unmonitored, and a great way to circumvent
these advances impact electronic discovery?                            restrictive corporate e-mail policies. [Editor’s Note: see “IM:

                                                                     November/December 2003    •   The Information Management Journal   39
Invaluable New Business Tool or Records Management                        become critical in cases involving trade-secret theft, for exam-
Nightmare?” on page 27.] According to a study quoted in                   ple. Biometrics and hardware-based security can still foil an
Information Week, “By 2007, businesses will be supporting 182             investigator’s attempts to access the data, however. IBM is plac-
million IM users”; PC World estimates IM users will top 250               ing storage of biometric factors and encryption keys on a dedi-
million. But when misused, IM can be used to leak everything              cated processor on the computer’s motherboard. To gain access,
from financial data to source code. For example, consider the             some removable media require fingerprint recognition or put-
possibility of an IM thread about pricing between competing               ting the device into its host computer.
                                                                             On the high end, storage area networks (SANs) are replacing
companies and its implications for antitrust violations.
                                                                          the need to add larger hard drives to individual servers. Both
   Almost as bad as IM misuse is the fact that commercial
                                                                          SANs and outsourced data warehousing can easily be over-
Internet service providers such as AOL and Yahoo have intro-              looked as a relevant data cache.
duced more sophisticated encryption options and premium
e-mail services that enable customers to store more e-mail in             Backup
                                                                              It appears the industry may also be moving beyond backup
                                                                          tapes into the world of “data protection appliances,” a phrase
            Disk-based backups ...                                        that is not a euphemism for file cabinets. Tape backup, which is
                                                                          linear, subject to failure, and tedious for data recovery, is being
              may soon supplant                                           challenged by small computer systems interface (SCSI) devices
                                                                          that keep an initial copy of a protected drive and log changes at
             backup tapes whose                                           intervals as short as 30 seconds. Disk-based backups such as
                                                                          these may soon supplant backup tapes whose only goal is data
               only goal is data                                          recovery rather than data archiving. For now, tapes continue to
                                                                          grow and by 2010, super advanced intelligent tape (SAIT) may
               recovery rather                                            hold as much as 4 terabytes per tape.
                                                                              The underlying issue, however, is too much data. As storage
             than data archiving                                          becomes less expensive and more data is generated, the tempta-
                                                                          tion is simply to keep it available. If that trend continues, the
                                                                          potential liability and cost of gathering and filtering this data for
their personal accounts for longer time periods. As cell phones           litigation will be staggering. Consider that the reported average
and PDAs converge, they, too, will harbor data that may be                storage capacity of a company’s Windows NT servers is 43 ter-
subject to both retention and discovery.                                  abytes. To put this number in perspective, if 43 terabytes of doc-
   The effects of increasing e-mail volume are becoming evi-              uments were printed, they would stack over 800 miles high.
dent. Last year, as a cost-saving measure and in response to a                Rapid Restore, a new IBM ThinkPad feature, creates a hidden
100-percent increase in e-mail in two years, EDS asked its                service partition that backs up the entire system image, from
employees to save messages in their local Microsoft Outlook               data files to registry settings, with periodic updates. Although
inboxes, rather than on the Exchange server. This short-term fix          not the same as an evidentiary image, this backup will let users
is just one example of how companies react to an immediate                locate and restore single files that have been corrupted or delet-
problem without considering the long-term impact.                         ed. That is good news for discovery but bad news for those try-
Compounding the situation is the fact that many users have not            ing to maintain tight controls.
been trained to use their e-mail systems effectively, making it
much more difficult to retrieve and isolate relevant e-mail.              Software and Operating Systems
   The good news is that the new version of Exchange, code-named             Integrated messaging, version control, audit trail, and event
Titanium, promises to protect messaging from hackers and inte-            notification are all components of the latest online collabora-
grates an automatic backup component that takes regular snap-             tion tools. Objectively, they are excellent tools for streamlining
shots of the data. It also will further the centralization of e-mail to   such business processes as product development, corporate
fewer servers, facilitating both discovery and data retention.            management, and more. When litigation threatens, however,
                                                                          they are just one more place where data may lurk.
Storage                                                                      Technology is slowly moving away from a Microsoft-centric
  Storage has become personal. Corporate servers are no longer            view of business computing. Linux and other open-source plat-
the exclusive keepers of corporate data. Thumb drives, flash              forms will heighten the variety and complexity of internal data
cards, and micro-drives are now capable of holding gigabytes of           review and storage. Futuristic applications such as visualization
data that can be downloaded simply and secretly. Employees can            and mapping technology, rather than the printed report, may
more easily take their work (or anything else) home or to a               ultimately hold the best evidence. It is therefore critical that cor-
competitor. Gaining access to such devices in discovery may               porate managers, attorneys, and records managers understand

40     The Information Management Journal    •   November/December 2003
current and future technologies and their effect on both reten-             A do-it-yourself trend is beginning to emerge, as lawyers and
tion requirements and proactive discovery in litigation. For             IT personnel take on more responsibility for managing elec-
example, will cell-site data (which antenna towers or wireless           tronic discovery. Large companies may want to build in-house
facilities a cell phone accesses) or .wav files be important in the      expertise in electronic discovery. However, they must recognize
company’s next litigation? Probably not, but IM and collabora-           that they will require significant training and an ongoing pro-
tive software probably will be.                                          gram to update them on current tools and technologies.
                                                                            The law is not settled as to form, scope, and cost of electron-
                                                                         ic discovery. Two recent cases, Zubulake v. UBS Warburg LLC,
                                                                         2003 ILRWeb (P&F) 2253 [SDNY, 2003], and Rowe
                 A single advanced                                       Entertainment, Inc. v. The William Morris Agency, Inc. 205 F.R.D.
                                                                         421 (January 16, 2002), offer guidance but do not acknowledge
                   server, when                                          the coming storm created by the compression of court dockets
                                                                         and the expansion of information and new technologies.
                clustered, can hold
                                                                         The Future of Electronic Discovery
                  a whopping 11                                             Some technologies will flourish, some will die, some will just
                                                                         keep hanging on. Predicting which will survive is like predicting
                petabytes (11,000                                        the outcome of the next Kentucky Derby. One thing is certain,
                                                                         however: Our computing environments will continue to change
                terabytes) of data                                       and impact discovery in litigation.
                                                                            Clearly, information managers must develop an understand-
                                                                         ing of hardware and software beyond that gained through
                                                                         personal experience to adequately pursue or defend electronic
Two Roads Diverged                                                       discovery in litigation. It is likewise easy to take a “been there,
   Imagine that a person is carrying five pingpong balls back            done that” attitude toward electronic discovery, but the times
and forth across the room in her hands. Each time she crosses            are quickly changing. AOL alone now generates 3 terabytes of
the room, another ball is added. After only a few trips she starts       logs a month. A single advanced server, when clustered, can hold
to drop a ball here and there, and she suddenly realizes that she        a whopping 11 petabytes (11,000 terabytes) of data.
could put all the pingpong balls into a box to make the task eas-           Emerging best practices for data retention and preservation
ier. She continues to carry the box back and forth, each trip            can help corporate counsel address these issues proactively.
adding another ball, but now baseballs, basketballs, and foot-           They will require rethinking in terms of how to approach dis-
balls are added. The box finally becomes too heavy to carry,             covery. Records and information management systems have not
however, and she eventually drops all the balls.                         historically been deployed with litigation in mind, but perhaps
   This not-so-subtle metaphor helps illustrate how most peo-            they should be. The escalating costs of data collection and
                                                                         review in discovery, as well as the complexity of the systems
ple have thus far approached computer-based discovery (i.e.,
                                                                         themselves demand a major realignment of how data is main-
continuing to follow the same practices used for paper-based
                                                                         tained in the ordinary course of business.
discovery), seeking only to contain the increasing amount and               Thus, an e-risk management plan has become an imperative.
variety of data in a larger container. But take a step back and          As with most things, a focus on minimizing risk now will yield
consider whether carrying all those balls back and forth was             benefits in the future.
really necessary.
   The costs and time associated with computer-based discovery            Deborah H. Juhnke is Vice President of Seattle-based Computer
can be greatly minimized with a little prior planning. Careful            Forensics Inc. and is a leader in the field of electronic data
selection of datasets, filtering, and sampling techniques offer           recovery. She may be contacted at
ways to focus discovery efforts and limit unnecessary collection.
Needless to say, if a comprehensive e-risk management plan is            References
implemented prior to litigation, the amount of data available             “Corporate Instant Messaging Ready to Take Off.” Information
for review will likely be much smaller. For example, the “2003
                                                                          Week. 2 April 2003.
E-Mail Rules, Policies and Practices Survey,” co-sponsored by
American Management Association, The ePolicy Institute, and               “E-mail Habits Are Risky Business.” 24 June
Clearswift, revealed a lack of e-mail retention and deletion              2003.
training and policies in U.S. corporations. According to Nancy            “The E-mail Scandal.” 25 November 2002.
Flynn, ePolicy Institute’s executive director, “... only 27 percent
                                                                          “How Secure Is Instant Messaging?” PC World, October 2002.
of the [1,100 U.S. companies that participated in the survey] are
doing any training about retention and deletion of e-mail, and            “More Than an In-Box.”, 6 May 2002.
only 34 percent have any retention and deletion policies at all.”         “2003 Infoworld Storage Survey.” 2003.

42     The Information Management Journal   •   November/December 2003

To top