Future Imperfect by IEEEComputerSociety


More Info
									Internet Predictions

                                        Future Imperfect

                                                                       s the second decade of the 21st           ping by at least a factor of 100 in the
                                                                       century dawns, predictions of             next 10 years, suggesting petabyte
                                                                       global Internet digital transmis-         (1015 bytes) disk drives costing between
                                                                 sions reach as high as 667 exabytes             $100 and $1,000. Of course, the rate at
                                                                 (1018 bytes; http://en.wikipedia.org/           which data can be transferred to and
                                                                 wiki/SI_prefix#List_of_SI_prefixes)             from such drives will be a major fac-
                                                                 per year by 2013 (see http://telephony          tor in their utility. Solid-state storage is
    Vinton G. Cerf                                               online.com/global/news/cisco-ip-traffic         faster but also more expensive, at least
    Google                                                       -0609/). Based on this prediction, traf-        at present. A 1-Gbyte solid-state drive
                                                                 fic levels might easily exceed many             was available for $460 in late 2009.
                                                                 zettabytes (1021 bytes, or 1,000 exa-           At that price point, a 1.5-Tbyte drive
                                                                 bytes) by the end of the decade. Setting        would cost about $4,600. These prices
                                                                 aside the challenge of somehow trans-           are focused on low-end consumer
                                                                 porting all that traffic and wondering          products. Larger-scale systems hold-
                                                                 about the sources and sinks of it all, we       ing petabyte- to exabyte-range content
                                                                 might also focus on the nature of the           are commensurately more expensive
                                                                 information being transferred, how it’s         in absolute terms but possibly cheaper
                                                                 encoded, whether it’s stored for future         per Mbyte. As larger-scale systems
                                                                 use, and whether it will always be pos-         are contemplated, operational costs,
                                                                 sible to interpret as intended.                 including housing, electricity, opera-
                                                                                                                 tors, and the like, contribute increasing
                                                                 Storage Media                                   percentages to the annual cost of main-
                                                                 Without exaggerating, it seems fair to          taining large-scale storage systems.
                                                                 say that storage technology costs have              The point of these observations is
                                                                 dropped dramatically over time. A               simply that it will be both possible and
                                                                 10-Mbyte disk drive, the size of a shoe         likely that the amount of digital con-
                                                                 box, cost US$1,000 in 1979. In 2010, a          tent stored by 2010 will be extremely
                                                                 1.5-Tbyte disk drive costs about $120           large,    integrating      over     govern-
                                                                 retail. That translates into about 104          ment, enterprise, and consumer stor-
                                                                 bytes/$ in 1979 and more than 1010              age systems. The question this article
                                                                 bytes/$ in 2010. If storage technology          addresses is whether we’ll be able to
                                                                 continues to increase in density and            persistently and reliably retrieve and
                                                                 decrease in cost per Mbyte, we might            interpret the vast quantities of digital
                                                                 anticipate consumer storage costs drop-         material stored away in various places.

    	                  Published	by	the	IEEE	Computer	Society	                1089-7801/10/$26.00	©	2010	IEEE	                          IEEE	INTERNET	COMPUTING
                                                                                                  Future Imperfect

   Storage media have finite lifetimes. How            evolve, adapt, and abandon support for earlier
many 7-track tapes can still be read, even if          versions. The same can be said for operating
you can find a 7-track tape drive to read them?        system providers. Applications are often bound
What about punched paper tape? CD-ROM,                 to specific operating system versions and must
DVD, and other polycarbonate media have                be “upgraded” to deal with changes in the oper-
uncertain lifetimes, and even when we can              ating environment. In extreme cases, we might
rely on them to be readable for many years,            have to convert file formats as a consequence of
the equipment that can read these media might          application or operating system changes.
not have a comparable lifetime. Digital storage           If we don’t find suitable solutions to this
media such as thumb drives or memory sticks            problem, we face a future in which our digital
have migrated from Personal Computer Memory            information, even if preserved at the bit and byte
Card International Association (PCM-CIA) for-          level, will “rot” and become uninterpretable.
mats to USB and USB 2.0 connectors, and older
devices might not interconnect to newer com-           Solution Spaces
puters, desktops, and laptops. Where can you           Among the more vexing problems is the evolu-
find a computer today that can read 8” Wang            tion of application and operating system soft-
word processing disks, or 5 1/4” or 3 1/2” flop-       ware or migration from one operating system to
pies? Most likely in a museum or perhaps in a          another. In some cases, older versions of appli-
specialty digital archive.

Digital Formats                                        If we don’t find suitable solutions
The digital objects we store are remarkably
diverse and range from simple text to complex          to this problem, we face a future
spreadsheets, encoded digital images and video,
and a wide range of text formats suitable for          in which our digital information
editing, printing, or display among many other
application-specific formats. Anyone who has           will “rot” and become uninterpretable.
used local or remote computing services, and
who has stored information away for a period
of years, has encountered problems with prop-          cations don’t work with new operating system
erly interpreting the stored information. Triv-        releases or aren’t available on the operating
ial examples are occurring as new formats of           system platform of choice. Application provid-
digital images are invented and older formats          ers might choose not to support further evo-
are abandoned. Unless you have access to com-          lution of the software, including upgrades to
prehensive conversion tools or the applications        operate on newer versions of the underlying
you’re using continue to be supported by new           operating system. Or, the application provider
operating system versions, it’s entirely possible      might choose to cease supporting certain appli-
to lose the ability to interpret older file formats.   cation features and formats.
Not all applications maintain backward compat-             If users of digital objects can maintain the
ibility with their own versions, to say nothing of     older applications or operating environments,
ability to convert into and from a wide range of       they might be able to continue to use them,
formats other than their own. Conversion often         but sometimes this isn’t a choice that a user
isn’t capable of 100 percent fidelity, as anyone       can make. I maintained two operational Apple
who has moved from one email application to            IIe systems with their 5 1/4” floppy drives for
another has discovered, for example. The same          more than 10 years but ultimately acquired a
can be said for various word processing formats,       Macintosh that had a special Apple IIe emula-
spreadsheets, and other common applications.           tor and I/O systems that could support the older
    How can we increase the likelihood that            disk drives. Eventually, I copied everything
data generated in 2010 or earlier will still be        onto newer disk drives and relied on conver-
accessible in useful form in 2020 and later?           sion software to map the older file formats.
To demonstrate that this isn’t a trivial exer-         This worked for some but not all of the digi-
cise, consider that the providers of applications      tal objects I’d created in the preceding decade.
(whether open source or proprietary) are free to       Word processing documents were transfer-

Internet Predictions

            able, but the formatting conventions weren’t               formats exist, such as OpenDocument format
            directly transformable between the older and               1.2 (and further versions) developed by OASIS
            newer word processing applications. Although               (see www.oasis-open.org). The Joint Photo-
            special-purpose converters might have been                 graphic Experts Group has developed standards
            available or could have been written — and in              for still imagery (JPEG; www.jpeg.org), and the
            some cases were written — this isn’t something             Motion Pictures Experts Group has developed
            we can always rely on.                                     them for motion pictures and video (MPEG;
                If the rights holder to the application or oper-       www.mpeg.org). Indeed, standards in general
            ating system in question were to permit third              play a major role in helping reduce the number
            parties to offer remote access in a cloud-based            of distinct formats that might require support,
            computing environment, it might be possible to             but even these standards evolve with time, and
            run applications or operating systems that devel-          transformations from older to newer ones might
            opers no longer supported. This kind of licens-            not always be feasible or easily implemented.
            ing would plainly require creative licensing and           The World Wide Web application on the Inter-
            access controls, especially for proprietary soft-          net uses HTML to describe Web page layouts.
            ware. If a software supplier goes out of business,         The W3C is just reaching closure on its HTML5
            we might wonder about provisions for access to             specification (http://dev.w3.org/html5/spec/Over
            source code to allow for support in the future, if         view.html). Browsers have had to adapt to
            anyone is willing to provide it, or acquisition by         interpreting older and newer formats. XML
            those depending on the software for interpreta-            (www.w3.org/XML/) is a data description lan-
            tion of files of data created with it. Open source         guage. High-level language text (such as Java
            software might be somewhat easier to manage                or JavaScript; see www.java.com/en/ and www.
            from the intellectual property perspective.                javascript.com) embedded in Web pages adds
                                                                       to the mix of conventions that need to be sup-
            Digital Vellum                                             ported. Anyone exploring this space will find
            Among the most reliable and survivable for-                hundreds if not thousands of formats in use.
            mats for text and imagery preservation is vel-
            lum (calf, goat, or sheep skin). Manuscripts               Finding Objects on the Internet
            prepared more than a thousand years ago on                 Related to the format of digital objects is also
            this writing material can be read today and are            the ability to identify and find them. It’s com-
            often as beautiful and colorful as they were               mon on the Internet today to reference Web
            when first written. We have only to look at                pages using Uniform Resource Identifiers
            some of the illuminated manuscripts or codi-               (URIs), which come in two flavors: Uniform
            ces dating from the 10th century to appreciate             Resource Locators (URLs) and Uniform Resource
            this. What steps might we take to create a kind            Names (URNs). The URL is the most common,
            of digital vellum that could last as long as this          and many examples of these appear in this arti-
            or longer?                                                 cle. Embedded in most URLs is a domain name
                Adobe Systems has made one interesting                 (such as www.google.com). Domain names
            attempt with its PDF archive format (PDF/A-1;              aren’t necessarily stable because they exist only
            www.digitalpreservation.gov/formats/fdd/fdd                as long as the domain name holder (also called
            000125.shtml) that the ISO has standardized                the registrant) continues to pay the annual fee
            as ISO 19005-1. Widespread use of this format              to keep the name registered and resolvable
            and continued support for it throughout Ado-               (that is, translatable from the name to an Inter-
            be’s releases of new PDF versions have created             net address). If the registrant loses the regis-
            at least one instance of an intended long-term             tration or the domain name registry fails, the
            digital archival format. In this case, a company           associated URLs might no longer resolve, los-
            has made a commitment to the notion of long-               ing access to the associated Web page. URNs
            term archiving. It remains an open question,               are generally not dependent on specific domain
            of course, as to the longevity of the company              names but still need to be translated into Inter-
            itself and access to its software. All the issues          net addresses before we can access the objects.
            raised in the preceding section are relevant to                An interesting foray into this problem area
            this example.                                              is called the Digital Object Identifier (DOI;
                Various other attempts at open document                www.doi.org), which is based on earlier work

		           	                               www.computer.org/internet/	                             IEEE	INTERNET	COMPUTING
                                                                                                           Future Imperfect

at the Corporation for National Research Initia-        vital that we solve the problems of long-term
tives (www.cnri.reston.va.us) on digital librar-        storage, retrieval, and interpretation of our
ies and the Handle System (www.cnri.reston.             digital treasures. Absent such attention, we’ll
va.us/doa.html) in particular. Objects are given        preside over an increasingly large store of rot-
unique digital identifiers that we can look up          ting bits whose meaning has leached away with
in a directory intended to be accessible far            time. We can hope that the motivation to cir-
into the future. The directory entries point to         cumvent such a future will spur creative solu-
object repositories where the digital objects are       tions and the means to implement them.
stored and can be retrieved via the Internet.
The system can use but doesn’t depend on the            Vinton G. Cerf is vice president and chief Internet evange-
Internet’s Domain Name System and includes                  list at Google. His research interests include computer
metadata describing the object, its ownership,              networking, space communications, inter-cloud com-
formats, access modes, and a wide range of                  munications, and security. Cerf has a PhD in computer
other salient facts.                                        science from the University of California, Los Angeles.
                                                            Contact him at vint@google.com.

A   s we look toward a future fi lled with an
    increasingly large store of digital objects, it’s
                                                               Selected CS articles and columns are also available
                                                               for free at http://ComputingNow.computer.org.

              How far have we come?
                                 See IC’s Millennium Predictions (Jan/Feb 2000 special issue)
                                 • “Guest Editors’ Introduction: An Internet Millennium Mosaic”:
                                 • “Millennial Forecasts”:

                                        Where will we go?
                                 See more from our IC’s Internet Predictions issue (Jan/Feb 2010)
                                 • “Guest Editors’ Introduction: Internet Predictions”:
                                 • “Internet Predictions”:


                   This article was featured in

    For access to more content from the IEEE Computer Society,
                 see computingnow.computer.org.

Top articles, podcasts, and more.


To top