LONG TERM STORAGE OF VIDEO IN THE DIGITAL WORLD NEW WAYS TO MIGRATE VIDEO Live Presentation Remarks June 26, 2004 Jim Lindner: We‘re going to be talking in this session about video as data. My session focuses not so much the ―how‖ but the ―what‖ of storage. Jim Wheeler and Ian Gilmore will be talking more about the what. I‘ll be talking about the results of some research that Media Matters did for the Dance Heritage Coalition. This project was a digital video preservation reformatting project. We did some research on compression formats, and that‘s what my presentation today is about. I want to thank the Mellon Foundation who funded the Dance Heritage Coalition, thus enabling us to do this research as well as the preservation work itself. And I also want to thank is Carl Fleischauer, who is in the audience, and who is a continual inspiration. If it weren‘t for him, we wouldn‘t have been involved in the first place. And Carl is the one who helps me out whenever I get in a pickle. I‘d like to talk a little bit about first the impact of compression on video. There are all sorts of technical issues relating to compression but what‘s really important from an archival point of view is that the artifacts that are created in compression actually become part of the piece. It‘s something people don‘t really talk about much. When we‘re thinking about compression, usually we‘re thinking about bandwidth or bit rate – those sort of things, and we don‘t talk about some of the larger issues. One doesn‘t normally think of dance as being on the cutting edge of technology; they‘re usually just struggling to survive. The reason we started this work with the Dance Heritage Coalition is because their archives are mostly on motion picture and video – moving image media. There are various forms of notation for dance, but the notation doesn‘t relate very much to performance. The critical documentation is recorded on film and video. And so for this ―financially challenged‖ dance community, how we preserve dance is very important. People in the dance community are struggling; they‘re desperate for answers – ―What do we do? Do we put everything on DV?‖ They came to us with questions about formats and conversions and compression and cost factors. And we found we didn‘t have any real information, any research that specified one method or format, or suggested the proper terms for a decision making process. Every format has strengths and weaknesses, and there is usually the issue of compression in the reformatting scenario. No matter what type of compression you‘re talking about there is no free lunch -- it‘s fair to acknowledge that up front. As with anything else, compression offers only trade-offs. In order to address the dance community‘s questions, we needed to examine those trade-offs, to determine what was acceptable, to in some sense quantify them in order to be able to apply that knowledge to a decision making process. We needed to look at those trade-offs from the archival point of view. We wanted to understand what would be on one hand minimally acceptable and on another hand, what would offer the best possible solution. We went into many issues in great depth, and ultimately, what I‘m going to present today, which is based on that research, is a case study for determining how to select a video preservation file format. I‘m not going to be talking much about file wrappers, but I can tell you that we determined from the beginning that, by definition, we are looking at AAF and MXF. They both capture metadata satisfactorily, so we consider that file wrapper format is not a significant issue. That being the case, the metadata issue becomes the quality of your data itself. I have been thinking recently about some of the AAF and the MXF features and extensions that allow you to edit extensively inside the file. I don‘t know anyone who has. For instance, it would be interesting to hear about someone who‘s made an AAF file, done a thousand edits in that file, and attempts to reconstitute the original essence – to compare and assess whether there have been any changes. I don‘t know whether that‘s been done. But for the Dance Heritage Coalition Project, we did not test that at all. So what are the objectives of our test? We looked at three basic areas; quality, usability, and preservability. I want to give you an idea first of the kind of material we were working with. [Ref: demonstration image] You‘ll notice that there are a lot of lines in the image – that‘s video interlace, not a compression artifact. That‘s an artifact of actually showing this here today. We are really looking at quality from a technical point of view, and we are looking at real-world examples, not test charts of signals. We are looking at the characteristics of picture and sound quality, including resolution, chroma bandwidth, luminance -- the different criteria we use to make a before-and-after comparison. What did it look like before? What did it look like after? We decided that a copy will pass the quality test if the measurement of these elements shows little or no diminishment or degradation when compared to the measurement of the original. The quality criteria objective, if you will is to make sure that what we end up with shares the same characteristics – technically speaking, the same resolution, chroma, bandwith and luminance – as what we start out with. And the techniques we use to affect this transfer have to be affordable. For instance, there are all sorts of esoteric encoding schemes that may be very effective in one context or another, but that are not affordable by a dance community. We‘ve decided that it had to be possible to edit the new copy, and that the new copy had to retain any innate information that supports any kind of search engine. Since we believe that the new copy will have to work in an environment where HD will be ubiquitous, we decided that one of the objectives is that the new copy has to be able to be output to HD, it has to have ―upward mobility,‖ in the sense that it should be migratable to a higher resolution context. And the new copy must permit tape to film transfer. Some of this material will be used again – projected – in the context of a live dance performance. So the ability for dance to use material in performance again is very important, and we don‘t want to limit its utility in those possible contexts. Finally, we specified that the new copy should have the characteristic of preservability. That‘s a word we don‘t have in the dictionary yet. The idea of the preservability is that the end product, the new copy must be migratable and must avoid technical protection such as encryption. The format must be open source, public, well documented, and it should require little or nothing in the way of license fees (which gets us back to ―affordable‖). The result of this research I am presenting today is a 150 page report, and so you‘re only getting a very superficial look at the data we produced in the short period we have here today. I‘m going to move through this very quickly, but you‘ll be able to read the report and get into all of this more deeply. So what was the essence of our research project? In conjunction with the Dance Heritage Coalition, we selected twenty-two clips for analysis. The clips were generally in excess of one thousand frames, and were selected for both technical and aesthetic criteria. We prepared these clips from the original tapes for analysis. This slide shows the process. [REF: PowerPoint slide] You have the original tapes on different formats -- Betacam SP, Hi-8, DV, and so on. DV was interesting because we did cross—transcoding, which is a nightmare. We then took the product of that exercise and tried to calibrate it using some of the Samma analysis tools, to try to find equivalencies in terms of video levels. For instance, we didn‘t want a video level registering one hundred ten percent, because that would necessarily cause a codec problem. So we of necessity had to impose a certain equivalence, to modulate the video playback in the transfer process so that it would fall within legal video range. These twenty-two samples were thus transferred to Betacam SP. We did not put them on Digital Betacam because we didn‘t want to compound the compression error — that is, we needed to avoid combining the compression programs being tested in playback with the native, proprietary compression system of the proprietary Digital Betacam. We then converted the clips to uncompressed AVI files, and we compressed each clip using six different codecs. Finally, we played back the migrated, digitized, compressed data files of these twenty-two clips and analyzed each of those variations using fourteen different metrics. This testing process thus requires no less than 1848 specific measurements, and this required more than a year to complete. Let‘s talk about the clips. I‘m not going to go over them in detail, but I‘ll show you a few samples to give you an idea of the range and types of material that was analyzed. Perhaps the most important thing is to look at the original formats. The first is Hi-8. Then DVcam. Then 3/4‖ U-matic. [REF: images from each format]. We did not try to bias the test in terms of format – say only demonstrating the up market alternatives such as Betacam SP or Digital Betacam. We tried to get a wide range of quality and we even included VHS. In terms of compression, we applied six different types. For example, MOV files were submitted to Sorenson Video-3. I can‘t go into all the details here, but you can consult the report and learn about the full complement. Here are the criteria for MPEG IV. You‘ll notice the bit rate what we believe to be reasonable considering the task at hand. We wanted to keep things relatively equivalent, insofar as that was possible. Windows Media, Real Media. And as you probably know, there‘s a relationship between Real Media, Windows Media and MPEG IV. There‘s a lot of politics involved in all this, but MPEG IV and Windows Media have a particular relationship with one another. MPEG II at twenty megabit. We also did MPEG II at fifteen megabit. And JPEG 2000. JPEG 2000 was our lossless compression type for this test. One of the main things that we wanted to test with JPEG 2000 lossless was whether it was lossless, because we in fact didn‘t know. We‘d been told it was lossless, but those of you who work with Avid Editing Systems have been told that too, but in fact, that system is not lossless. So we wanted to verify that what has been sold as lossless is in fact lossless. And indeed JPEG 2000 was lossless; I‘m not going to show you any of the graphs on that test because the results are the identity set. What went into compression is what came out of compression. And because of this, the JPG2000 lossless samples actually proved to be a challenge for our analysis tools. There are devices available – one made by Techtronics, for instance – that will allow you to do an analysis of a system, encoder, decoder, vtr, etc. . And you can test material from tape or hard disk or whatever source you need to analyze by passing a standard reference – a test pattern or other standard analytical content – through each of the components of the system and measuring the signal. We felt these systems didn‘t speak to the issues at hand, the issues of real-world material in an archival context. We wanted to work on two levels: the technical level and the perceptual quality level. We chose one tool which was the only analytical software tool for video we were able to find, made by [get data from PowerPoint slide or JL], a small Japanese company. Their system has has a number of different metrics. Probably the most important metric is the MOS -- the mean opinion score. The MOS is a simulation based upon real world testing of different criteria, and a weighting of it as a derivation of the responses of a subjective audience. Data for the MOS is generated by showing audiences different sample material and asking them to rate the samples with slider switches. The compiled information was reduced to an algorithm that models audience response. The MOS is sort of mean opinion of overall perceptual quality. So MOS was perhaps the preeminent metric of the fourteen characteristics we measured. We looked at metrics that take into account the image content. We looked at blurriness and blocking. We looked at video quality metrics. [REF: PowerPoint slide] In terms of technical metrics, the analyses can be grouped into two categories: spatial and temporal. Spatial metrics include effects such as blockiness. Temporal metrics are mainly directed at the instabilities produced by incorrect interframe movements, what we call ―jerkiness.‖ To complete the spatial-temporal metrics of our test program, we looked at several million individual frames. There are advantages to both referenced and non-referenced metrics, but in this study we concentrated on non-referenced metrics because we needed to measure the real-world test footage against itself – before and after compression. We looked at fidelity metrics, and at measurements that quantify the mathematical differences between samples. We looked at the spatial-temporal metrics related to ANSI and its standards matrix. Here are some of the individual criteria we looked at, and these were the ones that I found most telling. Simpler aspects of the image -- jerkiness, blockiness -- stand out assertively when you‘re looking at different compression. Some of the other defects we tracked include blur, noise, ringing, color saturation. MOS was extremely revealing as an analytical tool, as I mentioned. I‘m going to now show you that clip again, and you‘ve seen it once before, so I‘m going to ask you to concentrate on it. Understand that you‘re looking at it through PowerPoint. [REF: clip from PowerPoint] Here‘s what you‘re about to see. Beginning with and an AVI file, the clip was compressed by the Windows Media 9 encoder using a middle setting. It‘s playing back on my laptop through this data projector, so a lot of the artifacts you‘re seeing relate not to the compression but to the system that we‘re playing it through. I‘d like you to look at a couple of things in particular. Look at details such as edges along the humans as they are moving quickly. These are characteristic areas where compression algorithms fail. Notice that in the middle of the clip, we go to a close-up. You can see two windows on each side where they are actually using the video in the dance production. Those of you who work with compression would know from a technical point of view why we would pick a clip like this: an overall dark area and a blown out region in the middle where all of the action is taking place. The codec has a lot of work to do to try to keep up with all that. Let‘s look at some of the results. Let‘s look at MOS first. And here we‘re comparing two different codecs. This is a Sorenson V-III versus MPEG IV. [REF: PowerPoint slide] In this particular graph, higher is better, and five would be the best quality. You can see MPEG IV is lower than that. I‘d like to draw your attention to one particular segment: the close-up. You‘ll notice that the Sorenson Codec didn‘t compress this segment much. And the image quality oscillated back and forth within a rather narrow band, and the results were not particularly good. But if you look at MP IV, you‘ll see that there are several spikes along here. Those are probably the base frames. The interpolated frames have generally lower quality than the base frames. The MOS analysis of the Sorenson compression indicates consistently better performance in a tighter range than the MP IV clip. However, while averaging lower overall, at times the MP IV produced moments of extremely high quality. By contrast, the Sorenson delivered a more even and better level of quality although clearly the results were not outstanding. Moments of greatness doesn‘t necessarily equate with an overall enhanced viewing experience. In fact, the inconsistency produced by this compression can be quite distracting. You have a base level of quality which improves and reverts in a pattern which draws attention to itself and to the lower end of the quality spectrum when it appears on screen. One of the things we learned by this study is that it‘s very difficult to forecast how program content will respond to compression. In the same clip, all along we have blockiness. The problem occurs in MPEG II, perhaps less seriously. This is MPEG II versus MPEG IV. [REF: PowerPoint clip]. All of a sudden, the quality level of MPEG II at the close-up takes a dive. In this particular graph, we‘ve tracked percent of distortion. MPEG II performs better, and Sorenson actually got dramatically worse. The MOV was looking consistently good; you don‘t see too much oscillation. Pretty consistent results all of the way across and a very nice tight band. And then all of a sudden everything sort of goes to hell up here. Okay. And by contrast, the MPEG II actually got better. I can‘t tell you why that happened but I can tell you that it did happen. What did we learn from that? It‘s almost impossible to predict codec performance in advance. There was little consistency in codec performance from clip to clip or within a clip. Sometimes performance is divergent and other times the performance is uniformly bad. Here is Real Media versus Windows Media. Now these are two arch-rival companies. And here you can see Windows Media and Real Media almost shadow each other exactly. And when performance drops off, it is a very substantial decline. Neither of them are ‗stellar‘ performers. So sometimes you have a codec which performs well until it encounters an anomaly in the datastream that causes the compression algorithm to perform very sub-optimally, and then it returns to normal performance when the signal returns to what it recognizes as a stable source. There is no way to forecast that. I call that ―smooth sailing followed by disaster.‖ This chart is for the second example [REF: PowerPoint slide], a different clip from the first demonstration. This is represents Windows Media versus MPEG IV, which is interesting because they are variations of the same basic compression engine. And you can see that at the beginning they were comparable -- clearly neither codec was doing a very good job at the beginning of the clip. I suspect that‘s because of the quality of the initial frames. But as the clip progresses, you can also see that both of them totally fall apart towards the end of the clip. We had those kinds of results in many, many, many of the clips. We also had clips of better quality with large variations in quality; we felt this variation was actually worse in the sense of being more distracting than a lower-scored but more even performance. In this set of examples [REF: PowerPoint slide], you see wild oscillations in quality versus consistent, lower-quality performance. And whether the eye is drawn to these wild oscillations or not, whether subjective viewing finds the inconsistent or the lower-quality performer easier to watch -- in either case, it seems clear from an archival point of view that this kind of performance is not acceptable. There was no clear leader over a wide variety of material. Each codec had its own problems and we found that bit rate was not a good overall predictor of quality. That surprised me because I assumed bit rate would be a good indicator -- that‘s the ‗real world‘ feeling we all have – that you‘ll get better performance out of a 20 megabit encoder than out of a 15 megabit encoder. But our results indicate that is not the case. It depends upon the material that‘s going in and on the specific algorithm at the heart of the codec. You can‘t generalize that higher bit rate equals superior results. You probably could make the generalization that one and a half megabits is worse than 15, but within a reasonable scale, you can‘t assume that higher bit rate equals superior results. Artifacts can be generated by any codec at any bit rate. Some are perceptually significant and others aren‘t. Business issues and marketing dynamics notwithstanding, there was no clear performance leader even in the match-ups such as Windows Media v.MP IV or Windows Media v. Real Media, where very similar systems were closely compared. In some ways, these are disappointing results. Our opinion from an archival point of view is that lossy compression is unacceptable, any type at any rate. The reason is that there is no way to reliably predict performance over a wide spectrum of material unless you can do scene-to-scene compression, which we reject as being economically unfeasible. For example, such a solution would be prohibitive for the dance community. Those of you who have been involved with mastering a DVD can attest to what I‘ve suggested here this morning. Unless you‘re going section by section and closely analyzing the scenic content, it‘s very hard identify the algorithm that will give the best performance for a given video source. No one algorithm or combination of algorithms showed outstanding performance in our test. As a result, we believe that lossless compression is the only viable and acceptable option for video preservation. In terms of lossless compression, there is a standard. JPEG 2000 is a compression scheme based on a mathematically lossless algorithm, and it is an open standard. Mathematically lossless compression offers many advantages. There are no artifacts due to the compression process. Frames are available as discreet units—this is with JPEG 2000 -- which is very important. MPEG compression, for example, does not retain all the frames as discrete units. We think a three to one compression factor is feasible, and possibly more with certain types of material. With JPEG2000, Each frame can compress differently because each frame is an individual unit. Now, quality of the compression program is the most important factor, but certainly not the only factor in the viability of any mass conversion effort. If compression is going to work well in an overall program of archival migration, it needs to be real-time. Analog Devices has introduced JPEG 2000 lossless — JPEG 2000 codecs that work in real-time and aren‘t hardware based. We have a prototype in our lab, and we are very encouraged by it. We expect that JPEG2000lossless will be able to run on cost-effective hardware – that is, workstations that cost less than ten thousand dollars. And I‘m hoping that if the media-sector market responds positively to JPEG2000, there will be an increase and concomitant price drop in media-enhanced chips so that it will become very cost effective to run lossless codecs on relatively inexpensive hardware. Even compressed data needs to exist somewhere in storage, and fortunately, we find storage technology trending cheaper as well. Just over the course of this study we‘ve seen a substantial and characteristic drop in the cost of data storage. [REF: PowerPoint slide]. In 1998, the cost per gigabyte then was $57.97. When we started the project, data storage was about a dollar a gigabyte for raw storage – not for a higher level system such as a RAID system, but just for the basic storage function. The cost of a gigabyte is down to seventy-nine cents as of May 28. And based upon this curve which should look very familiar to many of you, we believe that by 2010, the cost will be somewhere around six cents per gigabyte. So that being the case, the need for high data compression is mitigated, and it becomes more economically feasible to deploy archivally acceptable lossless compression even though it uses more space than other forms of compression. There are many cost components involved in the migration of video to data files, including facility overhead. As the economics of processors and storage evolve, the additional cost of lossless compression (as compared to lossy) becomes smaller relative to overall project cost. By 2010, an hour of content is going to cost about $1.50 US in 2004 dollars for raw data storage. This is roughly where audio is now in terms of unit cost. And, you may recall that the audio archivists had the same conversation we are having now several years ago. The price of storage began to drop, acceptable compression tools became available, and now one hardly ever hears about these factors as significant impediments to conversion. We believe video is following that curve. At this point, I want to hand this over to Jim Wheeler, who is going to talk about the emerging medium for archival storage of this migrating data. Thank you.