EXAMINATION OF THE SAMVIQ METHODOLOGY FOR THE SUBJECTIVE by slappypappy116

VIEWS: 120 PAGES: 6

									          EXAMINATION OF THE SAMVIQ METHODOLOGY FOR THE SUBJECTIVE
                      ASSESSMENT OF MULTIMEDIA QUALITY

  Quan Huynh-Thu a,b Matthew Brotherton c David Hands c Kjell Brunnstr¨ m d Mohammed Ghanbari b
                                                                      o
                                a
                             Psytechnics Ltd, 23 Museum Street, Ipswich, IP1 1HN, UK
                            b
                           University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
                   c
                     British Telecommunications plc, Martlesham Heath, Ipswich, IP5 3RE, UK
                                d
                                  Acreo AB, Electrum 236, SE-164 40, Kista, Sweden


                           ABSTRACT                                        invest in their network infrastructure in order to transmit
                                                                           data reliably, with subjectively acceptable levels of delay
The Video Quality Experts Group (VQEG) is preparing a
                                                                           and data loss. The multimedia content should be available
large number of co-ordinated subjective multimedia quality
                                                                           at a high quality at source and any encoding and decoding
tests. These tests will be performed by laboratories located
                                                                           of the data should be performed with minimal impact on
in Europe, Asia and North America. The subjective data
                                                                           the reproduction quality of the media. When faults occur in
will be used to evaluate the performance of competing ob-
                                                                           the encoding, transmission or decoding of multimedia ser-
jective multimedia quality assessment algorithms. The reli-
                                                                           vices then there should exist sufficient quality monitoring
ability of the subjective test data is therefore of great impor-
                                                                           to rapidly identify any faults. Ideally, fault detection should
tance for VQEG’s task. Although standards were produced
                                                                           allow for problems to be resolved before they affect the user
on the basis of best available evidence obtained from related
                                                                           experience.
fields, relatively few multimedia subjective quality studies
                                                                               For the purpose of this discussion, multimedia video is
were performed to address specifically the most appropriate
                                                                           considered to be the video component within multimedia
and reliable assessment method for multimedia. There is a
                                                                           services such as Internet streaming, mobile streaming and
need for more substantive research evidence to be published
                                                                           video-conferencing. To detect problems in multimedia ser-
in order to validate and improve existing multimedia qual-
                                                                           vices, a number of quality measurement options are avail-
ity assessment standards. ITU-T SG9, ITU-R WP6Q and
                                                                           able to operators. For measuring the integrity of data trans-
VQEG are supporting activities on developing new methods
                                                                           ported across a network, network performance indicators
for assessing the subjective quality of multimedia systems.
                                                                           can be accurately measured, e.g. latency, throughput and
Recently, a new multimedia subjective quality assessment
                                                                           packet loss rate. If the network performance falls below
method, named SAMVIQ, has been proposed for standard-
                                                                           some critical threshold value, the provider can be informed
isation within ITU-R WP6Q. SAMVIQ was originally an
                                                                           and a decision can be made to act. Unfortunately, network
EBU methodology and has been designed specifically for
                                                                           performance indicators are not useful measurements of the
assessing the perceptual quality of video delivered as part
                                                                           quality perceived by the end-user as the impact of packet
of a multimedia service. In this paper, two psychovisual
                                                                           loss or delay is the result of a complex interaction between
experiments investigating the suitability of SAMVIQ are
                                                                           loss distribution, scene content, the encoding process and
presented. The results of these experiments are discussed
                                                                           the error concealment mechanism used during the decoding
within the context of the VQEG multimedia testing pro-
                                                                           process. Methods exist for calculating the pure objective
gramme.
                                                                           quality of media. For video, peak signal-to-noise (PSNR)
                                                                           measurements can be performed [1]. However, studies have
                     1. INTRODUCTION                                       shown that pure objective calculations, such as PSNR, corre-
                                                                           late poorly with human perception of quality [2].
Multimedia services are being deployed worldwide across                        Given that neither network performance measurements
various transmission platforms (e.g. ADSL, cable, mobile).                 nor pure objective quality calculations provide reliable mea-
Service providers and network operators alike are interested               sures of perceptual quality, alternative measurement meth-
in ensuring that the quality of the media meets customer ex-               ods are required. For this reason, objective perceptual qual-
pectations. To deliver high quality services, providers must               ity metrics, incorporating models of the human visual sys-
    The main author is also currently a PhD student at the University of   tem, have been developed. The performance of these per-
Essex.                                                                     ceptual computational methods has been proven to be suf-
ficiently accurate at predicting subjective quality opinions      media task has agreed on a subjective test methodology [8]
for international standards to be agreed. Video standards        that broadly follows ITU-T P.910. This choice was the re-
exist for standard definition television delivered over terres-   sult of extensive investigation and from running preliminary
trial broadcast and cable transmission systems [3, 4]. The       tests [9, 10, 11, 12]. The VQEG tests will use the single-
standardisation of perceptual video quality metrics has been     stimulus ACR scale methodology with inclusion of a hid-
possible due to experts working across both the ITU-R and        den reference for collecting quality ratings from subjects.
ITU-T, as well as contributions from several other interna-      Single-stimulus methods have been used extensively in au-
tional organisations (e.g. VQEG, ISO/IEC and ANSI). Fol-         dio, voice and video subjective quality tests. The advan-
lowing the success of this work, experts from audio, voice       tage of single-stimulus testing is that it allows many more
and video perceptual quality measurement have been work-         test conditions to be presented for assessment compared to
ing together in VQEG to evaluate new competing objec-            the alternative double-stimulus methodology. One study ex-
tive perceptual video quality metrics for multimedia appli-      amining subjective quality assessment for multimedia did
cations. VQEG is currently conducting an extensive pro-          not find any significant difference in ratings for video clips
gramme of subjective tests across several countries in order     viewed under both single- and double-stimulus presenta-
to collect sufficient subjective data to evaluate the perfor-     tion methods [9]. A further study using the single-stimulus
mance of these objective models.                                 ACR method indicated that this method provides reliable
                                                                 data [12].
                                                                     Recently a new methodology called SAMVIQ was pro-
     2. SUBJECTIVE MULTIMEDIA QUALITY                            posed for standardisation within ITU-R WP6Q. The ratio-
                ASSESSMENT                                       nale for the SAMVIQ methodology is described by signifi-
                                                                 cant differences between the broadcast television and mul-
In order to obtain consistent and reliable data from sub-        timedia domains. Multimedia presents extensive choices
jective quality tests, international standards have been pro-    and options over the fixed architecture of the television do-
duced for broadcast television [5]. These standards were de-     main. Whereas the television domain is relatively specific,
fined after many years of preliminary tests in order to deter-    the multimedia domain offers a variety of choices for co-
mine best practise. International standards for multimedia       decs, image formats, frame rates and display types. Ad-
quality assessment have also been set [6, 7]. ITU-T P.910        ditionally for television quality assessment, judgements are
describes procedures for performing subjective assessment        almost exclusively based on the spatial aspects of images.
of the video component for a one-way (non-interactive) mul-      The temporal refresh rate (frame rate) of multimedia video
timedia service. Although standards were produced on the         can be variable and subsequently observers are required to
basis of best available evidence obtained from related fields,    perceive both spatial and temporal errors. These factors
relatively few multimedia subjective quality studies were        combined with a range of possible viewing distances and
performed to address specifically the most appropriate and        some more practical aspects of testing have led to the de-
reliable assessment method for multimedia. The existence         velopment of assessment methods dedicated to multimedia.
of these standards has been useful in providing methodolog-      The SAMVIQ methodology is currently being standardised
ical guidelines to researchers working on multimedia qual-       within ITU-R. However, there is still a need to investigate
ity. However, there is a need for more substantive research      and define best practice for applying the SAMVIQ method.
evidence to be published in order to validate and improve
                                                                     The authors have performed two experiments using the
existing multimedia quality assessment standards.
                                                                 SAMVIQ method to study its reliability, merits and limita-
     The VQEG Multimedia project has been focusing on            tions. The results of these experiments are discussed within
evaluating objective quality metrics for measuring the per-      the context of the VQEG multimedia testing programme.
ceptual quality of multimedia services. Performance of ob-
jective models will be evaluated against subjective data. For
the work of VQEG to be successful, there must be available                3. THE SAMVIQ METHODOLOGY
reliable and appropriate methods for performing subjective
quality assessment of multimedia video content. The ini-         The SAMVIQ methodology uses a multi-stimuli with ran-
tial phase of the VQEG multimedia activity will concentrate      dom access approach and differs significantly from tradi-
solely on video but this is the first step for considering the    tional subjective test methods in several aspects. The in-
suitability of methodologies for true multimedia (i.e. com-      terface presents subjects with a single scene, available at a
bined audio, video and possibly other media such as text         variety of quality levels (including the reference and hidden
or graphics) subjective quality assessment. For VQEG’s           reference conditions) for quality assessment. Each scene is
multimedia work to be successful, the subjective assessment      therefore presented such that the viewer can compare be-
methodology must be precisely defined and produce reliable        tween all processed versions of the scene as well as against
and repeatable subjective quality data. The VQEG Multi-          the reference, adjusting the quality rating for each video
sequence accordingly. Each sequence is presented on its
own. The viewer has the capability to access each video se-
quence, adjust and amend the quality rating multiple times.
When all the sequences of the same scene have been rated
by the viewer, a new scene is presented.




                                                                          Fig. 2. SAMVIQ sequence random access.


                                                                until playback is complete. In addition, all sequences of the
                                                                current scene must be scored before the assessor can pro-
                                                                ceed to the next scene or visit the previous scene. However,
               Fig. 1. SAMVIQ user interface.
                                                                viewers are free to adjust these ratings once an initial vote
    Figure 1 shows the interface used in our experiments.       is entered. Cycling between scenes using the NEXT and
The interface presents a series of buttons that allow a col-    PREV buttons recalls all the previous ratings.
lection of sequences to be viewed one at time in the video
window. STOP and PLAY buttons allow the viewer to stop
or restart the video. In addition, NEXT and PREV buttons                           4. EXPERIMENTS
allow assessors to move to the next or previous scene. Qual-
ity ratings are made using the interactive slide-bar located    Subjective tests were performed by two laboratories, Lab
on the right-hand side of the video window. The slider is       A (BT) and Lab B (Psytechnics), both based in the UK. BT
a 0-100 continuous scale annotated with adjectival mark-        and Psytechnics have expertise in subjective assessment and
ers (Excellent, Good, Fair, Poor, Bad) graded at intervals of   objective perceptual measurement of speech, audio, video
20. Moving the slider updates the quality rating for the se-    and multimedia. Both organisations are active participants
quence currently selected. Access to each of these items on     in VQEG Multimedia. Both experiments were conducted in
the interface is subject to specific restrictions bound to the   soundproof test rooms with controlled lighting conforming
SAMVIQ method.                                                  to international recommendations [6].
    Figure 2 shows the typical SAMVIQ test organisation.            Both SAMVIQ tests were identical in procedure and
Each scene is presented with the following conditions: an       used the same software interface. The two test laborato-
explicit reference, a hidden reference and 10 processed vi-     ries used different computers and displays but they were of
deo sequences (PVSs). A PVS is obtained by processing a         similar specifications. The PC driving the experiment was
reference video through an error condition as explained in      placed outside the test room in order to avoid annoyance due
Section 4. The button with label REF clearly identifies the      to noise generated by the computer. The computer specifi-
explicit reference sequence. Buttons with letter labels A to    cations for Lab A and Lab B were chosen to playback the
K give access to either the hidden reference or one of the      video correctly. Our SAMVIQ software used a Windows
processed sequences. The letter labels identifying each test    Media plug-in for video playback. A test sequence was used
sequence for each scene were randomized across viewers.         to test that the refresh rate of the monitor was synchronized
The presentation order of the scenes was also randomized        with the refresh rate of the video. A computer LCD monitor
across viewers. Randomization in both these cases is crucial    was used to display the video sequences. The monitor re-
to reduce contextual effects that could be introduced by pre-   sponse time was lower than 16ms. User interaction with the
sentation and/or letter identifier order [13]. Each PVS must     SAMVIQ interface was performed using a computer mouse.
be viewed once fully before the assessor can score it or view       A total of 15 non-expert viewers took part in the sub-
another clip. Subsequently, all buttons including the sliding   jective experiment conducted by Lab A, 2 females and 13
rating scale are disabled on the first viewing of a sequence     males. Most of the subjects were experienced in using a
computer and worked in the telecommunications industry,                        to generate the PVSs. All video sequences were stored in
but none had worked in fields related to picture quality or                     uncompressed AVI-RGB24 format.
video coding. None of the subjects had taken part in any
subjective testing for at least 6 months. A total of 15 non-                              Table 1. List of error conditions (HRCs).
expert viewers participated in the experiment conducted by                      HRC nr     codec    frame rate   bit rate           other
Lab B, 10 females and 5 males. Viewers were recruited                              1       none         25        none
from the public and were from a diverse professional back-                         2       none        12.5       none
ground. All viewers reported to have normal vision.                                3       none          5        none
                                                                                   4       RV10         25        512k
    Before beginning the test, viewers were given written                          5       RV10         25        320k
instructions and had to run a practice trial consisting of one                     6       RV10        12.5       128k
unique reference scene, a hidden reference and 10 PVSs.                            7       RV10        12.5        56k
                                                                                   8       RV10         25         320      QCIF input to encoder
The practice trial was representative of the quality range in                      9       RV10        12.5        56k      QCIF input to encoder
the actual test. The practice trial ensured viewers were fa-                      10       RV10         25        128k       frame freezes/drops
miliar with the range of quality included in the test but more                    11      MPEG4         25        704k
importantly to familiarise them with the software and proce-                      12      MPEG4        12.5       320k
                                                                                  13      MPEG4         25        704k        PLR=3% bursty
dure. Each test lasted approximately 45 minutes to 1 hour,                        14      MPEG4        12.5       320k        PLR=3% bursty
although this was dependent on the viewer and how many                            15      MPEG4          5         94k
times they viewed each sequence. Viewers were allowed to                          16      H.264         25        512k
take a break after viewing half of the scenes.                                    17      H.264        12.5       320k
                                                                                  18      H.264        12.5       128k
    Viewing distance was not fixed. Subjects were allowed                          19      WMV9         12.5       320k
to adjust their position to their most comfortable viewing                        20      WMV9         12.5       128k
distance, although subjects were initially asked to keep their
back in contact with the chair which was placed at a distance                      The SAMVIQ methodology is limited to a maximum of
of 6H from the screen (H = physical height of the picture).                    10 error conditions for each scene, excluding the explicit
    Source material consisted of 9 scenes (SRC), each of 8                     and hidden references. Therefore the HRCs were divided
seconds in duration. Content was selected to include sev-                      between the two test laboratories in order to assess a wide
eral categories of content as specified in the VQEG multi-                      range of video qualities and content. Lab A used HRCs 3,
media test plan: sports, movie trailers, news/documentary,                     4, 7, 8, 12, 14, 15, 16, 17 and 20. Lab B used HRCs 2, 5, 6,
music video, advert, animation and talking-head. Content                       9, 10, 11, 13, 15, 18 and 19. The HRCs were divided such
was selected to span a wide range of spatial (SI) and tem-                     that both tests covered the entire quality range and there was
poral (TI) complexities. SI and TI values were estimated                       a balanced representation of qualities in each test. Note that
using the approach described in ITU-T P.910. Reference se-                     HRC15 was used as a common (duplicate) HRC in both
quences were at CIF resolution (352 x 288 pixels) and were                     tests.
derived from standard- or high-definition original content
                                                                                                   5. DATA ANALYSIS
by spatially re-sampling the original video to CIF. The test
sequences did not contain any audio track.
                                                                               From the SAMVIQ experiments, MOS values were obtained
                                      Network
                                                                               for each PVS. By processing every SRC through every HRC,
                                 transmission errors                           a total of 180 PVSs were obtained. The SAMVIQ method-
 Source
  video
          Frame rate
           reduction
                       Video
                       encoder
                                                        Video
                                                       decoder
                                                                   Processed   ology requires each scene to be presented with all quality
                                                                     video
 (SRC)
                                                           (HRC)
                                                                    (PVS)      conditions and a maximum of 10 conditions may be created
                                                                               and tested per scene. Also, because of time constraints, it is
          Fig. 3. Processing of test video sequences.                          not possible to run much more than 90 PVSs in any single
                                                                               SAMVIQ test. For these reasons, Lab A presented 90 PVSs
    Each reference sequence (SRC) was processed through                        for subjective assessment and Lab B presented the remain-
a total of 20 hypothetical reference circuits (HRC) to gener-                  ing 90 PVSs.
ate processed video sequences (PVS) spreading over a wide                           Analysis of MOS suggested suspicious scoring from one
quality range. Table 1 shows the list of experimental con-                     viewer in each of the experiments. These viewers consis-
ditions. Note that HRC1 is the hidden reference condition                      tently gave a low MOS score to all explicit reference videos.
in the SAMVIQ method. Encoding and transmission error                          The results of these subjects were discarded for data analy-
parameters were selected to produce a wide distribution of                     sis. Average MOS standard deviation and confidence inter-
quality in order to try to exercise entirely the quality scale.                val for the Lab A experiment was 14.36 and 7.67 respec-
Packet loss was the main type of transmission error con-                       tively. Average standard deviation and confidence interval
sidered in this study. Figure 3 shows the procedure used                       for the Lab B experiment was 15.10 and 8.07 respectively.
When SAMVIQ ratings are scaled down to the 5-point cate-          than the one given to the corresponding hidden reference
gorical MOS scale, standard deviation values are similar to       in the Lab A experiment. This value increases to 91% for
the ones typically obtained when assessing subjectively the       the Lab B experiment. These findings suggest that the term
video quality using the single-stimulus ACR method [12].          “explicit” reference has some influence on the rating of the
The distribution of votes for both experiments is shown in        unprocessed video. By identifying the reference explicitly,
Figure 4 and indicates that the experiments included a well-      it appears that subjective ratings are positively biased.
distributed spread of qualities with the Lab B experiment
including a higher percentage of high quality conditions.




                                                                                   (a)                              (b)



                (a)                               (b)             Fig. 5. Distribution of votes for explicit reference videos for (a)
                                                                  Lab A and (b) Lab B.
   Fig. 4. Distribution of votes for (a) Lab A and (b) Lab B.         In addition to the experiments using novice viewers, both
     Table 2 shows the correlation between each subject’s in-     test labs also conducted a pilot experiment using each 6
dividual scores and the average of all other subjects’ scores,    expert viewers. These subjects had knowledge in image
both per file and per error condition. Results show that           coding and most of them work in a field related to image
viewers agreed highly with other viewers’ opinion of qual-        quality. Using expert viewers, the average MOS standard
ity in both experiments.                                          deviation and confidence interval was 11.74 and 9.59 re-
    Table 2. Inter-subject correlation, per file and condition.    spectively for Lab A, 9.99 and 8.15 respectively for Lab
                             Lab A           Lab B                B. The MOS standard deviation for experts was therefore
            Subject Nr   File    Cond    File    Cond             lower than for non-experts in both experiments, i.e. experts
                1        0.85    0.95    0.91    0.97             agreed more closely with each other’s opinion of quality in
                2        0.87    0.97    0.91    0.99
                                                                  comparison with novice viewers. A higher confidence inter-
                3        0.91    0.98    0.84    0.97
                4        0.83    0.93    0.91    0.97             val for the expert viewers can be explained by the fact that
                5        0.86    0.98    0.85    0.96             only 6 expert viewers were used, compared to a number of
                6        0.92    0.98    0.89    0.99             14 non-expert viewers. Figure 6 compares MOS between
                7        0.92    0.99    0.82    0.96
                8        0.94    0.98    0.92    0.98
                                                                  the 2 groups of viewers for each lab. The graphs show that
                9        0.94    0.98    0.91    0.98             MOS between expert and novice viewers correlate highly.
               10        0.93    0.96    0.92    0.98             Correlation between expert and novice viewers for Lab A
               11        0.92    0.99    0.82    0.98             and Lab B was 0.97 and 0.96 respectively. However, ex-
               12        0.94    0.99    0.85    0.91
               13        0.95    0.99    0.88    0.98
                                                                  perts tend to be in overall more critical in quality rating than
               14        0.90    0.97    0.94    0.97             non-experts. This is mostly visible for Lab B results.
              mean       0.90    0.97    0.88    0.97
               min       0.83    0.93    0.82    0.91


The presence of the explicit and hidden reference videos
was also examined. The distribution of votes for all the ex-
plicit reference videos is shown in Figure 5. Most of the
explicit reference videos obtained a very high quality score.
Furthermore, 6 of the 14 viewers (i.e. 42%) in the Lab A
experiment marked all 9 explicit reference videos with the
maximum score of 100. Additionally, 2 viewers marked
100 for 8 of the 9 references and a score above 90 for the                         (a)                               (b)
remaining one. In the Lab B experiment, 5 of the 14 view-
ers (i.e. 36%) also gave the maximum score to all explicit        Fig. 6. Scatter plots of expert vs non-expert viewers for (a) Lab A
reference videos. A comparison between each explicit ref-         and (b) Lab B.
erence and its corresponding hidden reference indicates that
89% of all the scores given to the explicit reference is higher       In addition to the quality assessement task, all (novice)
viewers in the Lab B experiment were asked at the end of                 (PSNR) Full Reference Technique,” 2001.
the test to answer 5 questions concerning the usage of the
SAMVIQ interface. Table 3 reports the answers to the sur-            [2] I.E.G Richardson, H.264 and MPEG-4 Video Com-
vey. Most subjects needed to view the explicit reference                 pression: Video Coding for Next-generation Multime-
only once to rate the quality of the other sequences. The                dia, John Wiley and Sons, 2004.
presence of the explicit reference did not help most view-           [3] ITU-T, “Rec. J.144: Objective Perceptual Video Qual-
ers in scoring the other sequences. On the other hand, the               ity Measurement Techniques for Digital Cable Televi-
capability of amending previous votes was used. Finally,                 sion in the Presence of a Full Reference,” March 2004.
although the multi-stimuli access of the SAMVIQ interface
allows viewers to compare and rank order test sequences ac-          [4] ITU-T, “Rec. BT.1683: Objective Perceptual Video
cording to their perceived quality, most viewers did not use             Quality Measurement Techniques for Standard Defin-
this capability (possibly because the task of ranking would              ition Digital Broadcast Television in the Presence of a
be too long to complete).                                                Full Reference,” June 2004.
Table 3. Survey about the usage of the SAMVIQ user interface.        [5] ITU-R, “Rec. BT.500-11: Methodology for the Sub-
                                           1x    2x   3x     More
                                                                         jective Assessment of the Quality of Television Pic-
  How many times did you watch            56%   36%   3%     6%
  the REF for each scene?                                                tures,” 2002.
                                            Yes             No
  Did you use the REF to compare            36%            64%       [6] ITU-T, “Rec. P.910: Subjective Video Quality As-
  against each of the clips?                                             sessment Methods for Multimedia Applications,” Sep
  Did you replay the clips in the scene     64%            36%           1999.
  in order to change your vote?
  Did you go back to previous scenes        60%            40%       [7] ITU-T, “Rec. P.911: Subjective Audiovisual Quality
  to double-check them?
                                                                         Assessment Methods for Multimedia Applications,”
  Did you put the clips in a rank order     39%            61%
  from best to worst?                                                    Dec 1998.
                                                                     [8] VQEG, “VQEG Multimedia Test Plan,” Oct 2006,
                        6. CONCLUSION                                    Available from www.vqeg.org.
                                                                     [9] Q. Huynh-Thu and M. Ghanbari, “A Comparison
Experiments were performed at two laboratories examining
                                                                         of Subjective Video Quality Assessment Methods for
the suitability of an alternative subjective multimedia qual-
                                                                         Low-bit Rate and Low-Resolution Video,” in Proc.
ity assessment methodology. Our experimental results using
                                                                         IASTED Intl Conf. on Signal and Image Processing.
the SAMVIQ methodology indicate that it provides results
                                                                         August 2005, vol. 479, pp. 70–76, ACTA Press.
comparable to other existing methods such as the single-
stimulus ACR methodology. The SAMVIQ method pro-                    [10] T. Jeong, J. Choe, J. Lim, H. Choi, E. Lee, and C. Lee,
vides subjects with the ability to review test sequences mul-            “Video Quality Comparison on LCD Monitors,” in
tiple times. For some types of error conditions, the ability to          Proc. SPIE Image Quality and System Performance II,
review test content might produce more accurate or reliable              San Jose, Jan 2005, vol. 5668, pp. 196–203.
data. SAMVIQ may enable subjects to arrive at more ap-
propriate quality ratings for content that they find difficult                                                       o
                                                                    [11] M.D. Brotherton, D.S. Hands, K. Brunnstr¨ m, J. Jon-
to judge on a single viewing. However, this review capa-                 sson, and O.A. Soysuren, “Stabilising Viewing Dis-
bility does increase the artificiality of the method. In real             tances in Subjective Assessment of Mobile Video,” in
viewing situations, observers do not normally review con-                Proc. SPIE Human Vision and Electronic Imaging XI,
tent. The main drawback of the SAMVIQ methodology is                     San Jose, Jan 2006, vol. 6057, pp. 268–275.
that it only allows to test a limited number of error con-
                                                                    [12] Q. Huynh-Thu, M. Ghanbari, D.S. Hands, and M.D.
ditions per content. Further investigations are required to
                                                                         Brotherton, “Subjective Video Quality Evaluation for
determine whether there is some methodological advantage
                                                                         Multimedia Applications,” in Proc. SPIE Human Vi-
gained by applying the SAMVIQ technique to multimedia
                                                                         sion and Electronic Imaging XI, San Jose, Jan 2006,
content since it is significantly more time-consuming than a
                                                                         vol. 6057, pp. 464–474.
single-stimulus (or double-stimulus) approach.
                                                                    [13] P. Corriveau, C. Gojmerac, B. Hughes, and L. Stel-
                        7. REFERENCES                                    mach, “All Subjective Scales Are Not Created Equal:
                                                                         The Effects of Context on Different Scales,” Signal
 [1] ANSI, “T1.TR.74-2001: Objective Video Qual-                         Processing, vol. 77, no. 1, pp. 1–9, Aug 1999.
     ity Measurement Using a Peak-Signal-to-Noise-Ratio

								
To top