Taking Advertising Pre -Testing Online by Toby Hill and Stephen

Document Sample
Taking Advertising Pre -Testing Online by Toby Hill and Stephen Powered By Docstoc
					                         Taking Advertising Pre -Testing Online

                             by Toby Hill and Stephen Spencer
                                Whetstone Online Research
                              a division of The Leading Edge
                                        August 2002


Abstract

If it could be shown that ad pre -testing online yielded valid findings, this could be a boon
for the ad pre-testing industry. The benefits of online research – cost, timeliness and the
use of multimedia, could be well used in the ad pre -testing area. We have conducted two
self sponsored pieces of research aimed at understanding the effects of moving ad p re-
testing online. The first was a parallel ad pre -test – one test conducted online and the
other face to face. The second was an exploration of the effect of the format the ad is
seen in – full TV format, or the reduced size (and quality) typical with a ds downloaded
online. This test was conducted offline. The results show that with some caution, online
pre-testing represents a great opportunity for the advertising research industry.


1. Overview

Using the Internet as a medium to pre -test television commercials has enormous prima
facie appeal. Conventional pre -testing is often expensive, relatively slow, and relies on
quite small samples. In contrast, some of the key benefits of online research are lower
survey costs, speed of turnaround and the abi lity to cost and time effectively draw much
larger samples.

Were it to be proven valid, pre -testing advertising online would have great potential.

This potential could be for advertisers to have faster response, to be able to afford to do it
more often – this could be on more of their ads, or more times during the process of
developing an ad , and greater reliability through larger samples.

So, is it valid?

To clarify the question: are there differences between ad pre -testing conducted online and
offline, and, if so, what are the implications?

What we’ve found:

    -   Yes, there are differences.

    1) Verbatim responses are more expansive online.
   2) Ratings of the ad or brand being advertised are very similar between methods

   3) The smaller the format in which the ad is viewed, the more likely that scores on
      ad likeability will tend towards the mean. In other words, in a typical small
      window, online environment, it is likely that liked ads are less well liked than on
      TV and disliked ads are less disliked than on TV.

On the whole we believe the outcome is very positive for online ad pre -testing.

Findings have been drawn from two self sponsored studies conducted during 2001 and
2002. The first was a parallel online and offline ad test in the fast food category.
Samples of n=150 and n=171 respectively were drawn. The second study was conducted
entirely offline in which two matched cells were compared for their ratings of three ads.
Cell sizes of n=125 and n=117 were achieved. A further online/offline parallel study is
quoted to expand on our findings with regard to the open ended response issue.

2. Hypotheses

It had been suggested by several clients and research colleagues, that in an online setting,
we would lose the richness of open ended responses, that are so important in ad testing
(to at least some pre -testing approaches). Therefore, online pre -testing would be inferior.
It was felt that interviewer probing was the best way of eliciting response. This issue
became the first area for investigation.

We also wondered whether there might be some systematic difference in attitude to
advertising, amongst those online. Here the issue was one of sample validity, or
differences in the populations. Would the online world react, on the whole, more
positively to advert ising – perhaps because they were part of a survey panel, or for
whatever reason. Alternatively, would the online world react, on the whole, more
negatively to advertising – perhaps because they were more world -weary, astute, cynical .
In short, how would we answer someone who says “I don’t believe people who are
frequent Net users think the same way about advertising as everyone else”.

Finally, we speculated that the size and quality of the image and sound would affect the
ratings given to an ad. When s treaming an ad to someone online, generally the image is
served in a reduced size – around 5 cm by 4 cm , at a reduced frame rate (eg: from an
original video file of 30 frames per second we reduced this to 6 frames per second) . How
would ratings of an ad s hown in this way differ from ratings of the same ad shown on a
full television screen?

Through a series of projects – two self sponsored, we sought to answer these three issues:

   1) open ended responses
   2) online/offline comparisons to an ad pre -test
   3) format effects
3. Open Ended Responses

To evaluate the d ifferences between online responses to open ended questions, and the
responses in an offline environment, we used two measures. To evaluate the extent of
response we used the average number of words, per res ponse (a measure that has been
used before for such comparisons), and as a measure of “ richness” of response we used
the total number of unique words per defined number of respondents.

It is worth noting that whilst other authors have commented on the iss ue of length of
response between online and offline open ended questions, we are not aware of this being
done in specific reference to advertising testing, and nor are we aware of our approach to
calculating “richness” of response through number of unique words across the sample.

The results were as follows:

Table 1: Open Ended Comparison – Likes about Ad

“What do you like about the Ad”

                                         Offline                        Online
                                Face to face interviewing
                                         (n=150)                       (n=171)
Average number of words                     9                             9
Number of different words
per 100 respondents
(bootstrapping with 10
iterations     of     100
respondents each)                          192                       265 (+38%)

Here we found that the extent of response was the same, between the online and the face
to face interviews. However, the richness of response, as estimated by num ber of unique
words across the sample, was higher in the online sample. We take this to be a good
thing, as most would argue that in open ended questions they are looking for richness of
response, or something along those lines.

To further expand on this issue we analysed data from another parallel study being
conducted by us, this time comparing CATI and online research. Here the question is not
about advertising , and the product category is very different. However, as an indication
of the relative dif ferences between online, CATI and Face to Face interviewing, it is
interesting.

Table 2: Open Ended Comparison – Alternative Category and CATI Comparison

                                         Offline                        Online
                                          CATI
                                         (n=300)                        (n=81)
Average number of words                     14                      21 (+50%)
Number of different word s
per      81    respondents
(bootstrapping with 20
iterations     of       81
respondents)                                404                    633 (+57%)

Here we find that online provides substantially longer responses than CATI, as well as
the previous finding of greater “complexity” or “richness” of response.

In summary, as several previous authors have found with regards to open ended
responses in general, we find that online provides longer responses when compared with
CATI. When compared with face to face interviewing we find responses of the same
length, however the “complexity” or “richness” of response is greater with online,
whether compared with CATI or face to face.

Online Offline Comparisons to Ad Pre -test

The following tables show the online and offline responses to the three key scaled
measures in the previously discussed online offline parallel ad test :

Table 3: Overall Ad Likeability

                                                        Offline             Online
                                                        (n=150)            (n=171)
                                                           %                  %
I thought it was excellent                                 4                  3
I liked it a lot                                          21                  29
I liked it a little                                       27                  27
I neither liked it nor disliked it                        30                  25
I disliked it a little                                     9                  11
I disliked it a lot                                        9                  4
(no significant difference at 95% confidence level )

Table 4: Overall Opinion of the Brand after Seeing the Ad

                                                        Offline             Online
                                                        (n=150)            (n=171)
                                                           %                  %
Excellent                                                  9                  8
Very good                                                 36                  36
Good                                                      36                  37
Fair                                                      13                  17
Poor                                                       6                  2
(no significant difference at 95% confidence level )
Table 5: Change in Opinion after Seeing the Ad

                                                          Offline                Online
                                                          (n=150)               (n=171)
                                                             %                     %
Better opinion than before                                   8                     10
Opinion is unchanged                                        87                     87
Worse opinion than before                                    5                     3
(no significant di fference at 95% confidence level)

As can be seen from eyeballing these tables, the results are extremely similar between the
two media, and certainly not supportive of the notion that the internet community will
substantially “over-rate” or “under-rate” advertising.

This is good news so far; however it is just one ad and just one test. We know that the
two audiences saw different things. The online sample saw a compressed ad with a
number of frames deleted, and presented in a window of about 5*4 cm. Wh at we wanted
to do now was directly measure the effect of this format difference.

Format Effects – The Quality of the Experience

We now wanted to concentrate, in greater detail, on the issue of how the quality of the
experience affects the ratings of ads . In order to eliminate potentially intervening
variables, such as time of filling in the questionnaire, sampling method, and type of
incentive, we decided to conduct this stage entirely offline. A central location method
was chosen (two Sydney shopping centres and intercept recruitment) in which we had
two treatments – in one cell respondents saw ads on a full screen television, and in the
other cell respondents saw ads on a computer screen with a size and compression that
was reflective of current best practice online. Three ads were chosen in the same fast
food product category. All three ads were shown, one from the USA, one from UK and
one from South Africa, none of which had been shown in Australia. We selected ads that
had different levels of com plexity, and attempted not to get ads that respondents would
react to similarly (we wanted variability in response).

If we turn first to the issue of likeability of the ads, we find that the online and offline
produces similar results. Ads A and B are pr eferred to Ad C, and there is no significant
difference between Ads A and B.

Table 6: Ad Likeability (top two box scores) for Reduced Format PC View Versus Full
TV Format View
                          Top two box - Likeability of Ads

 50%

 45%                                                                     PC
                                                                         TV
 40%

 35%

 30%

 25%

 20%

 15%

 10%

  5%

  0%
                 A                        B                       C




If we turn to a series of diagnostics used to evaluate each Ad, we again f ind strong
consistency between the two approaches. Below we will show the results for Ad A and
Ad C:

Tables 7&8: % Agreeing with Statement About Ad s A & C – Comparison of Reduced
Format and Full Format Views
                                                   Diagnostic Evaluation - Ad A
                                                                                          PC
                      100%                                                                TV
                      90%
                      80%
                      70%
                      60%
  % agree




                      50%
                      40%
                      30%
                      20%
                      10%
                       0%




                                                       de le
                                                          Si l




                                                    m able
                                                       em ful
                                                                     g




                                                                      t




                                                                    ry
                                 ire bbe Con g
                                                                     n




                                                                   ng

                                                                     n



                                                                     n




                                                    us and

                                                     in k e d
                                                                     e




                                                       Po e
                                                                   ul
                                                                  en
                                                                 rin

                                                                 tin
                                                                  ai




                                                 op ntio




                                                                  io
                                                                  m




                                                              rm
                                                                  p




                                                               ng
                                                                 D




                                                               er
                                                               si




                                                              m
                           ag




                                                              nt

                                                              er
                                                            ita
                                                           Bo




                                                               i
                                                               t
                                                              e
                                                            fu




                                                           or
                                                            Il



                                                            w



                                                          hu
                                                           rs



                                                           fo
                                                          te



                                                          te

                                                          iff
                                                         lik
                                                        Irr
                       ch




                                                        ic
                                                       at



                                                       at

                                                        D




                                                        g




                                                       e
                                                      le
                      at




                                                   un
                                                    y



                                                    y




                                                   M
                                                  th
                  w




                                                 m



                                                 m




                                                 M




                                                 e
                                              no
                                               to
                                              pe




                                              ad
                 ld




                                               d



                                               d
                                            el
             ou




                                           sy




                                          M
                                           id
                                          at

                                          H




                                        Ea




                                         D
            C




                                       ed
                                       ra
                                    G

                                    ct
                               D




Above and below, we see very consistent r esults between the two media.


                                                           Diagnostic Ad C

             100%
                 90%                                                                           PC
                 80%                                                                           TV
                 70%
                 60%
                 50%
                 40%
                 30%
                 20%
                 10%
                  0%
                                                                                                le
                                                                   g




                                                                                    de le
                                                                   g

                                                                   g




                                                                    t

                                                                                                  l




                                                                                   em ful
                           n




                                                                                                ry
                                                                                    Po e
                                                                   n




                                                                   n




                                                                                               nd

                                                                                               ed
                                                                  e




                                                                                               ul
                                                                en
                                                               rin

                                                               tin

                                                                in
                       ai




                                                                                          rm
                                                               tio




                                                                io
                                                                m




                                                                                              p




                                                                                            ab

                                                                                           ng
                                                                                             D




                                                                                           er
                                                                                           ta

                                                                                            ik
                                                                                          m
                                                            us
                      ag




                                                             nt

                                                             er
                                                           ita
                                                          Bo




                                              pe tten

                                                             e




                                                                                        or
                                                                                         Il




                                                                                         w



                                                                                       hu
                                                                                        rs




                                                                                        fo
                                                                                       Si
                                                         te

                                                         iff
                                                        lik
                                             ed onf
                                                       Irr
                 ch




                                                                                     ic
                                                      at

                                                       D




                                                                                    g
                                                      a




                                                                                    e
                                                     le




                                                                                  us

                                                                                  in
                 at




                                                    C




                                                                                  m
                                                                                 un
                                                    y




                                                   y




                                                                                 M
                                                  op




                                                                                th
             w




                                                  m




                                                  m




                                                                               M




                                                                               e
                                                                            no
                                                                             to




                                                                            ad
            ld




                                                d
                                             el
   ou




                                                                          sy




                                                                         M
                                                                         id
                                           at
                                          bb




                                           H




                                                                        Ea




                                                                        D
 C




                                        ed
                                        ra
                                     G

                                     ct
                                 ire
                               D




Now, if we overlay both sets of data on one chart we can see that the differences between
the PC format and TV format are minor in comparison to the differences between the two
Ads. In other words, we would be telling th e client the same story regardless of which
method we were using.
Table 9: % Agreeing With Statements About Ads A & C, Both Reduced Format And Full
Format Views


                                      Comparison of Ad A and Ad C

      100%
          90%
          80%         Ad A
          70%
          60%
          50%
          40%
          30%
          20%
          10%
                                                Ad C
          0%
                                                                                          l




                                                                                        le
                                                        ng




                                                           t




                                                                          em ul
                      n

                     g

                                                          g




                                                                                         e
                                                          n




                                                                                       ry
                                                                                      nd

                                                                                      ed


                                                                                         e
                                                          n

                                                         e




                                                                                      ul
                                                       en
                   ai

                  rin

                                                      tin




                                                                                     pl
                                                       io




                                                                                 rm
                                                       io

                                                       m




                                                                                   ab
                                                                                      f



                                                                                  ng
                                                                                    D




                                                                                  er
                                                     si




                                                                                  ta

                                                                                   ik
                                                                                 m
               ag




                                                    nt

                                                    er
                                                    nt
                                                  ita
                Bo




                                                    e
                                                  fu




                                                                                or
                                                                                Il




                                                                                w




                                                                              hu
                                                                               rs




                                                                               fo
                                                                              Si
                                                te
                                                te




                                                iff
                                               lik




                                                                           Po
                                              Irr

                                              on
          ch




                                                                           de


                                                                            ic
                                             at

                                              D
                                             at




                                                                           g




                                                                           e
                                            le




                                                                        us

                                                                         in
          at




                                           C




                                                                        m
                                                                       un
                                           y




                                                                        M
                                           y

                                         op




                                                                      th
      w




                                         m
                                         m




                                                                      M




                                                                      e
                                                                    to




                                                                   no
                                      pe




                                                                   ad
     ld




                                       d
                                     ed




                                    el
 ou




                                                                 sy




                                                                 M
                                                                id
                                   at
                                  bb




                                  H




                                                               Ea
 C




                                                               D
                               ed
                               ra
                          G

                            ct
                        ire
                      D




Ad A is far more attention grabbing, “different”, memorable, and less dull than Ad C, an d
this is regardless of method of data capture.

So the news is good.

However, we had one last area to investigate. There was a consistent, albeit small pattern
of the TV viewing sample showing a larger proportion liking the Ad, regardless of which
Ad. What happens on the other side of the scale? Here we find the opposite effect.
Disliking is higher when viewed on television rather than with the PC format.

Table 10: Dislikeability (bottom two box score) For Ads – Comparing Reduced Format
and Full Fo rmat Views
                         Bottom Two Box - Dislikeability of Ads

 50%
                                                            PC
 45%                                                        TV

 40%

 35%

 30%

 25%

 20%

 15%

 10%

  5%

  0%
                   A                         B                         C




Here, we hypothesise that with the loss of fidelity in the online presentation of the ad,
there was also a reduction in the ‘boldness of response’ . In other words, the good is less
good online and the bad is less bad online.

This can be seen perhaps more clearly looking at the full response distributions to the
likeability question:

Tables 11 -13: Distribution of Ad Likeability Scores for the 3 Ads, Comparing Reduced
Format and Full Format Views
                                                 Likeability of Ad A

50%

45%                                                                                                   PC - A
                                                                                                      TV - A
40%

35%

30%

25%

20%

15%

10%

5%

0%
      I disliked it a   I disliked it a   I neither like I liked it a little I liked it a lot          I thought it
             lot             little       nor disliked it                                             was excellent




                                                Likeability of Ad B

50%
                                                                                                            PC - B
45%
                                                                                                            TV - B
40%

35%

30%

25%

20%

15%

10%

5%

0%
      I disliked it a   I disliked it a    I neither like    I liked it a little   I liked it a lot     I thought it was
             lot             little        nor disliked it                                                  excellent
                                                Likeability of Ad C

 50%

 45%
                                                                                                           PC - C
 40%                                                                                                       TV - C
 35%

 30%

 25%

 20%

 15%

 10%

  5%

  0%
       I disliked it a   I disliked it a   I neither like    I liked it a little   I liked it a lot   I thought it was
              lot             little       nor disliked it                                                excellent



In each case the PC format re sponses tend to the middle. This pattern could also be
referred to as regression to the mean – a pattern in data seen where we compare two sets
of data and one is an imperfect replica of the other. As a digression: Francis Galton was
the first to discuss the notion of regression to the mean, and observed this in relation to
the height of children as compared to the average height of their two parents. What he
found was that tall parents had, on average, less tall children, and short parents had, on
average less short (taller) children. The offspring tended or regressed to the mean.

Table 14: Galton’s Demonstration of Regression to the Mean
Source: Forrest, D. W. (1974)      Francis Galton: The Life and Work of a Victorian
Genius p.189

So, what we would expect to find, if we were to measure the mean likeability of ads
drawn from small format experiences mapped against those from full (TV) format, is the
following pattern:

Table 15: How the Mean Scores of Ad Likeability Across Many Ads Might Look if Suc h
Regression to Mean Were Operating – Dummy Data
                                                                              Hypothetical Comparisons

                                          5
 Average Likeability on Small PC Format




                                          4



                                          3



                                          2



                                          1



                                          0
                                              0        1            2           3             4          5
                                                           Average Likeability on TV Format


This is mocked up data demonstrating a tendency to the mean that could become apparent
if we were to conduct a lot more online off line comparisons, and across a wide range of
ads. If the hypothesis is correct, then it poses a number of interesting issues:

                                          1) we would expect the same phenomenon at all levels of increased format and
                                             fidelity – for example – large, flat screen TVs will lead to viewers reacting more
                                             strongly to advertising (one way or the oth er), and even more so when we move to
                                             cinema. Also, when it comes to likeability of internet advertising, we would
                                             expect a “dulling of effect”.
                                          2) The same effect may be found when testing animatics or unfinished ads – the
                                             likeable ones are less liked and t he dislikeable ones less disliked.
                                          3) When testing online we should aim to get a higher quality, larger format file
                                             “down the line”, even if that means greater incentives for the respondent, and
                                             more efforts mastering file compression.
                                          4) Building the data from online tests into large banks of norms drawn from the
                                             offline world of research may cause problems.

Conclusions

We believe that online ad testing offers the adve rtising industry a new and effi cient
methodology that can breathe new life into the advertising research process.
On the issue of open -ended response, online has been show n to provide as high or higher
quality (than offline) for general questions by several authors , and we have now proved
that specifically in regard to response to advertising q uestions and with face to face
interviewing as the point of comparison .

We have also showed that diagnostics show a strong correlation across the two
methodologies.

The key difference in the findings produced in the two methodologies is the
‘homogenising’ of response to likeability questions. We believe that this homogenising
can be dealt with in a number of ways: 1) by being aware of it, especially when
comparing data from an online tested ad to offline norms, 2) by using larger video file
sizes and increasing incentives to ensure respondents are willing to download larger sized
files, 3) by investigating better compression software solutions, and 4) by investigating
testing ads amongst the (ever increasing) subset of Internet users who have high speed
connections. We should also realise that developments in technology will mean that this
factor will become less and less important as full screen ads loom.

Assuming this issue is satisfactorily dealt with, the huge cost, time and sample size
benefits affo rded by Internet interviewing may create a new dawn for quantitative
advertising pre -testing.


References

Dipnall, J. and Jeavons, A. (2000).    Opinions in Web Surveys: Promoting Extremism
MRSA National Conference 2000.

MacElroy, B., Milucki, J., and Mc Dowell, P. (2002), A Comparison of Quality in Open -
end Responses and Response Rates Between Web -based and Paper and Pencil Survey
Modes Journal of Online Research 2002

Forrest, D. W. (1974) Francis Galton: The Life and Work of a Victorian Genius
Taplinger Publishing Co. New York