1827 Feb09 Cover.indd by xiuliliaofz


									How to Design Smart Business Experiments

                                           How to
             Managers now have the tools
                 to conduct small-scale
               tests and gain real insight.
              But too many “experiments”
             don’t prove much of anything.

                                    E                    EVERY DAY, managers in your organiza-
                                                         tion take steps to implement new ideas
                                                         without having any real evidence to
                                                         back them up. They fiddle with offerings,
                                                         try out distribution approaches, and al-
                                                         ter how work gets done, usually acting
                                                         on little more than gut feel or seeming
                                                         common sense – “I’ll bet this” or “I think
                                                         that.” Even more disturbing, some wrap
                                                         their decisions in the language of science,
                                                         creating an illusion of evidence. Their
                                                         so-called experiments aren’t worthy of
                                                         the name, because they lack investiga-
                                                         tive rigor. It’s likely that the resulting
                                                         guesses will be wrong and, worst of all,
                                                         that very little will have been learned in
                                                         the process.
                                                            Take the example of a major retail
                                                         bank that set the goal of improving
                                                         customer service. It embarked on a pro-
                                                         gram hailed as scientific: Some branches

Katy Lemay

                       by Thomas H. Davenport

                                          hbr.org   |   February 2009   |   Harvard Business Review 69
How to Design Smart Business Experiments

 were labeled “laboratories”; the new                    IDEA                                      grams). Now, a quantitatively trained
 approaches being tried were known as                  IN BRIEF                                    MBA can oversee the process, assisted
“experiments.” Unfortunately, however,                                                             by software that will help determine
                                                    » Too many business innovations are
 the methodology wasn’t as rigorous as                                                             what kind of samples are necessary,
                                                    launched on a wing and a prayer –
 the rhetoric implied. Eager to try out             despite the fact that it’s now reason-         which sites to use for testing and con-
 a variety of ideas, the bank changed               able to expect truly valid tests.              trols, and whether any changes result-
 many things at once in its “labs,” mak-                                                           ing from experiments are statistically
 ing it difficult if not impossible to de-
                                                    » With a small investment in                   significant.
                                                    training, readily available software,
 termine what was really driving any                                                                  Consumer-facing companies rich in
                                                    and the right encouragement, an
 improved results. Branches undergoing              organization can build a “test and             transaction data are already routinely
 interventions weren’t matched to con-              learn” capability.                             testing innovations well outside the
 trol sites for the most part, so no one                                                           realm of product R&D. They include
 could say for sure that the outcomes
                                                    » Companies that equip manag-                  banks such as PNC, Toronto-Dominion,
                                                    ers to perform small-scale yet
 noted wouldn’t have happened anyway.                                                              and Wells Fargo; retailers such as CKE
                                                    rigorous experiments don’t only
 Anxious to head off criticism, managers            save themselves from expensive                 Restaurants, Famous Footwear, Food
 did provide a control in one test, which           mistakes – they also make it more              Lion, Sears, and Subway; and online
 was designed to see if placing video               likely that great ideas will see the           firms such as Amazon, eBay, and Google.
 screens showing television news over               light of day.                                  As randomized testing becomes standard
 waiting lines would shorten customers’                                                            procedure in certain settings – website
 perceived waiting time. But rather than                                                           analysis, for instance – firms build the
 looking at control and test groups, they                                                          capabilities to apply it in other circum-
 compared just one control site with one                                                           stances as well. (See the sidebar “Stop
 test site. That wasn’t enough to ensure statistically valid results.      Wondering” for a sampling of tests conducted recently.) To
 Perceived waiting time did drop in the test branch, but it went           be sure, there remain many business situations where it is not
 up substantially in the control branch, despite no changes                easy or practical to structure a scientifically valid experiment.
 there. Those confounding data kept the test from being at all             But while the “test and learn” approach might not always be
 conclusive – but that’s not how the findings were presented to             appropriate (no management method is), it will doubtless
 top management.                                                           gain ground over time. Will it do so in your organization? If
    It doesn’t have to be this way. Thanks to new, broadly avail-          it’s like many companies I have studied, an investment in soft-
 able software and given some straightforward investments                  ware and training will yield quick returns of the low-hanging-
 to build capabilities, managers can now base consequential                fruit variety. The real payoff, however, will happen when the
 decisions on scientifically valid experiments. Of course, the              organization as a whole shifts to a test-and-learn mind-set.
 scientific method is not new, nor is its application in business.
 The R&D centers of firms ranging from biscuit bakers to drug               When Testing Makes Sense
 makers have always relied on it, as have direct-mail market-              Formalized testing can provide a level of understanding about
 ers tracking response rates to different permutations of their            what really works that puts more intuitive approaches to
 pitches. To apply it outside such settings, however, has until            shame. In theory, it makes sense for any part of the business
 recently been a major undertaking. Any foray into the ran-                in which variation can lead to differential results. In practice,
 domized testing of management ideas – that is, the random                 however, there are times when a test is impossible or unnec-
 assignment of subjects to test and control groups – meant em-             essary. Some new offerings simply can’t be tested on a small
 ploying or engaging a PhD in statistics or perhaps a “design of           scale. When Best Buy, for example, explored partnering with
 experiments” expert (sometimes seen in advanced TQM pro-                  Paul McCartney on an exclusively marketed CD and a spon-

The real payoff will happen
                                      when the organization as a whole shifts
                                                 to a test-and-learn mind-set.

70 Harvard Business Review      |   February 2009   |   hbr.org
                                                                   IDEA IN
 sored concert tour, neither component
 of the promotion could be tested on a
                                                       YOU OR SOMEONE on your team is                 EXAMPLE Marketers at the Subway
 small scale, so the company’s managers
                                                       suggesting a change that just might            restaurant chain wanted to drum up
 went with their intuition. At Toronto-                work. But why act on a hunch when              business by putting foot-long subs
 Dominion, one of the largest and most                 you can hold out for evidence?                 on sale for only $5, but franchise
 profitable banks in Canada, testing is so              According to the author, the best              owners worried that the promotion
 well established that occasionally man-               way to support decision making on              would lure existing customers away
 agers are reminded that, in the interests             potential innovations is to…                   from higher-priced menu items.
 of speed, they can make the call with-                                                               An experiment pitting test sites
                                                       » Design an experiment.
 out a test when they have a great deal                                                               against control sites proved that the
                                                       Start with a hypothesis about how
 of experience in the relevant business                                                               promotion would pay off – which it
                                                       the change will help the business.
 domain.                                                                                              subsequently did.
                                                       If it’s a good one, you’ll learn as
    Generally speaking, the triumphs of                much by disproving it as you would
 testing occur in strategy execution, not              by proving it. Put it to the test by
                                                                                                      » Make testing the norm.
 strategy formulation. Whether in mar-                 measuring what happens in a test
                                                                                                      Create the training and infrastruc-
 keting, store or branch location analy-               group versus a control group. From
                                                                                                      ture that will enable nonexperts in
                                                       the outset, be clear on what you
 sis, or website design, the most reliable                                                            statistics to oversee rigorous experi-
                                                       need to measure to produce a de-
 insights relate to the potential impact                                                              ments. Off-the-shelf software can
                                                       cisive result – and whether that’s a
 and value of tactical changes: a new                                                                 walk them through the steps and
                                                       metric you even have the capability
 store format, for example, or marketing                                                              help them analyze results. A core
                                                       to track.
 promotion or service process. Scientific                                                              group of experts can lend resources
 method is not well suited to assessing                » Act on the facts.                            and expertise and maintain the
                                                       Nothing but a success in a testing             learning library. Leadership must
 a major change in business models, a
                                                       environment should be rolled out               cultivate a test-and-learn culture,
 large merger or acquisition, or some                                                                 in part by penalizing those who act
                                                       more broadly. But neither should
 other game-changing decision.                         failures simply be scrapped. Refine             without sufficient evidence.
    Capital One’s experience hints at the              the hypothesis on the basis of
 natural limits of experimental testing in             the results, and consider testing a            As your managers become more
 a business. The company has been one                  variation. Most important, capture             comfortable with testing, they’ll
 of the world’s most aggressive testers                what’s been learned, and make it               discover that it paves the way for,
 since 1988, when its CEO and cofounder,               available to others in the organiza-           rather than throwing up barriers to,
 Rich Fairbank, joined its predecessor                 tion through a “learning library,” so          promising new ideas.
 firm, Signet Bank. You could even say                  resources aren’t wasted proving the
                                                       same thing again.
 the firm was founded on the concept.
 One thing that appealed to Fairbank
 about the credit card industry was its
“ability to turn a business into a scientific
 laboratory where every decision about product design, market-              critical, it was impossible to design an experiment that could
 ing, channels of communication, credit lines, customer selec-              reliably predict the outcomes of such a major change in busi-
 tion, collection policies and cross-selling decisions could be sub-        ness direction. Still, after making the acquisitions, Capital One
 jected to systematic testing using thousands of experiments.”1             reaffirmed its commitment to information-based strategy. Its
 Capital One adopted what Fairbank calls an information-based               managers immediately set about translating that ethos into
 strategy, and it paid off: The company became the fifth-largest             the full-service banking context, which required pushing the
 provider of credit cards in the United States.                             method further, into tests involving customer service and em-
    Yet when it came time to make the largest decision the                  ployee behavior. As one employee told me, “It’s much easier to
 company had faced in recent years, Capital One’s manage-                   do randomized testing with direct-mail envelopes than with
 ment concluded that testing would not be useful. Realizing                 branch bankers.”
 that the business would need other sources of capital to re-                  Sears Holdings provides another example of what can rea-
 main independent, the team considered acquiring some re-                   sonably be tested and what can’t. Interestingly, this is another
 gional banks in order to transform itself from a monoline                  business with a heritage of testing. Robert E. Wood, who origi-
 credit provider into a full-service bank. The decision was not             nally moved Sears out of the catalog business and into retail
 tested for a couple of important reasons. First, the nature of             stores, said his favorite book was the Statistical Abstract of the
 the opportunity made it imperative to move quickly; no time                United States. When he opened Sears’s first free-standing retail
 was available for even a small-scale test. Second, and more                stores, in 1928, he placed two in Chicago. Asked why he needed

                                                                                 hbr.org   |   February 2009   |   Harvard Business Review 71
How to Design Smart Business Experiments

two in one city, Wood said it was to
                                                  Put Your Ideas to the Test
reduce the risk of choosing a wrong
location or store manager.
   Today Sears Holdings has em-
barked upon a new era: Its primary
owner, financier Edward Lampert,                                                                     TEST
who has been its chairman since
Kmart acquired Sears, is exploring
alternative ways to combine the
two troubled chains. To my knowl-
edge, Lampert didn’t test the idea
                                                                  OR REFINE
                                                                 HYPOTHESIS    S                                           3
of combining the retailers. That                                                                                             TEST
would have been difficult if not                                                              LIBRARY Y
impossible to do (and the jury is
still out on whether the acquisition
was a good decision). However, he’s
a strong advocate of testing at the
tactical level. He wrote in a 2006
letter to shareholders, “One of the
great advantages of having approxi-
                                                      6                      5PLAN
mately 2,300 large-format stores at                                         ROLLOUT                             TEST
Sears Holdings is that we can test
concepts in a few stores before un-
dertaking the risk and capital asso-
ciated with rolling out the concept
to a larger number of stores or to                         Adapted from Applied Predictive Technologies’ “Test and Learn” Wheel
the entire chain.” The retailer has
tested, for example, various formats
for including Sears merchandise in Kmart stores, and vice            are most readily tested in companies that have offices in many
versa, as well as other formats, such as the arrangement of          cities. Drawing statistical inferences from small numbers of
merchandise in Sears stores by rooms in a consumer’s home            test sites is much more difficult and represents the leading
(kitchen, laundry room, bedroom, and so on).                         edge of the test-and-learn approach.
   Beyond using the tactical-versus-strategic criterion, there          Finally, formal testing makes sense only if a logical hypoth-
are other ways to decide whether formal testing makes sense.         esis has been formulated about how a proposed intervention
For instance, it is useful only in situations where desired out-     will affect a business. Although it’s possible to just make a
comes are defined and measurable. A new sales training pro-           change and then sit back and observe what happens, that
gram might be proposed, but before you can test its efficacy,        process will inevitably lead to a hypothesis – and often the
you’ll need to identify a goal (such as “We want to increase         realization that it could have been formulated in advance and
cross-selling”), and you must be able to measure that change         tested more precisely.
(do you even track cross-selling?). Sales and conversion-rate
changes are frequently used as dependent variables in tests          The Process of Testing
and are reliably measured for separate purposes. Other out-          To begin incorporating more scientific management into your
comes, such as customer satisfaction and employee engage-            business, you’ll need to acquaint managers at all levels with
ment, may require more effort and invasiveness to measure.           your organization’s process of testing. It is probably simple to
   Tests are most reliable where many roughly equivalent             grasp (a typical depiction is shown in the exhibit “Put Your
settings can be observed. This might mean physical sites, as         Ideas to the Test”), but it must be communicated in the same
with Sears’s stores, or it might mean more ephemeral set-            terms to people across the organization. Having a shared un-
tings, such as alternative website versions. Among the earliest      derstanding of what constitutes a valid test enables the inno-
and most extensive users of testing are retail and restaurant        vators to deliver on it and the senior executives to demand it.
chains. Because so much is held constant among their multi-             The process always begins with the creation of a testable
tudinous sites, it is easy to designate which ones will serve as     hypothesis. (It should be possible to pass or fail the test based
experiments and which will serve as controls and to attribute        on the measured goals of the hypothesis.) Then the details of
cause to effect. By the same token, workplace design changes         the test are designed, which means identifying sites or units

72 Harvard Business Review    |   February 2009   |   hbr.org
  CREATE OR REFINE                              EXECUTE TEST                                  PLAN ROLLOUT                         LEARNING LIBRARY
  HYPOTHESIS                                    MEET with test and control                    STUDY attributes of test             DEVELOP a summary of
  ASCERTAIN that the                            site managers and analyti-                    sites to determine whether           each test: hypotheses, test
  hypothesized relationships                    cal experts to discuss what                   rollout should be universal          dimensions, key results,
  haven’t already been tested                   might go wrong and what                       or differentiated.                   interactions, and rollout
  and measured – and that                       would constitute test-                        BALANCE complexity of                strategies and results.
  they can be.                                  confounding events.                           rollout with ease of                 EMPLOY standard business
  MAKE sure the hypothesis                      INSTRUCT field personnel                       implementation and                   taxonomy to allow easy
  could generate substantial                    to report abnormal events.                    management.                          searching of library.
  economic value.                               REMOVE sites from test if                                                          MAKE library widely
  DETERMINE whether it                          test-confounding events                                                            accessible to employees;
  suggests an actual decision                   occur.                                                                             publicize tests and results
  or action. (If not, go no                     ADJUST evaluation and                                                              of important studies to
  further.)                                     compensation plans for                                                             encourage a test-and-learn
                                                managers so that they are                                                          culture.
                                                not negatively affected by

1                 2                       3                      4                      5                      6
                      DESIGN TEST                                      ANALYZE TEST                                   ROLLOUT
                      ENSURE that the number                           ENSURE that “lift” from                        STAGGER the rollout
                      of test and control sites                        interventions is statistically                 and view it as a test in
                      is sufficient for statistical                     significant.                                    itself. (Are early-adopting
                      significance.                                     USE software to analyze                        sites yielding the desired
                      USE simulation to explore                        results and manage complex                     result? If not, modify
                      multiple strategies for creating                 data from multiple test and                    the approach in later-
                      control groups (for instance,                    control sites.                                 adopting sites.)
                      they may be nearly identi-                       DETERMINE need for further                     ENCOURAGE site manag-
                      cal but different on one key                     testing.                                       ers to share rollout strate-
                      variable).                                                                                      gies and tactics.
                                                                       EXAMINE as many site attri-
                      ASSESS whether control                           butes as possible to see how
                      group strategies previously                      key variables interact.
                      used for similar tests will suf-
                      fice; they usually do.
                      CONDUCT statistical analysis
                      to minimize the number of test
                      cells needed.
                      EXTEND testing period if key
                      metrics are highly variable.

to be tested, selecting the control groups, and defining the                    Hardee’s and Carl’s Jr. quick-service restaurant chains, the pro-
test and control situations. After the test is carried out for                 cess for new product introduction calls for rigorous testing at
the specified period – which sometimes can take several                         a certain stage. It starts with brainstorming, in which several
months but is usually done in less time – the data are analyzed                cross-functional groups develop a variety of new product ideas.
to determine the results and appropriate actions. The results                  Only some of them make it past the next phase, judgmental
are ideally put into some sort of “learning library” (although,                screening, during which a group of marketing, product devel-
unfortunately, many organizations skip this step). They might                  opment, and operations people will evaluate ideas based on
lead to a wider rollout of the experiment or further testing of                experience and intuition. Those that make the cut are actu-
a revised hypothesis.                                                          ally developed and then tested in stores, with well-defined
   More broadly, managers must understand how the testing                      measures and control groups. At that point, executives decide
process fits in with other business processes. They conduct                     whether to roll out a product systemwide, modify it for retest-
tests in the context of, for example, order management, or site                ing, or kill the whole idea.
selection, or website development, and the testing feeds into                     CKE has attained an enviable hit rate in new product intro-
various subprocesses. At CKE Restaurants, which includes the                   ductions – about one in four new products is successful, versus

                                                                                        hbr.org   |   February 2009    |   Harvard Business Review 73
How to Design Smart Business Experiments

one in 50 or 60 for consumer products – and executives say           design research and eye-tracking studies as well as diary stud-
that their rigorous testing process is part of the reason why.       ies to see how users feel about potential changes. No signifi-
If you have had occasion to enjoy a Monster Thickburger at           cant change to the website is made without extensive study
Hardee’s, or a Philly Cheesesteak Burger or a Pastrami Burger        and testing. This meticulous process is clearly one reason why
at Carl’s Jr., you’ve been the beneficiary of CKE’s efforts. These    eBay is able to introduce most changes with no backlash from
are just three of the successful new products that were rolled       its potentially fractious seller community. The online retailer
out after testing proved they would sell well.                       now averages more than 113 million items for sale in more
    At eBay, there is an overarching process for making website      than 50,000 categories at any given time.
changes, and randomized testing                                                                       EBay performed extensive
is a key component. Like other                                                                    online and offline testing, for ex-
online businesses, eBay benefits                                                                   ample, in 2007 and 2008, when
greatly from the fact that it is rel-                                                             it changed its page for viewing
atively easy to perform random-                                                                   items on sale. The page had not
ized tests of website variations.                                                                 been redesigned since 2003, and
Its managers have conducted                                                                       both customers and eBay design-
thousands of experiments with                                                                     ers felt it lacked organization, had
different aspects of its website,                                                                  inadequate photographs of items,
and because the site garners over                                                                 and suffered from haphazard item
a billion page views per day, they                                                                placement and redundant func-
are able to conduct multiple ex-                                                                   tionality. After going through all
periments concurrently and not                                                                     the testing steps, eBay adopted a
run out of treatment and control                                                                   new site design. It posted photos
groups. Simple A/B experiments                                                                    200% larger than those in the pre-
(comparing two versions of a web-                                                                 vious design, added a countdown
site) can be structured within a                                                                   timer for auctions with 24 hours or
few days, and they typically last at                                                              less to go, made more prominent
least a week so that they cover full                                                               the item condition and return
auction periods for selected items.                                                               policy, and included tabs to make
Larger, multivariate experiments                                                                  shipping and payment fields easier
may run for more than a month.                                                                     to navigate. It also included new
    Online testing at eBay follows                                                                security features to prevent unau-
a well-defined process that consists of the following steps:          thorized changes in site content. Each new feature and func-
    ■ Hypothesis development                                         tion was tested independently with control pages. Measures
    ■ Design of the experiment: determining test samples,            of page views and bid counts suggest that the redesign was
experimental treatments, and other factors                           very successful.
    ■ Setup of the experiment: assessing costs, determining

how to prototype, ensuring fit with the site’s performance            Building a Testing Capability
(for example, making sure the testing doesn’t slow down user         Establishing a standard process is the first step toward build-
response time)                                                       ing an organizational test-and-learn capability, but it isn’t suf-
    ■ Launch of the experiment: figuring out how long to run          ficient unto itself. Companies that want testing to be a reli-
it, serving the treatment to users                                   able, effective element of their decision making need to create
    ■ Tracking and monitoring                                        an infrastructure to make that happen. They need training
    ■ Analysis and results                                           programs to hone competencies, software to structure and
    The company has also built its own application, called the       analyze the tests, a means of capturing learning, a process for
eBay Experimentation Platform, to lead testers through the           deciding when to repeat tests, and a central organization to
process and keep track of what’s being tested at what times          provide expert support for all the above.
on what pages.                                                           Managerial training. At the very least, managers should
    As with CKE’s new product introductions, however, this on-       learn what constitutes a randomized test and when to employ
line testing is only part of the overall change process for eBay’s   it. Capital One, for example, offers a professional education
website. Extensive offline testing also takes place, including       program on testing and experiment design through its inter-
lab studies, home visits, participatory design sessions, focus       nal training function known as Capital One University. One
groups, and trade-off analysis of website features – all with        benefit of hosting a program like this, rather than sending
customers. The company also conducts quantitative visual-            managers outside for training, is the greater emphasis on how

74 Harvard Business Review     |   February 2009   |   hbr.org
the testing connects to upstream and                                                          perimented with an even more ambitious
downstream activities in the business.                                                        system that would use such learning to
    Test-and-learn software. Some firms,                                                       guide product managers as they develop
such as Capital One and eBay, have                                                            new offerings. Famous Footwear takes a
built their own software for managing                                                        “billboard” approach; for each test, it cap-
experiments, but several off-the-shelf
options exist – the most common ones          Stop                                            tures the results in a one-page document,
                                                                                              circulates that throughout the organiza-
being broad statistical packages and
analytical tools like SAS. With every
passing year, these tools make it more
                                              Wondering                                       tion, and posts it on the wall outside the
                                                                                              testing office.
                                                                                                 Regular revisiting. One tricky aspect
possible for numerate – but not statis-      TESTING is used to make tactical                 of establishing a long-term testing ap-
tically expert – users to conduct truly       decisions in a range of business set-           proach is determining when to retest.
defensible experiments. Ease of design        tings, from banks to retailers to               There is no way to know for sure when
                                              dot-coms. Here are some questions
and analysis has been a particular fo-                                                        a test has become obsolete; an experi-
                                              various companies are examining:
cus at Applied Predictive Technologies,                                                       enced analyst needs to assess whether
whose product leads users through the         ■ Do lobster tanks increase lobster sales       enough factors have changed in the
test-and-learn process, keeps track of           at Food Lion supermarkets?                   environment to make previous results
test and control groups, and provides         ■ Does a Kmart with a Sears store inside        suspect. Famous Footwear executives
a repository for findings to be usefully          sell more than an all-Kmart format?          feel that the retail store location con-
accessed in the future.                       ■ Do eBay users bid higher in auctions          text – their primary application area for
    Some software tools are tailored             when they can pay by credit card?            testing – changes enough to merit retest-
to particular problems or industries.         ■ What’s the optimum number of loose            ing after about a year. Netflix concluded
Several packaged tools, for example,             checks for a Wells Fargo ATM to accept?      in 2006 that its five-year-old customer
are available for the analysis of man-        ■ Do Subway promotions on low-fat
                                                                                              tests needed to be redone; the user base
ufacturing-quality experiments. Like-            sandwiches increase sandwich sales?          had evolved in that time from internet
wise, highly specialized tools exist for      ■ Does a Famous Footwear store sell
                                                                                              pioneers to mainstream society mem-
online-usage testing, such as the web            fewer shoes when there is a competi-         bers. CKE Restaurants has difficulty de-
analytics software sold by Omniture              tor in the same mall?                        ciding whether to retest pricing, particu-
and WebTrends and the free tools pro-         ■ Does a Toronto-Dominion branch get
                                                                                              larly in times when commodity prices
vided by Google Analytics. As of yet,            significantly more deposits when open         are increasing fast. Ironically, it is human
unfortunately, no single software tool           60 hours a week compared with 40?            intuition, not testing or analytics, that
can help organizations with all testing       ■ Which promotional offers will most
                                                                                              must be applied to determine the need
types and contexts.                              efficiently drive checking account            for retesting.
    Learning capture. If a firm does a            acquisition at PNC Bank?                        Core resource group. Most of the
substantial amount of testing, it will           As a result of their testing, these          firms that do extensive testing have es-
generate a substantial amount of             organizations are finding out whether             tablished a small, somewhat centralized
learning about what works and what            supposedly better ways of doing busi-           organization to supervise it. The group
doesn’t. Ideally employees throughout         ness are actually better. Once they             either actually does the testing, as at PNC
the company would share that knowl-           learn from their tests, they can spread         Bank, Subway, and Famous Footwear,
edge and use it to guide future initia-      confirmed better practices throughout             or – if testing is employed throughout
tives. But that happens at few organiza-      their business.                                 the organization – serves as a resource for
tions. The head of testing at one online                                                      methodological and statistical questions,
firm admitted, “All of that knowledge                                                          as at Capital One. At PNC Bank, the test-
is in my head, and we’d be in tough shape if I were hit by a bus.”       and-learn group (part of the bank’s knowledge management
One bank executive justified a lack of shared learning, com-              function, which reports to Marketing) views the promotion of
menting, “We should probably do more, but we’ve found that               its own services around the bank as a priority. It tries to build
people need to learn from doing the test themselves, even if             relationships and trust with key executives so that no major
we’ve done it before many times.” People do learn through                initiatives are undertaken without testing. Without a central
personal experience, but one would hope that it’s not the only           coordination point, testing methods may not be sufficiently
possible way.                                                            rigorous, and test and control groups across multiple experi-
    Some organizations, however, have begun to address the               ments may confound one another. That said, it’s not always
issue. Capital One captures the learning from its thousands of           easy to influence or coordinate testing even when a central
tests in an online knowledge management system and has ex-               group exists.

                                                                              hbr.org   |   February 2009   |   Harvard Business Review 75
How to Design Smart Business Experiments

Creating a Testing Mind-Set                                        reviewing tests. At Famous Footwear, Joe Wood and his se-
 In addition to making the requisite changes in process, tech-     nior management team meet with the testing head every two
 nology, and infrastructure, organizations also need to estab-     weeks to discuss past tests, upcoming tests, and preliminary
 lish a testing culture. Testing costs money (though not as much   and final results. Wood says that the company has made test-
 as widespread rollouts of new tactics that don’t work), and it    ing a part of management’s dialogue and the organization’s
 takes time. Senior managers have to become accustomed to,         culture.
 and even passionate about, the idea that no major change in                                          •••
 tactics should be adopted without being tested by people who      Testing may not be appropriate for every business initiative,
 understand testing.                                               but it works for most tactical endeavors. And it just isn’t that
    Ask for evidence. CEOs who firmly believe in testing can        difficult anymore. It needs to come out of the laboratory and
 change their entire organization’s perspective on the issue.      into the boardroom. The key challenges are no longer techno-
 When people claim that testing has confirmed the wisdom of         logical or analytical; they have more to do with simply making
 their idea, have them walk you through the process they used,     managers familiar with the concepts and the process. Testing,
 and demand at least the level of rigor outlined in the exhibit    and learning from testing, should become central to any or-
“Put Your Ideas to the Test.”                                      ganization’s decision making. The principles of the scientific
    Give it teeth. Gary Loveman at Harrah’s Entertainment          method work as well in business as in any other sector of life.
 has said that “not using a control group” is sufficient ratio-    It’s time to replace “I’ll bet” with “I know.”
 nale for termination at the company. Jeff Bezos of Amazon
 reportedly fired a group of web designers for changing the         1. “Capital One Financial Corporation,” HBS case no. 9-700-124.
 website without testing. Toronto-Dominion has a culture in
 which managers insist on tests for every major initiative         Thomas H. Davenport (tdavenport@babson.edu) is the Presi-
 involving customers or branches. The CEO, Ed Clark, is a PhD      dent’s Distinguished Professor of Information Technology and
 economist who once noted that although the bank might             Management at Babson College in Babson Park, Massachusetts.
 not be perfect, “nobody ever criticizes us for not running the    His newest book is Competing on Analytics: The New Science
 numbers.”                                                         of Winning, with Jeanne G. Harris (Harvard Business Press, 2007).
    Sponsor tests yourself. The best management teams in
                                                                     Reprint R0902E                             To order, see page 111.
 this regard have institutionalized the process of doing and

                                                                                                                                          P Vey

                            “I’m here to restore confidence in the unrealistic expectations we all had.

76 Harvard Business Review    |   February 2009   |   hbr.org

To top