Docstoc

replication

Document Sample
replication Powered By Docstoc
					Replication, Replication
Gary ~ i n g , Haward University
               '

Political science is a community         How were the respondents se-          standard holds that sufficient infor-
enterprise; the community of em-         lected? Who did the interviewing?     mation exists with which to under-
pirical political scientists needs ac-   What was the question order? How      stand, evaluate, and build upon a
cess to the body of data necessary       did you decide which informants to    prior work if a third party could
to replicate existing studies to un-     interview or villages to visit? How   replicate the results without any
derstand, evaluate, and especially       long did you spend in each commu-     additional information from the au-
build on this work. Unfortunately,       nity? Did you speak to people in      thor. The replication standard does
the norms we have in place now do        their language or through an inter-   not actually require anyone to rep-
not encourage, or in some cases          preter? Which version of the          licate the results of an article or
even permit, this aim. Following         ICPSR file did you extract informa-   book. It only requires sufficient in-
are suggestions that would facilitate    tion from? How knowledgeable          formation to be provided-in the
replication and are easy to imple-       were the coders? How frequently       article or book or in some other
ment-by teachers, students, dis-         did the coders agree? Exactly what    publicly accessible form-so that
sertation writers, graduate pro-         codes were originally generated and   the results could in principle be
grams, authors, reviewers, funding       what were all the recodes per-        replicated. Since many believe that
agencies, and journal and book           formed? Precisely which measure       research standards should be ap-
editors.                                 of unemployment was used? What        plied equally to quantitative and
                                         were the exact rules used for con-    qualitative analyses (King, Keo-
                                         ducting the content analysis? When    hane, and Verba 1994), the replica-
Problems in Empirical                    did the time series begin and end?    tion standard is also appropriate for
Political Science                        What countries were included in       qualitative research, although the
                                         your study and how were they cho-     rich complexity of the data often
   As virtually every good method-       sen? What statistical procedures      make it more difficult.2
ology text explains, the only way to     were used? What method of numer-         The process of reducing real-
understand and evaluate an empiri-       ical optimization did you choose?     world phenomena to published
cal analysis filly is to know the        Which computer program was            work involves two phases: the rep-
exact process by which the data          used? How did you fill in or delete   resentation of the real world by
were generated and the analysis          missing data?                         essentially descriptive quantitative
produced. Without adequate docu-            Producing a comprehensive list     and qualitative data, and the analy-
mentation, scholars often have           of such questions for every author    sis of these data. Both phases are
trouble replicating their own results    to address, or deciding ex ante       important components of the repli-
months later. Since sufficient infor-    which questions will prove conse-     cation standard. Future scholars,
mation is usually lacking in political   quential, is virtually impossible.    with only your publication and
science, trying to replicate the re-     For this reason, quantitative ana-    other information you provide,
sults of others, even with their         lysts in most disciplines have al-    ought to be able to start from the
help, is often impossible.               most uniformly adopted the same       real world and arrive at the same
   For quantitative and qualitative      method of ascertaining whether        substantive conclusions. In many
analyses alike, we need the an-          enough information exists in a        types of research this is not possi-
swers to questions such as these:        published work. The replication       ble, but it should always be at-

                                                                                      PS: Political Science & Politics
                                                                                                 Replication, Replication

tempted. In principle, the replica-      damental, if the empirical basis for    apart from these altruistic reasons
tion standard can sometimes be           an article or book cannot be repro-     to support the replication standard,
met even without malung public the       duced, of what use to the discipline    there is an additional, more self-
data used in the analysis, provided      are the conclusions? What purpose       interested motivation: an article
that one's description of both           does an article like this serve? At a   that cannot be replicated will gen-
phases of the analysis is sufficiently   minimum, some protection should         erally be read less often, cited less
detailed. However, providing this        be afforded to keep researchers         frequently, and researched less
level of detail without the data is      from wasting their time reading         thoroughly by other scholars. Few
difficult if not impossible for the      these works. At worst, vastly more      events in academic life are more
author and much less helpful to          time can be wasted in ultimately        frustrating than investing enormous
future researchers. Moreover, it         fruitless efforts to expand, extend,    amounts of time, effort, and pride
may not be possible to replicate the     and build on a body of work that        in an article or book, only to have
data collection phase, inasmuch as       has no empirical foundation.            it ignored by the profession, not
the world may have changed by the           More generally, the replication      followed up by other researchers,
time a future researcher undertakes      standard enables scholars to better     not used to build upon for succeed-
the duplication effort.                  understand and evaluate existing        ing research, or not explored in
   An excellent example of a recent      research, and select more discrimi-     other contexts. Moreover, being
study of adherence to the replica-       natingly among this body of work        ignored is very damaging to a ca-
tion standard is by Dewald,              in developing their own research        reer, but being applauded, cited
Thursby, and Anderson (1986). One        agendas. Without complete infor-        favorably, criticized, or even at-
of the authors was the editor of the     mation about where data come            tacked are all equally strong evi-
Journal of Monq, Credit, and             from and how we measured the            dence that you are being taken seri-
Banking. After accepting a year's        real world and abstracted from it,      ously for your contributions to the
worth of articles, they received an      we cannot truly understand a set of     scholarly debate (see Feigenbaum
NSF grant to replicate the results       empirical results.4 Evaluation like-    and Levy 1993, citing Diamond
from all the articles accepted. Their    wise requires at least as much in-      1988, and Leimer and Lesnoy
work is a revealing (and disconcert-     formation. Thus, reviewers and          1982). Unfortunately, a recent
ing) report of their extensive but       journal and book editors should be      study indicates that the modal num-
largely failed attempts to replicate     privy to sufficient information to      ber of citations to articles in politi-
each of these articles. Their find-      replicate work submitted to them        cal science is zero: 90.1% of our
ings (p. 587-88) "suggest that inad-     for publication. Perhaps most im-       articles are never cited (Hamilton
vertent errors in published empiri-      portantly, the replication standard     1991; Pendlebury 1994)! An even
cal articles are a commonplace           is extremely important to the fur-      smaller fraction of articles stimu-
rather than [a] rare occurrence."        ther development of the discipline.     lates active investigation by other
Even when they found no errors,          The most common and scientifi-          researchers.
replication was often impossible         cally productive method of building        This problem greatly limits our
even with the help of the original       on existing research is to replicate    collective knowledge of govern-
author-and help from the authors         an existing finding-to follow the       ment and politics. Academia is a
often was not provided. More im-         precise path taken by a previous        social enterprise that is usually
portant, when the editors started        researcher, and then improve on         most successful when individual
requiring authors to meet the repli-     the data or methodology in one          researchers compete and collabo-
cation standard, they (p. 589)           way or another. This procedure          rate in contributing toward com-
"found that the very process of          ensures that the second researcher      mon goals. In contrast, when we
authors compiling their programs         will receive all the benefits of the    work in isolation on unrelated
and data for submission reveals to       first researcher's hard work. After     problems, ignoring work that has
them ambiguities, errors, and over-      all, this is why academics refer to     come before, we lose the benefits
sights which otherwise would be          articles and books as "scholarly        of evaluating each other's work,
undetected." Since political scien-      contributions," and such contribu-      analyzing the same problem from
tists collect far more original data,    tions are recognized with citations,    different perspectives, improving
rather than following the econo-         acknowledgments, promotions, and        measurement techniques and meth-
mists' practice of relying primarily     raises. Such contributions are con-     ods, and, most important, building
on existing data from government         siderably more valuable when the        on existing work rather than re-
sources, the benefits of a replica-      cost of building thereon is as small    peatedly reinventing the wheel.
tion policy in our discipline should     as possible.
be even more substantial than indi-         Reproducing and then extending
cated in Dewald, Thursby, and            high-quality existing research is       Proposed Solutions
Anderson's conclusions.3                 also an extremely useful pedagogi-
   As this rather striking example       cal tool, albeit one that political       Solutions to many existing prob-
demonstrates, the widespread fail-       science students have been able to      lems in empirical political science
ure to adhere to the replication         exploit only infrequently given the     are best implemented by individual
standard poses serious problems          discipline's limited adherence to       authors. However, experience in
for any discipline. At its most fun-     the replication standard. Moreover,     many disciplines has shown that

September 1995
Symposium

some formal rules are also needed.        not provide all information avail-        continual verification of the digital
Academics, administrators, review-        able, only the subset of variables        medium on which the replication
ers, and editors can play an impor-       and observations from a data set          data set is stored, a commitment to
tant part in encouraging or requir-       actually used to produce the pub-         permanent storage (which involves
ing adherence to the replication          lished results.                           changing the storage medium as
standard.                                    The replication standard can be        technology changes), frequent ad-
                                          applied analogously vis-a-vis most        vertising so the author's contribu-
                                          qualitative research. A replication       tion will remain widely known, fast
Authors                                   data set for qualitative projects         and efficient methods for distribut-
                                          should include detailed descriptions      ing this information to anyone who
    If individual authors wish to in-     of decision rules followed, inter-        asks, and sufficient funding to per-
 crease the probability that their        views conducted, and information          form each of these functions for the
 work will be read, understood, and       collected. Transcripts of inter-          indefinite future.
 taken seriously in future research,      views, photographs, or audio tapes           As noted, two archives-the
 following the replication standard is    can readily be digitized and in-          "Social Science Research Archive"
 a very important step. (It is also an    cluded in a replication data set. Ad-     of the Public Affairs Video Archive
 effective means of ensuring that         hering to the replication standard        (PAVA) at Purdue University and
 researchers will be able to follow       is more difficult in qualitative re-      the "Publication-Related Archive"
 up on their own work after the           search and sometimes cannot be            of the Inter-University Consortium
 methodological details have faded                                                  for Political and Social Research
 from memory.)                                                                      (ICPSR) at the University of Michi-
    In practice, following the replica-                                             gan-will now accept replication
 tion standard might involve putting      Replication data sets                     data sets. PAVA is the more techni-
more information in articles, books,                                                cally up-to-date archive. Staff will
or dissertations about the precise
                                          include all information                  make data available within hours of
 process by which information was         necessary to replicate                    submission. Replication data sets
 extracted or data collected, coded,                                                are instantly available via Internet
 analyzed, and reported. Unfortu-         empirical results.                        through such servers as "gopher,"
nately, journals and books gener-                                                   "anonymous FTP" (file transfer
ally will not provide sufficient                                                   protocol), and "Mosaic." Anyone,
space to do this properly. More-          completely followed. But because         anywhere in the world, with an In-
over, much of the material neces-         rich, detailed qualitative data is       ternet account has free, unlimited
sary is best communicated in elec-        very informative, not adhering to        access to these data.
tronic form rather than on paper.         the replication standard when-it is          The ICPSR is the older and bet-
Fortunately, two of the discipline's      not possible is still well worth the     ter known of the two archives. Its
best digital archives, described in       cost. It would also be worthwhile        staff will also keep and distribute
more detail below, can readily be         for qualitative researchers to begin     data, and is presently able to dis-
used to satisfy the replication stan-     to discuss collectively the appropri-    tribute publications-related data via
dard: the collection of the Public        ate applications or modifications of     FTP or through the mail to other
Affairs Video Archive at Purdue           the replication standard (see Griffin    scholars. The ICPSR also offers
University and the Inter-University       and Ragin 1994).                         other classes in which to deposit
Consortium for Political and Social          Once a replication data set has       data, if the submitter is willing to
Research at the University of             been created, it should be made          provide additional documentation;
Michigan.                                 publicly available and reference to      for these, the ICPSR will provide
    The first step in implementing the    it made in the original publication      various levels of data checking and
replication standard is to create a       (usually in the first footnote).         additional advertising. Thus, PAVA
replication data set. Replication         One approach is to make the infor-       has some technological advantages
data sets include all information         mation available on request, but         over the ICPSR, but the ICPSR is
necessary to replicate empirical          this can be inconvenient to both         still the better known institution
results. For quantitative research-       requestors and authors alike. More-      and also offers more options. More-
ers, these might include original         over, academics are rarely profes-       over, submission of data sets is
data, specialized computer pro-           sional archivists. Their compara-        free and relatively easy in both
grams, sets of computer program           tively high degree of mobility also      cases. There is little cost in submit-
recodes, extracts of existing publi-      complicates self-distribution, in that   ting data to both institutions (as is
cally available data (or very clear       affiliations indicated on earlier pub-   my current practice).
directions for how to obtain exactly      lished articles will not remain             Replication data sets can be sub-
the same ones you used), and an           accurate.                                mitted to either archive via disk or
explanatory note (usually in the             These problems are resolved by        tape, mailed to PAVA (Director,
form of a "read-me" file) that de-        using professional data archives.        Public Affairs Video Archive, Pur-
scribes what is included and ex-          Professional archiving entails rou-      due University, 1000 Liberal Arts
plains how to reproduce the numer-        tine backups on site, off-site dupli-    Building, West Lafayette, IN
ical results in the article. One need     cates in a different building or city,   47907-1000) and/or the ICPSR

                                                                                          PS: Political Science & Politics
                                                                                                 Replication, Replication

(Director, User Support, ICPSR,         then be more clearly noticed, and        This practice will be useful for stu-
P.O. Box 1248, Ann Arbor, MI            should be given substantial weight       dents in political science as replica-
48106). An easier and quicker ap-       in committee, departmental, and          tion data sets become more widely
proach is to put data in a self-ex-     university decisions about promo-        available.5
tracting archive file (with a utility   tion and tenure. Outside letter-writ-
such as PKZIP for the DOS operat-       ers should also make note of these       Editors and Reviewers of
ing system, TAR for Unix, or            significant contributions to the         Books and Journals
Stu£fIt for the MacIntosh) and sub-     scholarly community.
mit the data via anonymous FTP;                                                     Editors of journals and university
the file name, article, book, or dis-   Graduate Programs                        and commercial presses work hard
sertation citation, and a brief para-                                            to publish scholarship that makes
graph describing the contents              The design of graduate programs       important contributions to the polit-
should also be included in an ac-       and specific research-oriented           ical science discipline and has max-
companying electronic mail mes-         courses can also encourage adher-        imum impact in the profession. For
sage. To send to PAVA, FTP to           ence to the replication standard,        the reasons described above, publi-
pava.purdue.edu in directory            which in turn can strengthen stu-        cations by authors who adhere to
publincoming and send electronic        dents' ability to learn the basics of    the replication standard are more
mail to info@pava.purdue.edu.           academic research and ultimately         likely to meet these criteria. Thus,
To submit to the ICPSR, FTP to          conduct their own original re-           editors can maximize the influence
ftp.icpsr.umich.edu in directory        search. The first professional "pub-     of their journal or book series by
publincoming and send electronic        lication" for most political scien-      requiring adherence to a replication
mail to jan@tdis.icpsr.umich.edu.       tists is the Ph.D. dissertation. This    standard.
   Once a replication data set is       is intended to be an original contri-       Possibly the simplest approach is
submitted and made available by         bution to knowledge and to the           to require authors to add a footnote
the archive, it will be advertised by   scholarly community. To maximize         to each publication indicating in
PAVA and ICPSR through their            the impact of thesis work, students      which public archive they will de-
regular publications and catalogues     are well advised to submit replica-      posit the information necessary to
and on the Internet. To maximize        tion data sets for their disserta-       replicate their numerical results,
visibility, citations to the publica-   tions. Graduate programs can also        and the date when it will be avail-
tion and corresponding replication      adopt rules that require dissertation    able. For some authors, a state-
data set will also appear in several    students to submit replication data      ment explaining the inappropriate-
newsletters distributed to the mem-     sets when appropriate. In doing so,      ness of this rule, of indeterminate
bership of the American Political       graduate programs will further so-       periods of embargo of the data or
Science Association (as described       cialize and professionalize students     portions of it, could substitute for
later).                                 into the standards of the discipline.    the requirement. In this case peer
                                           PAVA will acceDt re~lication
                                                           L   L
                                                                                 reviewers would be asked to assess
                                        data sets for dissertations and em-      the statement as part of the general
Tenure and Promotion                    bargo them for a period of your          evaluative process and to advise
Review Committees                       choosing. In the Department of           the editor accordingly. I believe we
                                        Government at Harvard University,        should give maximum flexibility to
   Tenure and promotion review          students doing quantitative and,         authors to respect their right of first
committees are in the business of       when applicable, qualitative disser-     publication, the confidentiality of
judging candidates for promotion in     tations must submit replication data     their informants, and for other rea-
their contributions to the scholarly    sets as a requirement of the Ph.D.       sons that are discussed below.
community. Adherence to the repli-      degree. (It is important that the stu-   However, these exceptions do not
cation standard should be part of       dent create and submit the replica-      apply to the vast majority of arti-
this judgment. Those who follow         tion data set when the work is           cles and books in political science.
this standard are more likely to en-    fresh in his or her mind.) We em-           This policy is very easy to imple-
joy a wider scholarly audience and      bargo the data for up to five years,     ment, because editors or their staffs
have their research better under-       as determined by the student and         would be responsible only for the
stood and extended; thus, they will     his or her advisor, to give the stu-     existence of the footnote, not for
be taken more seriously by their        dent a head start at publication. In     confirming that the data set has
scholarly peers. In addition, how-      most cases since our policy has          been submitted nor for checking
ever, candidates for tenure and         been adopted, students have opted        whether the results actually can be
promotion who submit their data to      for a short embargo or none at all.      replicated. Any verification or con-
a national archive should be recog-        As noted, having students repli-      firmation of replication claims can
nized for this contribution to the      cate the results of existing articles    and should be left to future re-
discipline. I recommend, for exam-      has proven to be an effective teach-     searchers. For the convenience of
ple, that scholars add an extra sec-    ing tool. Many economics graduate        editors and editorial boards consid-
tion to their curriculum vitae for      programs even require Ph.D. stu-         ering adopting a policy like this,
 "replication data sets archived."      dents to replicate a published arti-     the following is a sample text for
This important contribution would       cle for their second-year paper.         such a policy:

September 1995
Symposium

     Authors of quantitative articles in    mends that data be made publicly        admissions process (King, Bruce,
  this journal [or books at this press]     available during the review process.    and Gilligan 1993). Clearly, the
  must indicate in their first footnote        Finally, some journals might         grades and GRE scores of students
  in which public archive they will         wish to experiment with asking an       in the program cannot be released.
  deposit the information necessary to      extra reviewer or perhaps a gradu-      In fact, we even withheld regres-
  replicate their numerical results, and    ate student (acting as an editorial     sion coefficients in the paper, since
  the date when it will be submitted.
  The information deposited should          intern to the journal) to replicate     we felt it would be inappropriate
  include items such as original data,      analyses for accepted but not yet       for prospective students to be able
  specialized computer programs, lists      published articles. Reviewers of the    to calculate expected grades in our
  of computer program recodes, ex-          replication data sets could then        program. Publishing sufficient infor-
  tracts of existing data files, and an     make suggestions to the authors         mation to enable students to calcu-
  explanatory file that describes what      that could be incorporated before       late the expected probability of ad-
  is included and explains how to re-       publication to make replication eas-    mission would also have been an
  produce the exact numerical results       ier or clearer for future scholars.     unpopular decision at the Harvard
  in the published work. Authors may        These kinds of experiments would        general counsel's office! However,
  find the "Social Science Research         be very useful to journals, authors,    cases such as these are the excep-
  Archive" of the Public Affairs Video      readers, and future scholars.           tion rather than the rule.
  Archive (PAVA) at Purdue Univer-             The exact requirement should be         In some rare situations, confiden-
  sity or the "Publications-Related         left to the needs of individual jour-   tiality could not be protected if any
  Archive" of the Inter-University
  Consortium for Political and Social
                                            nals and presses, although in politi-   data were made publicly available.
  Research (ICPSR) at the University        cal science the less restrictive ver-   For example, studies based on elite
  of Michigan convenient places to          sion above will be more than            interviews among a small popula-
  deposit their data. Statements ex-        adequate in implementing the repli-     tion, or other surveys based on a
  plaining the inappropriateness of         cation standard. Moreover, it prob-     very large proportion of the rele-
  sharing data for a specific work (or      ably fits better with the norms of      vant population, potentially pose
  of indeterminate periods of embargo       the discipline.6                        this problem. In a subset of these
  of the data or portions of it) may                                                cases, the author might be able to
  fulfill the requirement. Peer review-                                             make the data available to individ-
  ers will be asked to assess this state-   Important Exceptions                    ual scholars willing to restrict their
  ment as part of the general evalua-                                               use in very specific ways. (Analo-
  tive process, and to advise the              Although the presumption should      gously, specific data analysis rules
  editor accordingly. Authors of works      be that authors will provide free       have been adopted for U.S. Census
  relying upon qualitative data are en-     access to replication data, the edi-    data to avoid revealing individual
  couraged (but not required) to sub-       tor, in combination with the author,
  mit a comparable footnote that
                                                                                    identities. For example, cross-tabu-
                                            will always have final say about        lations with fewer than 15 people
  would facilitate replication where        applying general policies to particu-
  feasible. As always, authors are ad-                                              per cell are not permitted.) These
                                            lar instances. Exceptions are essen-    are important exceptions, but they
  vised to remove information from
  their data sets that must remain con-     tial when confidentiality consider-     too cover comparatively few pub-
  fidential, such as the names of sur-      ations are important, to guarantee      lished works in political science.
  vey respondents.                          authors rights of first publication        In some situations, data used in a
                                            and for a variety of other reasons.     published work cannot be distrib-
   Some journals may wish to adopt          Important as these exceptions are,      uted because they are proprietary,
stricter requirements. (Although            they will probably not apply in the     such as survey data from the Roper
these may be appropriate in some            vast majority of scholarly works in     Center. However, most of these
cases, I believe they are not usu-          the discipline.                         organizations allow data to be re-
ally necessary or desirable.) For                                                   distributed by other authors if they
example, some journals now verify           Confidentiality                         are modified in some way, such as
that the data were actually depos-                                                  by making extracts of the variables
ited in a public archive. For au-              To maintain confidentiality, sur-    used or doing recodes. Wholly pro-
thors who request embargoes,                vey organizations do not normally       prietary data are rare in political
some journals might wish to require         release the names and addresses of      science.
submission at the time of publica-          respondents. In these and related
tion and have the archive do the            cases, authors relying on such in-
embargoing so that the replication          formation can comply with the rep-      Rights of First Publication
data set will be prepared when it is        lication requirement by releasing a
fresh in the mind of the investiga-         subset of their data, stripped of          As indicated previously, to guar-
tor. The APSA Political Methodol-           identifying information. However,       antee the right of first publication,
ogy Section has proposed a maxi-            in some instances, providing any        it is appropriate to submit data to a
mum allowable embargo period of             data would be inappropriate. For        public archive and request an em-
five years. The Committee on Na-            example, in a recent article on         bargo for some specified period of
tional Statistics of the National           graduate admissions, my coauthors       time. However, embargoes like this
Academy of Sciences even recom-             and I used data from Harvard's          should be relatively rare, in that

                                                                                           PS: Political Science & Politics
                                                                                                 Replication, Replication

the replication standard obligates      of John Freeman, now require foot-       be submitted (or an explanation if it
one to provide only the data actu-      notes about replication data sets to     cannot be). Second, when a grant
ally used in a publication. For ex-     be included with all articles. The       is awarded, the political science
ample, if you conducted your own        editors have encountered no resis-       program officer will ask the pro-
survey with 300 variables and used      tance from authors, and the policy       spective investigator to verify that
only 10 for an article, you need to     has required very little time and        he or she has allocated sufficient
provide only those 10. If you have      effort to implement. (Kenneth            funds to fulfill the program's data-
a five-category variable and use        Meier also reports that 70% of the       archiving requirement. Third,
only two of the categories, you         empirical articles he has accepted       within a year of a grant's expiration
could provide just the recoded vari-    use original data collected by the       date, principal investigators must
able with only the two categories.      author.) The British Journal of Po-      inform the political science pro-
If you have 1,500 observations, and     litical Science (David Sanders and       gram officer where their data have
use only 1,000 of them in the arti-     Albert Weale, editors), and the Pol-     been deposited. Finally, NSF pro-
cle (perhaps by dropping the South-     icy Studies Journal (edited by           gram officials will consider confor-
ern states), you also need to submit    Uday Desai and Mack C. Shelley           mity with their data-archiving pol-
only the 1,000 cases used in your       11) have adopted similar policies,       icy as an important additional
analysis. Then you can save the         and International Interactions, un-      criterion in judging applicants for
remaining information for your fu-      der the editorship of Harvey Starr,      renewals and new awards.
ture publications. You certainly        is in the process of doing so.              Anyone receiving funds from the
could provide the rest of the data,        The new policy of the University      National Institute of Justice (NIJ)
which would probably make your          of Michigan Press political science      must now "deliver to NIJ, upon
work more valuable to the schol-        and law is "to expect routinely that     project completion, computer-read-
arly community, but the decision        all authors have a footnote in their     able copies and adequate documen-
to do so would remain with you.         books indicating where their repli-      tation of all data bases and pro-
   In some cases, authors might         cation data set is archived (and         grams developed or acquired in
wish to embargo the subset of data      when it will be available, if appro-     connection with the research" (Sie-
used in an article to clean, docu-      priate)," although it is not an abso-    ber 1991, 9). The "Committee on
ment, and then publicly archive the     lute requirement of publication.         National Statistics" of the National
larger data set from which it was       The Cambridge University Press           Academy of Sciences has also rec-
extracted. This would be more con-      series, "Political Economy of Insti-     ommended policies similar to those
venient for the investigator, and       tutions and Decisions," under the        suggested here (see Fienberg et al.
might also benefit future research-     editorship of James E. Alt and           1985). Although there are many
ers by encouraging them to wait for     Douglass North, has recently             national differences in norms of
the more comprehensive version of       adopted a version of this policy.        data sharing, related policies and
the data. (In these cases, investiga-   The Free Press and HarperCollins         recommendations have been
tors should retain an old version of    have done the same. Many other           adopted or addressed by national
the data, or fully document any         editors and editorial boards in polit-   governments, international organi-
changes in the data since the publi-    ical science have indicated support      zations, academic and professional
cation of the article.)                 for a replication policy but are still   societies, granting agencies, other
   Broadly speaking, the basic point    in the process of considering the        disciplines, and scholarly journals
of this proposal is to change au-       specific form of the policy they will    (see Boruch and Cordray 1985).
thors' expectations, from the cur-      adopt.                                      To help provide additional visi-
rent situation of rarely taking any        The National Science Foundation       bility for authors of replication data
measure to ensure that their work       (NSF) Political Science Program's        sets, The Political Methodologist,
can be replicated, to usually taking    policy regarding replication data        the newsletter of the APSA Politi-
some steps in this direction. Excep-    sets is clearly stated in their award    cal Methodology Section (edited by
tions are important, but they would     letters: "All data sets produced         R. Michael Alvarez and Nathaniel
not apply to the vast majority of       with the assistance of this award        Beck), has announced that schol-
articles published.                     shall be archived at a data library      ars' data-set reference, a citation to
                                        approved by the cognizant program        the associated article, and a brief
                                        officer, no later than one year after    abstract all will be highlighted in
                                        expiration of the grant." To en-         subsequent issues. Similar citations
Support for These Policies              force this rule, the NSF Political       will appear in other newsletters if
Replication Policy Adoptions            Science Program recently adopted         the data are relevant to their sub-
                                        several new policies. First, all jour-   stantive focuses; these newsletters
   Formal support for these policies    nal and book publications that were      include Policy Currents (edited by
appears to be growing. Beginning        prepared with the help of NSF            Laura Brown), Law and Courts
last year, the American Journal of      funds must include a statement in-       (edited by Lee Epstein), Urban
Political Science, under the editor-    dicating in which public archive         Politics Newsletter (Arnold Vedlitz,
ship of Kenneth Meier, and Politi-      they will deposit the information        editor), the Computer and Multime-
cal Analysis (the discipline's meth-    necessary to replicate their numeri-     dia Section Newsletter (edited by
ods journal), under the editorship      cal results and the date that it will    Bob Brookshire), The Caucus for a

September 1995
Symposium

New Political Science Newsletter          questions and issues have been           fied period, or even commit to sub-
(John C. Berg, editor), Clio: The         raised about the specific methods        mitting it at a specified future date.
Newsletter of Politics and History        for implementing replication poli-            Implementing the replication stan-
(Dave Robertson, editor), and VOX         cies. The proposals discussed in           dard will make much more data
POP, the newsletter of the APSA           this paper have been greatly im-           available through public archives.
section on political parties and          proved as a result.                         Won't an unintended consequence
interest groups (edited by John                                                      of this proposal be that future schol-
Green).                                                                              ars will spend most of their time an-
                                                                                     alyzing existing data rather than col-
                                          Questions and Answers                      lecting new data, spending time in
Replication Policy Discussions                                                       the computer lab rather than in the
                                             In the course of numerous con-          field?
   Replication policies have been         versations about these issues, sev-
widely discussed in the political         eral questions have been raised and      Experience suggests just the oppo-
science community in recent years.        discussed. I list some of these here,    site. When the ICPSR was founded,
Among political methodologists,           along with the most common reso-         and later expanded, the amount of
support is enthusiastic and appears       lutions.                                 publicly available data increased dra-
to be unanimous. Lengthy formal                                                    matically. However, content analy-
and informal discussions were held            Will a replication standard reduce   ses indicate that many more articles
                                            incentives for individual investiga-
at the three most recent annual Po-         tors to collect large and dificult     containing original data were pub-
litical Methodology Group summer            data sets?                             lished during this time (King 1991).
meetings (University of Wisconsin,                                                 Hence, it appears that increasing the
Madison, 1994; University of Flor-        Investigators receive recognition        availability of original data inspires
ida, 1993; and Harvard University,        for collecting data and making them      other scholars to collect original data
1992). Well-attended panels at the        available to the scholarly commu-        themselves. In fact, one learns so
last two meetings of the American         nity. This recognition is in the form    much by replicating existing research
Political Science Association have        of citations to the data and to the      that collecting new data, by follow-
also been devoted in part (in 1993)       author's articles, acknowledge-          ing the successful procedures devel-
or in full (in 1994) to this issue. The   ments for the author's help, and         oped in past research, should be
APSA Political Methodology Sec-           promotion, tenure, and raises.           made much easier.
tion unanimously passed a resolu-         Scholarly work is judged by its              Wouldn't it be better i f all jour-
tion in 1994 asking all journal           contribution, so making an article         nals and book series adopted exactly
editors in the discipline to require      more important by contributing a           the same policy at the same time?
footnotes indicating where replica-       replication data set can only en-
tion data sets are stored.                hance the recognition that the au-       There might be some advantages to
   The APSA Comparative Politics          thor receives. The risk is not hav-      coordination, but the reason we
Section held a discussion of this         ing one's ideas stolen, a familiar       have different iournals in the first
issue at the 1994 annual conven-          but largely unfounded fear most of       instance argues against waiting.
tion. After considerable debate at        us have experienced while writing        Each journal has a different constit-
the convention focusing on the spe-       dissertations; the much larger           uency, follows different style manu-
cial concerns comparativists have         risk-indeed a risk with a high           als, has different quality standards,
about confidentiality and distribut-      probability-is having one's publi-       different editorial boards, different
ing "de-contextualized" data, the         cations ignored. Submitting a rep-       editors, different reviewers, differ-
Section's executive committee en-         lication data set can significantly      ent methodological styles, different
dorsed the idea in general terms.         decrease that probability.               copyeditors, and encourages a dif-
The committee subsequently wrote             Moreover, as discussed above,         ferent mix of substantive articles. It
a proposed policy statement that          information gathered but ultimately      should not be surprising or trou-
reflects the special concerns of          not used in the article need not be      bling if different journals adopted
comparativists while still requiring      included in the replication data set;    slightly different policies regarding
the replication footnote. This pro-       only those variables and observa-        replication, or adopted them in ac-
posal is now being distributed            tions necessary to replicate pub-        cordance with different timetables.
through the Section newsletter for        lished results need be submitted. If         If our journal requires adherence
general comment from the member-          there happens to be new informa-           to the replication standard, won't
ship.                                     tion in the variables and observa-         authors send work elsewhere or not
   Wide-ranging discussions have          tions submitted, the author will           publish articles and save their work
                                          have a substantial head start in ex-       until a book manuscript is ready?
also been held in meetings of many
of our discipline's editorial boards      tracting such information. In most       This may be true for some authors,
and section meetings of the Arneri-       cases, nearly two years elapse from      but it has not been the experience
can Political Science Association         completion of the article to final       at the journals that have already
and, judging from the response in         publication. If this is not sufficient   adopted this policy. Moreover,
these forums, support for the repli-      lead time in a given instance, the       many book presses are adopting
cation standard is strong through-        author can still submit the data set     the same policy, and no one can
out the discipline. Many insightful       and choose an embargo for a speci-       recall a scholar turning down NSF

                                                                                           PS: Political Science & Politics
                                                                                                            Replication, Replication

funds to avoid this rule. Once the         nection to the Internet would easily          Foundation for grants SBR-9321212 and
replication policy is adequately           accommodate all submissions for               SBR-9223637, and the John Simon Guggen-
                                                                                         heim Memorial Foundation for a fellowship.
communicated and explained to              well over a decade.                              2. In some cases, the replication standard
authors, they will likely understand             If submitting replication data sets     refers to running the same analyses on the
that it is in their interest. It is also      is in the interest of individual inves-    same data to get to the same result, what
clearly in the interest of journals to        tigators, why do we need journals          should probably be called "duplication" o r
have their articles cited, and thus           and book presses to require their          perhaps "confirmation." For other articles,
                                                                                         the replication standard actually involves
to follow the replication standard.           submission ?
                                                                                         what is more popularly called "replica-
Moreover, failing to follow this           We shouldn't need laws when cus-              tion"-going     back to the world from which
standard would be far more unfair          tom will do, but experience in our            the data came and administering the same
to potential readers, and more dam-        discipline and most others indicates          measurements, such as survey instruments.
aging to the profession.                                                                 Since this involves different numerical re-
                                           that this collective goods problem            sults, due to a change in time, place, or sub-
    If I give you my data, isn't there a   cannot be solved in practice with-            jects, w e would not expect to duplicate the
  chance that you will find out that       out some policy change. See Dew-              published results exactly; however, this pro-
  I'm wrong and tell everyone?             ald, Thursby, and Anderson 1985,              cedure confers the scientific benefit of veri-
                                           Boruch and Cordray 1985; 209-210,             fying whether the substantive conclusions
Yes. The way science moves for-                                                          are systematic features of the world o r idio-
ward is by making ourselves vul-           and Fienberg et al. 1985 for more             syncratic characteristics of the last author's
nerable to being wrong. Ultimately,        detailed justifications.                      measurement. In this article, I follow the
we are all pursuing a common goal:               Why are we worrying ourselves           common current practice in the social sci-
                                              with what might be called "duplica-        ences of referring to all of these procedures
a deeper understanding of govern-                                                        as "replication."
ment and politics. Thus, we must              tion" of misting research? Isn't the          3. For other work on replication, from the
give others the opportunity to                more important question actual rep-        perspectives of other social sciences, ethical
prove us wrong. Although being                lication where the same measure-           considerations, the advantages to science,
                                              ments are applied to new substan-          incentives of investigators, and other con-
criticized is not always pleasant, it         tive areas, countries, or time periods?    cerns, see Sieber 1991, Ceci and Walker
is unambiguous evidence of being                                                         1983, Neuliep 1991, Fienberg et al. 1985,
taken seriously and making a differ-       Good science requires that we be              and Feigenbaum and Levy 1993.
ence. Again, being ignored-the             able to reproduce existing numeri-               4. It is worth mentioning that I doubt
fate of over 90% of all political sci-     cal results, and that other scholars          fraud is much of a problem in political sci-
ence publications-is the much              be able to show how substantive               ence research. It probably exists to some
more serious risk.                         findings change as we apply the               degree, as it does in every other discipline
                                                                                         and area of human endeavor, but I see no
                                           same methods in new contexts.
     Shouldn't editors collect replica-                                                  evidence that it is anything but extremely
  tion data sets to guarantee that they    The latter is more interesting, but it        rare.
  have been submitted?                     does not reduce the necessity of                 5. At present-before the replication stan-
                                           the former. In fact, we can encour-           dard has been widely adopted in the disci-
This is a possibility, but editors are     age scholars to pursue replication            pline-replicating published articles is fre-
no more professional archivists            in new contexts if they can be                quently difficult or impossible. However,
than are authors. Editors might as                                                       other procedures can be used in the interim
                                           more certain of present results.              by teachers of quantitative methods classes.
well avail themselves of PAVA or           Better knowledge of existing re-              For example, I try to have students submit a
the ICPSR. I also do not think veri-       sults, through the distribution of            draft of their term papers about halfway
fication is necessary, since any dis-      replication data sets, will also en-          through the semester (usually with data
crepancies in the public record will       able easier adaptation of existing            analyses but few written pages), along with
be corrected by future researchers.                                                      a disk containing a replication data set.
                                           research methods and procedures               These are then given randomly to other stu-
    If eveiyone starts submitting repli-   to new contexts. Moreover, a stan-            dents in the class. The next week's assign-
  cation data sets, won't archives rap-    dard practice in estimating causal            ment is to replicate their classmate's
  idly begin to be filled with junk?       effects is to make one change at a            project. In most cases, the replicator and
                                           time so we are able to relate indi-           the original author learn a lot about the
This is extremely unlikely. Unlike                                                       data, methods, and process of research.
most data sets submitted to public         vidual changes to specific effects               6. Another occasion the replication stan-
archives, replication data sets will       and judge each effect in isolation.           dard can be implemented is during the peer
have been filtered by the peer re-                                                       review process. Reviewer of journals and
                                                                                         book manuscripts also should verify the ex-
view process, and will likely be                                                         istence of a footnote indicating in which ar-
verified by future researchers.            Notes                                         chive a replication data set has been depos-
Thus, public archives can harness                                                        ited. Since the footnote affects the
the regular scientific process to             1. This paper has benefited immeasurably   magnitude of the contribution of the schol-
build value into their collections.        from innumerable conversations I have had     arly work, commenting on this is probably
                                           with many groups and individuals. For many    the reviewer's responsibility. Moreover,
Moreover, the average size of a            helpful comments on previous versions of      suggesting to authors in reviews that they
replication data set in political sci-     this paper, I am especially grateful to Jim   include this footnote, and deposit their data
ence is under a megabyte. Even if          Alt, Neal Beck, Robert X. Browning, John      in a public archive, will help remind authors
as many as 100 replication data sets       DiIulio, John Green, Matthew Holden, Gary     and perhaps editors of this useful method of
                                           Klass, David Laitin, Malcolm Litchfield,      scholarly contribution. Journals also could
were submitted to an archive each          Ken Meier, Jonathan Nagler, Bob Putnam,       include requests to evaluate the footnote on
year, approximately $600 for a giga-       Richard Rockwell, Phil Schrodt, and Sid       replication when they send out their request
byte of hard disk space and a con-         Verba. I also thank the National Science      for a review.

September 1995
Symposium

                                              Fienberg, Stephen E., Margaret E. Martin,      Pendlebury, David. 1994. (Institute for Sci-
References                                       and Miron L. Straf, eds., Sharing Re-          entific Information.) Telephone conversa-
                                                 search Data. Washington, DC: National          tion. 8 September.
Boruch, Robert F., and David S. Cordray.         Academy Press.                              Sieber, Joan E., ed., 1991. Sharing Social
   1985. "Professional Codes and Guide-       Griffin, Larry, and Charles C. Ragin. 1994.       Science Data: Advantages and Chal-
   lines in Data Sharing." In Sharing Re-        "Formal Methods of Qualitative Analy-          lenges, Newbury Park, CA: Sage Publi-
   search Data, Stephen E. Fienberg, Mar-        sis," a special issue of Sociological          cations.
   garet E . Martin, and Miron L. Straf,         Methods and Research 23, 1.
   eds., 199-223. Washington, DC: National    Hamilton, David P. 1991. "Research Papers:
   Academy Press.                                Who's Uncited Now?" Science, 4 Janu-
Ceci, Stephen, and Elaine Walker. 1983.          ary, p. 25.
   "Private Archives and Public Needs,"       King, Gary. 1991. "On Political Methodolo-
   American Psychologist 38 (April):414-23.      gy," Political Analysis 2: pp. 1-30.
Dewald, William G., Jerry G. Thursby, and     King, Gary, John M. Bruce, and Michael         About the Author
   Richard G. Anderson. 1986. "Replication       Gilligan. 1993. "The Science of Political
   in Empirical Economics: The Journal of        Science Graduate Admissions," PS:           Gary King is professor of government in the
   Money, Credit, and Banking Project,"          Political Science & Politics 4 (Decem-      department of government at Harvard Uni-
   American Economic Review 76, 4 (Sep-          ber):772-78.                                versity. H e is currently finishing a book en-
   tember):587-603.                           King, Gary, Robert 0. Keohane, and Sidney      titled A Solution to the Ecological Inference
Diamond, A. M., Jr. 1988. "The Polywater         Verba. 1994. Designing Social Inquiry:      Problem and, with Brad Palmquist, is com-
   Episode and the Appraisal of Theories."       Scientific Inference in Qualitative Re-     pleting (and will deposit with the ICPSR!) a
   In Scrutinizing Science, A. Donovan et        search. Princeton: Princeton University     large collection of merged precinct-level
   al., eds. New York: Kluwer Academic           Press.                                      election and census data. King can be
   Publishers.                                Leimer, D. R., and Lesnoy, S. D. 1982.         reached at Littauer Center, North Yard,
Feigenbaum, Susan, and David M. Levy.            "Social Security and Private Saving,"       Cambridge, Massachusetts 02138; e-mail:
   1993. "The Market for (1r)reproducible        Journal of Political Economy 90:60&29.      gking@harvard.edu; phone (617) 495-2027.
   Econometrics (with comments and re-        Neuliep, James W., ed. 1991. Replication       The current version of this manuscript is
   sponse)," Social Epistemology 7, 3            Research in the Social Sciences. New-       available via gopher or anonymous FTP
   (July-September):215-92.                      bury Park, CA: Sage Publications.           from haavelmo.harvard.edu.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:25
posted:8/8/2011
language:English
pages:9