Lamberton The Dryad data repository wiki by alicejenny


									Research epidemiologists’ views

     Dr Poppy Lamberton
      (and Colleagues)
   Imperial College London
In general more benefits for data users, modellers etc. than data producers:

 Increased availability of data
    Mathematical modellers
    Meta-analyses
    Re-analysis, may locate errors =>more accurate analysis

 Increased information e.g. Control lines, missing data
    But is the data set fully understood

 Speed of obtaining data if no permission required

 Negative results
    Rarely published, could reduce repetition
    Animal ethics e.g. rarely publish number of deaths.

 Data collection costly financially and temporally
              Possible disadvantages
 Misuse of data
   Simply getting it wrong
        Confusion on exact methods, controls, margins of error , missing data etc.
    Ethics: ethical clearance often very project specific.

 Delayed publication until other papers ready to submit.

 Possible reduction in funding for data collection as more data
  already available from previous projects

 Fear that in the long run data collectors feel under valued leading
  to a reduction in that type of research and therefore a loss of
  empirical data overall

 Reduction in collaborations.
         What’s already out there. e.g
         schistosomes and other NTDs
 SCORE – Want everything available immediately even if
    before publication
   CONTRAST – open access with key partners and others
    if requested
   SCI –Nothing official, but in theory after results
   NHM – Grant application for repository for any
    schistosome ( snail, adult worm, larvae; epidemiological
    to molecular) data to be available (associated with
   APOC – little data widely available
                       General concerns
 Will people need any form of permission to use the data?

 Data sets often huge – how feasible is this.

 ‘Timely’ release of data to repository – proposed as up to one year
  post publication
    Influenza epidemic: required continuous data release
    Long term control programmes:
        E.g. SCI, CONTRAST, APOC/OCP etc.
            historical data (3-4 year up to 25-30 year old data)
            Additional longitudinal data, new techniques: when is analysis complete?

 Premature publication through fear of being beaten to it

 If you didn’t want to publish your data would this have a negative
  impact on your chance of research publication
           Ethics and data ownership
 Ethical clearance often very project specific not generic
 ‘Dryad is not designed to host data that should never be publicly
  exposed, such as patient records’
         What constitutes such data
            Sex and age: too personal for publication in some journals without written
             consent from every individual for that particular investigation.
         ‘Personal data’ often vital in the analysis

 Control programmes
    Who owns the data
    Views on data release varies from country to country
         Must protect the interest of developing countries
 Multicentre studies
    Happy to release data to some institutes but not others:
    Or data often owned by authors but permission needed from every
     author to release.
 Will they release at all if going to be published with open access?
 Financial
    Proposed to be covered by the journals, but is this assured or
     will some of the costs be covered by increased publication

 Presentation of data – takes time
    Tidying of data set
    Detailed variable explanations
    Missing data

 Ethics
    Whose responsibility to police use
    SCI M&E indicators for longitudinal
• Demographic: age, gender, weight, height
• Parasitological exams: egg counts (multiple days)
• Ultrasound exams
• Clinical exams
• Self reported symptoms through personal
• Blood tests: haematuria, Hb counts
• Miracidia stored from multiple stool/urine samples
  over multiple days 30++
    Number of persons treated (millions) for schistosomiasis
     and STH in SCI-supported countries from 2003-2008.
                     Cumulative Treatments delivered = 44.644 million
                                                                                                              Total by
      Year             Uganda         Burkina           Niger       Mali       Tanzania         Zambia
                                       Faso                                                                       year

       2003             0.433             0               0            0          0.100             0             0.533

       2004             1.230           1.027           0.672          0          0.442             0             3.371

       2005             2.988           2.296           2.010       2.598         2.952             0             12.844

       2006             1.511           2.819           1.560       2.175         0.384          0.556            9.005

       2007             1.812           0.750           2.066       0.647         2.650          0.245            8.170

       2008            1.497 *          2.697          5.284 *        0*          1.243             0             10.721

    Total by
    country             9.47            9.59           11.59        5.42          7.77           0.80             44.64
* Treatment incorporated into the new integrated NTD control programme
** Burkina Faso was the first country in the WHO African Region to achieve nationwide coverage with anthelminthic drugs against
three major so-called neglected tropical diseases (NTDs), namely lymphatic filariasis, schistosomiasis and STH.

                                                  Fenwick., A., Webster, J.P., Bosque-Oliva, E. et al., (2009) Parasitology
 Modify the repository to be more of an archive of what
 kind of data is available, including geographical
 locations, collection dates, methods and sample sizes
       Example / subsample of raw data
       Authors contact details (2 or more)
         But permission for use needed from how many and who?

 Need to counter balance benefits for data users with
  incentives for data producers, increase in H index?
 Must leave some control with the people who have
  recorded the data, particularly the countries from
  which it is collected
Thank you

To top