Lamberton The Dryad data repository wiki by alicejenny

VIEWS: 3 PAGES: 11

									Research epidemiologists’ views

     Dr Poppy Lamberton
      (and Colleagues)
   Imperial College London
                           Advantages
In general more benefits for data users, modellers etc. than data producers:

 Increased availability of data
    Mathematical modellers
    Meta-analyses
    Re-analysis, may locate errors =>more accurate analysis

 Increased information e.g. Control lines, missing data
    But is the data set fully understood

 Speed of obtaining data if no permission required

 Negative results
    Rarely published, could reduce repetition
    Animal ethics e.g. rarely publish number of deaths.

 Data collection costly financially and temporally
              Possible disadvantages
 Misuse of data
   Simply getting it wrong
        Confusion on exact methods, controls, margins of error , missing data etc.
    Ethics: ethical clearance often very project specific.

 Delayed publication until other papers ready to submit.

 Possible reduction in funding for data collection as more data
  already available from previous projects

 Fear that in the long run data collectors feel under valued leading
  to a reduction in that type of research and therefore a loss of
  empirical data overall

 Reduction in collaborations.
         What’s already out there. e.g
         schistosomes and other NTDs
 SCORE – Want everything available immediately even if
    before publication
   CONTRAST – open access with key partners and others
    if requested
   SCI –Nothing official, but in theory after results
    published
   NHM – Grant application for repository for any
    schistosome ( snail, adult worm, larvae; epidemiological
    to molecular) data to be available (associated with
    SCORE.
   APOC – little data widely available
                       General concerns
 Will people need any form of permission to use the data?

 Data sets often huge – how feasible is this.

 ‘Timely’ release of data to repository – proposed as up to one year
  post publication
    Influenza epidemic: required continuous data release
    Long term control programmes:
        E.g. SCI, CONTRAST, APOC/OCP etc.
            historical data (3-4 year up to 25-30 year old data)
            Additional longitudinal data, new techniques: when is analysis complete?

 Premature publication through fear of being beaten to it

 If you didn’t want to publish your data would this have a negative
  impact on your chance of research publication
           Ethics and data ownership
 Ethical clearance often very project specific not generic
 ‘Dryad is not designed to host data that should never be publicly
  exposed, such as patient records’
         What constitutes such data
            Sex and age: too personal for publication in some journals without written
             consent from every individual for that particular investigation.
         ‘Personal data’ often vital in the analysis

 Control programmes
    Who owns the data
    Views on data release varies from country to country
         Must protect the interest of developing countries
 Multicentre studies
    Happy to release data to some institutes but not others:
    Or data often owned by authors but permission needed from every
     author to release.
 Will they release at all if going to be published with open access?
                  Responsibilities?
 Financial
    Proposed to be covered by the journals, but is this assured or
     will some of the costs be covered by increased publication
     costs?

 Presentation of data – takes time
    Tidying of data set
    Detailed variable explanations
    Missing data

 Ethics
    Whose responsibility to police use
    SCI M&E indicators for longitudinal
                 studies
• Demographic: age, gender, weight, height
• Parasitological exams: egg counts (multiple days)
• Ultrasound exams
• Clinical exams
• Self reported symptoms through personal
  questionnaires
• Blood tests: haematuria, Hb counts
• Miracidia stored from multiple stool/urine samples
  over multiple days 30++
    Number of persons treated (millions) for schistosomiasis
     and STH in SCI-supported countries from 2003-2008.
                     Cumulative Treatments delivered = 44.644 million
                                    **
                                                                                                              Total by
      Year             Uganda         Burkina           Niger       Mali       Tanzania         Zambia
                                       Faso                                                                       year

       2003             0.433             0               0            0          0.100             0             0.533

       2004             1.230           1.027           0.672          0          0.442             0             3.371

       2005             2.988           2.296           2.010       2.598         2.952             0             12.844

       2006             1.511           2.819           1.560       2.175         0.384          0.556            9.005

       2007             1.812           0.750           2.066       0.647         2.650          0.245            8.170

       2008            1.497 *          2.697          5.284 *        0*          1.243             0             10.721

    Total by
    country             9.47            9.59           11.59        5.42          7.77           0.80             44.64
* Treatment incorporated into the new integrated NTD control programme
** Burkina Faso was the first country in the WHO African Region to achieve nationwide coverage with anthelminthic drugs against
three major so-called neglected tropical diseases (NTDs), namely lymphatic filariasis, schistosomiasis and STH.

                                                  Fenwick., A., Webster, J.P., Bosque-Oliva, E. et al., (2009) Parasitology
                       Suggestions
 Modify the repository to be more of an archive of what
 kind of data is available, including geographical
 locations, collection dates, methods and sample sizes
       Example / subsample of raw data
       Authors contact details (2 or more)
         But permission for use needed from how many and who?

 Need to counter balance benefits for data users with
  incentives for data producers, increase in H index?
 Must leave some control with the people who have
  recorded the data, particularly the countries from
  which it is collected
Thank you

								
To top