INTRODUCTION TO BIOSTATISTICS by ewghwehws

VIEWS: 8 PAGES: 63

									INTRODUCTION TO
BIOSTATISTICS

 DR.S.Shaffi Ahamed
 Asst. Professor
 Dept. of Family and Comm. Medicine
 KKUH
This session covers:
  Background and need to know
   Biostatistics
  Origin and development of Biostatistics
  Definition of Statistics and Biostatistics
  Types of data
  Graphical representation of a data
  Frequency distribution of a data
 “Statistics is the science which deals
 with collection, classification and
 tabulation of numerical facts as the
 basis for explanation, description
 and comparison of phenomenon”.

           ------ Lovitt
“BIOSTATISICS”
 (1) Statistics arising out of biological
  sciences, particularly from the fields of
  Medicine and public health.
 (2) The methods used in dealing with
  statistics in the fields of medicine, biology
  and public health for planning,
  conducting and analyzing data which
  arise in investigations of these branches.
Origin and development of
statistics in Medical Research
 In 1929 a huge paper on application of
  statistics was published in Physiology
  Journal by Dunn.
 In 1937, 15 articles on statistical methods
  by Austin Bradford Hill, were published in
  book form.
 In 1948, a RCT of Streptomycin for
  pulmonary tb., was published in which
  Bradford Hill has a key influence.
 Then the growth of Statistics in Medicine
  from 1952 was a 8-fold increase by 1982.
                                                C.R. Rao
Douglas Altman   Ronald Fisher   Karl Pearson




Gauss -
Basis
Sources of Medical
Uncertainties
 1. Intrinsic due to biological,
    environmental and sampling factors
 2. Natural variation among methods,
    observers, instruments etc.
 3. Errors in measurement or assessment
    or errors in knowledge
 4. Incomplete knowledge
Intrinsic variation as a
source of medical
uncertainties
  Biological due to age, gender, heredity, parity, height,
   weight, etc. Also due to variation in anatomical,
   physiological and biochemical parameters
  Environmental due to nutrition, smoking, pollution,
   facilities of water and sanitation, road traffic, legislation,
   stress and strains etc.,
  Sampling fluctuations because the entire world cannot
   be studied and at least future cases can never be
   included
  Chance variation due to unknown or complex to
   comprehend factors
Natural variation despite
best care as a source of
uncertainties
  In assessment of any medical parameter
  Due to partial compliance by the patients
  Due to incomplete information in
   conditions such as the patient in coma
Medical Errors that cause
Uncertainties
  Carelessness of the providers such as physicians,
   surgeons, nursing staff, radiographers and
   pharmacists.
  Errors in methods such as in using incorrect quantity or
   quality of chemicals and reagents, misinterpretation of
   ECG, using inappropriate diagnostic tools,
   misrecording of information etc.
  Instrument error due to use of non-standardized or
   faulty instrument and improper use of a right
   instrument.
  Not collecting full information
  Inconsistent response by the patients or other subjects
   under evaluation
Incomplete knowledge as a
source of Uncertainties
  Diagnostic, therapeutic and prognostic
   uncertainties due to lack of knowledge
  Predictive uncertainties such as in
   survival duration of a patient of cancer
  Other uncertainties such as how to
   measure positive health
Biostatistics is the
science that helps in
managing medical
uncertainties
Reasons to know about
biostatistics:
  Medicine is becoming increasingly
   quantitative.
  The planning, conduct and interpretation
   of much of medical research are
   becoming increasingly reliant on the
   statistical methodology.
  Statistics pervades the medical literature.
CLINICAL MEDICINE

  Documentation of medical history of
   diseases.
  Planning and conduct of clinical studies.
  Evaluating the merits of different
   procedures.
  In providing methods for definition of
   “normal” and “abnormal”.
  Role of Biostatistics in
  patient care
 In increasing awareness regarding diagnostic,
  therapeutic and prognostic uncertainties and
  providing rules of probability to delineate those
  uncertainties
 In providing methods to integrate chances with value
  judgments that could be most beneficial to patient
 In providing methods such as sensitivity-specificity
  and predictivities that help choose valid tests for
  patient assessment
 In providing tools such as scoring system and expert
  system that can help reduce epistemic uncertainties
PREVENTIVE MEDICINE

 To provide the magnitude of any health
  problem in the community.
 To find out the basic factors underlying
  the ill-health.
 To evaluate the health programs which
  was introduced in the community
  (success/failure).
 To introduce and promote health
  legislation.
Role of Biostatics in Health
Planning and Evaluation
  In carrying out a valid and reliable health
   situation analysis, including in proper
   summarization and interpretation of data.

  In proper evaluation of the achievements
   and failures of a health programme
Role of Biostatistics in
Medical Research
  In developing a research design that can
   minimize the impact of uncertainties
  In assessing reliability and validity of
   tools and instruments to collect the
   infromation
  In proper analysis of data
Example: Evaluation of Penicillin (treatment
A) vs Penicillin & Chloramphenicol
(treatment B) for treating bacterial
pneumonia in children< 2 yrs.
 What is the sample size needed to demonstrate the significance
  of one group against other ?
 Is treatment A is better than treatment B or vice versa ?
 If so, how much better ?
 What is the normal variation in clinical measurement ? (mild,
  moderate & severe) ?
 How reliable and valid is the measurement ? (clinical &
  radiological) ?
 What is the magnitude and effect of laboratory and technical
  error ?
 How does one interpret abnormal values ?
WHAT DOES STAISTICS
COVER ?
        Planning
        Design
        Execution (Data collection)
        Data Processing
        Data analysis
        Presentation
        Interpretation
        Publication
 BASIC CONCEPTS
  Data : Set of values of one or more variables recorded
  on one or more observational units

  Sources of data    1. Routinely kept records
                     2. Surveys (census)
                     3. Experiments
                     4. External source
Categories of data
 1. Primary data: observation, questionnaire, record form,
    interviews, survey,
 2. Secondary data: census, medical record,registry
TYPES OF DATA

 QUALITATIVE DATA
 DISCRETE QUANTITATIVE
 CONTINOUS QUANTITATIVE
QUALITATIVE

Nominal
 Example: Sex ( M, F)
         Exam result (P, F)
         Blood Group (A,B, O or AB)
         Color of Eyes (blue, green,
                       brown, black)
ORDINAL
  Example:
       Response to treatment
        (poor, fair, good)
       Severity of disease
        (mild, moderate, severe)
       Income status (low, middle,
         high)
QUANTITATIVE (DISCRETE)

 Example: The no. of family members
         The no. of heart beats
         The no. of admissions in a day

QUANTITATIVE (CONTINOUS)

 Example: Height, Weight, Age, BP,
 Serum
        Cholesterol and BMI
Discrete data -- Gaps between possible values



             Number of Children

        Continuous data -- Theoretically,
        no gaps between possible values




                     Hb
 CONTINUOUS DATA



   QUALITATIVE DATA

wt. (in Kg.) : under wt, normal & over wt.
Ht. (in cm.): short, medium & tall
Table 1 Distribution of blunt injured patients
according to hospital length of stay
 hospital length of stay   Number      Percent
     1 – 3 days            5891          43.3
     4 – 7 days            3489          25.6
     2 weeks               2449          18.0
     3 weeks                813            6.0
     1 month                417            3.1
    More than 1 month       545            4.0
 Total                    14604         100.0
 Mean = 7.85 SE = 0.10
Scale of measurement
 Qualitative variable:
 A categorical variable

 Nominal (classificatory) scale
   - gender, marital status, race

 Ordinal (ranking) scale
    - severity scale, good/better/best
    Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous

Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.

- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
  No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale:
Data is presented in frequency distribution in
 logical order. A meaningful ratio exists.

 - Age, weight, height, pulse rate
 - pulse rate of 120 is twice as fast as 60
 - person with weight of 80kg is twice as heavy
 as the one with weight of 40 kg.
Scales of Measure

    Nominal – qualitative classification of
     equal value: gender, race, color, city
    Ordinal - qualitative classification
     which can be rank ordered:
     socioeconomic status of families
    Interval - Numerical or quantitative
     data: can be rank ordered and sizes
     compared : temperature
    Ratio - Quantitative interval data along
     with ratio: time, age.
CLINIMETRICS
 A science called clinimetrics in which
   qualities are converted to meaningful
   quantities by using the scoring system.

 Examples: (1) Apgar score based on
   appearance, pulse, grimace, activity and
   respiration is used for neonatal prognosis.
 (2) Smoking Index: no. of cigarettes, duration,
   filter or not, whether pipe, cigar etc.,
 (3) APACHE( Acute Physiology and Chronic
   Health Evaluation) score: to quantify the
   severity of condition of a patient
                    INVESTIGATION

                                         Data Colllection



                                                      Inferential Statistiscs
                      Descriptive Statistics
Data Presentation
                                                    Estimation       Hypothesis   Univariate analysis
                      Measures of Location
   Tabulation                                                        Testing
                     Measures of Dispersion
   Diagrams                                                  Ponit estimate       Multivariate analysis
                    Measures of Skewness &
    Graphs                                                  Inteval estimate
                           Kurtosis
Frequency Distributions

   data distribution – pattern of
    variability.
     the center of a distribution
     the ranges
     the shapes
   simple frequency distributions
   grouped frequency distributions
     midpoint
Tabulate the hemoglobin values of 30 adult
        male patients listed below

      Patien Hb       Patien Hb       Patien Hb
      t No   (g/dl)   t No   (g/dl)   t No   (g/dl)
      1      12.0     11     11.2     21     14.9
      2      11.9     12     13.6     22     12.2
      3      11.5     13     10.8     23     12.2
      4      14.2     14     12.3     24     11.4
      5      12.3     15     12.3     25     10.7
      6      13.0     16     15.7     26     12.5
      7      10.5     17     12.6     27     11.8
      8      12.8     18     9.1      28     15.1
      9      13.2     19     12.9     29     13.4
      10     11.2     20     14.6     30     13.1
        Steps for making a
        table
Step1   Find Minimum (9.1) & Maximum (15.7)

Step2   Calculate difference 15.7 – 9.1 = 6.6

Step3    Decide the number and width of
        the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----

Step4   Prepare dummy table –
        Hb (g/dl), Tally mark, No. patients
              DUMMY TABLE                          Tall Marks TABLE
Hb (g/dl)      Tall marks   No.        Hb (g/dl)     Tall marks   No.
                            patients                              patients


 9.0 – 9.9                              9.0 – 9.9    l            1
10.0 – 10.9                            10.0 – 10.9   lll          3
11.0 – 11.9                            11.0 – 11.9   lll          6
12.0 – 12.9                            12.0 – 12.9
13.0 – 13.9
                                                     llll llll    10
                                       13.0 – 13.9
14.0 – 14.9                            14.0 – 14.9   llll         5
15.0 – 15.9                            15.0 – 15.9                3
                                                     lll          2
                                                     ll
Total                                  Total         -            30
Table Frequency distribution of 30 adult male
               patients by Hb
             Hb (g/dl)       No. of
                            patients
             9.0 – 9.9         1
           10.0 – 10.9         3
            11.0 – 11.9        6
           12.0 – 12.9        10
           13.0 – 13.9         5
           14.0 – 14.9         3
           15.0 – 15.9         2
               Total          30
Table Frequency distribution of adult patients by
               Hb and gender:
           Hb                Gender        Total
          (g/dl)
                      Male        Female


           <9.0         0             2      2
         9.0 – 9.9      1             3      4
        10.0 – 10.9     3             5      8
        11.0 – 11.9     6             8     14
        12.0 – 12.9    10             6     16
        13.0 – 13.9     5             4      9
        14.0 – 14.9     3             2      5
        15.0 – 15.9     2             0      2


          Total        30             30    60
                  Elements of a Table
Ideal table should have Number
                        Title
                        Column headings
                        Foot-notes
Number –     Table number for identification in a report

Title,place -       Describe the body of the table, variables,
Time period         (What, how classified, where and when)

Column -     Variable name, No. , Percentages (%), etc.,
Heading

Foot-note(s) - to describe some column/row headings,
              special cells, source, etc.,
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976

                         No. of divisions
      Death rate (/1000 per annum)
           7.0-7.9          4 (3.3)
          8.0 - 8.9       13 (10.8)
          9.0 - 9.9       20 (16.7)
         10.0 - 10.9      27 (22.5)
         11.0 - 11.9      18 (15.0)
         12.0 - 12.9       11 (0.2)
         13.0 - 13.9       11 (9.2)
         14.0 - 14.9        6 (5.0)
         15.0 - 15.9        2 (1.7)
         16.0 - 16.9        4 (3.3)
         17.0 - 18.9        3 (2.5)
           19.0 +           1 (0.8)
            Total        120 (100.0)


             Figures in parentheses indicate percentages
DIAGRAMS/GRAPHS

Discrete data
   --- Bar charts (one or two groups)

Continuous data
  --- Histogram
  --- Frequency polygon (curve)
  --- Stem-and –leaf plot
  --- Box-and-whisker plot
Example data

  68   63   42   27   30   36   28   32
  79   27   22   28   24   25   44   65
  43   25   74   51   36   42   28   31
  28   25   45   12   57   51   12   32
  49   38   42   27   31   50   38   21
  16   24   64   47   23   22   43   27
  49   28   23   19   11   52   46   31
  30   43   49   12
Histogram
               20
   Frequency




               10




                0

                    11.5   21.5   31.5   41.5   51.5   61.5   71.5
                                         Age


               Figure 1 Histogram of ages of 60 subjects
Polygon

               20
   Frequency




               10




               0

                    11.5   21.5   31.5   41.5   51.5   61.5   71.5
                                         Age
Example data

  68   63   42   27   30   36   28   32
  79   27   22   28   24   25   44   65
  43   25   74   51   36   42   28   31
  28   25   45   12   57   51   12   32
  49   38   42   27   31   50   38   21
  16   24   64   47   23   22   43   27
  49   28   23   19   11   52   46   31
  30   43   49   12
         Stem and leaf plot
Stem-and-leaf of Age     N = 60
Leaf Unit = 1.0


   6    1 122269
 19     2 1223344555777788888
 (11) 3 00111226688
 13    4 2223334567999
  5    5 01127
  4    6 3458
  2    7 49
Box plot

           80

           70

           60

           50
     Age




           40

           30

           20

           10
Descriptive statistics report:
Boxplot
  - minimum score
  - maximum score
  - lower quartile
  - upper quartile
  - median
  - mean



 - the skew of the distribution:
    positive skew: mean > median & high-score whisker is longer
    negative skew: mean < median & low-score whisker is longer
                Pie Chart
                               •Circular diagram – total -100%
        10%
                               •Divided into segments each
                               representing a category
              20%   Mild
                               •Decide adjacent category
                    Moderate
                    Severe     •The amount for each category is
 70%                           proportional to slice of the pie




The prevalence of different degree of
           Hypertension
         in the population
                  Bar Graphs
         25
                                                        Heights of the bar indicates
                       20                          20
         20
                                                        frequency
                                16
Number




         15       12                      12
              9                                8        Frequency in the Y axis
         10
          5                                             and categories of variable
          0                                             in the X axis
              Smo Alc Chol DM HTN No F-H
                                  Exer                  The bars should be of equal
                            Risk factor
                                                        width and no touching the
                                                        other bars
               The distribution of risk factor among cases with
                           Cardio vascular Diseases
HIV cases enrolment in
USA by gender
                                         Bar chart
                       12
Enrollment (hundred)




                       10
                       8
                       6
                                                                 Men
                       4                                         Women
                       2
                       0
                            1986 1987 1988 1989 1990 1991 1992

                                          Year
                          HIV cases Enrollment
                          in USA by gender
                                                           Stocked bar chart
                         18
                         16
Enrollment (Thousands)




                         14
                         12
                         10
                          8                                         Women
                          6                                         Men
                          4
                          2
                          0
                              1986 1987 1988 1989 1990 1991 1992
                                             Year
Graphic Presentation of
Data
   the frequency polygon
   (quantitative data)



   the histogram
   (quantitative data)



   the bar graph
   (qualitative data)
General rules for designing
graphs
  A graph should have a self-explanatory
   legend
  A graph should help reader to understand
   data
  Axis labeled, units of measurement
   indicated
  Scales important. Start with zero (otherwise
   // break)
  Avoid graphs with three-dimensional
   impression, it may be misleading (reader
   visualize less easily
Any Questions
Origin and development of
statistics in Medical Research
 In 1929 a huge paper on application of
  statistics was published in Physiology
  Journal by Dunn.
 In 1937, 15 articles on statistical methods
  by Austin Bradford Hill, were published in
  book form.
 In 1948, a RCT of Streptomycin for
  pulmonary tb., was published in which
  Bradford Hill has a key influence.
 Then the growth of Statistics in Medicine
  from 1952 was a 8-fold increase by 1982.

								
To top