Types Of Data by gM9zk03g


									MATH 2441                     Probability and Statistics for Biological Sciences

                                               Types of Data
Statistics deals with the organization, summarization, and analysis of the implications of experimental
observations or measurements. How these operations are carried out, and what sort of mathematical
methods are appropriate depends on the nature of the observations. Although most of what is mentioned in
this document is "common sense", it is important to be aware of the issue.

Whenever we make a measurement or an observation, we are measuring or observing something. In
statistics, that "something" is called a variable, because different measurements or observations of its
"value" may be different -- they may vary over a set or range of possibilities.

There are at least three important ways to classify different types of data in statistics.

First, there is the distinction between qualitative data and quantitative data:
             the term qualitative comes from the word "quality", indicating a property, characteristic,
              feature or attribute. Qualitative data is always a list of words or names of a characteristic.
              Examples of qualitative variables (which have qualitative "values") are the flavor of ice cream,
              the color of a person's eyes or hair, the species of a selected life form, the brand of potato chip
              selected by a customer, the presence or absence of a particular genetic feature, etc.
             the term quantitative comes from the word "quantity", indicating amount, measure, number,
              size, etc. Quantitative data is always a list of numerical values where the numbers are more
              than just names, but actually represent measured numerical values. Examples of quantitative
              variables that might be considered in studying the population of BCIT students are the height
              of a student, the age of a student, the number of apples the student ate in the past week.
              However, the student ID number is a qualitative variable rather than a quantitative variable,
              since it is in some way equivalent to a name for that student. The numerical digits in a student
              number are not intended to indicate the measure or size or amount of something that that
              student has.

         Sometimes numerical digits are used to represent qualitative values. Thus, the players on a sports
         team often have numbers on their shirts, but these numbers are qualitative labels, not quantitative
         values. Similarly, statisticians sometimes code qualitative values with numerical digits -- for
         example, letting the numerical digits 0 and 1 stand for the qualities "male" and "female",

         Grey areas can arise. For example when we use a scale of 1 - 5 to represent the range of
         responses to questions from "strongly disagree" to "strongly agree" on survey-type questionnaires,
         one could regard the result as qualitative (ie., one of the list of "strongly disagree", "disagree", "no
         opinion", "agree" or "strongly agree") or as qualitative (the values 1, 2, 3, 4, and 5 measuring the
         degree of agreement with the statement given).

         Arithmetic operations often make sense with qualitative data, but do not make sense with
         qualitative data.

Secondly there is the notion of scale, of which statisticians distinguish four kinds:
             nominal scales: the observation of the variable results in one of a set of characteristics or
              attributes, rather than a numerical value. The word "nominal" comes from the word "name",
              meaning that the observations will be names rather than numerical values. Nominal scales
              result in qualitative data. Examples of nominal scales are:
                       flavor (for example, the choice of ice cream purchased by a randomly selected
                        customer might be chocolate, vanilla, strawberry, etc. There is no numerical or
                        quantitative relationship between these flavors: we can't say that vanilla is twice as
                        much as chocolate or that vanilla is five more than chocolate, etc.)
                       gender (possibilities usually are just "male" or "female")

David W. Sabo (1999)                            Types of Data                                         Page 1 of 3
                      species or variety (if we're talking about, say, lettuce plants, the observed variety
                       might be romaine, buttercrunch, iceberg, leaf, etc.)
                      genetic phenotypes

                  Although people sometimes use numerical codes to label such attributes (for example, we
                  might record ice cream flavors as 1 = chocolate, 2 = vanilla, 3 = strawberry, etc.), these
                  numerical codes are still just names, not values. We know this is so, because in this
                  particular example, it makes no sense to talk about flavor 2.5 being halfway between
                  vanilla and strawberry, for instance. Nominal scales have no natural ordering from least to
                  greatest, or smallest to biggest, or even some intrinsic notion of first to last.

             ordinal scales: the possible observations of the variable form a set which has a natural order
              -- the observations can be ranked in some order. Ordinal scales can result in either qualitative
              or quantitative data. Purely ordinal scales are not as common in technical applications as the
              other three because usually, the natural order is a result of numerical value, and so the data
              really belongs to the last two types described below. However, a couple of simple examples of
              an ordinal scale are:
                       alphabetic order
                       sets of levels (for example, a school student is classified as being in grade 1, or grade
                        2, or grade 3, etc. There's no implication that a student in grade 2 knows twice as
                        much stuff as a student in grade 1 (though it is true that a student completing grade 2
                        has completed one grade more than a person completing grade 1, and so in this
                        sense, the grade level completed can be regarded as an interval scale). However,
                        there is a notion of increasing knowledge and skill as one progresses from one grade
                        to the next through the system.)
                       numerical categories (for example, rather than record the actual weight gain of mice
                        on a particular diet -- which would result in a ration scale, see below -- we might
                        simply categorize the observations as weight loss, no change, small gain, moderate
                        gain, and large gain. These five possibilities represent an increase amount of weight
                        gain, but only indicate relative ranking, not precise relative size.)
                       rating scales (you see these in surveys where you are asked to select responses to a
                        sentence from the set of strongly disagree, disagree, no opinion, agree, strongly
                        agree, etc.)

                  Ordinal scales are most often used in biological applications when it is not possible or
                  feasible to work with either an interval or a ratio scale, but the data reflects some sort of
                  ordering or size property.

             interval scales form the first of two distinctly numerical or quantitative scales. In an interval
              scale, differences between observed values have significance, but their ratio does not.
              Another way of saying this is that interval scales do not have a true zero. Examples of interval
              scales are:
                       the celsius (or fahrenheit) temperature scales. A temperature difference between 40
                                0                                                         0       0
                        and 20 is the same as the temperature difference between 70 and 50 (for example,
                                                                                                      0      0
                        it would take as much heat to raise the temperature of some water from 20 to 40 as
                                                                               0       0
                        it would to raise the temperature of that water from 50 to 70 ). However, it does not
                                                     0                              0
                        make sense to speak of 40 C as being twice as hot as 20 C. Nor does it make sense
                        to talk of a temperature of 0 C indicating the absence of temperature or the absence
                        of heat.
                       time scales are interval scales.

             ratio scales are interval scales that also have a natural zero, so that ratios of values (and not
              just differences between values) are meaningful. Examples are:
                        concentrations (a 2 M solution is twice as concentrated as a 1 M solution. A 0 M
                         solution indicates a solution containing no solute.)
                        measurements of size relative to some standard (for example, measurements of
                         length in meters. A plant 1.5 m tall is twice as tall as a plant which is 0.75 m tall. A
                         mouse which weighs 36 g is twice as heavy as a mouse which weighs just 18 g.)

                  Like interval scales, ratio scales are always numerical.

         In brief summary, we can say:

Page 2 of 3                                      Types of Data                             David W. Sabo (1999)
                       a nominal scale consists of an unordered set of qualitative "values"
                       an ordinal scales looks like a nominal scale, but with the possible "values" having a
                        meaningful or natural ordering from first to last, or least to greatest, etc.
                       an interval scale looks like an ordinal scale (has ordering), but with the differences
                        between possible values also being meaningful
                       a ratio scale looks like an interval scale, but with the ratios of possible values also
                        being meaningful.

Thirdly, when dealing with quantitative data or variables, it will be necessary to distinguish between
discrete and continuous data, and their corresponding variables.
            the possible values of discrete variables form a set of distinct, isolated quantities.
             Observations that result from counting objects or items give discrete data, since only whole
             number values can arise. Thus, the number of "heads" observed when a coin is flipped four
             times is a discrete quantity, because the only possible values that can arise are 0, 1, 2, 3, or 4.
             The number of mice in a sample of six which have a certain genetic mutation will be a discrete
             value, since the only values that can arise are 0, 1, 2, 3, 4, 5, or 6.
            the possible values of a continuous variable form an unbroken set of decimal values, with at
             most a finite number of distinct gaps. Continuous variables usually result from measurements
             made relative to a standard scale of size: for example, length, mass, time, temperature, etc.
             Thus, the mass of a mouse selected at random is a value from a continuous scale, since in
             principle, any value between 0 g (a very light mouse!) and some maximum value could occur.

         The distinction between discrete and continuous variables is quite important from a methodological
         point of view. Methods for solving problems involving continuous variables almost always are
         based on concepts from calculus, whereas methods for solving problems involving discrete
         variables often just involve simple arithmetic or algebra. Both discrete and continuous variables
         arise in biological sciences applications, though continuous variables are quite a bit more common.

David W. Sabo (1999)                            Types of Data                                         Page 3 of 3

To top