; Data Analysis with SPSS
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Data Analysis with SPSS

VIEWS: 13 PAGES: 15

  • pg 1
									Statistics – Spring 2008
                                                 Lab #2 – Descriptives

Descriptive analysis involves examining the characteristics of individual variables, as compared to inferential
statistics which examines the relationship between variables.

There are two types of variables -- categorical and continuous -- and the characteristics of interest for each variable
are different. For categorical variables, you are interested in the “count”, such as demographic characteristics of your
study (e.g., 50 males and 56 females). For continuous variables, there are many different characteristics to examine,
such as mean, median, mode, range, variability, etc., but the mean is typically the most useful descriptor.

This document explains how to examine characteristics of categorical and continuous variables. Descriptive analysis
involves the same SPSS commands as for Data Screening (e.g., Explore, Frequencies), so you are already intimately
familiar with how to conduct descriptive analysis.

This document also explains how to create composites by averaging together individual variables into new
composited variables. Compositing can involve a few different tasks, such as reverse coding items, averaging items
with different scale ranges, and conducting reliability analysis to determine if from a statistical point of view the
individual items “should” be averaged together. All those tasks are described below.

This document also explain new skills that are related to descriptive analysis you may want to learn, such as how to
transform a continuous variable into a categorical variable, how to create a new variable based upon the combination
of two or more variables, and how to use syntax.

1. Descriptive Statistics
    Your two options for descriptive analysis are: “Frequencies” command and “Explore” command.
      Both provide much of the same information, except
          a. Frequencies -- groups together the descriptive information into one grid; and displays histograms with
              a normal curve (whereas Explore displays histograms but without normal curves)
          b. Explore -- displays descriptive information for each variable separately; and displays boxplots
    Frequencies:
      1. Select Analyze --> Descriptive Statistics --> Frequencies
      2. Move all variables into the “Variable(s)” window.
      3. Click “Statistics” and put a checkmark next to every descriptive statistics you are interested in viewing.
      4. Click “Charts” and put a checkmark next to the chart type you are interested in viewing.
      5. Click OK.
    Output below is for the first four “demographic” questions.
    “Statistics” box provides a grid format of the descriptive statistics for each variable.




                                                                                                                           1
   After the Statistics box, the frequency distribution for each variable is displayed. Below is the frequency
    distribution for gender:




   Next comes the histograms. Below are the histograms for age and gender. I chose to display to you these two
    histograms because it illustrates how the “Frequency” histogram is useful for displaying both categorical and
    continuous variables. Also, notice that both histograms are not normally distributed. Not every variable needs
    to be normally distributed. Plus, categorical variables with few answer choices (e.g., 2, 3, 4, 5, 6) will rarely
    conform to a normal curve. Finally, in the age histogram notice the sharp drop-off below the “20” line. This is
    because we restricted participation in the study to people who were aged 18 or older.




                                                                                                                    2
   If you double-click on the histogram in the SPSS output viewer, it opens a new window containing the
    histogram with many new drop-down options to manipulate the histogram. There are too many options to
    explain them all, so feel free to try each one, and if you have specific questions, please let me know. However,
    one option I wanted to present to you was the ability to change the scale range on the histogram axis. For
    example, if you double-click on the “age” histogram, it opens a new window. Then, double-click on the
    horizontal axis, which opens the “Properties” window. Then, click on “scale”, and change the scale range from
    18 to 72 (which is the minimum and maximum in our sample), and change the increment value to 1. Click
    “Apply”. The new histogram for age is displayed below. Notice how much more information is displayed.




   Now lets use the Explore command:
    1. Select Analyze --> Descriptive Statistics --> Explore
    2. Move all variables into the “Variable(s)” window.
    3. Click “Plots” and unclick “Stem-and-leaf”
    4. Click “Options” and click “Exclude cases pairwise”
    5. Click OK.
   Output below is for only the four “system” variables in our dataset because copy/pasting the output for all
    variables in our dataset would take up too much space in this document.
   “Case Processing Summary” shows the number of cases that are valid, missing, total.




                                                                                                                  3
   “Descriptives” shows the same information as the “Frequencies” command, but now each variable is
    displayed separately.




                                                                                                       4
   Next, the boxplot for each variable is displayed. Below is the boxplot for “edu” because I want to show you a
    boxplot that contains both mild outliers (round dots) and extreme outliers (stars).




   What if you want descriptive statistics within groups? For example, imagine a study that manipulated the
    presence or absence of a weapon during a crime, and the Dependent Variable was measuring the level of
    emotional reaction to the crime. In addition to looking for descriptive statistics of your DV within the entire
    study (so collapsing across both groups), you may also want descriptive statistics for your DV within each
    group. Another example of when you would want descriptive statistics within groups is when your study
    involves a verdict choice. Typically, you not only report the percentage of guilty/not-guilty verdicts across the
    entire study, but you also want to report the percentage of guilty/not-guilty verdicts within each group in your
    study. I present an example of this situation on the next page, and how to present this data in a Figure.
   How to conduct descriptive statistics within groups:
   In our dataset about “Legal Beliefs”, let’s treat gender as the grouping variable because sometimes you also
    want to present the gender split amongst your variables:
    1. Select Analyze --> Descriptive Statistics --> Explore
    2. Move all variables into the “Variable(s)” window.
       Move “sex” into the “Factor List”
    3. Click “Statistics”, and click “Outliers”
    4. Click “Plots”, and unclick “Stem-and-leaf”
    5. Click OK.
   Output on next page is for “system1”



                                                                                                                    5
   “Descriptives” box tells you descriptive statistics about the variable. Notice that information for “males” and
    “females” is displayed separately.




   WRITE-UP: You typically discuss the characteristics of demographics in the beginning of the Method section,
    not the Results section, and you also typically only present data for gender (see below). If you want to discuss
    more than just gender, such as age, education, political afflitiation, income, etc., then you would create a
    Figure to display all the data. For descriptive statitics other than demographics, you would present that data in
    the Results section. If there are only a few descriptive statistics, you discuss them in the text of the Results
    section (see below). If there are many descriptive statistics, you present them in a Figure, and then discuss
    only the most pertinent information from the Figure when you are writing the Results/Discussion section.
        a. Here is a sample write-up for gender: “The sample consisted of 327 participants, with many more
            females (n = 248) than males (n = 76), and three participants who did not indicate gender.”
        b. Here is a sample write-up for how you would discuss descriptive statistics in the Results section:
            “When asked what percentage of people brought to trial did in fact commit the crime, the average
            response was 78%.”
        c. Here is a Figure (from another paper I wrote a few years ago):
            (FYI – see http://www.docstyles.com/apa15.htm for how to format Figures and Tables in APA format)

       Verdict Choices for Death Qualified and Excludable Jurors

                                  Witherspoon Witherspoon              Witt           Witt
                                  Excludable Includable                Excludable     Includable

       Verdict Choice

       1.    Guilty                  29.2%          77.9%           31.2%           71.8%

       2.    Not Guilty              70.8%          22.1%           68.8%           28.2%



                                                                                                                      6
      EVALUATION: Since “evaluating” descriptive statistics in Results sections or Figure is simply reading the
       descriptive statistics that are reported, I don’t have any advice for evaluating descriptive statistics other than to
       pay attention if there are any other descriptive statistics that were not reported that you may find helpful or
       would want the author to include in the paper.


2. Other graphs
    SPSS has the ability to create other types of graphs beyond histograms and boxplots, but they provide little
      information beyond the information provided by histograms and boxplots. The other charts are:
          a. Bar charts
          b. 3D bar charts
          c. Line charts
          d. Area charts
          e. Pie charts
    To access these charts:
      1. Select Graphs --> choose either “Chart Builder” or “Legacy Charts”
      2. Move chosen variables into the appropriate open spaces
      3. Click OK.
    “Legacy Charts” are the old way that SPSS builds charts. Each chart has a separate command window, each
      with its own unique options and characteristics. The options and characteristics are very straightforward and
      easy to use.
    “Chart Builder” is new to SPSS. It reportedly has more functionality, but it is also complex and sometimes
      difficult to manipulate. I would suggest first using the Legacy Charts to get a better understanding of each type
      of chart.


3. Composites – averaging items together
    Why do we create composites? The rule of thumb in statistics is “the more, the better”. In terms of measuring
      constructs, this means that you typically want to ask many questions about the same construct in order to
      adequately tap into the entire construct of interest. For example, in a study about happiness, asking, “how
      happy are you right now” perfectly maps onto the construct of “how happy you are right now”. But, if your
      intended construct is “happiness”, you need to ask more questions to tap the entire theoretical construct, such
      as “how happy do you feel”, “how happy are you with your life in general”, and etc. Thus, for every construct,
      researchers ask many questions by either using established scales of the topic, or creating their own measures
      to tap all the facets of the construct. When you analyze the data, you start by conducting descriptive analysis
      of each individual question. Then, you composite all 10 questions together into 1 variable by averaging
      together all 10 questions. Researchers are typically more interested in that 1 composite variable than the 10
      individual items (unless the 10 questions are uniquely taping different sub-parts of the entire construct, and the
      researchers are interested in each sub-part). So, after first conducting descriptive analysis of each item, you
      then conduct descriptive analysis of the 1 composite variable.
    How do you create a composite?
      1. Select Transform --> Compute Variable
      2. Type a new name for your composite in the “Target Variable” box.
      3. Drag “mean” from the “Function group” into the open box above
      4. Replace the question marks (?) with each item to be composited, separated by a comma (,)
      5. Click OK.
    The newly created composite will appear at the end of the data file.
    Is it appropriate to create a composite with my questions? We described above how to create a composite, but
      another question is whether its appropriate to create the composite given the questions and data in your study.
      You can answer that question from a theoretical point of view, and a statistical point of view. I describe below
      both points of view:
                                                                                                                       7
   From a theoretical point of view…
        a. From a theoretical point of view, it is possible your questions do not measure the same construct, and
             thus it is inappropriate to average them together. For example, the face content of each item may
             measure different concepts. Imagine questions about your political group orientation. A question about
             whether you “think” of yourself as a republican or democrat, may tap a different construct then if you
             ask whether you “feel” like a republican or democrat. You need to examine your questions and make a
             determination of whether you feel its appropriate to average the items together.
        b. Another option is create separate composites, one for each concept that is measured. For example,
             maybe you composite together all the questions about how you “feel” about your political group
             membership, and create another composite of the questions about how you “think” of your political
             group membership. After creating the separate composites, you can then also merge all the questions
             together (so merge all the separate composites together) into 1 big composite. In this case, you would
             call the separate composites you merged together the “sub-parts” or “sub-factors” of the 1 big
             composite. Also, from a theoretical point of view you need to decide how to label or characterize this
             big composite.
        c. It is acceptable to create composites from a theoretical point of view even if it is not appropriate from a
             statistical point of view. I discuss next the benchmarks for deciding whether or not its statistically
             appropriate to merge items together into a composite, but assuming those benchmarks are not met in
             your data, it is still appropriate to merge items together from a purely theoretical point of view.
             However, you must state in your paper that the statistical benchmarks were not met, and then explain
             the theoretical basis for why you are still merging the items together. (FYI – if the statistical
             benchmarks are met, then you rarely see researchers explain the theoretical basis for why the items
             were merged together.)
   From a statistical point of view…
        a. From a statistical point of view, it is possible your questions do not measure the same construct, and
             thus it is inappropriate to average them together. For example, you can use “Factor Analysis” to
             determine if the items fall into 1 big composite (called a “factor”), or if they fall into separate sub-
             factors. I will explain Factor Analysis at the end of the semester, but only if you request it. Factor
             analysis is not one of the more typical statistical tests. Instead, researchers decide how the items group
             together from a theoretical point of view, and then proceed to test their judgment by conducting
             “Reliability Analysis”, which provides a benchmark for determining whether or not the items group
             together. In other words, Reliability Analysis is called a “confirmatory” test because its confirming
             your decisions, whereas Factor Analysis is considered a “exploratory” test because it is used to explore
             which, if any, of the items group together into which set of factors or sub-factors.
        b. Reliability Analysis is rather straightforward to conduct:
    1. Select Analyze --> Scale --> Reliability Analysis
    2. Move all variables into the “Variable(s)” window.
    3. Click “Statistics” and put a checkmark next to “item” and “Scale if item deleted”
    4. Click OK.
   “Reliability Statistics” give you the “Alpha” number which is the determination of whether or not the items
    group together from a statistical point of view. Alpha ranges from 0 to 1, and the higher the number, the
    stronger the items group together statistically. Output below is for the three “prosecutor” questions.
    Alphas above .9 are great, above .8 are good, above .7 are ok, above .6 are borderline.
    In this case, Alpha=.68, which is acceptable to merge the three items together into a composite. Also, the
    smaller the sample, the more likely you will find smaller Alpha levels because there is less data to identify
    intercorrelations. In smaller samples, smaller Alpha levels are acceptable to create composites.




                                                                                                                      8
      The other output from the analysis is helpful to interpret your data. “Case Processing Summary” tells you the
       number of valid cases included in the analysis. Notice that only listwise deletion is possible. “Item Statistics”
       gives you descriptive information about each item. “Item-total Statistics” tells you the Alpha levels if each
       items is removed. Notice that Alpha improves to .78 if we remove “prosecutor3”. In this case, because there
       are so few items (e.g., 3), I would suggest not removing “prosecutor3”, even though it improves Alpha,
       because only 2 items is not much of a composite. If we were analyzing many items (e.g., 4+), then it would be
       more appropriate to exclude items.




      WRITE-UP: “The three items measuring attitudes toward prosecutors formed a reliable composite (α = .68).”
      EVALUATION: For each composite in the paper, the author(s) need to report the alpha level, which is the
       statistic that tells you whether or not the items group together statistically. Alpha is determined by the strength
       of the bivariate relationships amongst all the items in the composite. The higher the internal consistency
       amongst items, the higher the Alpha level. Alphas above .9 are great, above .8 are good, above .7 are ok,
       above .6 are borderline. Also, the smaller the sample, the more likely you will find smaller Alpha levels
       because there is less data to identify intercorrelations.


4. Items with different scale ranges
     If you are going to composite together multiple items, all the items need to have the same scale range.
     For example, lets say we ask two happiness questions: (1) “How happy are you right now?” on a 1-7 scale,
       and (2) “How happy do you feel?”, on a -3 to 3 scale. Notice that the two questions are about the same
       construct (so theoretically you can merge them together), and also notice that the total range of the scales for
       both items are 7 points, BUT the scale ranges are along different dimensions. Compositing involves averaging
       items together. If we average together these two items, the resulting average will not be interpretable because
       of the different scale ranges. For example, a “1” on the first item is the lowest possible answer choice, but a
       “1” on the second item is one of the highest possible choices. The solution is to transform both scale ranges
       into a common metric. This is accomplished by first “standardizing” both items. Then, we composite the
       newly transformed items.
     Before we get to how to standardize items, I want to point out why I included in the example a scale that
       ranged from a negative number (-3) to a positive number (3). Sometimes when you are measuring constructs,
       there is a natural mid-point or neutral point, such as with “happiness” where you could have “0” happiness at
                                                                                                                         9
       the moment. In this situation, it can be beneficial to include an answer choice that is neutral or “0”. Notice that
       if we asked the same question but with a 1-7 scale, if you wanted to indicate you are feeling zero happiness at
       the moment, your only answer choice would be a “1”, which you may not feel indicates you absence of
       happiness. Another reason to include a scale that ranges from negative to positive is that your construct also
       ranges from negative to positive. For example, imagine a question that asked about your feelings about the
       death penalty. You could have a negative view or a positive view of the death penalty, so in order to tap that
       construct you need to include in the scale range answer choices that reflect positive and negative. Another way
       to reflect both positive and negative in a scale with the labels. For example, you could ask the same question
       about your feelings toward the death penalty on a 1-7 scale, but have the labels for “1” be strongly oppose, and
       for “4” be neutral, and for “7” be strongly support.
      I also want to point out that standardizing your items to transform items to a common metric is necessary
       when any of the scale ranges differ, not just with negative versus positive items, as in the example above. For
       example, you may ask questions about the death penalty that are so similar that you want to vary the scale
       ranges of the items so that you tap into more information (and also force the subjects to pay more attention to
       the items because all items with the same scale range may allow lazy subjects to answer the same way on
       similar questions without thinking carefully about their answers).
      To Standardize items:
       1. Select Analyze --> Descriptive Statistics --> Descriptives
       2. Move all variables into the “Variable(s)” window.
       3. Put a checkmark next to “Save standardized values as variables”
       4. Click OK.
      The newly standardized variables are listed at the end of the data file. Each standardized variable is listed in a
       separate column. You can then analyze the new standardized variables as you would any other variable in your
       data set, including averaging them together to create a composite.


5. Reverse coding items
    If you are going to composite together multiple items, all the items need to be “in the same direction”. This
      means that indicating a higher (or lower) response each scale must correspond conceptually to answering
      higher (or lower) on the other items you want to composite together.
    For example, lets say we ask two happiness questions: (1) “How happy are you right now?” on a 1-7 scale,
      and (2) “How unhappy you are right now?”, on a 1-7 scale. Notice that the two questions are about the same
      construct (so theoretically you can merge them together), and also notice that the total range of the scales for
      both items are 7 points, BUT that conceptually answer higher (or lower) on one item is the same as answering
      lower (or higher) on the other item. Before we can composite them together, we need to transform all the
      items so that they are “in the same direction”. Thus, we could either reverse code the scale range for the first
      item, or the second item (but not obviously both items). Composites typically contain multiple items, so you
      typically have to reverse code multiple items. Also, when choosing which set of items to reverse code (e.g.,
      either the items that are in the positive direction, or items that are in the negative direction), you should think
      ahead to the statistical analyses you want to conduct and how you want output from those statistical analyses
      (or the relationship between those variables) to be conceptualized. For example, imagine a study testing the
      relationship between happiness and income. If your hypothesis is that more income is correlated with more
      happiness, then conceptually we want our “happiness” composite to code in the positive direction (so that
      higher on the scale means more happiness) so that the outcome is easier to interpret. Notice, that if we code
      the happiness composite in the opposite direction (so that lower means more happiness), we will still get the
      same conceptual outcome as with the positively coded composite -- that more happiness is correlated with
      more income -- but the interpretation of the outcome will be more difficult because we will get a negative
      correlation between the variables (because lower on the happiness scale is more happiness, and more
      happiness is correlated with higher income. Thus, think ahead to your intended results and code all the items
      in the appropriate direction.


                                                                                                                       10
      To reverse code items:
       1. Select Transform --> Recode into different variables
       2. Move one item into the “Input Window”
       3. Type a name for the new variable.
          (I like to use the same name as the original variable, but labeled with “_rev”, such as “system1_rev”)
       4. Click “Changes”
       5. Click “Old and New Values”
       6. Enter the “old” value and the “new value” and click “Add”
          (If reverse coding a 1-7 scale, then old=1, new=7; old=2, new=6; old=3, new=5, and etc.)
       7. Click Continue
       8. Click OK.
      The newly reverse coded variable is listed at the end of the data file.
      Notice that instead of “Recode into different variables”, there is an option for “Recode into same variables”. I
       do not use this function because I like to leave the old variable intact because I like to keep a permanent record
       of each variable, and you may forget you reverse coded it and reverse code it again, and you may make a
       mistake in reverse code that can't be undone if the old variable has been replaced.


6. SYNTAX
    Up this point, we have learned that SPSS has two windows –Data Editor (grid of data) and Viewer (output).
      SPSS has a third window – Syntax.
    What is syntax? When you point-and-click in the Data Editor for SPSS to calculate a mean, or outlier, or
      correlation, or whatever, SPSS is calculating the statistical formulas for those tests. SPSS is basically a big
      calculator that can perform many different calculations. When you point-and-click in the Data Editor for
      SPSS, you are telling SPSS how to perform those calculations, such as include “Kurtosis”, or “exclude cases
      pairwise”, or “run correlations on these three specific variables, and not the other variables”. Another way to
      tell SPSS to perform those same operations is to use programming language. In the syntax window, you can
      type programming language, then hit the “run” button, and SPSS will perform the calculations. This process is
      analogous to how a website designer writes computer code to design a website, but you don’t see the code,
      only the website design. Similarly, the point-and-click functionality in SPSS is analogues to the website
      design you see, and the syntax functionality in SPSS is analogues to the background computer code that you
      typically don’t see.
    Why use syntax? The point-and-click” interface is very easy to use. You don’t need to learn the syntax
      programming language which can sometimes get overwhelming and difficult to understand. However, there
      are some advantages to syntax. For one, you can perform multiple operations easier than with the point-and-
      click interface. For example, in the previous section about reverse coding items, you can only reverse code 1
      item at a time. If you want to reverse code multiple items, you have to repeat the same steps over and over.
      Syntax makes that repetition less time-consuming. I present an example below about reverse coding, but I
      want to point out that you can use syntax for any point-and-click command. For example, for every command
      in SPSS, instead of clicking “OK” as the last step, you can click “PASTE” instead as the last step, and it will
      display the syntax.
    To reverse code items:
      1. Select Transform --> Recode into different variables
      2. Move one item into the “Input Window”
      3. Type a name for the new variable.
         (I like to use the same name as the original variable, but labeled with “_rev”, such as “system1_rev”)
      4. Click “Changes”
      5. Click “Old and New Values”
      6. Enter the “old” value and the “new value” and click “Add”
         (If reverse coding a 1-7 scale, then old=1, new=7; old=2, new=6; old=3, new=5, and etc.)
      7. Click Continue
      8. Click PASTE
                                                                                                                      11
      Notice that the last action is “PASTE”, not OK.
      The syntax window will open, and the command you just initiated is displayed using syntax code.
      I have pasted below the syntax for our example. “RECODE” is the command to perform. Notice that the old
       variable name and new variable name are in the command line. Notice that it ends with “EXECUTE.”. If we
       wanted to “run” this command, we would highlight the entire syntax, and click the arrow button: ►
                    RECODE system1 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) INTO system1_rev.
                    EXECUTE.

      We are using this example to show how using syntax can speed up repetitive actions. So if we copy/paste the
       syntax over and over, we can then type in the other variables we need reverse code. Then, we highlight all the
       syntax, and click the arrow button to run the syntax.
                   RECODE system1 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) INTO system1_rev.
                   EXECUTE.
                   RECODE system2 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) INTO system2_rev.
                   EXECUTE.
                   RECODE system3 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) INTO system3_rev.
                   EXECUTE.
                   RECODE system4 (1=7) (2=6) (3=5) (4=4) (5=3) (6=2) (7=1) INTO system4_rev.
                   EXECUTE.

      Another way to use syntax is to keep a record of your statistical analyses because the syntax indicates not only
       which statistical analyses was performed, but it also provides a record of how you performed those statistical
       analyses and which options you chose to use. The Output window provides that record by displaying the
       syntax for every analyses that is conducted.


6. Transforming continuous variables into categorical variables
              (and categorical variables into different categorical variables)
    It is possible to transform continuous variables into categorical variables. For example, imagine a study about
      happiness where your happiness item (or composite) ranges from 1 to 7. You might be interested in
      categorizing the subjects as either high happiness (4 through 7 on the scale) or low happiness (1 through 4 on
      the scale). This is called “dichotomizing” the variable because you are creating a new variable that has only
      two options.
    Another example of why you would want to transform a continuous variable into a categorical variable is if
      there are only a few responses on some of the answer choices in the continuous variable. For example,
      imagine a scale range from 1-11 in which answer choice 4 and/or answer choice 9 received only 1 response
      each. 1 response is not enough data for meaningful interpretation. You may want to collapse the 11 point scale
      into 3 or 4 categories. As another example, look at the “rel_category” in our dataset which measures the
      religious category memberships of the subjects. The frequency distribution is listed on the next page. Hindu
      received only 6 responses, and Jewish received only 9 responses. You may want to merge those responses into
      “other” and/or merge all the data into “Christian” versus “other”. Notice that creating the new categorical
      variable is answering a different research question than the original categorical variable.




                                                                                                                    12
   Transforming variables in this way uses the same SPSS command as for reverse coding items.
   To transform the variables:
    1. Select Transform --> Recode into different variables
    2. Move one item into the “Input Window”
    3. Type a name for the new variable.
       (I like to use the same name as the original variable, but labeled with “_cat”, such as “system1_cat”)
    4. Click “Changes”
    5. Click “Old and New Values”
    6. Click “Range” and enter the range of values of the “old” variable, and assign a number for new variable.
       (e.g., 1-3.999 become a “1”, and 4.0001-7 becomes a “2”)
    7. Click Continue
    8. Click OK.
   The newly transformed variable is listed at the end of the data file. I would suggest then going into the
    “Variable View” and assigning value labels in the “Values” column that reflect how you cut the variable. For
    example, if you just created a new categorical variable where 1-3.999 become a “1”, and 4.0001-7 becomes a
    “2”, then assign 1=1-3.999, and 2=4.001-7. Thus, you keep a record of what the “1” and “2” means.
   How do I decide where to split up the variable? This is a complex question with a complex answer:
   If you are dichotomizing a variable, you can split at the midpoint of the scale from a theoretical point of view
    because that is conceptually the middle response. Plus, sometimes you choose to use an odd scale range
    because that is designed to have a true mid-point. However, what if in your dataset there are more subjects in
    the high or low end of the scale. Splitting at the mid-point of the scale might create a vastly unequal
    distribution when you dichotomize the variable. What if, for example, splitting at the midpoint of the scale has
    70-80% of the subjects in one end, and 10-20% in the other. You are already losing valuable information by
    reducing from a continuous variable to a categorical variable, and if you have unbalanced categories, you are
    losing even more information. In this case, you could choose to split at the median, even if the median is not
    the midpoint of the scale. From a theoretical point of view, the median is a good choice for splitting the
    variable because it is the mid-point of that sample. Samples are not always normally distributed. Research is
    about discovering empirical reality, so sometimes reality dictates how subjects respond to the question, and
    maybe assuming the midpoint of the scale is the true midpoint of the construct is inaccurate. Plus, from a
    statistical point of view, the median truly splits the sample into halves. However, what if in your dataset the
    median is a very high or low number on the scale range. For example, what if on a 1-7 point scale, the median
    is a 2 or a 6. In this situation half of the scores are bunched into a small range (e.g., 2 points in this example),
    whereas the other half are more evenly distributed across a larger range (e.g., 5 point in this example). Once
    again, you are losing valuable information by dichotomizing in this way. In summary, theoretical and
    statistical considerations when dichotomizing variables. One solution is to dichotomize in both ways and
    analyze the data using both variables.
   The same theoretical and practical considerations come into play when you are deciding to split the variables
    in other ways. You may decide, for example, to cut the continuous variable in thirds, or fourth, or fifths.
    Sometimes when you cut the variable into thirds, your new categorical variable only includes the top and
    bottom third. Sometimes you are only interested in the more polarized decisions. Sometimes you can
    strengthen the relationship between your variables by only including the polarized judgments. From a
    theoretical point of view it can make sense to drop the middle third because they are the subjects who are
    somewhat undecided about the construct. Plus, think about why dichotomizing continuous variables results in
    reduced information and reduced statistical power. Subjects in the continuous variable who are near the
    middle are now the same as subjects near the top/bottom after you dichotomize the variable. In a 100 point
    scale for example, the subjects who respond 49 and 51 are treated the same as the subjects who respond 0 and
    100, respectively. Thus, you are reducing your ability to detect true relationships in the study because the
    subjects close to the middle may be masking relationships amongst your variables by diluting the strength of
    the high/low categories in the variable. Eliminating the middle third when you cut the continuous variable in
    thirds is one way to create a categorical variable while minimizing your loss of power.

                                                                                                                     13
      From a practical point of view, if you are dichotomizing a variable, you don’t truly cut it in half because if you
       cut a 1-7 point scale from 1-4 and 4-7, for example, a subject who answered “4” is technically in both
       categories. Thus, when you use the typically create a small degree of separation, such as 1-3.999 and 4.001-7.
      When splitting a continuous identification variable into two groups, another question is whether you want to
       have equal N size for just that variable, or have equal N across that variable AND another variable. For
       example, I conducted a study about how republicans and democrats identify with their political party. Lets say
       I want to dichotomize my measure of “identification”. When splitting the continuous identification variable
       into two groups, the question is whether you want to have equal N size for just the identification variable, or
       have equal N across both identification and the republican v. democrat variable. For example, if you split the
       identification variable down the middle, you might have many more republicans in the low or high
       identification condition, and vice versa for democrats. On the other hand, you could split the identification
       variable separately for republicans and then again democrats, and then combine together, so that way you have
       equal N across both variables. I believe both are defensible options to choose. My opinion is the first option is
       the best (grand median or midpoint) because then the high and low groups will have equivalent psychological
       meaning across party affiliation. In other words, “high” and “low” mean the same thing for both republicans
       and democrats even if cell size is unequal.


6. Creating new variables based upon the combination of two or more variables
    Sometimes you want to create a new variable that is a combination of two or more other variables. For
      example, I conducted a study about how republicans and democrats identify with their political party. For each
      subject, I asked what is their political party affiliation and how much they identify with that political party.
      Lets say I want a new variable of only highly identified republicans but lowly identified democrats. In this
      case I want to create a new variable that is a combination of my two questions. Here is another example:
      Assume that when I asked my first question about political party affiliation, there were four options –
      Republican, Democrat, other, none. If I wanted to create a new variable of only highly identified republicans
      and democrats, I can’t simply cut the “identification” question in half because the top half will contain more
      than just democrats and republican, it will contain those who responded “none” or “other. In this situation, I
      need a way to create a new variable that takes into account different option choices.
    How do I create a new variable based upon the combination of two or more variables? Below, I explain the
      steps for using the “Compute variable” command. However, I want to first explain conceptually what the task
      entails. In essence, we are going to tell SPSS to create a new variable that is labeled as “1” if it satisfies certain
      criteria (such as high on variable 1, but low on variable 2), and then labeled as “2” if it satisfies other criteria
      (such as high on variable 2, but low on variable 1). In other words, we can specify a long combination of
      criteria, and have subjects who meet that criteria labeled as 1 or 2 (or 3 or 4, depending on how many
      categories you want in your new variable). As an example, we could create a new variable that has subjects
      listed as “1” if they are republican and high identifiers, and subjects listed as “2” if they are democrat and high
      identifiers. Thus, the new categorical variable will have two categories that are a combination of my two
      questions about political party affiliation and identification with that political party. You create each new
      category separately. Thus, in our example about creating a new variable that contains only highly identified
      republicans and highly identified democrats, we first use the “Compute variable” command to create “1” if
      highly identified republicans. Then, we repeat the process by using the “Compute variable” command to
      assign a “2” if highly identified democrats.
    To transform the variables:
      1. Select Transform --> Compute Variable
      2. Type a new name for your new variable in the “Target Variable” box.
      3. In the “Numeric Expression” box, type the number of a category
         (e.g., Let’s start by assigning category “1”)
      4. Click the “If” button, and click “Include if case satisfied condition”.
      5. Move the old variable into the open box, and specify the restriction.
         (e.g., if identification if “identify” variable, and political party affiliation was “party” variable, then I need
      to specify only those subjects who are highly identified (e.g., greater than 4 on the “identify” variable) and
                                                                                                                           14
    who are simultaneously republicans (e.g., republicans are labeled “1” on the “party” variable). So, I would
    type the following into the box -- identify>4 & party=1.
    7. Click Continue
    8. Click OK.
   THEN WE REPEAT TO CREATE THE SECOND CATEGORY
    1. Select Transform --> Compute Variable
    2. Type a SAME name in the “Target Variable” box as you did the first time.
    3. In the “Numeric Expression” box, type the number of a category
       (e.g., “2”)
    4. Click the “If” button, and click “Include if case satisfied condition”.
    5. Move the old variable into the open box, and specify the restriction.
       (e.g., if identification if “identify” variable, and political party affiliation was “party” variable, then I need
    to specify only those subjects who are highly identified (e.g., greater than 4 on the “identify” variable) and
    who are simultaneously DEMOCRATS (e.g., democrats are labeled “2” on the “party” variable). So, I would
    type the following into the box -- identify>4 & party=2.
    7. Click Continue
    8. Click OK.
   To summary, the “Numeric Expression” box is the number we want to assign in the new category (1 or 2)
    And, the criteria for who is assigned that number is specified in the “If” box.
    And, the name of the new categorical variable was labeled in the “Target Variable box
   The new variable will appear at the end of the data file.




                                                                                                                        15

								
To top