Introduction to SPSS 15 and Exemplar Datasets Tuesday 19 by efi17708

VIEWS: 17 PAGES: 20

									                                                     UNICEF Workshop on Global Study
                                                             18th to 28th August 2008

          Introduction to SPSS 15 and Exemplar Datasets

                            Tuesday 19 August 2008

This session covers:
   •   A brief introduction to SPSS
   •   An introduction to DHS dataset
   •   Transforming, recoding and labeling variables
   •   Running frequencies and cross-tabulations


   -   Open the SPSS sample data file ‘Ghana sample data – session1’. You can
       download this data from course website at:
       www.southampton.ac.uk/socsci/ghp3/course/material.html
       Note: that SPSS has different conventions for naming file extensions (the words
       after the dot). A working datafile is labeled ‘.sav’, output or results is labeled ‘.spo’
       and syntax ‘.sps’.


SPSS file types – database file (.sav)




                                                              Database file is like an
                                                              Excel file

                                                              Data are inputted &
                                                              managed in the database
                                                              file

                                                              Two types of view – ‘data
                                                              view’ and ‘variable view’

                                                              This is the ‘Data view’




Centre for Global Health, Population, Poverty and Policy (GHP3)                               1
                                                UNICEF Workshop on Global Study
                                                        18th to 28th August 2008
Variable View – database file (.sav)




Variable characteristics are inputted and managed in the variable view


SPSS File Types – Output files (.spo)




                                                                  Results (and errors) are
                                                                  displayed in the output files

                                                                  Tables can be copied from
                                                                  here into Excel or Powerpoint
                                                                  to create charts




Centre for Global Health, Population, Poverty and Policy (GHP3)                         2
                                                 UNICEF Workshop on Global Study
                                                         18th to 28th August 2008
SPSS File Types – Syntax Files (.sps)


                                                           Commands can be written or pasted
                                                           in Syntax files

                                                           This allows functions to be repeated
                                                           or amended much more quickly

                                                           They also provide a history of the
                                                           analyses which is useful when lots of
                                                           analyses are being performed or a
                                                           long break is taken between analyses




The DHS Datasets
DHS Datasets: Registration is required (through MEASURE DHS at
www.measuredhs.com) for access to DHS data. Most survey datasets are available in
ASCII as well as SPSS, SAS and STATA system formats for:
       Individual (women) file
       Household file
       Male recode
       Couple recode
       Births /child recode
       Household member recode
       Other special datasets include GIS and HIV for selected countries
To view available datasets:
http://www.measuredhs.com/accesssurveys/search/start.cfm


Guide to DHS Statistics: A reference guide is available to help users who work with
DHS survey indicators and datasets to better understand indicator definitions and the
calculations used to generate the data.
http://www.measuredhs.com/pubs/pdf/DHSG1/Guide_DHS_Statistics.pdf




Centre for Global Health, Population, Poverty and Policy (GHP3)                         3
                                                   UNICEF Workshop on Global Study
                                                           18th to 28th August 2008
DHS recode manual: DHS datasets contain standardized data variables. The DHS
Recode Manual provides a reference guide to understand and work with these
variables: The recode manuals are available at:
http://www.measuredhs.com/pubs/pub_details.cfm?ID=739




This session
This introductory practice session is based on an extract from Ghana 2003 Individual
(woman-level) recode DHS data file. We shall use the sample data to practice how to
transform/recode and label variables and carry out descriptive analysis involving
frequency distributions and cross tabulation analysis. (Note that later computing
sessions will be based on the child-level file with child well-being/poverty indicators
such as malnutrition, etc)


Sample research questions for this session:
   • What proportion of women in Ghana had a given number of births in the past five
      years before the 2003 Ghana DHS?
   • What is the distribution of the number of births in the past five years by household
      wealth index?
   • How is household wealth related to child mortality experience?


The analysis will be carried out in the following steps:
   a) First, obtain the frequency distribution of the number of births in the past five
       years. This will be useful for determining how best to classify the number of
       births for subsequent cross-tabulation analysis
   b) Then, recode the number of births into suitable categories and label new variable
       and categories
   c) Finally, obtain cross-tabulations of the number of births in the past five years and
       child mortality experience by wealth index to establish:
           (i) the distribution of the number of births in the past five years by
               household wealth index; and
           (ii) the relationship between household wealth and child mortality
               experience



Centre for Global Health, Population, Poverty and Policy (GHP3)                           4
                                                UNICEF Workshop on Global Study
                                                        18th to 28th August 2008


(a) Frequency distribution of births in the past five years


                                                             Left click on the ‘Analyze’
                                                             tab on the Toolbar. This
                                                             will reveal a list of possible
                                                             commands...


                                                              •   Rest the cursor on the
                                                                  ‘Descriptive Statistics’
                                                                  option.

                                                              •   This will bring up a
                                                                  smaller list of options.
                                                                  click on ‘Frequencies’




                                                                  This will bring up the
                                                                  ‘Frequencies’ dialogue box.
                                                                  Select the variable you wish
                                                                  to explore from the list in the
                                                                  left had box.

                                                                  (e.g select ‘v208’)




                                                                   Then click the arrow in
                                                                   the middle to move the
                                                                   variable into the
                                                                   ‘Variable(s)’ box on the
                                                                   right hand side.
        Select your variable
        from this list




Centre for Global Health, Population, Poverty and Policy (GHP3)                        5
                                                    UNICEF Workshop on Global Study
                                                            18th to 28th August 2008


Frequency distribution




                                                                           Finally click OK to run
                                                                           the analyses

                                                                           OR click Paste to copy
                                                                           the syntax into a syntax
                                                                           file.




SPSS Output

                    Births in last five years

                                                                     Cumulative
                    Frequency        Percent   Valid Percent          Percent
 Valid    0              2914             51.2           51.2                51.2
          1              1812             31.8          31.8                83.0
          2               875             15.4           15.4               98.4
          3                79              1.4            1.4               99.8
          4                10               .2             .2              100.0
          5                  1              .0             .0              100.0
          Total          5691           100.0          100.0


Note: Only a few women had more than two births in the past five years. Hence, we will
classify the number of births into three categories: ‘none’, ‘one’ and ‘2 or more’ for
meaningful cross-tabulation analysis.




Centre for Global Health, Population, Poverty and Policy (GHP3)                            6
                                                    UNICEF Workshop on Global Study
                                                            18th to 28th August 2008
(b) Transform and recode variables

Follow the steps described below to create a new variable labeled ‘births5’ with three
categories - ‘none’, ‘one’ or ‘two or more’ births in the past five years. This new variable
will have three legitimate values.



Recoding variables




                                                                          click on ‘transform’


                                                                         Scroll down to ‘recode
                                                                         onto different variables’




 • Ask yourself what would happen if you had selected ‘recode into the same variable’
 • Advice ‘play safe’ when recoding. Always archive the original data and check
    frequencies before and after recoding




Centre for Global Health, Population, Poverty and Policy (GHP3)                                7
                                                UNICEF Workshop on Global Study
                                                        18th to 28th August 2008
   Recoding variables



                                                                  Click over the variable you
                                                                  wish to recode in the variable
                                                                  list on the left hand side -
                                                                  Here ‘v208’.


                                                                  Then click on the arrow
                                                                  button to transfer ‘v208’ to
                                                                  the ‘input variable’ box.




                                                                     Type in a name for the
                                                                     new variable in the output
                                                                     variable window ‘births5’

                                                                     Then label the output
                                                                     variable i.e ‘births in past
                                                                     5 years recoded’




                                                                      After entering output
                                                                      variable and labeling it click
                                                                      on ‘change’

                                                                      Then right click on
                                                                      ‘old and new values’




Centre for Global Health, Population, Poverty and Policy (GHP3)                       8
                                                UNICEF Workshop on Global Study
                                                        18th to 28th August 2008

 More on transform and recode

                                                                  On left hand side of screen
                                                                  are the
                                                                  ‘old values’ and
                                                                  ‘new values’ on right hand
                                                                  side. Type in similar
                                                                  old/new values for ‘0’ and
                                                                  ‘1’

                                                                  For last category, click on
                                                                  range and enter
                                                                  2 through 5

                                                                  Then click on the ‘new
                                                                  value’ button (right hand
                                                                  side of screen) and
                                                                  enter ‘2’

                                                                  ‘then click on ‘add




                                                                  Instruction gets pasted in
                                                                  Old->New window
        Then click on ‘continue’




                                                                             Click on
                                                                              ‘ok’ or
                                                                             ‘paste’ to
                                                                             create the
                                                                             new variable




Centre for Global Health, Population, Poverty and Policy (GHP3)                     9
                                                UNICEF Workshop on Global Study
                                                        18th to 28th August 2008


Defining value labels for newly created variable




                                                                      Go to ‘variable view’




                                                                      Click on the box with ‘3
                                                                      dots’ to define value
                                                                      labels for newly created
                                                                      variable ‘births5’




                                                                  Type the values and
                                                                  labels for each of the 3
                                                                  categories,
                                                                  (e.g. label ‘2’ as ‘two or
                                                                  more’ after labeling ‘0’
                                                                  and ‘1’)

                                                                  clicking on ‘add’, to
                                                                  transfer each label to
                                                                  the box

                                                                  Finally, click on
                                                                  ‘ok’ on the right hand
                                                                  side to create the labels




Centre for Global Health, Population, Poverty and Policy (GHP3)                       10
                                                   UNICEF Workshop on Global Study
                                                           18th to 28th August 2008
Checking your creation

•   Go to your data view window scroll across to the end column
•   Additionally go to your variable view window and check the last variable listed
•   Run frequency distributions for ‘v208’ and ‘births5’
•   Do they agree ?


We want to see if the number of births in the past five years vary by wealth index. We are
going to use the ‘crosstabs’ command to obtain a cross tabulation of ‘births5’ by wealth
index (v190). The first variable defines the number of rows in the table and the second
variable defines the number of columns


Here are the commands: Analyze – Decriptives - Crosstabs
(This takes you to the crosstabs window):
    -   Transfer ‘births5’ to rows window and ‘v190’ to columns window.
    -   Click on the cells button to define ‘row percentages’, then continue
    -   Ignore the other buttons for the time being (exact, statistics etc.) and produce the
        output by clicking ok or paste then run syntax.


What can you say about the relationship between the number of births in the past five
years and household wealth?


Task: Carry out a cross tabulation analysis to explore the relationship between wealth
index and experience of child deaths. (Note: it is advisable to recode the number of
children who have died ‘deaths’ into appropriate categories before running the cross-
tabulation).


NB: The table of percentages could be copied onto Excel to produce graphs



MICS Data
The next part of the workshop will use the other exemplar dataset which will be utilized
throughout the workshop. This is the Multiple Indicator Cluster Survey (MICS3) survey
from Tajikistan.




Centre for Global Health, Population, Poverty and Policy (GHP3)                            11
                                                   UNICEF Workshop on Global Study
                                                           18th to 28th August 2008
Please download the Tajik MICS file from the course website at
www.southampton.ac.uk/socsci/ghp3/course/material.html

The MICS is a household survey programme developed by UNICEF to assist countries in
filling data gaps for monitoring the situation of children and women. It is capable of
producing statistically sound, internationally comparable estimates of these indicators.
In the latest round of the MICS (MICS3) data on 21 of the 48 Millennium Development
Goal targets, and thus the MICS is the largest single source of data for MDG monitoring.


The MICS survey consists of three questionnaires – a household questionnaire, a
questionnaire for women aged 15 to 49 and a questionnaire for children under the age of
5.


The data that you have for Tajikistan is from 2005. The dataset is a combination of
different standard files, with a row for each person living in the sampled households.


The aim of this section of the workshop is to give you further experience of working with
SPSS and for you to become acquainted with the MICS data from Tajikistan.


Filtering and Weighting Data
The MICS dataset that you have opened contains one line for each person in the selected
household. However, this may lead to problems with the analysis and interpretation.
This section will explain how to analyse data when it is organized in this manner.


One of the variables included in the dataset is the ‘Mother Tongue of the Head’. Obtain
the frequencies for each of the different languages (Hint: Go to Analyze – Descriptive
Statistics – Frequencies). How many people live in the selected households?


You will see that that the table indicates that 28,053 people, or 69.5%, have a head of
household that speaks Tajik. However, this is incorrect! There is only one head of each
household, but the dataset includes a line for each of the residents of the house with the
mother tongue of the head replicated in each one. Thus families where there are many
people living in the household will be disproportionately represented in these results.
For example, take two households. The first has one person resident, so counts once in



Centre for Global Health, Population, Poverty and Policy (GHP3)                           12
                                                  UNICEF Workshop on Global Study
                                                          18th to 28th August 2008
the table. The second household has 10 people living there, each stating the mother
tongue. Each of these 10 residents is counted in the table, even though there is only one
head of the household. Therefore we need to filter the data so only one person in each
household is represented in the household.


To do this we will use ‘Line Number’. This is a number given to each member of the
household. All households therefore have a person who is designated as line number ‘1’.


                                                             Click on Data




                                                             Then click on Select Cases…




You should get the following box:




Centre for Global Health, Population, Poverty and Policy (GHP3)                          13
                                                  UNICEF Workshop on Global Study
                                                          18th to 28th August 2008




                                                                        Click on the
                                                                        radio button by
                                                                        the side of ‘If
                                                                        condition is
                                                                        satisfied’


                                                                        Then click on
                                                                        If…




The following box will appear. We need to select those records which have a line number
of 1. Do this by completing the box in the following way:




                                                                                          Find line
                                                                                          number on the
                                                                                          left and click
                                                                                          the ► to move
                                                                                          it to the right
                                                                                          hand side




Centre for Global Health, Population, Poverty and Policy (GHP3)                           14
                                                     UNICEF Workshop on Global Study
                                                             18th to 28th August 2008

Click            and then           .


Re-run your frequencies of ‘Mother Tongue of Head’. What do you notice? What changes
have happened?


However, these are not the final results either. Due to how the households were selected
into the sample, not every household had an equal chance of selection. To counteract
this, survey weights should be applied to the analysis. The survey weights for households
(i.e. only to be used when we are dealing with the households) is called hhweight.


Firstly it is useful to study this variable further. Go to Analyze – Descriptive Statistics –
Descriptives




                                                                    Find hhweight in the box
                                                                    on the left and transfer it
                                                                    to the right using the ►
                                                                    button




Click          and study the results in the Output window.


The Descriptives command gives, the minimum, maximum, mean and standard
deviation of the variable selected (you can change these options using the ‘Options…’
button.


You will see that the mean for this variable is ‘1.000’. This is exactly what we need – the
mean of the weight variable should usually be 1. So now we should apply this weight to
work out the percentage of households with different languages for the head of the
household.



Centre for Global Health, Population, Poverty and Policy (GHP3)                             15
                                                  UNICEF Workshop on Global Study
                                                          18th to 28th August 2008




                                                              Click on Data




                                                              Then click on Weight
                                                              Cases…




In the next dialogue box click on ‘Weight cases by’. Then find hhweight in the left hand
box and transfer to the right hand side using the ► button.




Click        .


You will notice on the bottom right hand side of the data window that the following is

displayed:                       . You should get into the habit of checking this area to
ensure you do have a filter/weight on when you need it, and that you don’t when you
don’t!


Obtain the frequencies for the Mother Tongue of the Household Head again. What has
changed?




Centre for Global Health, Population, Poverty and Policy (GHP3)                             16
                                                     UNICEF Workshop on Global Study
                                                             18th to 28th August 2008
Now the percentage of households with a Tajik speaker as the head is 71.1%. This has
changed from the original 69.5% before filtering and weighting. This is not a big change,
but may be important in final analyses. If possible, you should always weight your
analyses.


Cross-Tabulations and Obtaining Proportions
You have conducted cross-tabulations previously on the DHS dataset. In this section you
will conduct some more cross-tabulations, but this time in order to get proportions
within different categories.


You have already been introduced to wealth quintiles. For this exercise you will see
within each wealth quintile the ownership of certain assets. For example, what is the
proportion of those who are in the poorest wealth quintile who own a phone, or a
refrigerator, or an animal drawn cart, and how does this compare to the richest quintile?


To conduct the cross-tabulation:
   -   Go to Analyze – Descriptive Statistics – Crosstabs…
   -   Place ‘Wealth Index Quintiles’ into the Row(s) box (N.B. This variable is at the
       bottom of the list of variables!)
   -   Find the variables ‘Mobile Phone’, ‘Non-Mobile Phone’, ‘Refrigerator’ and
       ‘Animal Drawn Cart’ and place into the Column(s) box in the usual manner.

   -   Click on                and the following box will be presented:


                                               Click on ‘Rows’. This will give us the row
                                               percentages, which we want!


                                               Click Continue, and then OK.




Centre for Global Health, Population, Poverty and Policy (GHP3)                             17
                                                    UNICEF Workshop on Global Study
                                                            18th to 28th August 2008
In the output window there should be four tables. Look through each of them to see the
percentage of households in each wealth quintile with the different assets.


You will see that few households in the poorest quintiles have mobile phones or non-
mobile phones, and it is only in the richest quintile that there are households that own
them in any great number – the disparities between rich and poor are great. Refrigerator
ownership is poor in the bottom quintiles, but the middle quintile is starting to own this
asset. The animal-drawn cart shows the opposite effect, with the rich hardly owning this
asset, yet almost a quarter (24%) of the poor have this asset.


There is a different method to calculate these proportions/percentages. This involves
recoding the data. This may seem irrelevant to do at the moment, but the skills involved
in doing this are important in future workshops, even though we will get the same
answers as above!


First of all, we need to recode the asset variables so that the variables are coded 0 (for
non-ownership of the asset) and 1 (for ownership). At the moment they are not coded in
this way – those that do not own the asset are coded as a ‘2’.


   -   Go to Transform – Recode – Into Same Variables
   -   Please note that as stated above it is standard to recode into different variables so
       that the original is not lost. However, in this case it is fine to recode into the same
       variable.




                                                                               Transfer the
                                                                               four assets
                                                                               needing to
                                                                               be recoded
                                                                               into the right
                                                                               hand box




Centre for Global Health, Population, Poverty and Policy (GHP3)                              18
                                                      UNICEF Workshop on Global Study
                                                              18th to 28th August 2008
   -   Click on old and new values




   -   All those that do not have the asset (currently coded as a ‘2’) need to be recoded

       as a ‘0’. Fill in the box as above and click           and then
   -   The four assets will now be coded as ‘0’ and ‘1’. Check this out using the
       Frequencies command.


Now we can work out the Mean of each of these variables. To do this we will use the
Means command in SPSS.
   -   Go to Analyze – Compare Means – Means…
                                                                         Transfer the four
                                                                         assets to the
                                                                         ‘Dependent List’
                                                                         box


                                                                         Put the Wealth
                                                                         Index to the
                                                                         Independent list
                                                                         Click OK



Centre for Global Health, Population, Poverty and Policy (GHP3)                              19
                                                      UNICEF Workshop on Global Study
                                                              18th to 28th August 2008
You will have one table with all the variables in it. The Mean row for each wealth
quintile should be the same as seen in the cross-tabulations above. Using this method it
is simple to compare between quintiles and between assets – something which was more
difficult to do using the crosstabs command.


Exercises
A number of participants have experience of SPSS already, and thus the worksheet above
will be elementary. The exercises below are designed to both ensure the techniques
above are practices and to give those with more experience of SPSS the chance to apply
their skills.


The three exercises all focus on wealth quintiles and analysing differentials between
wealth quintiles
    1. Study the relationship between wealth and construction type – floors, roof and
        walls – by household. What trends do you observe?
                •   Remember to select the correct sample - only one person from each
                    household is needed in the analysis
                •   Remember to weight the data
    2. For this exercise, use all adults aged over 21. What is the relationship between
        ‘Highest Level of School Attended’ and wealth?
                •   You will need to recode Highest Level of School Attended’. Recode those
                    missing into one category, and combine the other categories into
                    ‘Primary’, ‘Secondary’ and ‘Higher’. Therefore you should have four
                    categories.
                •   There is no weight for household members, so do not use one in this
                    analysis.
    3. The final exercise studied women. Again, what is the relationship between wealth
        and highest level of school?
                •   Select the women in the survey who completed the women’s interview
                •   Recode education in the same way as in question 2.
                •   Weight the data using ‘wmweight’. Check that you have selected the
                    correct sample before you apply this weight.


Monica A. Magadi, Andrew ‘Amos’ Channon, August 2008.


Centre for Global Health, Population, Poverty and Policy (GHP3)                           20

								
To top