Document Sample
workbook Powered By Docstoc
					   Introduction to Housing and the Local Environment Using SPSS 16

There are four sections in this worksheet. Work through the sections that interest
you, bearing in mind your level of experience with using large-scale survey
documentation and data, Nesstar and SPSS 16. See below for recommendations
about which sections to do.

    1. Exploring documentation online
           -Uses the English Housing Survey (EHS) 2008-2009 household data
           -For those unfamiliar with the EHS documentation

    2. Using Nesstar to explore data online
           -Uses the Scottish Social Attitudes Survey 2009
           -For those unfamiliar with Nesstar

    3. Working with data using SPSS 16
          -Uses the EHS 2008-2009 household data
          -Involves: weighting the data, running frequency tables and cross-tabs,
          and recoding data.
          -Suitable for basic SPSS users.

    4. Linking EHS datasets in SPSS 16
           -Uses the EHS 2008-2009 household data
           -Suitable for those with some knowledge of SPSS (or who have done
           Section 3 of this worksheet).

The English Housing Survey
The English Housing Survey (EHS) is a continuous national survey commissioned by
the Department for Communities and Local Government (DCLG) from 2008 onwards
that collects information about people's housing circumstances and the condition
and energy efficiency of housing in England. It replaces the Survey of English
Housing (SEH) and the English House Condition Survey (EHCS).

We use documentation and data from the English Housing Survey 2008-2009,
household data in Sections 1, 3 and 4 of this worksheet.

The Scottish Social Attitudes Survey
The Scottish Social Attitudes (SSA) survey has been designed as an annual Scottish
sister survey to the British Social Attitudes survey. Like the British Social Attitudes
series, the survey aims to chart and interpret attitudes on a range of social, political,
economic and moral issues.

Section 2 of this worksheet uses the Scottish Social Attitudes Survey 2009.

1. Exploring documentation online

This section examines the documentation and other useful features on the ESDS
Government website for the English Housing Survey 2008-2009.

In order to be a critical researcher it is important to understand where the data
come from. The documentation that comes with the dataset will help a little, but it
will also help to look at how the original dataset was created. Each ESDS dataset has
a UK Data Archive catalogue entry and may also have a fuller web page. Through
these, you can access data and the documentation.

1.1 Finding the web-pages on the ESDS site
Start by opening the main ESDS page ( The easiest way to access
the pages about ESDS Government data from this page is to follow the major studies
link on the left of the page – see below:

This leads you to a list of the main ESDS Government datasets. Choose the link for
the data you are interested in. Here, the English Housing Survey. Follow the link for
the English Housing Survey 2008-2009: Household Data – see below:

1.2 Survey summary information and documentation page
You should open a page that looks like this:


ESDS provides a page of basic information for each survey with the same format.
The information on this page includes the Principal Investigator(s), Main Topics,
Coverage and Methodology as well as lots of other information. Scroll down the EHS
2008-2009 page to see what is available.

At the bottom of the page there are links to user guide(s) and/or questionnaire(s)
and other information provided by the data supplier. These give more detailed
information you need to understand the data: for example about each of the
variables, the questionnaires used, how the sampling was done and which weights to

If you are registered with ESDS, you can click on the link                near the
top on the right of this page to download the EHS 2008-2009 (but don’t do this now
as we will provide you with the data to use in Sections 3 and 4).

1.3 Information about survey series and variable search and other useful links
In addition, surveys supported by ESDS Government - the large-scale government-
commissioned surveys - have a survey series page with links that give advice about
FAQs (Frequently Asked Questions) for that survey and Starting Analysis, links to
publications using that survey and a Variables Search facility.

From the EHS 2008-2009 page you are on now, scroll to the top and follow the link
to the ESDS Government English Housing web-pages in the blue box on the right of
the page. You should reach this page:

Note the useful links on the left including:
    Datasets: links to the EHS datasets
    Starting Analysis: advice on using the EHS for analysis
    Variables Search: allows users to do a word search for variables in all EHS
       datasets or in all the ESDS Government datasets.
    FAQs: Frequently asked questions about the EHS

To move back to the EHS 2008-2009 page, either click the back button in your
browser or you can use the Datasets link.

1.4 Exercise
Now using the pages you have just looked at, find the following information about
the EHS 2008-2009 household survey:

Look at the EHS 2008-2009: Household data page:
Who are the Principal Investigator(s) and Data Collector(s) for this survey?

What geographical area variables are included in the data? (look under Coverage)

What was the sample size in 2008-2009 for the household data (look at the
Methodology section)?

Go to the documentation at the bottom of this page, open the user guide and search
it – you can do a word search either using control-F or by using the search box
provided. Then:

How many questions are there about tenure? What are the names of the variables?
And for household composition?

Search for ‘derived file’ and read what the user guide says about these files. What
are the names of the two files which contain the most commonly used derived

What is the name of the household weight variable?

Look at the general EHS survey series page.
What do the FAQs say about how data about properties are collected?

Use the variables search link on the left of the webpage (and click the EHS box by the
search box to search only the EHS):
Are there any questions in the EHS about: (At the start of each search, click on the
internet back arrow to go to the search page that allows you to search only the EHS)
- internet use in the household?
- toilets?
- disability?

2. Using Nesstar to explore data online

This worksheet demonstrates Nesstari which is an online data discovery and
exploration tool used by a number of data archives including the UK Data

2.1 Accessing the survey data online (freely without registering)
Nesstar allows you to explore the data online before registering. It is used to provide
access to some datasets but not the English Housing Survey. This section uses the
data from the Scottish Social Attitudes (SSA) Survey 2009.

First find the ESDS web-pages for the Scottish Social Attitudes Surveys from the ESDS
home page ( by following the Major Studies link on the left of the
page and finding the Scottish Social Attitudes Survey in the list of Government
Surveys. This should lead you to the list of SSA surveys.

Open Nesstar by clicking on the Nesstar icon                   found either from the
webpage with the list of Scottish Social Attitudes Surveys (click on the link by the SSA
2009) or at the top right of the SSA 2009 survey webpage shown below:

Survey list:
Survey page:

When Nesstar opens, you should see the following page:

The left window contains a list of surveys, and the right window contains general
information about the Scottish Social Attitudes Survey.

2.2 Variable information in Nesstar

In the left window, if you click on the      signs, you open a new list. For example, if
you click on the       by Variable Description under the Scottish Social Attitudes
Survey, 2009, you will see a list of variables categories for the SSA 2009.

Click on Sustainable Places to find a list of statements or questions.

Click on the first in the list: ‘Things people might think might make somewhere a
good place to live: choose one Q235’. You will see information about that variable in
the right hand window as shown below:

2.3 Further things to do in Nesstar once registered with ESDS
Nesstar can run cross-tabulations of variables and conduct simple analyses. To use
these features, you must be registered with ESDS. For this workshop, we have
provided you with a temporary username and password which you will be asked for
when you try to add variables to the table.

Click on the ‘TABULATION’ tab at the top of the screen. To populate the table, click
on a variable in the left window and choose ‘add to row’ or ‘add to column’. For
example, you can click on ‘Things people might think might make somewhere a good
place to live: choose one Q235’ and select ‘Add to row’.

You will be asked for your username and password at this point. Log in via the ‘UK
Federation’ and then select ‘UK Data Archive’ from the list in the drop-down box to
use the temporary login we have provided for this workshop. (In real life, you will
need your own personal login which you obtain by registering with ESDS).

When you are logged in, the table will appear with the variable you selected in it.

Try adding the ‘Sex of respondent’ as a column. You can find this variable in the
Household Grid list of variables. This is what you should see:

Do men and women have the same ideas about what makes a place good to live in?
Are these results what you would expect to see?

Using weights in Nesstar
Along the top on the right are a number of icons. If you hover over each with the
mouse, you can see what they are for. These icons allow you to apply a weight from
the data set to a tabulation or analysis, to filter the data, to send the tabulation to an
Excel spreadsheet and to create a graph of the data, amongst others.

The frequencies displayed in the table your created are the raw statistics for these
variables but the SSA should be weighted to ensure the results are unbiased.

Click on the scales symbol         on the top right hand side of the screen.

This brings up a screen which lists all weighting variables associated with the dataset.
Select Interview weight and move it to the right hand window using the arrow, and
press OK. Note that the words “Weight is on” now appears in red under the table.

Can you spot any differences between the percentages given in this table and those
you produced earlier without weights?

Now that the data are weighted, you are able to report the percentages (and the
counts on which they were based) in your research.

Exporting data to Excel
You can also easily export the table to Excel:

        Click on the Excel icon      on the top right hand side of the screen.
        Click on Open to open the file – or click on Save, and then save the file with
         an appropriate name to an appropriate location.

You can also amend the Excel file to produce a suitable table or chart for inclusion in
your research.

Graphs in Nesstar

You can produce graphs in Nesstar by clicking on the chart button       and choosing
from the five types of chart to produce a graph of the information in your table (see
the screenshot below for an example of a graph in Nesstar).

Downloading data or a subset of the data from Nesstar
If you are a registered user you are able to download all, or a selected subset of, the
data. Nesstar can save data into formats suitable for SPSS, STATA, SAS, Statistica, DIF
(suitable for use in Excel), Dbase and NSDStat formats. You could start to do this by
clicking on the disk icon        - but don’t do this now as we are going to use the
English Housing Survey for the exercises.

3. Working with data in SPSS

This section uses household data from the English Housing Survey (EHS) 2008-2009
and involves:
     creating a frequency table
     weighting the data
     creating a cross-tabulation

Follow the instructions to see what SPSS 16 looks like and use the drop-down menus
to look at descriptions of the data and to do some simple analyses.

3.1 About SPSS 16
Open SPSS version 16. Then open the data (using the standard windows open
button): the file we will use is called all_derived.sav.

There are two ways to view the data: in the Data View or the Variable View tabs.
The tabs to switch between the two are at the bottom left of the screen.

Ensure the Variable View tab (bottom left of screen) is clicked. In Variable View:
    Each row represents something that varies between respondents (known as a
       variable) and each column provides information about the variable including
       the name, label and coding information in the Values column.

Click on the Data View tab (bottom left of screen). In Data View:
     Each column represents a variable in the survey. This is often a response to a
        question or derived from answers to a question or several questions.
     Each row represents an individual respondent (in the EHS, each row
        represents a household)

In both views, you can use the drop-down menus at the top of the page (i.e. File,
Edit, View, Data, Analyze, Graphs etc.) to manipulate the data and to do analyses.
The menus used in this section are:
     Analyze: Descriptive Statistics: Frequencies…
     Data: Weight cases…
     Analyze: Descriptive Statistics: Crosstabs…

When you conduct an analysis, you see them in the Output window. You can move
between the Output window and the data in Data View or Variable View windows,
by clicking on the tab in the Taskbar (the bar that is always at the bottom of the

3.2 To create a one-way frequency table of a variable
From the menu bar at the top of the page, use:
Analyze> Descriptive Statistics> Frequencies

We will look at a measure of occupational density or overcrowding: the bedroom
standard. The term is explained below, from the Communities and Local
Government (CLG) website:

Use the Analyze drop-down menu to open the Frequencies dialogue box. When you
get to the Frequencies screen: Select the variable for Bedroom Standard: bedstdx
from the list and move it into the right hand box. See below for how this should
look. When you have finished, click on ‘OK’.

Another Output window will now open. You should see the frequency table shown
below displayed in this window:

3.3 To weight the data
Use: Data> Weight Cases…

Using the drop-down menus, go to the Weight Cases screen and add the weight
aagfh008, then click ‘OK’.

Redo the frequency table for bedstdx and you will see that the numbers are much
larger than before and the percentages have changed a bit. The previous frequency
table showed the numbers of households sampled for the survey. Using the weight
has given you population estimates (for households in England). The weights also
take account of the fact that the sample varied from the population in terms of sex
and age structure and aim to minimise bias due to the way that the data were
collected. The weighted results are the ones that you would report in your research.

3.4 To create a cross-tabulation
Use: Analyse> Descriptive Statistics> Crosstabs…

We will look at overcrowding by sex of the head of the household (HRP). To create
the cross-tabulation using the drop-down menus:
     Click on: Analyze, Descriptive Statistics, Crosstabs…
     Select bedstdx (Bedroom standard) as your row variable and sexhrp (the sex
       of the head of household (HRP)) as your column variable (see below for how
       the Crosstabs box should look)
     Click on the Cells… button and select row percentages (you can leave
       observed ticked too, to keep frequencies)
     Click Continue to return to the Crosstabs dialogue box and press OK to run

You should see the following cross-tab in the output window:

We can interpret the first cell in the second column of the table as meaning that
50.8% of households two or more bedrooms below the Bedroom standard have a
female head of household (a proportion very similar to those headed by men).
However, only 31.7% of households with two or more bedrooms above the Bedroom
standard are headed by women.

3.5 Exercise
Now try the following using the same menus and dialogue boxes as above:

Question: Are households with heads over 60 years less crowded than household
with younger heads?

In effect we want to look at the relationship between overcrowding (Bedroom
standard) and the age of the head of household (age>60 and under 60).
    1) First remove the weight. Use the Data drop-down box.
    2) Then look at the variables bedstdx and agehrp2x. They are both ordinal or
        grouped variables so we can look at a frequency table for each. Via the
        Analyse drop-down menu, select both of the variables then click on OK. How
        may households included in this survey have a head aged over 60?
    3) Now add a weight to the data (via the Data menu as before). Redo the
        frequency tables. What proportion of households has a head over 60?
    4) Create a cross-tab of bedstdx and agehrp2x via the Analyse menu. Put
        bedstdx in the rows and agehrp2x in the columns. What does the cross-tab
        tell you about the relationship between these variables?

3.6 Filtering: Selecting parts of the data
Use: Data> Select Cases…

If we are interested in a research question that refers to only a part of the
population, we can filter out all the parts of the data that we are not interested in.
This means that our analyses use responses by some respondents and not others.
An example might be that we only want to look at people who live in the North East
of England so we filter out all everyone who lives elsewhere. Filtering does not
delete data from the data set, it just removes it from analyses while the filter is on.

!!! Don’t forget to remove the filter when you want to use all the data again !!!

To select only people in the North East of England, we use the drop-down menu:
     Click on: Data, Select Cases… to see the following box:

        Select If condition is satisfied and click on the button If…
        Then click on gorEHCS and use the arrow to move it to the box at the top,
         then type in =1 (see below for how this looks), and press Continue. This
         selects only those cases for which the Government Office Region is 1, which
         is the North East.
        Press OK

You can see that the filtering has worked by looking at the data in Data View (see
below). The rows with lines through the numbers will not be included in analyses. In
this case, the data are sorted by region so all the first rows shown above are filtered

You should also do a frequency table of gorEHCS to check that the filtering is doing
what you think it is doing.

!!! Now remove the filter via Data> Select Cases… and choosing Select all. !!!

4. Linking EHS datasets in SPSS 16

Most datasets are provided in a single file so there is no need to combine datasets to
use them. However, some of the large-scale government data are provided in
multiple files. Sometimes the files contain different levels of responses: e.g.
individual level or household level. This kind of data is called hierarchical data.

Sometimes the data you are interested in are supplied in different files but are not at
different levels. This is the case for the EHS Household Data 2008-2009 because the
weight is only on one of the data sets, and all the household interview data are on
the other. Section 4.2 shows how to link two household level data sets in SPSS 16.

Most of the EHS data is at the household level, but there is also a file at the
individual level. Section 4.3 looks at how to combine these with household level files
to obtain individual level data in SPSS 16.

There is more general information about linking and matching files in SPSS and Stata
in the ESDS Government guide to working with survey files available at

4.1 File structures and matching variables
When you download the EHS 2008-2009 household files from ESDS, you will see that
the data are in two folders called Interview and Derived. Interview contains a
number of files based on the raw data from the interviews. The data you are more
likely to want to use is the derived data in the Derived folder. If you were to
download the data from ESDS and open the Derived folder, you will find two files
with names:

        Generalfs08.sav contains information about household tenure, the
         Government Office Region, deprivation scales and the household weight.

        Interviewfs08.sav contains results of the household interviews with
         information such as household composition, age, gender and employment
         status of the HRP and partner, the number of bedrooms etc.

To use the interview data with the weight, you must combine the two files. The files
are at the same level (both household) so this is relatively easily done. The matching
variable is the household identifier, which is called aacode in both files.

4.2 Combining two EHS household-level data files
Data> Merge Files> Add Variables

The following uses the drop-down menus but if you are more familiar with SPSS, you
can use syntax. The syntax file to link the files is shown at the end of this section.

The matching variable is the household identifier aacode. Before merging data sets,
the matching variable must be ordered in the same way in both data sets. The data
in these EHS files are sorted by aacode already so there is no need to sort the files.

Start by opening generalfs08.sav
Look at how many variables there are in this file. Look at the data in Data View to
see whether the data are sorted on aacode (they are). If in doubt, right-click on the
title of the aacode column and select, Sort Ascending (see below):

To merge the interviewfs08.sav data with this file, use the menu: Data> Merge Files>
Add Variables to get to the following screen:

Browse to find second data set: interviewfs08.sav then press Continue. (This data
set is also already sorted by aacode.)

In the new window, tick Match cases on key variables in sorted files, and Non active
dataset is a keyed table . Then click on aacode in the left-hand window and click on
the arrow to move it to the Key Variable box as shown below:

Then press OK to merge the files. The new merged data set is now displayed: you
should see that there were 19 variables in total before the merge, and now there are
126 variables. The new data set should be saved under a new name on the desktop:
all derived2.sav.

The syntax for this (where the datasets are held in a drive called F) is:
  /TABLE='F:\ interviewfs08.sav'
  /BY aacode.

4.3 Using data at the individual level
Data> Merge Files> Add Variables

The derived household data in the EHS are at the household level. Sometimes, we
may want to analyse the data at the individual level: For example, if we have the
following research question:

Research question: How many children under 16 live in over-crowded housing?

To measure overcrowding, we will use bedroom standard as in Section 3 of these
worksheets (the definition of Bedroom standard is on Page 13).

Start with the new data set created in the first part of this section: all derived2.sav

To this we are going to add a data set which is at the individual level people.sav. As
in all these data sets, it has already been sorted in ascending order by the matching
variable aacode.

As before, use: Data> Merge Files> Add Variables, and select the data set: people.sav
to add. In the new window, tick Match cases on key variables in sorted files, but this
time choose Active data set is keyed table. Then click on aacode in the left-hand
window and click on the arrow to move it to the Key Variable box as shown below:

Press OK. Look at the new data in Data View. The variable aacode now contains
multiple rows with the same value representing different people in the same

Now save the data under a new name.

To answer the research question, first weight the data using the household weight:

Then run a frequency table of the variable for the bedroom standard: bedstdx. Do
the total numbers look reasonable for the population of England (approx 50 million)?
If not, then you need to weight the data (Data, Weight Cases… and the weighting
variable is called aagfh08)

Now you can filter out anyone over the age of 15 years to look at children only
(Data> Select Cases, then Select if age<16). Run a frequency table of ‘age’ to check
that the filtering worked as you expected. Finally, re-run the frequency table for

Q: How many children under 16 in England live in households where there are two or
more bedrooms below the standard?
(Merging the general file with the interview file, and then with the people file where the data are in drive F:)
  FILE='F:\ generalfs08.sav'.
  /BY aacode.
SAVE OUTFILE='F:\all derived hh.sav'
  /BY aacode.


(Filtering out adults (over 16 years))
COMPUTE filter_$=(age<16).
VARIABLE LABEL filter_$ 'age<16 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.




Shared By: