Module B2: Handling Data by HC12110509488


									                                                  Intermediate Level Module I3

Module I3: Analysing data - Trainers guide


This guide is for trainers of all or part of the SADC Module I3. Overall information for
trainers is in the file:

SADC Statistics Course – trainers Guide

This guide assumes you are familiar with the overview of module I3 as given in the guide:

Intermediate Level Module I3: Analysing data

This module requires students to have access to computers. Ideally the lecture sessions
will have a computer projector, so Power Point presentations can be used and
demonstrations given and discussed within the class. However, it can be run with the
access to computers being largely during practical sessions.

Software Installation

The software should be installed prior to the start of the training, if at all possible. It is
assumed that Excel is already installed. The other software packages needed are as for
Module B2, as follows:

       CAST electronic textbook. (It could otherwise be run directly from the DVD.)
       The SSC-Stat add-in to Excel
       This module also introduces a simple statistics package. The package used in the
        notes is called Instat and is supplied on the DVD. Alternatively it could be
        replaced by an alternative statistics package and the notes for Sessions 13 to 16
        adjusted accordingly.

The datasets and relevant case studies must also be made easily available. This could be
through supplying the DVD, or by copying the DVD image to a central server, or even to
each machine. Or it could be by copying individual files needed for the particular sessions,
though that is likely to be more error prone, for both staff and students.

SADC Course in Statistics                                   Trainers guide - Module I3 – Page 1
                                                Intermediate Level Module I3


Computing and statistical computing

Students who have done earlier modules, particularly B2 and I2, will automatically have
sufficient computing and statistical computing experience.

Some students may have gained exemption from some earlier modules and they would
have to be prepared to spend time on the computing aspects that are assumed. The two
particular aspects are as follows:

      The electronic textbook, CAST was introduced and used in Module B2. The
       SADC Intermediate book is used extensively in this module and it is assumed that
       students are familiar with the mechanics of using CAST.
      Excel with SSC-Stat is used for the most of the analyses and practical exercises.
       Students who need a “refresher” could review using materials from Module B2, I2
       or the relevant guides on Excel and SSC-Stat that are on the DVD.

It is also assumed that students are familiar with a word-processing package, such as Word,
and with a presentation package, such as Power Point. Familiarity with some screen-
capture software is desirable to enable enhanced reports and presentations to be made.


Students are assumed to have done Module B2, or to have equivalent knowledge. The
materials for Module B2 should be available, so students can refresh their memory, or look
at the materials, if they have exemption.

The datasets are assumed organised, using the ideas in Module I2. If this module is being
taken separately, then some sessions may require materials from Module I2. They are
described in the information below on the individual sessions.

Training style

Self study and group work is included in a similar way, and to a greater extent than in
Module B2.

SADC Course in Statistics                               Trainers guide - Module I3 – Page 2
                                                Intermediate Level Module I3


Session 1: Review of concepts from the Basic level

This Session is a revision session of the key ideas from Module B2. It also provides a
possible review session for trainers, who are supporting Module I3, but have not taught
Module B2 themselves.

Sessions 2 and 3: Graphical summaries for quantitative data

Sessions 2 to 9 are all essentially concerned with measuring and interpreting variability.
These 2 sessions start the process in a “gentle” way by describing histograms and boxplots.
Students are usually already familiar with histograms, but boxplots may be new to them.
They were mentioned briefly in Module B2, when the Excel add-in, SSC-Stat was

If time runs short, then trainers should concentrate more on boxplots than on histograms.
They are very useful in their own right, and also introduce the idea of quartiles, and hence
the “quartile deviation”, which is a measure of variation that is easy to interpret.

CAST covers both topics well.

Sessions 4 and 5: Numerical summaries for quantitative data

Many students do not understand and are therefore not able to interpret and use the
standard deviation. These two sessions are designed to prevent this situation from students
who have taken this module. Both CAST and Excel are used.

Within CAST I still find Section 3.2.1 “How close are the values to k” to be unhelpful for
students. The author of CAST insists this is an effective start to the topic – we leave you
to decide. But the remainder of the section is good, particularly the 70-95-100 “rule-of-

Sessions 6 and 7. Processing single and multiple variables

The idea of Analysis of Variance is introduced in these sessions, purely as a descriptive
tool. This will be a surprise to some trainers. But it is consistent with the idea that
statistics is designed to explain variability. And it shows how the variance is a natural

SADC Course in Statistics                                Trainers guide - Module I3 – Page 3
                                                Intermediate Level Module I3

summary statistic here, because sums of squares “add-up”. This in turn, provides a good
explanation of why the standard deviation (square root of variance) is so important, even
though it is harder to interpret than the mean deviation or the quartile deviation.

CAST used to introduce ANOVA much later in the books, but has now added a section to
Chapter 3 of the Intermediate-level book that covers the ideas of these sessions.

Sessions 8 and 9. Risk and return periods

This is a good section in CAST. Two main ideas are introduced in these sessions. The first
is the whole frequency distribution – generalising the boxplot, where just the quartiles were
described. The second is the way in which variability can best be explained to a non-
statistical audience. It introduces the different ways of interpreting risks and return

Sessions 10 to 12. Tables and graphs for frequencies and other statistics

Module B2 examined simple tables for counts and percentages. The complications of
percentages were avoided, by looking only at one-way tables.

Here multi-way tables are described, and students need to be clear on which percentage is
appropriate for a given set of objectives. The sequence of 3 sessions starts with a Flash
presentation that generalises a short version shown in Module B2 (and reviewed in Session
1 of this module). This is used to remind students of the importance of considering tables
and graphs to satisfy stated objectives.

Tables are used mainly here, with presentation graphs largely left for Module I4.

The second new topic is that of tables for summaries, such as means, as well as for
tabulating frequencies.

Session 13. Introducing a statistics package

A simple statistics package is introduced. It is justified, because the following three
sessions consider common complications in the analysis of survey data. They include the
need for weighted tables, and the analysis of multiple response questions. These

SADC Course in Statistics                               Trainers guide - Module I3 – Page 4
                                                 Intermediate Level Module I3

complications are straight forward with many of the standard statistics packages, but more
difficult with Excel.

The case is made that statistics packages have become easy to use, and hence this is not a
major step. The statistics packages all communicate with Excel, and so provide an extra
tool that can be used with Excel.

Most statistics packages are similar to use. So once one is familiar it is even easier to use
another, should the need arise.

We decided to introduce a simple statistics package called Instat. It is produced by the
SSC, Reading, and hence we are able to supply the software within SADC at no cost to
users. It is also designed to be:

       Simple to use
       Easy to use for teaching
       With the facilities for the complications described here

The Higher-level modules (H1 to H8) also assume a statistics package, though the
introduction of a particular package is assumed to be outside the teaching of the module.
Two “obvious” packages at the higher level are SPSS, because of its popularity, and Stata,
because of its value-for-money, and power as a comprehensive statistics package.

Trainers may therefore wish to introduce a more powerful statistics package immediately,
or take the stand – assumed here – that it could be an advantage for students to see more
than one package (more than two if Excel with SSC-Stat is interpreted as the first statistics
package.) They then could gain the confidence that further statistics packages could easily
be mastered later, should the need arise.

Sessions 14 to 16. Common complications when analysing survey data

Real data always seems to add practical complications that beginners to analysis find
difficult when they first start their work. These modules have tried to prepare students for
the “real world” by including proper-sized data sets from the outset.

SADC Course in Statistics                                 Trainers guide - Module I3 – Page 5
                                                  Intermediate Level Module I3

Here this process is taken a step further, by mentioning some common complications that
arise when processing survey data. These complications also justify the introduction of the
statistical software, because that provides one way of “taming” the common complications.

The two main complications that are mentioned are those of processing multiple response
data and of producing weighted tables. They are in Sessions 15 and 16. In Session 16
weights are shown to be one way of adjusting for missing values.

Session 14 introduces the common complication of zeros (or special values) in the data.
Until now, students have thought of variables as either being numeric or categorical, but
some variables combine both aspects. One example is of sunshine hours between 9am and
1pm. This can be analysed as “just numbers”, but a more perceptive analysis might
distinguish between days with no sun (zero values), days with all sun (4 hours), and those
with some sun. This is categorical, and is then followed by an analysis of the data for those
days with some sun, as a numeric variable.

This is a general idea. Namely the data are split into categories and then the data are
analysed in two stages. The first is the frequency of being in each category, and the second
is the analysis within one (or more) categories, conditional on being in that category.
Once split in this way, students should realise that they have all the tools they need from
their analyses so far. What they have to avoid is to analyse the data in an unthinking way.

Session 17 and 18. Producing a product portfolio

These two sessions are on group work using data that is usually supplied locally. It is
possible that these sessions are interspersed with some of the previous work, to give
students and trainers time to work together on the analyses. We have suggested a survey
where different groups can each work on a defined theme, possibly education, health, water
and poverty, being 4 themes from the same survey.

One aim is for students to practice all the skills they have learned so far, in a realistic way.
A second is for them to practice working in groups, and to recognise that being a “good-
team-player” is a valuable skill in its own right.

SADC Course in Statistics                                  Trainers guide - Module I3 – Page 6
                                               Intermediate Level Module I3

Session 19. Portfolio presentations

This session combines the presentations with practice by students in constructive
evaluation of their peers. The forms are designed to avoid students being able to resort to
over-general statements such as “very good”. They should again recognise that helping
colleagues to improve is an important general skill.

Session 20. Review

On the statistics side, students should now be able to process standard sets of data. On the
more general side they should recognise the varied skills, including “soft skills” that are
needed to enable data to be processed effectively.

SADC Course in Statistics                               Trainers guide - Module I3 – Page 7

To top