Learning Center
Plans & pricing Sign in
Sign Out

Report of the Quality Assurance Review


									                            Quality Assurance Review

                                Summary Report

                            Don Royce (Chairperson)
                             Louis Marc Ducharme
                                 Claude Julien
                              Maryanne Webber
                                 Karen Wilson

                               February 28, 2007

Source: Statistics Canada
                                  Executive Summary

This Summary Report provides an overview of the findings of a Quality Assurance
Review that was conducted for nine key statistical programs during the period September
2006 to February 2007. The review was commissioned by Statistics Canada’s Policy
Committee in order to assess the soundness of quality assurance processes for these nine
programs and to propose improvements where needed. The Summary Report describes
the principal themes that recur frequently throughout these programs, as well as providing
guidance for future reviews of this type.

The main conclusions and recommendations are the following:

      Program managers are well aware of the risks factors for their programs and are
       managing them well, within the circumstances and resources available to them.

      The programs are well-positioned to identify the improvements that are needed to
       reduce the risk of errors. Based on the results of this review, programs should be
       asked to develop proposals for Corporate Planning Committee consideration.

      While the risk of errors is well-managed across all programs reviewed, some
       would benefit more than others from additional investment to further lower the
       risk of errors. The Consumer Price Index, Balance of Payments and International
       Trade programs should receive a higher priority for investment to further reduce
       the risk of errors. We note that as of this writing, funds have been allocated to
       strengthen the CPI.

      Human resources concerns dominate all other risk factors. Investments should
       first be targeted at addressing the human resources issues affecting the mission-
       critical programs, to ensure sufficient, well-trained staff to support them.

      The existence of a research and analysis capacity separate from the production
       operation is a key factor in assuring quality. All mission critical programs should
       have a strong and explicit research and analysis capacity, independent from the
       production operation, whose role is to challenge the data and to conduct research
       into the particular subject matter of the program.

      There are numerous “best practices” in all programs that can usefully be shared
       within the Agency. Many of these best practices are small things, but taken
       together they add up to an effective quality assurance program. These best
       practices should be the basis of the training program on quality assurance that is
       currently being developed.

      Statistics Canada should develop a formal Quality Incident Response Plan
       containing standard procedures for dealing with data quality incidents, whether
       before or after data release. Training would be a part of this initiative.

   Proposals to further increase the timeliness of the programs should be regarded
    with extreme caution, especially if there is a possibility that the time would be
    taken from the certification or data release steps.

   This review of quality assurance practices has been widely regarded by
    participants as very useful. Statistics Canada should establish an ongoing program
    of Quality Assurance Reviews, tied to the Integrated Program Reporting cycle,
    and based on teams of Assistant Directors reporting to a Steering Committee of
    Directors General.

1. Introduction

In September 2006, in response to a small number of unfortunate errors in data released
by Statistics Canada in the preceding eighteen months, Policy Committee decided to
undertake a review of the quality assurance practices of nine key statistical programs. An
internal management task force was set up for this purpose and conducted the review
during September 2006 to February 2007 time period.

This Summary Report provides an overview of the findings of the Quality Assurance
Review. It describes the principal themes that recur frequently throughout many
programs, as well as providing guidance for future reviews of this type. The report is not
intended to provide a comprehensive list of specific risks, best practices and
recommendations for each of the nine programs; these are contained in the individual
reports on each program.

Section 2 briefly describes the objectives, scope and organization of the review. Section 3
presents our principal conclusions. Section 4 summarizes the best practices which we
found, while Section 5 describes the principal risks we found, along with
recommendations for ways in which these risks might be mitigated. Section 6 provides an
evaluation of the review process itself, and Section 7 concludes with some
recommendations for the next steps.

Specific recommendations are interspersed throughout the text, but for ease of reference
are numbered and italicized.

The Steering Committee wishes to thank the staff of the programs, all of whom
cooperated fully with the review. We also wish to thank all of the review team members
(listed in Appendix 1) whose dedicated work forms the basis of this report. Special thanks
are due to Claude Julien for his invaluable support in developing review materials,
coordinating the work of the review teams, and in filling in wherever help was needed.

2. Objectives, scope and organization of the review

The nine programs that were chosen to be the subject of the review are the following:

      Monthly Consumer Price Index (CPI)
      Monthly Labour Force Survey (LFS)
      Monthly GDP by Industry (GDP)
      Monthly Retail Trade Survey (MRTS)
      Monthly Survey of Manufacturing (MSM)
      Monthly International Trade (IT)
      Quarterly Income and Expenditure Accounts (IEA)
      Quarterly Balance of Payments (BOP)
      Quarterly Labour Productivity (LP)

These programs were selected, not because it was believed that they were particularly
prone to errors, but because they represent the most “mission critical” sub-annual
programs of Statistics Canada, providing current information on economic conditions.
The quality of these “key indicators” is therefore of utmost importance to a wide range of

There was considerable variety in the characteristics of the programs reviewed. Six of the
programs are monthly, while the other three are quarterly. Four of the programs
(Consumer Price Index, Labour Force Survey, Retail Trade Survey and Monthly Survey
of Manufacturing) are survey-based programs, one (International Trade) is an
administrative records-based program, three (Monthly GDP, Income and Expenditure
Accounts and Labour Productivity) are derived statistics programs, and Balance of
Payments incorporates characteristics of both surveys and derived statistics. This variety
made the review challenging, but it also meant that the results of the review would have
wide applicability.

To oversee the review, a Steering Committee was formed consisting of the Director
General (DG) of the Methodology Branch, who chaired the Committee, the DG of
Economy-wide Statistics, the DG of the System of the National Accounts, and the DG of
Labour and Household Surveys. The Steering Committee invited the head of the Quality
Secretariat, also located in the Methodology Branch, to become part of the Committee as
an assistant-coordinator. The Steering Committee met on a weekly basis to design and
monitor the review.

With direction from Policy Committee, the Steering Committee established its objectives
and developed an approach to review these programs within the very tight time schedule
prescribed. The specific objectives of the review were twofold:

  i.   to identify any specific weaknesses where action is needed, and
 ii.   to identify “best practices” that should be promoted to other programs.

This balanced approach, looking at both positive and negative aspects, was a key
principle of our work.

The focus of the review was on the implementation of the programs, not on their design.
We were specifically interested in factors affecting the accuracy of data, rather than the
other five dimensions of quality (relevance, timeliness, coherence, interpretability and
accessibility), unless the latter happen to affect the risk of producing inaccurate results.
Particular attention was put on the certification of the outputs, deemed as the last check
on the accuracy of the information, and on the data release process, where this
information is communicated to the public.

Ten managers, primarily at the Assistant Director level, were recruited from across the
Bureau to form the review teams. One team was formed for each of the nine programs
listed above, with a separate team formed to review the data release activities of the
Dissemination Division and Communications and Library Services Division (hereafter

referred to as DCD). With the exception of the latter team, each team consisted of three
managers, so that each manager was involved in the review of three different programs.
This interlocking assignment of reviewers to review teams was designed specifically to
expose each reviewer to several programs and thereby achieve a more uniform approach.
For the same reason, a member of the Steering Committee was assigned to be a member
of each review team. For the DCD review, the team consisted of an Assistant Director
and all of the members of the Steering Committee.

A lead reviewer (not one of the Steering Committee members) was assigned for each
program and was generally responsible for organizing the review, collating all of the
materials and preparing a report on the program. In order to avoid any conflict of interest,
care was taken to ensure that the lead reviewer did not work in the program being
reviewed. In some cases one of the other team members who were familiar with a
specific program was assigned to be a member of that review team, in order to provide
insight into the program. Appendix 1 shows a complete list of the teams for each

During the review process, all the review team members and Steering Committee
members met every month to review progress and to share findings. This also helped to
keep the reviews on track and fostered a uniform approach.

The Steering Committee developed a semi-structured questionnaire to be used in the
main interviews with the programs (see Appendix 2). The questionnaire was not designed
to be self-completed by programs, but rather was intended to provide the review teams
and the programs with suggested lines of questioning, while not being so structured as to
confine the review teams to asking pre-defined questions. The questionnaire first
collected information on the various steps in the production and the process flows. For
each step, the questionnaire then asked about the various kinds of checks in place,
indicators of quality at each step, and the various risk factors for each step. The third
section of the questionnaire concentrated on the data certification (for example what data
confrontation is done or what internal checks are done) to certify that the data were
accurate. The fourth part of the questionnaire asked about the release process and what
checks were in place to ensure that there were no errors introduced at this step. The fifth
part asked the program about its experiences (if any) where incorrect information had
been released in the past, what the reasons were, and what had been done since then to
prevent errors in future. The sixth part of the questionnaire asked about various factors
that might cause checks to fail more frequently. Finally, the questionnaire asked about
how changes to the program, whether internal or imposed from outside, are managed.
The questionnaire was sent to the programs in advance of the first meeting to familiarize
them with the content of the review. A slightly modified questionnaire (not included in
this report) was developed for the Dissemination and Communications program.

The first meeting with each program was intended to explain the review process to them
and to obtain information and documentation on the program’s operations (i.e. part 1 of
the questionnaire). The meeting was normally with the assistant-coordinator and the lead
reviewer, and the Director and/or Assistant Director for the program. Based on the results

of these initial meetings, the Steering Committee developed a list of seven standard steps
which were common to all programs. These were:

       i.   Preparation for certification
      ii.   Data collection
    iii.    Editing/transformation of data
     iv.    Imputation and estimation
      v.    Certification
     vi.    Release
    vii.    Post-release

While the details of these steps varied across programs, these activities were common to
all nine programs and this framework proved to be a very useful way of organizing the
remainder of the review.

Following the development of this framework, a series of meetings was held between the
full review team and the program to cover the remainder of the topics in the
questionnaire. The program was typically represented by the Director, Assistant Director
and/or production manager. Although the initial estimate was that one two-hour meeting
would be enough, it quickly became evident that at least three such meetings would be
required to cover all the material. Because the meetings had to be scheduled around the
production schedules for these monthly and quarterly programs, it was often difficult to
find meeting times and as a result the interviews took about four weeks longer to
complete than originally planned.

While the review teams were conducting the interviews in November and December
2006, the Steering Committee developed a standard format for the reports on each
program (see Appendix 3). Following a standard introduction describing the background
to the review, each report was to describe the program, enumerate the various quality
assurance checks at each of the seven standardized steps in the process, provide a
summary, describe other considerations (optional) and finish with Appendices. As a
guideline, the Steering Committee suggested to the review teams that each report should
be approximately 15 pages, excluding any Appendices.

Once each team had drafted its report, the reports were reviewed with the management of
the programs. The purpose of this was not to ensure that the program management agreed
with all the findings and recommendations, but that the review teams had reflected
accurately what it had been told about the program. Following this process the reports
were finalized and passed to the Steering Committee. The Steering Committee reviewed
the reports and in some cases asked that further follow-ups or clarification be made.

The final step of the review teams’ involvement was an all-day debriefing session in early
February, where each team presented a summary of its findings and answered questions
from the Steering Committee and the other reviewers. As well, the review process was
discussed and suggestions were solicited for improvements to future reviews of this type.
The Steering Committee then used the reports and the all-day debriefing session to

prepare this Summary Report.

3. Overall conclusions

In this section we present our high-level conclusions from the review. More detailed
descriptions of best practices and risks are covered in Sections 4 and 5 respectively.

Conclusion 1: Program managers are well aware of the risks factors for their
programs and are managing them well, within the circumstances and resources
available to them.

Throughout all of the review meetings, it became evident that the Directors, Assistant
Directors and Production Managers had a very good understanding of the risks of their
programs with respect to the accuracy of the data they publish. Furthermore, it is our
conclusion that they are managing their programs quite well in this respect, within the
constraints that they face. For example, all of the programs have and use very detailed
schedules, checklists and similar types of tools to manage their operations. Program
managers also have a very good idea of what improvements would be best for reducing
the risks of producing erroneous data. In many cases they are pursuing these
improvements on their own, to the extent that resources permit.

This finding suggests two things. First, we believe that it is the programs themselves that
are in the best position to identify the improvements that are needed to reduce the risks of
releasing incorrect data. This leads us to Recommendation 1.

Recommendation 1: Each program that was reviewed should develop a set of costed
proposals for Corporate Planning Committee consideration that are designed to reduce
the risks of producing erroneous data.

Second, we believe that training simply to raise awareness of quality assurance is not a
major part of the solution. Among the program managers we spoke to, there is already a
keen awareness of quality issues. In some cases, taking staff away from already-tight
production schedules to attend training courses could even be counter-productive.
Instead, training courses on quality assurance should focus on disseminating the many
useful best practices that we found.

We do acknowledge that our review teams spoke only to the program managers, so there
is some possibility that this conclusion may be conditioned by this. Nevertheless, we
were impressed by the depth of knowledge about data quality issues shown by those we
interviewed, and we feel confident that this conclusion is correct.

Conclusion 2: While the risk of errors is well-managed across all programs
reviewed, some would benefit more than others from additional investment to
further lower the risk of errors.

It became evident through the review that some of the programs are more at risk than
others. The Quarterly Income and Expenditure Accounts and the Labour Force Survey
have relatively low levels of risk and in many respects can serve as models of exemplary
practice for other programs. While they still need to be vigilant, there are no pressing
issues that need to be dealt with. Three other programs, namely the Consumer Price
Index, Balance of Payments and the International Trade program, are under somewhat
more pressure and would benefit more from investment to reduce the risks of error. Of
these three, we would rate the CPI as being most in need of assistance (we note that as of
this writing the necessary resources have been allocated to the CPI.) The remaining four
programs (MSM, MRTS, Monthly GDP and Labour Productivity1), as well as the DCD
program, are in the middle. Our second recommendation is therefore:

Recommendation 2: Three programs should receive the highest priority for investment to
further reduce the risk of errors: the Consumer Price Index, the Balance of Payments,
and the International Trade program.

Note that this categorization is not a judgment of the quality of the management of these
programs. In many cases the risk factors are largely beyond the immediate control of the
program managers, for example dependence on data from other organizations or legacy
processing systems. As well, some programs are inherently more complex and difficult
than others. This should be kept in mind when reading the rest of this report.

Conclusion 3: Human resources are the dominant risk factor.

The number one risk factor identified in virtually all programs relates to human
resources. There are a number of dimensions to this, including significant staff shortages,
the need for staff with specialized skills, over-dependence on specific individuals,
difficulty in recruiting staff to work on monthly programs, succession issues and retention
issues. Thus our third recommendation is:

Recommendation 3: Investments should first be targeted at addressing the human
resources issues affecting the programs, by ensuring sufficient, appropriately-trained
staff to support the mission critical programs.

Section 5 on Principal Risks provides more detailed descriptions of the various human
resources issues and associated recommendations.

Conclusion 4: The existence of a research and analysis capacity separate from the
production operation is a key factor in assuring quality.

One factor that emerged as being central to good quality assurance is the existence of a
research and analysis capacity, separate from the production operation, which can
“challenge” the data through subject matter analysis. This capacity can take a number of

  In the case of Labour Productivity, we would have rated this program in the higher risk group until the
recent decision to move the production operation into IEAD. We feel that this change will significantly
reduce the risk.

forms. In some programs, such as the MSM, the challenge function is played by another
division (the Industry Accounts Division), who feed back to the MSM suspicious
estimates that they find in the data and ask for an explanation. In the case of the Income
and Expenditure Accounts, this challenge role is played before release by bringing
together the various industry analysts within the division to discuss the data, and after
release by having post-mortems with key users. In some other cases the program has its
own group of analysts who are separate from the production staff but work alongside
them. In other cases, such as in the CPI, this capacity used to exist but is no longer

In the case of the Labour Productivity program, the reverse situation has been the case.
The program has evolved from a research program to a production program, but the
analysts have continued to be focused more on research and analysis than on production,
making the production operation somewhat unstable.

As well as simply analyzing the data, it is also important to have a research capacity that
is directed at maintaining the quality of the program. The economy is continually
evolving, and programs must keep up with what is happening in the marketplace through
good research. Such research can help to identify future problems or trends that need to
be taken into account in the methods used to collect and process the data.

With a research and analysis capacity in place, the mindset is different. In a sense, the
analysis function challenges the data to prove that they are correct when they do not agree
with other information about what is happening in the economy. This mindset leads to
better detection of errors. Without a research and analysis capacity, the production
operation has a tendency to rationalize otherwise questionable findings.

Our fourth recommendation is therefore:

Recommendation 4: All mission critical programs should have a strong and explicit
research and analysis capacity, separate from the production operation, whose role is to
challenge the data and to conduct research into the particular subject matter of the
program. The most suitable form of this capacity may vary across programs.

Conclusion 5: There are numerous “best practices” in all programs that can usefully
be shared.

During the review we were encouraged to see just how many best practices exist, even
for the programs which we judged were at higher risk. Many of these practices are small
things, but taken together they add up to an effective quality assurance program.
Furthermore, we found that the whole was more than the sum of its parts – having a good
set of quality assurance practices which have been in place for a long time, as in the LFS
and IEA programs, seems to create a culture of quality where employees become fiercely
proud of the reputation of their program. As one interviewee put it, “people are ready to
walk on coals for the survey.”

Section 4 of this report describes these best practices in more detail, so that they can be
disseminated across programs.

Our fifth recommendation is therefore:

Recommendation 5: The best practices found in this review should be disseminated as
widely as possible and should form the basis for the training course on quality assurance
that is currently under development.

Section 7 provides some further suggestions on how this could be done.

Conclusion 6: Statistics Canada should have a standard protocol to deal with data
quality incidents, whether erroneous data are caught prior to release or afterwards.

In the review of the programs we were struck by just how much data are released for
these nine programs. The sheer volume makes it impossible to check all of the data that
are released. We believe that prudent planning should assume that despite good quality
assurance practices, errors similar to those that prompted this review can happen again,
and that we should be prepared to respond to them.

In reviewing the errors that did occur, we remarked that much of the problem was in
communicating the issue within the organization and in deciding how to inform users. On
the other hand, we did observe an exemplary practice in the case of the LFS, which
discovered an error in the course of doing a historical revision. The LFS adopted an
approach of full disclosure with its users, informing them of the existence and impact of
the error. This well-managed approach resulted in very good cooperation from the user

Our sixth recommendation is therefore:

Recommendation 6: Statistics Canada should develop a formal Quality Incident
Response Plan (QIRP) containing standard protocols for recognizing and dealing with
data quality incidents, whether before or after data release. A useful model for this is a
plan developed at the Australian Bureau of Statistics, which could be supplemented with
our own best practices. Training on the QIRP would be an important part of this

Conclusion 7: The interdependency among the programs is an important factor in
the quality of the results. Risks often increase when data flow across organizational
boundaries and these flows should be explicitly managed.

Another fact that struck us was how interdependent the various programs are. While the
flow of data from programs into the National Accounts is already well known, there are
many other flows of data from the SNA back to programs, and between other programs,
that are less well understood. As one example, Income and Expenditure Accounts
Division (IEAD) provides deflated (constant dollar) estimates of retail sales that are

published by Distributive Trades Division. We found that DTD does not have a complete
understanding of how these estimates are calculated, nor does IEAD take complete
ownership, since they produce them only for DTD use and publish a different set of
estimates themselves. While there have been no incidents of errors in published data, we
feel that this increases the risk of publishing inaccurate data. Another example is the rent
data produced by the LFS and used by the CPI and IEAD; it is not apparent who takes the
ownership of these data.

In many cases the flow of data across programs is well managed, involving formal
agreements and regular meetings to review the data. In the case of DTD and IEAD
described above, the two divisions have recently realized this problem and have
developed a communications protocol. However we feel that it is desirable to identify and
formalize the arrangements between divisions whenever data flows from one division to

Recommendation 7: Divisions should identify all data flows into and out of their division
and ensure that formal arrangements are in place to ensure sufficient understanding of
how the data were produced and how they will be used by the recipient division.

Conclusion 8: The review of quality assurance practices was very useful and should
become an ongoing program.

The review had a number of benefits to the organization. First, it surfaced numerous best
practices that can be shared among all the programs of the agency, not just those that
were under review. Second, it gave the programs under study a chance to take stock of
their quality assurance practices and to submit them to an independent assessment by an
outside review team. In all cases the programs cooperated fully with the exercise because
they felt they had something to learn. In some cases the review surfaced issues that
needed to be addressed, while in other cases it confirmed the best practices that were
already in place. Third, reviewing nine programs simultaneously provided us with a good
idea of the range of both best practices and risks across the nine programs, which will aid
the Agency in setting priorities for investment. Fourth, it was a very beneficial learning
experience for the members of the review team and the Steering Committee members. An
ongoing program would be an excellent learning tool for the Assistant Director level, the
results of which they could apply to improve the quality assurance practices in their own
divisions. Fifth, it provided a “road test” of the Quality Management Assessment
program that was already under development jointly amongst the Methodology Branch,
Standards Division and Internal Audit Division. The experience gained in conducting the
review was invaluable in shaping the future of such a program. Finally, the existence of
the review raised awareness of quality assurance as a key issue for Statistics Canada
among employees at large.

Recommendation 8: Statistics Canada should establish an ongoing program of quality
assurance reviews, based on teams of Assistant Directors.

More specific suggestions on the form of such a program are given in Section 7.

4. Best Practices

As mentioned above, many “best practices” were discovered over the course of the
program reviews. This section outlines the major best practices in order of the process
steps analysed. These seven standard steps were outlined above in section 2.0 of the
report. Many less important best practices are described in the individual program

4.1 Preparation for certification.

This step in the production cycle can be characterized as a research and fact-finding stage
whereby analysts follow current events related to their respective data programs.

Two important best practices emerged from the review.

 First, a daily summary of economic reports in the media is prepared by Industry
Accounts Division which classifies articles and events by NAICS code. A Daily
Economic Brief (DEB) is sent around to many analysts to use in preparation of the
production cycle. The briefs cover strikes, plant openings/closures, announcement of
major contracts or projects, etc. A similar database is prepared by Balance of Payments
Division on major international transactions. Analysts use these media databases in the
production of estimates to verify that the survey instruments capture important events or
to explain extra-ordinary movements in the data. This practice should be encouraged for
all major sub-annual data releases and the DEB and other such databases should be
shared across programs.

Recommendation 9: Media databases such as the Daily Economic Brief should be
disseminated more widely to ensure their receipt by analysts in all key sub-annual
economic programs.

Most analysts in the programs reviewed attend the monthly Daily Theme Analysis Panel.
This is a meeting that takes place at the beginning of every month where a summary of
trends to date in major economic releases is done along with a review of major events and
economic issues for the month ahead. This promotes the exchange of analytical
information across programs and helps analysts make the appropriate links to other data
sources. This is a relatively new analytic function but is seen as very fruitful and should
be continued.

4.2 Data Collection/Acquisition

In the case of the survey programs, all of them make use of the corporate collection
systems. Some are totally reliant while others do some partial collection within the

Several best practices were identified at this stage:

Prior to each collection period, the LFS conducts end-to-end testing of systems using a
pre-specified (fictive) data set where the expected output is known in advance. This
ensures that any updates or changes in date ranges or specifications in the programs of the
data system are functioning according to expectations.

In programs where the collection period is compressed over a short period of time as is
the case for monthly surveys, daily reports on operations including completion rates or
response rates by interviewer to monitor the collection process are extremely useful.
They can trigger the need for follow up or extra effort by collection staff.

While the collection systems are extremely efficient in terms of resource allocation, there
is a risk associated with collection staff not having sufficient subject matter expertise for
the various surveys for which they work on collection. A communications network with
collection staff has been developed in some cases including visits by subject matter staff
to follow up on issues identified in the field and to inform collection staff of changes or
issues to be watchful of at the collection stage. This practice is very successful in
engaging collection staff in the process of data quality management.

Recommendation 10: Collection operations should include, as a matter of course, good
practices such as end-to-end testing, monitoring, and direct communications between
subject matter and collection staff.

In the case of derived statistics programs a best practice emerged whereby the program
established written data delivery contracts with supplying divisions one year in advance.
These contracts specified the variables, level of detail, delivery dates and revision process
so that the expectations of both sides of the contract are known. They are particularly
helpful when there is staff turnover in either the supplying or receiving division so that
the arrangements are codified and no time is lost during the transition period. Since
many interdependencies were identified between the various programs covered in this
review, this becomes a very important practice in maintaining ongoing quality assurance,
both for the recipient and the supplier of the data. This point is addressed by
Recommendation 7 above.

4.3 Edit and data transformation

A variety of editing practices ranging from manual edits to completely automated
systems were identified. Programs are encouraged to use corporate tools to automate the
identification and correction for non-response, erroneous response, outliers etc. A
particular best practice is used by the Labour Force Survey in the form of consistency
edits whereby coding of responses is verified for consistency with other variables

Consistency verification is also a best practice used in most derived statistics programs

whereby variables can be analysed from different sources for consistency.

4.4 Imputation and Estimation

A variety of imputation methods were identified including the use of donor response,
historical imputation or the use of auxiliary variables. Corporate tools (generalized
systems such as Banff and GES) are available to support the various methods and
programs are encouraged to use them.

The survey systems by and large depend on centralized methodology expertise in the
estimation process including seasonal adjustment and benchmarking. This best practice
promotes consistent high quality statistical methodology across programs. Programs
which do not benefit from these resources should be encouraged to do so.

Recommendation 11: Programs should make use of corporate generalized systems and
centralized methodology services as a way of reducing the risk during the editing,
transformation, imputation and estimation steps.

4.5 Certification

The certification step of any process involves a variety of analytic techniques to test the
validity of the data. Two particularly important best practices were identified in this

Those programs which had access to easily usable time series analysis tools showed
regular attention to time series consistency and were capable of in-depth coherence
analysis. The availability of such tools on the desk tops of the analysts facilitated
repetitive analysis functions (e.g., ratios, percentage changes) but also gave the capability
to add ad hoc analysis in instances where the data presented some unexpected change
which needed to be explained or verified before release. Access across all programs to
such tools would benefit many of the programs.

Some programs use a canned package of analytic tables and charts to review not only
their own data, but also related indicators published elsewhere in the statistical system.
This is particularly useful for coherence analysis.

Recommendation 12: Analytic tools such as those described should be made widely
available to analysts to assist in the certification process.

Another best practice used in certain programs is the practice of calling upon independent
analysts who had not been involved in the production process to do high level analysis.
These analysts are able to spot unusual movements or inconsistencies that were not
observed earlier in the production cycle. Such analysts are often responsible for
providing the analytic content of the text which will be used in the subsequent release of
the data in The Daily. These same analysts are also available to follow up on specific

issues identified during one round of production to follow the developments through
subsequent months or quarters. Their insights can lead to adjustments in collection,
editing or estimation phases in future rounds. The function may also include more in-
depth analysis to be published outside of the regular production cycle and
communications with key external users to get feedback on quality issues.

Another fruitful practice is the review of the data an early stage by staff of related
programs. One example is the verification of import and export data by three programs
each month (International Trade Division, Income and Expenditure Accounts Division
and Balance of Payments Division). Another is the review of manufacturing shipments
data by the Monthly GDP program for conversion to real estimates (price change effects
removed). All programs involved view the data from different perspectives, which can
be enlightening in the verification process.

The best practices in the two preceding paragraphs are examples of the analytic and
research capacity referred to in Recommendation 4.

A final stage of this process is the briefing of senior management of the highlights of the
upcoming release. A dry run of this briefing before the presentation to Policy Committee
helps ensure that all bases have been covered in the verification process.

4.6 Release step

At this stage, once the data have been certified, databases are prepared to load onto
CANSIM and Daily releases are prepared as well as electronic and printed publications.

Again, a number of best practices were noted.

First, Divisions must organise a system of verifications against internal data bases and
products – which could include a team that does last minute spot checks of the various
products, including The Daily text, charts and graphs before release.

Second, use of corporate systems such as “smart publishing” to automatically generate
graphs and tables for publications from CANSIM help mitigate the possibility of errors
across different product lines.

Recommendation 13: The use of CANSIM and Smart Publishing should be expanded, as
a way of reducing the risk of errors.

Third, setting up good co-ordination and communications with the Dissemination and
Communications Divisions of Statistics Canada is a must. Periodic meetings to discuss
roles and responsibilities or any proposed changes in product line or standards for
publication are useful.

4.7 Post Release

The post release phase largely consists of follow up with partners in the production
process such as collection or methodology services or data suppliers, as well as following
the media coverage and requests from other users of the data.

This phase is particularly important for derived statistics programs which rely entirely on
others to supply their source data. The best practice here is regular and frequent meetings
to discuss emerging trends and/or inconsistencies which may signal changes in data
quality. This practice is followed regularly for some parts of the program and on a more
ad hoc basis for others. Given the interdependencies already noted above among the nine
programs studied, this practice should be formalized and regular.

Another interesting practice is the “Post Mortem Group” established by the Quarterly
Income and Expenditure Accounts Program including Balance of Payments and Monthly
GDP. A large group of users of the derived statistics programs are invited to a meeting
the day following the release. The programs give a briefing on the emerging economic
trends and the users pose various questions about the data and are encouraged to express
any concerns related to data quality. This function helps ensure that the data quality
concerns of the users are top of mind for the data producers as well and that the analytic
needs of the users are well understood. It is also a useful forum for Statistics Canada to
inform users of upcoming changes to the program including data quality improvements or
risks. It helps users make better informed decisions when using the data.

A similar best practice was found in the Labour Force Survey, which works closely with
the major media to ensure that the data are reported and interpreted correctly, even going
so far as to check the numbers that are published by (for example) the Canadian Press.

Recommendation 14: Mission critical programs should have explicit programs of
outreach and support to major users and the media as part of, and immediately
following, each release.

4.8 Other best practices

A key aspect to quality assurance is management of change. All on-going surveys and
programs undergo periodic review or re-design. In all of the cases studied, where the
introduction of change was well managed, it involved significant testing of the new
systems and overlap estimation periods to ensure smooth transition from the old to the
new estimates. It was clear that the best practice in management of change was building
these two key elements into the redesign plan.

Recommendation 15: Plans and budgets for redesigned programs should include
provisions for the thorough testing of new systems and a sufficiently long parallel run, so
that problems can be detected and corrected before switching to the new series.

5. Principal Risks

As noted in Section 2, one of the important conclusions that can be drawn from the
Quality Assurance Review is the fact that all the managers interviewed know about the
main risks that their program faces in the short and medium term. The review also
confirmed that the programs were well managed and that the program managers tried to
produce and publish the best detailed information while minimizing the risks to their
program. Nevertheless, some programs are more at risk than others. The Consumer Price
Index, the Balance of Payments, and the International Trade programs are overall more at
risk than others and should receive more immediate attention. At the other end of the
spectrum, the Labour Force Survey and the Quarterly Income and Expenditure Accounts
have a low overall risk. The remaining programs present moderate overall risks and
could be addressed through the adoption of best practices and specific measures directed
to the particular needs of the program.

Examining the programs in detail, the review teams categorized the risks in five

   i.   risks related to people (e.g. shortage of staff, expertise, succession issues);
  ii.   risks related to the timeframe (for certification, last minute changes, etc.);
 iii.   risks related to the systems (e.g. complexity, insufficient testing or
        documentation, etc.);
 iv.    risks related to the service areas (e.g. lack of control on turnover, reduction of
        services) and;
  v.     risks related to input data (e.g. administrative data).

Obviously not all risks have the same importance or the same impact for each program.
Some of the risks need more attention in the short term in order to reduce the potential of
errors. Others need to be addressed in the medium term as they do not immediately affect
the program, but could potentially increase the risk of errors if nothing is done. Finally,
there are those risks that can be managed and minimized on an ongoing basis.

What follows is a summary of the main type of risks identified during the review.

5.1 Risks related to people

As mentioned in Section 3, the most frequently mentioned risks were related to human
resources issues. Within this general category there are a number of sub-issues.

5.1.1 General staff shortage

One issue is simply an overall shortage of staff in the Agency. Program managers, with
the single exception of IEA, all seem to be short of sufficient people to do the work. As
one example, the MSM section of Manufacturing, Construction and Energy Division
nominally has ten employees: one ES-6 Chief, one ES-5 Head of analysis, two ES-4

analysts, three SI-03 or SI-04 subject matter specialists, an SI-5 Production Head, an SI-
03 production officer, and a CR-04 operations clerk. In actual fact, however, the ES-05 is
on full-time language training, one of the ES analysts is a recruit, who changes every
eight months, one of the three SI subject matter specialists is loaned to the Business
Condition Survey once every quarter, and the SI-05 Production Head position is vacant.

Recommendation 16: At the time of the drafting of this report, efforts were being made at
the Agency level to increase the intake of new employees. Nevertheless, it will take some
time to alleviate the pressure of staff shortage on programs. In the meantime, mission
critical programs should receive priority in staffing and prioritization of work should be
done in order to identify what activity can be expended in the short term.

5.1.2 Stress of monthly production programs

A second issue is the difficulty in finding people to work on monthly production
programs. Most of the monthly programs mentioned that they had difficulty in attracting
people to work on a monthly survey because of a perception that such positions are
grueling and have a poor work-life balance. This issue was particularly acute for the
LFS, GDP, IT and MRTS programs. In addition to the difficulty in attracting resources,
these programs also have difficulty in retaining their staff as there is a perception that
their effort is not recognized by the organization and that these employees do not perform
well in generic competitions because they have little time to prepare.

Recommendation 17: More recognition and resources should be given to the monthly
production programs. Recruits should be encouraged to spend some time in a monthly
production program to change the perception. The benefits of working on a regular
production survey should be better disseminated.

5.1.3 Shortage of specialized staff

A third issue is the shortage of specialized staff. While for most programs there is a good
balance in terms of the mix of people running the various steps of data production,
programs such as LP, GDP, BOP, IT and the CPI have a greater need for highly
specialized personnel. This is due in part to the complexity of the program and the type of
research and analysis necessary to certify the data. For these programs the certification
and validation often rely heavily on the expertise and experience of the analyst. In the
case of the BOP, there is a need to understand how transactions are made on the
international capital markets and who are the main actors involved. The review found that
there is relatively little documentation in BOP explaining how certification is done and
therefore the program relies mainly on the expertise of its specialized staff. This type of
staff is difficult to replace and often has to be trained on the job. The CPI and IT
programs are also confronted with similar issues as there are very few university
programs that provide index theory as a discipline and recruits have to be trained in-
house. In the case of the IT program, their prices program is dependent on one senior


In addition to the shortage of specialized staff, their number in the some of these
programs is simply not sufficient to fully research and keep abreast of what is happening
in the economy. In the case of the Labour Productivity program, the total number of
people dedicated to the program has decreased from 10 to 7, leaving the program without
any breathing space in case of illness or emergency. In the case of the BOP, the program
does not have the specialized resources to investigate increasingly complex transactions
on the international market. The same remarks apply to the CPI. The world has become
more complex and the demands on many programs such as the CPI, IT, BOP and LP
have increased substantially over time and there are not enough analysts to review the
concepts, methodologies and data sources and at the same time certify the monthly or
quarterly analysis. This situation could explain in part the delay in finding the error in the
Traveler Accommodation Price Index.

Recommendation 18: We recommend the expanded use of communities of practice, asset
qualifications and other mechanisms in our staffing processes in order to ensure an
adequate supply of specialists for programs such as the CPI, BOP, LP and IT.

5.1.4 Mix of people dedicated to production, analysis and certification

A fourth issue is related to the mix of people dedicated to production, analysis and
certification. The review teams found that the good quality assurance of a program
depends on the existence of a research and analytical capacity independent of the
production operations. People dedicated to production operations do not necessarily have
the preparation or the time to question the data they are processing. That function is better
served by an independent team of analysts with the mandate of challenging the numbers
and comparing them with other series, keeping abreast of the market intelligence and
understanding the research and policy questions related to the statistics produced. The
review has identified a variety of good models along these lines in IEA, LP and LFS.
However, it also found that there was little or no analytical capacity in the CPI and IT. In
the case of the CPI, the lack of comparable external indicators at the certification stage
prior to data release limits the scope for reasonableness checks. This is a case where
additional analytical capacity would allow more research on external sources and better
use of its own data. The same comments apply to the IT program where there is a need to
ensure that a thorough review of data for exports is done on a monthly basis.

This issue is addressed by Recommendation 4 above.

5.1.5 Succession planning

Each time a key resource leaves, there is a loss of knowledge. Given the shortage of
resources mentioned previously, it has become difficult for most programs to prepare

    As of the drafting of this report, an additional resource has been re-allocated to the prices program of IT.

succession plans. In addition, some programs experience difficulties in finding
experienced analysts, which puts them increasingly at risk. One example is the Chief
position in the GDP program that is in charge of a highly visible monthly program that
covers the whole economy. The experience that is needed for such a position can only be
acquired after many years of training and experience in the division.

Even if the issue of succession is generalized throughout the Agency the risk remains
moderate with the exception of three of the programs identified earlier (IT, BOP and
CPI), where the risk is higher.

Recommendation 19: In order to reduce risks associated with the departure of
experienced staff, each program should develop a specific succession plan for key
functions in the program. The plan should include training, rotation of staff within the
community of practices and some job shadowing when retirements can be identified.

5.2 Risks related to the timeframe of the program

5.2.1 Reduced time for certification and publication of the data

Another major issue encountered in all programs is the very short time frame available to
the staff to certify the data to be published. The time allocated to certify the data varies
from a couple of days in some programs to 5 days in others. In the case of the DCD
operations, the window is often 24 hours, which leaves very little time to correct
mistakes. With the increasing difficulties of getting responses from the respondent,
survey programs such as MSM and MRTS have sometimes extended their collection cut-
off dates in order to increase the rate of response, but at the expense of the time that is
allocated to the certification. This also has some repercussion on programs in the
National Accounts which depend on the information from the survey areas. In the case
of the IT program the pressure comes from the fact that they must align their publication
dates with their American counterpart. Last year the Census Bureau improved the
timeliness of its publication by six days. This has meant that there are six days less for
certification for the IT program, as the data coming from the Canadian Border Services
Agency (CBSA) does not come earlier.

There is a real trade-off between timeliness and the other dimensions of quality,
particularly accuracy. Statistics Canada’s main indicators are generally published a bit
later than those of some other OECD countries, but it publishes more detail and its
revisions are much smaller. The review team is of the opinion that this trade-off is a real
issue and that any further attempt to improve timeliness should give serious consideration
to the effect on risks to other dimensions of the quality of the estimate.

Recommendation 20: Under current conditions, proposals to further increase the
timeliness of the programs should be regarded with extreme caution, especially if there is
a risk that the time would be taken from the certification or data release steps.

5.2.1 Last minute changes

Another issue related to the timeframe in which programs operate is the frequency with
which programs have to make some last minute changes before publication. These
changes are often due to the late arrival of information from the survey or from external
input data files, corrections to the data or system problems. Most programs we reviewed
experienced some last minute changes on a fairly regular basis. Although the risk is
moderate for most programs, it is higher for the IT, CPI and MRTS programs. These last
minute changes have a particularly acute effect on the Dissemination and
Communications operations, due to the number of places in which changes are required
(The Daily, CANSIM, IMDB, publications, etc.)

While some last minute changes are unavoidable for the reasons given above, and while
programs are generally aware that last minute changes significantly increase the risk of
errors, we believe that many programs could do a better job of having contingency plans
in place to deal with the situation when it arises. This could include such things as contact
lists, a list of cutoff times, alternative dissemination plans, and so on. We do note that
Dissemination and Communications does produce a contact list for its various services as
part of its Business Continuity Plan, but there was some concern that the subject matter
divisions were not as well-trained as they should be on how to use it.

Recommendation 21: All programs should have contingency plans, developed in concert
with Dissemination and Communications, to deal with unplanned last minute changes.

5.3 Risks related to systems

5.3.1 Risks due to the complexity of systems

One issue that has surfaced in some programs is the complexity of the interactions
between systems. For the BOP, CPI and IT programs there are often more than two or
three systems that need to interact to produce the output of the program. This was not
created by design, but because of the evolution of the program towards more complex
operations as an answer to a more complex world.

Each time data moves from one platform to another there is a risk. For example,
Disseminations and Communications have identified some 19 different interdependencies
in their operations. An example of this is the fact that the text of The Daily is produced
by the team of the Communication Division while the graphics appearing in the same
Daily article are prepared by the Dissemination Division. When a last minute change
occurs, the only safeguard is the great care taken in the chain of communications. We do
note that Dissemination and Communications are aware of this issue and have begun to
document and control the interdependencies.

In addition to the multiple systems involved in the production of data, there is often only
a very small technical team that understands how they operate. For instance, there is only
a small group of staff in ITD with experience in using the FAME application. This

limited expertise restricts the division’s ability to manage the monthly production process
and to cope with unplanned events. The same comments apply to the MSM program
which depends on one person for knowledge of their systems and to SMART publishing
in DCD which depends on a handful of individuals.

Recommendation 22: Each program should review its interdependencies among systems
and prepare a plan with its associated costs to reduce it. It should be noted that the
present mainframe migration is also an occasion to reduce the interdependencies of

5.3.2 Risks due to manual processing

A related issue is the fact that in many programs there are still significant manual
operations. For instance, the CPI production requires considerable manual intervention
and is fairly paper-based. For the US exports, one employee makes all the updates
directly to the database using paper listings. Similarly, the main LFS numbers in
CANSIM are verified manually. These situations augment the chances of errors and limit
the ability to detect errors efficiently. Programs should be encouraged to examine their
manual operations and prepare proposals for converting these to automated operations
where feasible (see Recommendation 1 above).

5.3.3 Risks due to insufficient testing and documentation

Finally under the system related risks, the review team has identified cases where there is
insufficient testing and documentation. Proper testing of the Matrix Processing System
(MPS) which was used to calculate the Traveler Accommodation Price Index would had
avoided the incident with the CPI. Lack of documentation is also a risk factor when
programs are running their production on legacy systems. This is a higher risk for the IT
and the MSM programs.

By contrast, the LFS demonstrated some excellent testing procedures, including monthly
end-to-end testing, use of dummy data with known results, and deliberate redundancy in
software development (e.g., having two programmers code the same application and
compare the results).

Recommendation 23: Programs should ensure that they conduct thorough testing and
prepare complete systems documentation.

5.4 Risks related to service areas

The programs expressed some concerns related to their dependencies on service areas.
We do note that the review teams did not have sufficient time to explore these issues with
the service areas themselves.

One issue identified during the review was the lack of control of the programs over the
turnover that takes place in the service areas. There are two aspects to this issue.

The first one has to do with the rotation of systems staff or of methodologists. While
subject matter programs recognized the benefit of rotating staff, it must be done in an
orderly manner and should also take into account the particular HR situation of the
program. Given that there is always a learning curve for new systems people or
methodologists assigned to a monthly program, transitions should be carefully planned.
Given the staff shortage this is even more important.

The other aspect of the issue has to do with the fact that monthly programs never have a
large span of free time between production cycles and cannot go unattended. It is not rare
that the use of leave with income averaging in service divisions has some ripple effects
on the subject matter program. While reviewing the LFS, it was noted that during the
summer months it is difficult to obtain the same level of service required and this can put
the program at risk.

Recommendation 24: Service areas should ensure that better planning and
communication protocols are in place to avoid leaving programs short-staffed. Also it is
suggested to rotate leave periods throughout the summer.

5.5 Risks related to provision of input data from other parties

During the review, some programs indicated that they have less time for certification and
validation because they have to compensate for the reduction in service in OID and SOD.
For these programs and in particular for the MRTS and LFS, it meant that they had to
spend more time on the collection step of the process and less time on certification.
Similarly, the programs of the National Accounts have noticed that the survey programs
seem to have less time or no time at all to examine the data from a macro perspective
before sending it to the National Accounts. The impact of this is that the programs in the
Accounts have to spend more time certifying the input data and have less time to do the
coherence analysis and the balancing. None of this has been identified as a high risk, but
if it is not managed it could generate errors.

In the case where programs depend on external input data, there has also been some risk
associated with the lack of control on what is sent to the program. In the case of the IT
program, the dependency on external data sources is significant and changes in policies
and practices by the Canadian Border Services Agency (CBSA) could lead to data quality
problems. The error in 2004 with the International Trade figures occurred because of the
lack of control and monitoring of the input data provided by the CBSA. Similar situations
exist with the BOP program with regards to information on international transactions and
the MRTS program with regards to GST data from Revenue Canada.

Recommendation 25: Formal protocols should exist at a senior enough level to ensure
that data that are critical to programs are provided on a timely basis and that changes at
the source of the data are communicated.

6. Evaluation of the review process

In this section we present an evaluation of the review process itself (described in Section
2) and make recommendations for future reviews of this nature. This evaluation is based
on discussions with the review teams during the all-day debriefing in early February, a
short questionnaire completed by the reviewers, and our own impressions.

6.1 Steering Committee

The Steering Committee consisted of four Directors General, plus the head of the Quality
Secretariat in the Methodology Branch (an Assistant Director) as an assistant-coordinator.
The varied experiences brought to the project meshed well and made for an effective
team. In particular, the assistant-coordinator’s knowledge of quality assessments and
audits in other organizations proved very useful. His assistance in developing materials
and in being a day-to-day interface between the review teams and the Steering Committee
was especially valuable.

The Committee found that it was necessary to meet on a weekly basis throughout most of
the review period, in order to develop a plan, schedule, questionnaire, review templates
and other materials, as well as to monitor the work of the review teams. The Steering
Committee members were also called on to give numerous presentations and progress
reports on the project, both internally and to external advisory groups. If such a review is
conducted in future based on the same materials, such frequent meetings may not be

Having a Steering Committee member on each of the review teams, while time-
consuming, was beneficial in three ways. First, it helped to standardize the process, since
Steering Committee members could give guidance to the review teams. Second, it gave
the review teams increased legitimacy with the programs by having someone at a senior
level present for the interviews. Third, it gave direct exposure and feedback on the
programs to the Steering Committee members. This is an important aspect of the
approach that should be retained.

6.1 Review teams

The ten review team members consisted primarily of Assistant Directors. Given the short
amount of time available to organize the review, most of the reviewers were identified
and “volunteered” by members of the Steering Committee. Despite this, all of the
reviewers were enthusiastic and quickly dedicated themselves to the project. They clearly
recognized the importance of the project to the organization and were interested in
participating, even though they had to fit this work into their already busy schedules.
They expected that this would take a considerable amount of time and they were correct:
by the end of the process they estimated that they had each spent eight full working days
on the project for which they were the lead reviewer. When one adds the time spent in
meetings and reviewing reports for the other two reviews in which they participated,
preparation for and participation in the one-day debriefing session, and subsequent

editing of the reports, it is not unreasonable to assume that the average commitment
approached the equivalent of three weeks full-time.

Despite some initial uncertainty, the review process – essentially peer review teams
conducting a structured exploration of a program – was an amazing learning experience.
The reviewers had the breadth of experience and depth of knowledge to effectively
engage program managers. The fact that the review teams were not immersed in the
programs in question meant they had to ask questions as “outsiders”. The process was
iterative and structured, but with enough flexibility to make it meaningful across a
diverse set of programs.

Having a fairly large number of interviewers, each working on separate teams, created an
atmosphere of learning together and making adjustments “as we go” that offset most of
the drawbacks of conducting the reviews simultaneously It alleviated the pressure on any
one individual reviewer and provided a sense of not being alone on this important project.

The review teams were interpenetrating – a feature designed for expediency that turned
into an enormous benefit. The fact that each reviewer saw three programs and that all
reviewers periodically sat around the same table was tremendously useful in dealing with
difficult issues and potential unevenness in evaluation standards. It also meant that
reviewers were learning about best practices and risks in other programs. At the same
time, by assigning a “lead reviewer” for each program, it was clear who was responsible
for preparing the final report on each program.

The Steering Committee and review team members met on a monthly basis to report on
progress, share experiences and receive further instructions as needed. The reviewers
found these meetings quite useful and of appropriate frequency. They particularly valued
the direct communication with the Steering Committee members, either at these monthly
meetings, the interviews with programs, or on more informal occasions.

Finally, all reviewers were very happy to have participated in this project (see Table 1).
All of the reviewers found the project interesting and a good learning experience. Despite
the extra workload, almost all were interested in repeating the experience (!) and nine out
of ten highly recommended such a project to colleagues. The reviewers also felt that this
type of evaluation should be applied to other programs.

Table 1: Responses of the ten reviewers to an evaluation questionnaire

Aspect of project                        Not at all   A little     A lot         Totally
Project was interesting                  0            0            5             5
Project was good learning experience     0            0            4             6
Have a better knowledge of risks that    0            1            6             3
can affect quality
Glad to have participated                0            0            4             6
Interested again in a year or two        0            2            6             2
Would recommend a similar project        0            1            7             2

to colleagues
Should apply similar evaluation to        0            1             3             6
other programs

Based on these results, we highly recommend this type of project as an excellent learning
experience for the Assistant Director level (see Recommendation 8). Statistics Canada
could build on this exercise by recognizing the intense interest of the Assistant Director
level in comparing QA practices across programs, with the intention of
importing/exporting them. Although this was an ad hoc exercise, it proved beyond a
doubt that these managers recognize the benefits of this type of peer review process and
will willingly engage in it.

6.3 Program staff

The managers and senior staff of the programs under review also found the review useful.
It was an opportunity to take stock and identify real risks. Some long-standing practices
received kudos; others areas of their operation were under a different lens. Overall, the
programs were very cooperative and took the review seriously. All ten reviewers rated
the degree of cooperation as either “a lot” or “totally.”

 A first meeting with the Director of the program, the lead interviewer and the assistant-
coordinator took place in the last half of October to introduce the review process to the
program and to introduce the program to the lead interviewer. The cooperation was such
that the discussion often started getting into the actual review and had to be cut off until
the full review team was available.

We found that the amount of preparation by the programs did vary. Some programs came
very well prepared, with well documented processes, while others were more reactive.
Having the initial meeting with the programs was very useful in equalizing this by letting
programs know what was expected of them. For future reviews of this type, it would be
useful to spend more time in defining what is expected from programs in terms of
documentation. Examples from the individual program reports can serve as a useful guide
in this respect.

The initial meeting was also very valuable in collecting enough information about the
processes that we were able to categorize the activities for all programs into the seven
standard steps listed in Section 2. This served as an excellent way of standardizing the
remaining steps in the review (e.g., interviews, report writing) and could probably be
applied to future reviews without change.

One of the limitations of the exercise was that the review teams did not have time to
speak to the various service areas involved in supporting these programs. The conclusions
of the reports are based primarily on information and documentation obtained during a
series of interviews with the management of the programs. The one exception is the
support provided by the Dissemination Division and Communications and Library
Services Division. Their role was important enough that it was felt vital to interview

them. In future, if time permits, the interviews for a specific program should extend to the
other service areas that support the programs, such as Methodology, Systems
Development, Business Register and Collection.

Recommendation 26: Future reviews should include direct interviews with all significant
service areas, time permitting.

6.4 Schedule

To plan the project, the Steering Committee developed a simple schedule of activities
(see Table 2). With the exception of the interview phase of the project, the project was
generally successful in sticking to the schedule. The interview stage took about four
weeks longer than initially thought, and a one-month extension for the project had to be
sought from Policy Committee.

                                Table 2: Planned and actual schedule

Milestone                                                       Planned date            Actual date
Presentation of initial plan to Policy Committee                Sep 15                  Sep 15
Recruitment and briefing session for reviewers                  Oct 11                  Oct 11
Questionnaire design completed                                  Oct 18                  Oct 25
Interviews with program managers completed                      Nov 30                  Dec 223
Assessment reports drafted and vetted with                      Dec 22                  Feb 7
Debriefing session with reviewers                               Early Jan               Feb 7
Summary report,                                                 End Jan                 Feb 28
Presentation to Policy Committee                                                        Mar 214

There were two reasons why the interviews took longer than expected. First, it took more
interview meetings than expected. On average, each review took 3.2 meetings (not
counting the initial meeting) and the total average time per review was 6.1 hours, rather
than one meeting of two hours as planned.

The second reason was that the programs under review are monthly or quarterly
production operations and therefore had certain periods when they were simply not
available to meet with the review teams. Seven of the ten reviewers reported difficulties
in scheduling meetings around the programs’ production schedules.

The interviews also ran into the pre-Christmas holiday period and in a few cases into
January. The majority of reviewers mentioned that the holiday break hindered the
momentum of the review and a few felt that it had a serious negative effect on the
schedule. Overall, half of the reviewers felt that the project deadlines were at least

    In a small number of cases the interviews went into early January
    Presentation to Policy Committee was constrained by its availability and could have been a week earlier.

somewhat unreasonable.

As well as the interviews, the process of finalization of the reports also took somewhat
longer than planned. There was no explicit step built into the schedule for the Steering
Committee members to review and provide comments on the reports, as there should
have been. When the reports were reviewed by the Steering Committee members, we
realized that some of the reports needed to be reworked. In some cases too much focus
was put on one aspect of the program, and in other cases the authors had not provided the
necessary arguments to support their conclusions and recommendations. In future
reviews, approximately three weeks should be built into the schedule for such a review.
Now that we have some good examples of reports, we can be better prepared to specify
what is wanted and provide examples.

Table 3 shows a breakdown of the average number of hours spent by each reviewer on
their program. The time estimates are approximate based on self-reporting by the
reviewers, so they should probably be interpreted only in relative terms. Somewhat
surprisingly, additional research on the program took almost twice as long as the
interviews themselves; the lead reviewers also spent an average of almost 11 hours
researching and consulting other materials to complete their reports. The most relevant
documents were those provided by the program, the integrated program reports, the
Integrated Metadata Base record and official releases by the program. They also had to
contact the programs on occasion to clarify points discussed in the interviews. As
expected, drafting of the report was the most time-consuming activity.

     Table 3: Time taken for selected major activities (hours per lead reviewer)

Activity                                                              Hours
Interviews with programs (excluding introductory meeting)             6.1
Conducting additional research on the program                         10.9
Writing first draft of report                                         21.8
Vetting and finalization of report                                    4.6
Review and commenting on other reports                                6.9

6.5 Documents developed for the review

The Steering Committee developed a number of documents for the review process. The
purpose of these was to standardize the review process as much as possible across the
nine programs plus the DCD program, while maintaining sufficient flexibility for the
review teams to adapt their review to the specifics of the programs.

Aside from the schedule (described in Section 6.4), the main materials were an initial
briefing to the review teams, the questionnaire, the report template, and an agenda and
instructions for the one-day debriefing session. The briefing seemed to work well, as nine
out of ten of the reviewers found the mandate clear. Our impression is that the one-day
debriefing session also went well and was appreciated as a good way to “wrap-up” the
reviewers’ involvement.

The questionnaire (see Appendix 2) was designed as a semi-structured document that was
designed to suggest lines of questioning or prompts during the interviews. It was
generally successful, although it became clear during the interviews that it contained
much more detail than could be covered in a few meetings. The reviewers thus had to do
considerable additional research to complete their reports. Nevertheless, all of the
reviewers reported that the materials provided were “a lot” or “totally” useful, and nine of
them said the same about the materials being sufficient.

The report template (see Appendix 3) was designed to standardize the format of the
reports. It provided a general outline, but did not specify particular charts or tables that
should be included. While several of the reports did follow the format, others did not. The
template came somewhat late in the process, after some of the reviewers may have
already started to formulate the outline of their reports. In other cases (e.g., DCD) the
content simply did not lend itself to the format we had developed.

A number of the reviewers came up with excellent ways of summarizing information in
the form of graphics and summary tables. We would recommend that future reviews
examine these reports and try to develop a more detailed template that would improve the
standardization of the reports.

6.6 Summary

In our view there were three key factors that resulted in the project being a success. The
most important of these was having a knowledgeable, experienced and dedicated group of
reviewers at our disposal. We highly recommend that the Assistant Director level be used
for future reviews of this type. The second factor was the excellent cooperation from the
programs under review, who viewed this as an opportunity to learn. The third key factor
was good communications among all the players: the Steering Committee, the review
teams, the programs, senior management, and advisory groups.

7. Future considerations

In this section we discuss some potential next steps following the review.

7.1 Dissemination of the results

Within Statistics Canada, the divisions directly involved in the QA review are fully aware
of the results pertaining to their own program, but not necessarily of those of other
programs. Also, many divisions not directly involved in the QA Review are very keen to
learn about the best practices – and the indicators of risk – that have emerged. It is thus
very important to ensure that the program-specific reports, as well as the Summary
Report are readily accessible within the Agency. This can best be achieved by translating
the reports and putting them on the Internal Communications Network (ICN). The status
of the program-specific reports is internal working document available upon request.

Recommendation 27: The Summary Report and the individual program reports should be
made available on the ICN.

Secondly, the review teams have amassed a variety of production reports, checklists and
process flow diagrams that should be credited as best practices and should be accessible
to other managers. Some but not all of these are suitable for ICN. Although the solution is
not readily apparent, some means for sharing these reports – which in many instances
would not be meaningful to another program area without help in their interpretation – is
worth consideration. One option is a course where managers with high-quality production
reports are invited into a classroom setting to “walk through” the report. Another might
be to ask program areas to give seminars on their production processes, pointing out both
the positive and negative aspects.

Third, the program-specific reports (and indeed the Summary Report) will soon become
dated. A mechanism needs to be established to monitor progress on the recommendations
made in these reports.

Recommendation 28: The nine programs should report on activities or modifications
related to the QA Review in their next integrated program report (IPR). This will ensure
that any changes since the QA Review are documented and publicly available.

Fourth, some external advisory groups, including the National Statistics Council, have
shown considerable interest in the review and have suggested that some external review
would worthwhile as well.

Recommendation 29: Statistics Canada should present the results of the review to the
National Statistics Council and other external advisory committees for their review.

7.2 Suggestions for improvements to the review process in future

The Steering Committee believes firmly in the value of conducting similar reviews in the
future. The experience is very positive and valuable to all participants. How could the
review process be improved, based on this first experience?

First, while the current human resource environment is particularly stressful, it is likely
that “human resources” will always emerge as a major risk factor in assessments of this
type. It would be useful to develop an evaluation framework that helps reviewers to probe
and pinpoint more specifically what the problems and remedies are. For example, staff
shortages due to blockages in staffing processes require different solutions from an
abundance of staff without the proper skills. Fundamentally, we need to push the analysis
of human resource issues further. It needs structure and finer granularity to ensure that
responses will actually solve the problem.

Second, we provide a cautionary note to not over-structure or over-complicate the
process. The lack of time forced us into certain expediencies that, in the end, probably

made the process better. The amount of latitude in the interview guide is one example.
The emphasis on review teams meetings frequently is another.

7.3 Recommendation for an ongoing program of quality reviews

We recommend an on-going program of quality reviews (Recommendation 8), with
results reported to Policy Committee. The quality reviews would best be imbedded in the
IPR; this would promote a careful weighing of the benefits of improving the quality of
what we already do versus doing new things (or doing what we do faster). Any
shortcomings identified in quality reviews should trigger mitigation measures to
strengthen the program in question. Unlike other parts of the IPR process, however, the
quality reviews would not be conducted by the program itself.

An ongoing quality review program could be based very much on the model used in this
exercise, with a Committee of Directors General to oversee the reviews, together with
“volunteer” Assistant Directors interested in taking part in the exercise. The approach
used in this exercise worked very well and has proven its value in contributing to the
quality assurance practices of the Agency.

Appendix 1
              Interviewer Assignments for Review of Nine Programs

    Program         Program       Lead        Member       Member        Steering
                    Manager    Interviewer      #2           #3        Committee
Consumer Price      George     Arthur        Richard      Lenka        Claude
Index               Beelen     Berger        Dupuy        Mach         Julien
Labour Force        Peter      Joe           Geoff        Tarek        Karen
Survey              Morrison   Wilkinson     Bowlby       Harchaoui    Wilson
Monthly GDP         Michel     Geoff         Arhur        Kevin        Maryanne
                    Girard     Bowlby        Berger       Roberts      Webber
Retail Trade        Richard    Lenka Mach    Tarek        Martin St-   Karen
Survey              Evans                    Harchaoui    Yves         Wilson
Monthly Survey      Andy       Johane        Kevin        Arthur       Louis Marc
of Manufacturing    Kohut      Dufour        Roberts      Berger       Ducharme
International       Craig      Kevin         Johane       Joe          Don Royce
Trade               Kuntz      Roberts       Dufour       Wilkinson
Quarterly Income    Roger      Tarek         Joe          Richard      Claude
and Expenditure     Jullion    Harchaoui     Wilkinson    Dupuy        Julien
Quarterly Balance   Art        Martin St-    Lenka        Geoff        Louis Marc
of Payments         Ridgeway   Yves          Mach         Bowlby       Ducharme
Labour              John       Richard       Martin St-   Johane       Maryanne
Productivity        Baldwin    Dupuy         Yves         Dufour       Webber
Dissemination       Vicki      France                                  Don Royce
Division and        Crompton   Corriveau                               Karen
Communications      Louis                                              Wilson
and Library         Boucher                                            Louis Marc
Services Division   François                                           Ducharme
                    Bordé                                              Maryanne

                               Appendix 2: Questionnaire

                         Review of Quality Assurance Practices

                       Program: __________________________


This review is being carried out at the request of Policy Committee. It is aimed at
assessing the quality assurance practices in the production of nine key economic
indicators. The focus of the review is on the measures that programs take to prevent
erroneous data being released to the public (i.e. to assure the accuracy of its data). It is
not a review of the overall design of the program, as this is assumed to be sound. Nor is it
a review of the other dimensions of quality (relevance, timeliness, coherence,
interpretability and accessibility), unless these happen to affect the risk of producing
inaccurate results.

The results will be used to identify areas where action is needed to reduce the risk of
erroneous data being released, and to identify “best practices” that should be promoted to
other programs in Statistics Canada. The review should cover at least the last two years,
but could go beyond that period to highlight particularly good practices or address
particularly problematic situations that are still affecting the Program or have recently
been resolved.

For the purpose of this review, the “Program” is the organisation responsible for
designing, implementing, executing and assessing a series of steps towards the release of
data to the public. The Program includes all divisions or agencies that are involved in the
various steps on a regular basis (e.g., Methodology, Survey Operations, Dissemination,

This form serves as a guide to a structured interview that will be conducted for each
program. It contains seven sections or topic areas:
    In Section 1, you will briefly describe the production cycle of the program by
       listing the key steps. This will provide important background information to the
       review team, as well as to provide a point of reference for the other questions.
    In Section 2, you will describe the use and effectiveness of checks and indicators
       in the steps leading up to the point where the data to be released are available for
    In Section 3, you will briefly describe the validation/confrontation/certification
       step and its effectiveness at detecting suspicious or potentially erroneous data
       prior to data release.
    In Section 4, you will briefly describe the data release step and the checks applied
       to assure the release of accurate data.
    In Section 5, you will describe incidents where erroneous data may have been
       detected after data release.

      In Section 6, you will briefly describe the reasons or factors that are causing
       checks to fail and the actions taken to deal with these factors, to reduce the
       likelihood of check failures or to mitigate the impact of errors.
      In Section, 7, you will briefly describe how changes introduced to the production
       cycle are managed.

Most of the information will be collected in an interview between the person(s)
responsible for the Program and the review team. We expect that one or two 2-hour
interviews will be required to get your answers on the questions provided on this form.

The questions ask you to describe certain aspects of the production cycle. They are
followed by suggested sub-questions for you to consider in preparing your response. You
do not have to prepare an answer to each sub-question, nor limit your answers to them.
The goal is to understand what quality assurance practices are currently in place, to assess
how effective they are in preventing mistakes, and to generate ideas for improving how
Statistics Canada manages quality. Furthermore, the review team welcomes comments or
observations on this topic that are not covered by the questionnaire. The objective is to
learn from this review what can be done to reduce our risk of publishing erroneous

Section 1: Production steps and process flows

Question 1.1
Briefly describe the production cycle by listing the main steps of the Program leading to
the release of data. For the purpose of this review, a step is a set of methods, systems and
operations that transforms inputs, from a previous step or from an external source, into
outputs for a subsequent step. For example, the usual steps of a survey statistic program
are: frame creation, sample selection, data collection, etc. A step is not an individual
operation, e.g. data collection is a step and mailing out questionnaires is an operation
within that step.

This information will be collected during a short meeting with the person(s) responsible
for the Program, the head of the review team and the review coordinator. The purpose of
the meeting is to avoid having too many steps and to ensure that the review team has a
good understanding of what they are.

You can list the number of steps that you see fit, as long as they fully cover the whole
production cycle. You do not have to describe the steps in much detail.

Step 1.

Step 2.

Step 3.

Step 4.

Step 5.

Step 6.

Step 7.

Step 8.

Step 9.

Step 10.

Step 11.

Step 12.

Step 13.

Step 14.

Step 15.

Question 1.2 Describe the existence and use of documentation about the production cycle
or process flow.
     Is the production process or are the production steps documented?
     What kind of flowchart or other documentation does the program have that
       describes the production process? How detailed is it?
     Who uses it and how (e.g., training new staff, checklist for production, etc)?
     What are the responsibilities of other divisions or agencies that are regularly
       involved in the production steps? Are responsibilities clear to everyone?
     What kinds of flowcharts or other documentation do these divisions or agencies
       have for their parts of the step? If none or don’t know, why (e.g., no control over
       them, have never asked)?

Section 2: Checks, indicators and risk factors

The use of checks5 and indicators6 to ensure that steps are correctly carried out are an
integral part of any production cycle. They include information on the execution of the
step (e.g. checklist of operations, output of expected number of records, etc.), as well as
the outcome of the step (i.e. the quality of the output). This section reviews how checks
are used, applied and maintained, how effective they are at flagging potential errors, how
the Program reacts when they do, and the risk factors that affect the likelihood that the
checks fail.

For the purpose of this review, the responses to this section should cover all steps leading
up to the production of data ready to be certified. The
validation/confrontation/certification and data release steps are reviewed in more detail in
sections 3 and 4.

Question 2.1
Describe the use of checks or checklists during production.
    How are handoffs of data from one division or agency to the next one monitored
       and controlled? What checks of the data are done at these points?
    What types of checklists exist of things to be done or checked at each stage of the
    Who uses the checklist?
    How complete is the checklist? Does it cover all steps or just certain ones?
    What kind of review is done to make sure the checklist stays complete and up to
       date? How are new checks added or old ones deleted?
    Does the checklist cover all divisions or agencies involved in the production step?
       (If not, which ones are excluded and why?)
    How do you ensure that all checks are done each month/quarter?
    Who is involved in making sure the checks are done?
    Are all checks always done? If some checks are not always done, what are the
       reasons (e.g., not enough resources, deadlines too tight, etc)?

Question 2.2
Describe the effectiveness of the checks at detecting potentially erroneous data.
    Do the checks ever fail (i.e. identify potential errors)?
    What happens when a check identifies a potential error? How and to whom is this
    Who is involved in confirming the presence of errors and the reasons causing
       them? Are the outcomes of this investigation and any required resolution
       documented? If not, why not?
  Checks are verifications that are done at key points within a step or between steps. The outcome of the
verification is usually pass or fail. Failure of the check requires an action which could go as far as halting
the production flow.
  Indicators are measures taken that at various steps to monitor the status of production in terms of quality
(accuracy and timeliness) and costs. They provide readings, as frequent as on a daily basis, to assess
whether production is progressing as planned. Only in extreme cases do they signal a need for action. They
are usually analyzed at the end of the production as input to the next cycle.

Question 2.3
Describe the use of indicators in the production cycle.
    Is there regular monitoring of quality indicators such as response rates, slippage
       rates, imputation rates, CVs, etc.
    Is there a regular review or “post mortem” of how the production went each
       month/quarter? If not, why not? If so, who is involved and how useful is it?

Section 3: Validation/confrontation/certification step

The validation/confrontation/certification step7 is the last step where erroneous data can
be detected and corrected prior to data release. It is, by itself, the ultimate check applied
to the data prior to release. This section reviews this important step in more detail. It asks
you to describe the step, its effectiveness at detecting potentially erroneous data and the
risks that may prevent it from detecting erroneous data.

Question 3.1
Describe the methods and operations of the validation/confrontation/certificationstep in
the Program.
     What data confrontation is done (e.g., with other programs, with other iterations
       of the same program, administrative data, with external events, etc.)? If not, why
       not (e.g. no other sources, no time, etc.)?
     What methods and/or tools are used in data confrontation (e.g., time series,
       graphical, coherence with other datasets)
     How is internal coherence of the data checked (e.g., calculating financial ratios,
     Are large or influential (micro-level) units or (macro-level) estimates examined
       individually? If so what kinds of checks are done?
     What data quality indicators are examined (e.g., response rates, imputation rates,
       CVs, etc). What do you look for?
     What other regular checks of the data are done?
     How is the certification/validation included in the process? Is it done at a variety
       of places in the steps or is it done at the end? If the process becomes time
       constrained is the certification activity a place where time can be squeezed or can
       this step be skipped to expedite the process?
     How is it determined whether the data pass or fail the certification?
     Are the operations documented or are there written procedures?
     Are data reviewed by Statistics Canada experts outside the program (e.g., SNA
     Are data reviewed by experts outside Statistics Canada, i.e. under a “work in
       progress” arrangement? If so, how useful is this in catching errors?

Question 3.2
Describe the effectiveness of the validation/confrontation/certification step at detecting
potentially erroneous data and, in the case of a confirmed serious error, preventing any
embarrassing situations (i.e. near misses).
    Has the validation/confrontation/certification step detected potentially erroneous
       data in recent years? How often? More often in recent years? Less often?
    What happens when a validation/confrontation/certification check detects
       potentially erroneous data? How and to whom is this communicated?
    Who is involved determining the presence of erroneous data and the reasons
       causing them? Are the outcomes of this investigation and any required resolution

 This is the step where estimates are checked for internal coherence (e.g. reasonable ratios or trends),
external coherence (e.g. confronted to other sources or events) and temporal coherence (e.g. trend analysis).

    documented? If not, why not?
   Has the validation/confrontation/certification step detected serious errors which,
    had they not been caught, could have been embarrassing? What would have been
    the consequences had they not been caught?
   What were the reasons for these near misses? Has this happened often? Are there
    any common features causing this to happen at this late point in the production

Section 4: Data Release step

This ultimate step includes the preparation of documentation for Policy Committee, the
transmission of data and accompanying material to the Dissemination and
Communication divisions, to put the data on CANSIM and to release the data in The

In preparing your responses, refer to the concepts defined in the Introduction, definitions
and instructions section and consider the suggested aspects, as well as any other aspects
that are relevant to ensuring the accuracy of the data released.

Question 4.1
Describe how the certified data is presented to Policy Committee.
    What documentation is produced for Policy Committee review?
    Are there guidelines or a template for what is produced for Policy Committee
       review each month/quarter?
    Who presents to Policy Committee and what is their subject-matter expertise?

Question 4.2
Describe how data and accompanying material is prepared and transmitted to
Dissemination and Communication divisions.
    Who prepares the text, tables and graphics for The Daily?
    Are there standards or guidelines (specific to the program) for what goes into The
       Daily? Please describe.
    What checks are made that the text, tables and graphics for The Daily are correct
       and have the right data?
    How are these checks done (e.g., automated, manual)?
    Who makes these checks?
    What checks are made that the data are loaded correctly onto CANSIM?
    How are these checks done (e.g., automated, manual)?
    Who makes these checks?
    Is this process documented?
    Are changes in methodology communicated to users effectively to ensure that
       there are no misconceptions which could be perceived as errors or large revisions?

Question 4.3
Describe the effectiveness of the checks at detecting any transmission errors or other
release of erroneous data and, in the case of a confirmed serious error, preventing any
embarrassing situations (i.e. near misses).
     Have the data release checks failed in recent years (i.e. identified a potential error
        in transmitting the data)? How often? More often in recent years? Less often?
     What happens when a check identifies a potential error? How and to whom is this
     Who is involved in determining the presence of an error and the reasons causing
        it? Are the outcomes of this investigation and any required resolution
        documented? If not, why not?

   Have the data release checks detected serious errors which, had they not been
    caught, could have been embarrassing? What would have been the consequences
    had they not been caught?
   What were the reasons for these near misses? Has this happened often? Are there
    any common features causing this to happen at this late point in the production

Section 5: Release of erroneous data

In spite of the numerous checks and indicators used in the production cycle, there have
been incidents when erroneous data were released or data were not released at the right
time. This section asks about such incidents in the program.

Question 5.1
Describe situations where suspicious or potentially erroneous data was detected after
data release (i.e. possible misses).
    Has the accuracy of the Program’s data been questioned or challenged by external
       users in recent years? Who is involved in investigating these challenges?
    Has the release of erroneous data (i.e. misses) ever been confirmed in recent
       years? Were the errors serious or significant?
    Who detected the erroneous data?
    Who was informed, internally and externally, about the release of erroneous data?
    Were the erroneous data corrected? How?
    How frequently has erroneous data been released?
    Are these incidents documented? How?

Question 5.2 Describe why erroneous data were released.
   Is it due to inadequate checking prior to validation/confrontation/certification (e.g.
      insufficient checks, ineffective checks or the incomplete application of checks)?
   Is it due to other factors prior to validation/confrontation/certification (e.g. factors
      related to financial resources, time, staff, methods, systems, operations or quality
      of input data)? Which ones? Are there any common factors?
   Why were the erroneous data not caught in validation/confrontation/certification?
      Is it due to inadequate checking? Other factors?
   Why were the erroneous data not caught in data release? Is it due to inadequate
      checking? Other factors?
   What follow-up actions were taken to prevent a re-occurrence (e.g., training, new
      checks, systems changes)?

Section 6: Factors causing checks to fail

As mentioned in Section 2, the use of checks and indicators is an integral part of any
production cycle. It is normal that they fail on certain occasions, otherwise they wouldn’t
be necessary. This section investigates why checks fail and the measures taken by the
Program to deal with the underlying factors or reasons. The review is particularly
interested in identifying good practices that have significantly reduced (or even
eliminated) the risk of errors, or kept the risk of errors under control in spite of the
presence of risk factors. The review is also interested in identifying issues where risk of
errors have appeared or have increased significantly in recent years.

In preparing your response consider the following factors, or any other, that could have
an impact on the likelihood of errors in the production cycle.
     Financial: insufficient budget, too much change (lack of stability), diminishing
     Time: lack of time to check, to test changes, to correct the sources of errors (root
       causes), diminishing deadlines
     Staff: lack of staff, lack of experience, lack of knowledge of subject-matter, lack
       of knowledge in quality assurance practices, lack of backup for key staff, too
       much turnover
     Methods: too complex, numerous minor changes, major change(s), not well
       understood, lack of documentation, not enough methodology (too much left to
     Systems: too old (subject to strains or crashes), new (lack of testing), too
       complex, numerous minor changes, major change(s), lack of documentation, lack
       of support, inflexible / unable to change easily
     Operations: too complex (numerous steps), lack of automation (numerous
       manual interventions), numerous minor changes, major changes
     Data: poor accuracy of input data (requires a lot of clean-up), poor timeliness of
       input data (failed deadlines), variable quality of input data (problems keep
       changing), complex flow of data between steps (e.g. different platforms,
       numerous file transfers, etc.)
     External: changing user needs, more complex user needs, more
       knowledgeable/demanding users, changing environment, more complex
       environments, less predictable environment, new phenomena, etc.

Question 6.1 Describe where and why new checks have been introduced in recent years.
   At which step(s)?
   What were the factors causing the errors and raising the need for new checks?
   What other measures have been put in place to deal with these factors, to reduce
      the likelihood of errors or to mitigate the impact of errors?

Question 6.2 Describe where and why the incidence of errors has significantly increased
in recent years.
     At what step(s)?
     What factors are causing the errors?

      What measures have been put in place to deal with these factors, to reduce the
       likelihood of errors or to mitigate the impact of errors?

Question 6.3 Describe where, why and how the incidence of errors has significantly
decrease or been eliminated in recent years.
    At what step?
    What were the factors causing the errors?
    What measures were put in place to deal with these factors or to reduce the
       likelihood of errors?

Question 6.4 Describe where, why and how the incidence of errors has remained under
control in spite of the challenges brought on by certain factors in recent years.
    At what step(s)?
    What were the factors putting additional pressure on the Program?
    What measures have been put in place to deal with these factors, to reduce the
        likelihood of errors or to mitigate the impact of errors?

Section 7: Management of change

A common factor that can affect the quality of data released is change. Changes in
resources, deadlines or personnel, minor changes in methods, systems or operations
needed to support the program, or major redesigns in methods or systems increase the
likelihood of releasing erroneous data. This last section investigates this specific risk
factor, how it has affected the Program in recent years and how the program has managed

Question 7.1
Describe the changes that affect the likelihood of releasing erroneous data.
    What kinds of changes come from external sources (e.g., changes of source data,
       user demands, imposed changes of systems, budget cuts, tighter deadlines on the
       program, etc)?
    What kinds of changes come from internal sources (e.g., staff turnover, need to
       improve methods, operations or systems, opportunities to improve)?
    How frequently does change happen? How much of this is major change and how
       much is minor?
    Have there been any major changes recently in methods, operations, personnel, or
       other factors?

Question 7.2
Describe how these changes are managed.
    Who can make changes to production systems? Is authorization required?
    How are changes initiated? How formal is the change initiation step (e.g., formal
       change requests, tracking system)?
    How are changes tested before being introduced into production? (e.g., not at all,
       parallel run, etc)?
    Who checks that the changes have been made correctly (e.g., same person that
       made the changes, someone else)?
    How are changes documented?

                          Appendix 3: Template for Reports

1.     The Quality Review Process

Provide an overview of the review process: why it was carried out, what was the
approach, what was the scope. This text would be the same for each report and be written
by the Steering Committee. Note: given the importance of this report and its briefness, an
executive summary is not needed.

Suggested length: 0.5 to 1.0 page

2.     The Program

Briefly describe what the program is about, what makes it a key indicator. Probably
extract something from the IMDB.

Suggested length: 0.5 page

3.     Quality Assurance Practices

One subsection for each of the basic step of the production process:
           Preparation of certification
           Data collection
           Editing and transformation
           Imputation and estimation
           Certification
           Release
           Post-release

In each subsection (i.e. for each basic step):
     Describe the step, be very brief on the typical aspects, highlight the features
       needed to understand and appreciate the best practices or issues described later
     Checks in the execution of the program
           o Are they present?
           o Are they sufficient?
           o Are they applied systematically at every occasion?
           o Are they effective?
           o What are the particularly good practices worth sharing with other
               programs, i.e. practices that are not present in most programs?
     Risk factors
           o Which risk factors are particularly present at this step?
           o Which risk factors are particularly well managed by the program? How?
               (highlight the good practices)
           o Which ones actually contributed to near-misses or misses? What did the
               program do about it?

            o Which ones are still putting the program at risk of having near-misses or
              misses? What is the program doing about it?
        Recommendations
            o Should the program improve its checks? How?
            o Can the program further prevent near-misses or misses? How?

    The text in this section could become a bit repetitive, especially if the same risk
       factor(s) is (are) present and problematic at different steps.

Suggested length: 8-12 pages

4.       Summary

Is the overall production process typical? Is it similar to other surveys of the same type,
i.e. survey statistics program or derived statistics program? Are there notable differences?
And do these differences reduce or augment the risks of errors?

What are the strengths of the program in terms of assuring the quality of its production
process? What are the major risk factors actually or potentially putting the program at
risk of having near-misses and misses? Do some of these risks cut across several steps of
the program?

What are the main weaknesses of the program in terms of assuring the quality of its
production process? What should the program do to reduce the risk of preventing errors
and detecting them earlier in the production cycle, thus further avoiding near-misses and

Suggested length: 2-4 pages

5.       Other considerations

This section is optional. The review team could use it to report on aspects that may not be
in-scope for this review, but of pivotal importance to assuring the quality of our products,
e.g. design issues.

Suggested length: 0 to 1 page

6.       Appendices

Any document that the review team and the program team wish to include in support of
good practices or issues related to the scope of the review. We do not expect the program
team to produce any documentation for the sole purpose of this review.

Suggested length: none


To top