Docstoc

New Policy and Procedures Governing the Release of Microdata

Document Sample
New Policy and Procedures Governing the Release of Microdata Powered By Docstoc
					            New Policy and Procedures Governing the Release of
                Microdata Derived from ONS Social Surveys

    Carole Abrahams and Kieron Mahony

    Abstract


    Microdata derived from Social Surveys provide a valuable research tool for a wide range of users
    and uses. The Statistics and Registration Service Act 2007, which fully came into force on 1 April
    2008, includes provisions that control the release of microdata. Data which are personal
    information must not be released except to an Approved Researcher or through another relevant
    gateway in the 2007 Act. This paper includes guidance for designing microdata which are not
    personal information and may therefore be lodged at the UK Data Archive under the End-User
    Licence. It also describes how a user may apply to become an Approved Researcher and
    therefore be permitted to access personal information.


1. Introduction

      Social survey microdata are files consisting of individual records for each respondent. The
      records include demographic details about the respondent together with variables specific to
      the survey. These data provide a valuable research tool for a wide range of users and uses.
      Microdata files of different degrees of disclosiveness are therefore made available to
      government departments and to academic and other researchers.

      The Statistics and Registration Service Act 2007 (the 2007 Act) defines personal information,
      and makes it a criminal offence to release such data, unless there is a legal exemption. As
      such there was a need for standards and guidance to be available on how to prepare
      microdata which were not personal information. As these standards were developed they
      became part of a new policy: “ONS disclosure control policy for the release of microdata
      derived from social surveys”.1 In addition, work was undertaken to establish criteria and
      procedures for the provision of data to Approved Researchers, one of the exemptions in the
      2007 Act by which personal information can be released.

      This paper describes the development of a policy on microdata release for social surveys and
      of the procedures for becoming an Approved Researcher. Section 2 gives background
      information about social survey microdata which are available to researchers. Section 3
      describes ONS procedures for releasing microdata. Section 4 describes the 2007 Act. Section
      5 introduces the new policy. Sections 6, 7 and 8 summarise guidance for producing microdata
      which are not personal information. Section 9 describes the procedures associated with
      Approved Researchers.

2. Background

      Although social survey microdata is a very valuable resource, it is essential that the
      confidentiality of survey respondents is respected. This requirement is underwritten by the
      National Statistics Code of Practice, see 3.2, as well as the survey pledge which is given to
      respondents. The 2007 Act has added a legal dimension which is discussed below (see 5).
      Removing direct identifiers, such as name and address, from the data is not sufficient. It still
      may be possible to identify an individual based on other indirect identifiers such as age, sex,
      occupation, geography etc. In addition one could combine anonymised datasets with other
      sources of data to identify individuals in the data, with a high probability that the identification is


1
    For the rest of this paper it will be referred to as the “new policy”.
                                                                                                            1
   correct. Once an individual has been identified, other information about him or her may then be
   revealed by the survey microdata.

   There are two ways to manage the risk of such disclosure. The first is to modify the data, in
   order to make it non-disclosive, by using Statistical Disclosure Control (SDC) techniques and
   tools. This is discussed further in sections 6, 7 and 8. The second is to restrict access to the
   data. The ONS uses combinations of these two approaches.

   More particularly, microdata derived from social surveys are made available to users in the
   following ways:

2.1. Non-personal information

  Datasets are lodged at the UK Data Archive (UKDA) for secondary research access under an
  End-User Licence (EUL), or provided to customers on direct application to ONS under an
  equivalent arrangement. The new policy described in this paper applies particularly to these
  data, being based on Section 39 of the 2007 Act: Confidentiality of personal information. As
  discussed further in sections 4 and 5, these data must not be personal information.

  Some social survey data are also provided to Eurostat for the use of researchers and analysts
  across the European Union under equivalent arrangements to the EUL. The new policy also
  applies to these releases.

2.2. Personal information

  ONS recognises that some valuable research will need to make use of more detailed data than
  the EUL microdata. Less SDC is therefore applied to the data but stricter access controls are in
  place. More detailed datasets are deposited at the UKDA under a Special Licence, which
  requires a user to observe strict security and confidentiality rules. These are personal
  information as defined by the 2007 Act and may therefore only be supplied to users under the
  exemptions listed in Section 39 (4) of the Act. In this paper, the particular exemption considered
  is Section 39 (4) (i), which permits the release of personal information to users who are
  Approved Researchers. The processes relevant to Approved Researchers are described in
  section 9.

  Anonymised datasets may be provided to other parts of the Government Statistical Service
  (GSS). Output derived from these data are subject to the GSS Disclosure Control Policy for
  Tables Produced from Surveys. Recipients sign a data access agreement.

3. Current ONS procedures for releasing microdata

3.1. The Microdata Release Panel

  The procedure for releasing microdata is facilitated and controlled by the Microdata Release
  Panel (MRP) and the Microdata Release Unit (MRU). A data provider wishing to release
  microdata outside of ONS submits an application to the MRP describing the data, the purpose
  for which it is being released and any related data security and confidentiality procedures.

  Applications to release datasets which are not personal information, such as EUL microdata,
  must include a risk assessment from the Statistical Disclosure Control (SDC) team, in ONS
  Methodology. The MRU is authorised by the MRP to approve releases of non-personal
  microdata, subject to a satisfactory assessment by SDC. However applications to release
  datasets which are personal information must be submitted to the Panel for approval, and
  recipients of personal information must sign a data access agreement or the UKDA Special
  Licence.


                                                                                                      2
    The new policy includes detailed guidance for the data provider on how to ensure that data are
    not personal information. This is summarised in sections 7 and 8. The SDC team can also
    provide additional support. However it is the responsibility of the data provider to ensure that,
    when microdata are made available to the UKDA under the EUL, SDC advice has been fully
    implemented. It is also the data provider’s responsibility to ensure that the data released
    correspond to the data specification provided in the MRP application.

3.2. The Code of Practice (CoP)

    The National Statistics Code of Practice (CoP) sets out the professional principles and
    standards which official statisticians are expected to follow and uphold. It is supported by twelve
    Protocols which describe how those principles and standards are to be implemented in practice,
    one of these being the Protocol for Data Access and Confidentiality. This states that National
    Statistics should be modified by statistical disclosure control methods which will be judged
    sufficient when the guarantee of confidentiality can be maintained, taking account of information
    likely to be available to third parties, either from other sources or as previously released
    National Statistics outputs, against the following standard: it would take a disproportionate
    amount of time, effort and expertise2 for an intruder to identify a statistical unit to others, or to
    reveal information about that unit not already in the public domain. In the case of Social Survey
    data, the statistical units are generally individual respondents and households.

4. The Statistics and Registration Service Act 2007

    Section 39 (2) of the 2007 Act defines personal information as information which relates to and
    identifies a particular person. It specifies what constitutes a disclosure of information and the
    sanctions that may apply for any breach of confidentiality.

    Disclosure of personal information takes place when the identity of a particular person is
    revealed. This can take place through being specified in the information, by being deduced
    from the information, or by being deduced from the information when taken together with any
    other published information (section 39 (3)).

    The 2007 Act states that personal information must not be disclosed unless through an
    exemption as specified in section 39 (4), such as release to an Approved Researcher.
    Therefore, in order to be able to provide research access to EUL level data, the data must not
    reveal or have the potential to reveal the identity of an individual.

    More generally, section 23 of the 2007 Act gives the UK Statistics Authority the function of
    promoting and assisting statistical research, in particular by providing access to data held by it.
    The new Approved Researcher gateway adds significant legal powers to assist research
    through access to data. Where an existing gateway is not otherwise available, a researcher
    can now obtain lawful access to personal information provided safeguards are met.

    The 2007 Act has thus had two main implications for the release of social survey microdata.

    (1) Previously data to be released under an EUL or similar (see 2.1) had to be non-disclosive,
    meaning that an individual could not be identified either directly from the data or by use of other
    data which might be available; this included private databases to which an intruder may have
    access. This was in accordance with the National Statistics Protocol, see 3.2. However the
    2007 Act’s definition of personal information implies that private databases do not have to be
    taken into consideration, only data which have been published. It was therefore necessary to
    develop a new policy for EUL data. This is described in the following four sections.

2
  The designer should allow for the intruder to have access to powerful data processing software and hardware
equivalent in standard to that available in ONS, to have some statistical and mathematical expertise equivalent in
standard to those found in an ONS Statistical Officer and to be prepared to dedicate a number of hours of their time to
the task of identifying an individual.
                                                                                                                          3
   (2) Section 39 of the 2007 Act means that personal information may be released to an
   Approved Researcher, so that procedures for granting this status have had to be put in place.
   These are described in section 9.

5. The new policy for releasing microdata

5.1. Developing the policy

    The SDC team undertook a review of the social survey microdata which was held at the UKDA
    in 2006 and the new policy was developed with the help of a Working Group made up of
    internal and external stakeholders. The group concentrated on standards for EUL datasets
    which would comply with the 2007 Act. It was agreed that when preparing EUL microdata from
    a social survey there are three distinct criteria:

    (1) The microdata must not be personal information as defined by the 2007 Act. This means
    that an intruder must not be able to identify an individual or a household either directly from
    the data or by using published information.

    (2) The microdata need to include enough detail to meet the requirements of the majority of
    users.

    (3) The disclosure protection process must not impose an unreasonable burden on the
    business areas.

    The new policy includes an appendix giving detailed guidance on the preparation of microdata
    to satisfy these three criteria. The document has been placed in the ONS Standards and
    Guidance business database. The policy was approved by the ONS Statistical Policy
    Committee on 23 April 2008.

5.2. Different types of microdata

    The policy allows for three tiers of microdata derived from social surveys as follows:

    (1) Personal information

    Data are classified as personal information if they may enable an individual to be identified
    either directly from the data or by the auxiliary use of published information. Examples of
    personal information derived from ONS social surveys are identified data, which are made
    available via the Virtual Microdata Laboratory (VML), GSS datasets which are anonymised
    data supplied to other Government departments, and Special Licence datasets (see 2.2)
    which have had some degree of protection applied, and which may be made available to
    Approved Researchers.

    (2) End-User Licence datasets.

    These are not personal information and are lodged at the UKDA under the End-User Licence
    (EUL), see 2.1. Some data which are not personal information may also be supplied to
    Eurostat. They may also be provided to customers on direct application to ONS.

    Disclosure protection is applied to these datasets, in accordance with the guidance included in
    the new policy. Such data may not be released without a satisfactory risk assessment from the
    SDC team.

    (3) Public use data

    The new policy makes a clear distinction between EUL microdata and public use data. Current

                                                                                                      4
    ONS Corporate Policy for Micro-data Access and Release requires data providers to take into
    account the possibility that an intruder might have access to a private database or to privileged
    information which could be matched with ONS microdata to enable the identification of an
    individual. The new policy states that if microdata were to be released for public use it would
    have to satisfy the Protocol for Data Access and Confidentiality (see 3.2)

5.3. Summary of implications for microdata release

    The new policy states that, for EUL microdata, data providers need only take into account the
    possibility that an intruder might make use of published information (as defined in the 2007
    Act). The more restrictive risk assessment, which takes into account that an intruder might
    have access to private databases, and other sources, should only apply if public use
    microdata were to be released. This means that some EUL microdata will be less restrictive
    than was previously the case.

    The implication is that EUL microdata could be published without breaking the law. However,
    there remains a possibility that publishing the EUL microdata could lead to the identification of
    an individual by use of private or privileged information. This could compromise ONS by
    damaging its reputation or by breaking the terms of the Survey Respondent Pledge. It is
    considered that the End-User Licence provides adequate management of these risks since the
    conditions of the EUL provide some control over the use of data3.

    The new policy includes guidance on preparing EUL microdata, and the following three
    sections give a summary of this guidance; section 6 discusses the disclosure risk posed by
    microdata, section 7 explains how this risk may be assessed, and section 8 describes how
    data can be modified to ensure that it is not personal information.


6. Disclosure risk for microdata - an overview

    For microdata, disclosure occurs where there is a possibility that an individual can be re-
    identified by an intruder using information contained in the file, and when on the basis of that,
    confidential information is obtained. For the purposes of this discussion, an intruder is defined
    as someone who deliberately or inadvertently determines confidential information about a
    respondent from a dataset, or attempts to do so. Microdata are released only after taking out
    directly identifying variables, such as names and addresses. However, other variables in the
    microdata can be used as indirect identifying variables. For individual microdata these are
    variables such as gender, age, occupation, place of residence, country of birth and family
    structure. These (indirect) identifying variables are mainly publicly available variables or
    variables that are present in public databases.

    To assess the disclosure risk, we first need to make realistic assumptions about what an
    intruder might know about respondents and what information might be available to him/her to
    match against the microdata and potentially make an identification and disclosure. These
    assumptions are known as disclosure risk scenarios or intruder scenarios. Based on the
    disclosure risk scenario, the identifying variables (also known as key variables) are determined.
    The other variables in the file are confidential or sensitive variables and represent the data not
    to be disclosed. The aim of statistical disclosure control (SDC) is to prevent such identification
    or disclosure from being possible, at least by introducing uncertainty as to whether the

3
    The UKDA end-user licence includes the following restrictions:
   The user must preserve the confidentiality of individuals and households in the data.
   The user must not attempt to derive information from the data relating to an individual, nor claim to have
    done so.
   The data must be kept secure and may only be shared with other registered users of the UKDA.
   The data may only be used for research or educational purposes, and may not be used for commercial
    purposes without permission.
                                                                                                            5
   identification and disclosure are correct.

   Thus SDC needs to consider to what other data an intruder may have access in order to
   perform such matching. The 2007 Act allows data to be released under an end-user licence
   provided it is not personal information, and this needs only to take account of other published
   information which may be available to an intruder. However SDC also needs to take account of
   private data sources to which an intruder may have access, if there is a risk of compromising
   ONS reputation or breaking survey pledges by using that information to identify an individual.

   To summarise, EUL datasets should neither enable the identification of a survey respondent
   directly from the microdata nor do so by deduction from the microdata taken together with other
   published information. Many Social Surveys are based on households, and the microdata will
   include individual records for each member of the household, which may be regrouped to allow
   household-based analysis. Therefore the identification of a household also needs to be
   prevented.

7. Guidance for preparing EUL microdata - assessing disclosure risk

   This section summarises the guidance in the new policy on assessing the disclosure risk posed
   by social survey microdata and the various factors which need to be considered.

7.1.       Scenarios

       When assessing the disclosure risk of microdata, it is necessary to first consider the intruder
       scenarios. A range of different scenarios applicable to social survey data were developed by
       Professor Angela Dale and Dr Mark Elliot. (Elliot and Dale, 1998; Elliot and Dale, 1999). The
       consideration of scenarios indicates some of the variables which are likely to be used by an
       intruder. For EUL microdata there are two main scenarios to be considered: use of published
       datasets and spontaneous recognition. Other scenarios may also need to be considered,
       either because of particular characteristics of the survey, or as we become aware of more
       publicly available data.

7.1.1. Scenario 1 – use of published datasets

       Examples of these are the Electoral Register and Commercial Datasets, such as consumer
       profile databases, which may be purchased at a reasonable cost by any member of the public.
       The Confidentiality and Privacy Research Issues Group (CAPRI), at the University of
       Manchester, undertook a Scoping Study for a Data Environment Analysis Service (DEAS)
       (Purdam, Elliot 2006). This found that a large number of databases are held by commercial
       data companies, which combine public records, such as the electoral register, with other
       sources, such as lifestyle surveys, to produce large datasets which are then made available
       commercially to the public. As the number of such datasets increases, potential intruders will
       be able to make use of multiple datasets and thus enhance the level of information derived
       from them.

       A typical commercial dataset examined by Purdam and Elliot included the following variables:
       name, address, postcode, age, sex, ethnicity, number of cars, number of children, given in 5
       year age-groups (e.g. 1 child aged 0-4, 2 children aged 5-10), size of household, tenure, house
       type, number of rooms, occupation (high-level), income (banded), qualifications. In addition
       hierarchical datasets were available, for example for every household included in the dataset
       there is a record for each adult in the household, and each of these records contains variables
       which give details, such as age group and sex, of every child in the household, so that records
       can be grouped into households.

       Thus Scenario 1 assumes that an intruder would link published information with EUL microdata
       using key variables as listed above. For matched records, they would then have the direct
       identifiers, name and address, linked with all the other information in the EUL dataset. There

                                                                                                         6
    would be a high probability of these being correct matches.

7.1.2. Scenario 2 – spontaneous recognition

    An intruder may spontaneously recognise an individual in the microdata by inadvertently
    recognising characteristics of an individual who is either known to them or in the public eye (for
    whom published information may be available). It is generally assumed that the intruder does
    not know that the individual or household is in the data set we want to protect. Spontaneous
    recognition can occur for instance when a respondent is a politician, an entertainer or a very
    successful business person. An example is the “Rich List” which publishes annual salaries of
    high-earning individuals.

    The key variables for this scenario include: name, age, sex, marital status, income, occupation
    – job title, which may be equivalent to 4-digit occupation and industry codings.

7.2. Key variables

    Taking into account these scenarios, and others if appropriate, we can identify which variables
    an intruder is likely to combine into a key which could be used to attempt to match the
    microdata with other published data to which the intruder has access, in order to identify one
    or more records as referring to particular individuals. The disclosure risk comes from
    individuals within the microdata that are both sample uniques and population uniques on the
    key since this increases the probability of a correct match. Provided that the disclosure risk is
    reasonably small, the data may be considered not to be personal information; this takes
    account of the “disproportionate time and effort” rule (see 3.2).

    It follows that the variables which are most likely to need protection include demographic
    indicators, such as geography, household composition, ethnicity, occupation etc. Salaries and
    household income are also key variables. Work in the USA (Winkler, 1999) showed that the
    level of matching between datasets is improved when income data are used as part of a
    matching key, due to the availability of administrative tax records. Although tax data are not
    publicly available in the UK, some income-related data are in the public domain. Examples are
    the salaries of company directors, and very high salaries and bonuses, which are all
    published.

7.3. Sample size

    Another factor which needs to be taken into account is the size of the sample. Microdata
    based on a larger sample will have a greater absolute disclosure risk than a smaller sample,
    since the number of re-identifications is likely to be greater. In addition data from a larger
    sample is more likely to be of interest to an intruder. Therefore microdata based on larger
    samples should be treated as more risky.

    The guidance in the new policy recommends the following basis for varying disclosure control
    according to sample size:

    (1) Small samples

    If the sample is less than 1% of the population, then most key variables may not need to be
    protected. However there will always be some, such as geography, which need treatment, as
    discussed below. Surveys such as the Expenditure and Food Survey, General Household
    Survey and Labour Force Survey fall into this group.

    (2) Medium samples

    If the sample size is between 1% and 3% of the population, then it is likely that several key
    variables will need to be protected. The Annual Population Survey has a medium size sample,

                                                                                                     7
    and the Integrated Household Survey is also likely to be of medium size.

    (3) Large samples

    If the sample size is greater than 3% of the population, then further protection may be
    necessary. No social surveys, whether current or envisaged, belong to this category.

7.4. Household surveys

    Many social surveys are household-based, examples being the Family Resource Survey
    (FRS) and the Labour Force Survey (LFS). Microdata from these surveys are hierarchical, as
    they include a record for each individual in the household as well as variables which allow the
    individuals’ records to be linked. This enables an intruder to enhance identification keys, for
    example by combining age, sex, marital status and the relationship of each individual in the
    household. Such keys increase the likelihood of households being identified. Thus the
    disclosure risk has to be assessed at the individual and household level. This is particularly
    important when considering large households.

7.5. Large households

    The presence of large households increases disclosure risk. It has been demonstrated that
    households of size eight and above are intrinsically disclosive, independent of the size of the
    sample (Elliot, 2005). Work on the 2001 SAR noted that, for private households of size 6 and
    above in England, 88% were population uniques for age-sex structure (Bycroft et al, 2005).
    The guidance states that, for households of size 10 and above, all records pertaining to that
    household should be suppressed in EUL datasets.

8. Guidance for preparing EUL microdata – protecting the data

   This section summarises the guidance provided by the new policy for modifying social survey
   microdata to obtain a dataset which is not personal information. Having assessed the
   disclosure risk by considering the likely intruder scenarios and identifying the risk factors, as
   described in section 7, the process of preparing microdata which is not personal information
   can be divided into three steps:

   (1) Anonymise the data
   (2) Apply disclosure control to key variables
   (3) Deal with large households

8.1. Anonymising the data

    This means removing all direct identifiers, including name, address, post-code, NI number and
    NHS number. If the data contain any other direct identifiers, such as Passport Number, then
    these must also be removed. In addition date of birth must be removed, and all similar
    variables such as year of birth and month of birth.
    Anonymised data can still be personal information.

8.2. Applying disclosure control to key variables

    As discussed above, the data provider should consider the relevant intruder scenarios, what
    key variables are included in the data and the size of the sample. The guidance is based on
    experience gained from disclosure risk assessments which SDC has carried out on various
    sets of social survey microdata, and in particular work carried out on the Sample of
    Anonymised Records from Census 2001 (Gross et al, 2005). The advice may be changed for
    future microdata releases if SDC becomes aware of relevant published information which
    could be used by an intruder.


                                                                                                       8
    Some examples of the advice for particular variables follow. It is emphasised that these are
    suggestions and not rules. The method of disclosure control chosen should be appropriate for
    the survey and the sample size. Users’ requirements should always be borne in mind; if a
    variable is needed at a lower level than advised, then another variable should be protected at
    a higher level. For example, if exact rather than banded age is required, then salary and
    income variables could be banded instead, or occupation and industry variables provided at a
    higher level.


8.3. Geography

    Geography variables are the primary candidates for protection, as removing low levels of
    geography introduces extra uncertainty into a possible identification. For most EUL microdata,
    the lowest level of geography is therefore Government Office Region (GOR). One of the main
    reasons for setting up the Special Licence, see 2.2, was to give researchers access to data
    with lower geographical details, such as local authority. Some variables, such as the
    urban/rural indicator, are based on postcode, so their inclusion may reveal a lower level of
    geography.

    Care also needs to be taken with variables based on Council Tax. If the data includes both
    Council Tax band and the amount of Council Tax played by a household, or other variables
    based on this amount, then the local authority may be deduced. This is because Local
    Authorities publish their council tax rates. Thus, in order that EUL data do not include
    geographic levels below GOR, these variables may need to be treated, e.g. by taking
    averages over groups of local authorities with similar council tax rates.

8.4. Age

     Age is one of the primary variables which are used to match datasets. Therefore it is advised
    that, except for small samples (see 7.3), age in EUL datasets should be banded, for example
    into 5-year groups.

8.5. Marital status

    Civil Partnerships are discoverable data, and are therefore considered to be in the public
    domain. They have introduced possible new values to the marital status variables. These
    include: in a legally-recognised Civil Partnership, separated from his/her civil partner, formerly
    a civil partner where the Civil Partnership now legally dissolved, surviving civil partner: his/her
    partner having since died. Social survey samples will include very small numbers where some
    of these values apply, making the corresponding records more likely to be identifiable. Advice
    is therefore that such values should be grouped together, to give a new value for marital status
    of "civil partner or former civil partner".

8.6. Salary and income

    As discussed in 7.2, very high salaries, bonuses, incomes and related variables (such as
    weekly, hourly rates, gross/net household income, amount of income tax) may allow
    individuals to be identified by spontaneous recognition. Advice is therefore that these variables
    should be protected by top-coding at an appropriate level, for example by recoding the values
    for the top 4% of earners. Thus if 4% of the respondents to a survey earned £200,000 per
    annum or more, and a particular respondent earned £1,250,000 then his salary would be
    recoded to £200,000 and all corresponding variables would be similarly recoded.

8.7. Numbers of rooms, cars etc

    These variables are often included in commercially available datasets, see 7.1.1. It is therefore


                                                                                                     9
    recommended that they are top-coded.

9. Approved Researchers

    This section describes the procedures by which a person may become an Approved
    Researcher and be given access to personal information.

9.1. Becoming an Approved Researcher

    The UK Statistics Authority has developed and published criteria for accreditation as an
    Approved Researcher. The 2007 Act requires the Authority to establish whether an individual
    is a 'fit and proper person' to receive personal information and that the purpose of the research
    is compatible with the Authority's statistical principles. Researchers are required to complete a
    standard application form and, subject to successful application, are also required to complete
    a Declaration indicating that he/she understands the requirements of the arrangements.

    The initial criteria against which an application is assessed consists of, but is not necessarily
    limited to, the following:

9.2. Person Criteria

    A researcher is deemed 'Fit and Proper' when he/she is able to demonstrate, to the
    satisfaction of the National Statistician, that he/she:
    (1) Has the appropriate knowledge and experience necessary for handling potentially
    disclosive personal information;

    (2) Has provided satisfactory evidence supporting their application that illustrates their
    professionalism and technical competence to carry out the research proposal;

    (3) Demonstrates a commitment to protecting and maintaining the confidentiality of the data
    during the creation of outputs and publications that arise during the proposal.

9.3. Project Criteria

    A research project is deemed suitable when, in the opinion of the National Statistician, it
    serves one of the following public benefits:

    (1) Supporting the formulation and development of public policy or public service delivery.

    (2) Forms part of the programme of research covered by the National Data Strategy or
    otherwise supported directly or indirectly by the Economic and Social Research Council.

    (3) Supports an obligation of public law (e.g. Local Development Plans)

    (4) Explores new statistical methods that can be used to produce statistics that serve the
    public good.

9.4. Procedures associated with Approved Researchers

    Prior to the 2007 Act, access to potentially disclosive data was made possible upon consent,
    or via statutory gateways found in the Census Act 1920, Statistics of Trade Act 1947,
    European Community Regulations, and some non-statistical legislation. However, these
    existing gateways did not allow complete or consistent access.

    The advantage of the new gateway is that it is suitable for all the different combinations of
    researcher, source data and research, that are not already provided for in law. This makes it
    hard for the criteria to be prescriptive. Thus the criteria leave room for the supply of suitable

                                                                                                        10
    evidence with which the National Statistician can exercise her professional judgement. The
    criteria above provide a basic framework that has been implemented at the beginning of the
    scheme with the potential for development with experience. To assist the applicants there are
    detailed completion notes and a simple form for gathering evidence.

    In terms of administration, the current National Statistics Code of Practice includes a protocol
    on data access and confidentiality, see 3.2. The protocol establishes minimum standards for
    the governance of research access to data. ONS has a Micro-data Access and Release Policy
    that establishes research access to data is a risk management, and not a risk avoidance
    activity. Risk management is achieved by consideration of 'safe projects', 'safe people', and
    'safe environment'.

    The National Statistician currently authorises all research access to personal information held
    by ONS. Operationally, this is carried out in her name by the MRP chaired at Deputy Director
    level in ONS. The Approved Researcher authorisation fits well into the Code and current
    governance arrangements and it is has been agreed that the MRP manages the Approved
    Researcher arrangements on behalf of the Authority.

10. Conclusion

   As a result of the 2007 Act, the level of data which may be provided under an end-user licence
   has changed, as only data in the public domain now needs to be taken into account when
   assessing disclosure risk. However where microdata is to be published, it is still necessary to
   consider private data sources which may be available to an intruder.

   The need for guidance to be provided to business areas was recognised, and the new policy
   document includes this. The SDC team continues to work closely with business areas,
   providing further guidance and advice where disclosive or sensitive variables are concerned.

   The Microdata Release Panel (MRP) is required to approve all releases of microdata. As
   described in section 3.1, for applications to release non-personal data, including EUL datasets,
   approval is delegated to the Microdata Release Unit (MRU), subject to a satisfactory risk
   assessment by the SDC team. Personal data may only be released via a gateway in the 2007
   Act. In the case of data deposited at the UKDA as Special Licence data, requests for access as
   an Approved Researcher are processed by the MRU.

   Methods of risk assessment are continually being reviewed and developed by the SDC team.
   There is ongoing work to monitor what data are publicly available, and this will inform the
   intruder scenarios which need to be considered when carrying out disclosure risk assessments.
   Research is also being undertaken into quantitative methods of risk assessment. The
   interpretation of the 2007 Act, and specifically of Section 39, may be revised in the future. The
   guidance may therefore be subject to future amendments, in particular to ensure that it
   complies with the new Code of Practice (CoP) drawn up by the UK Statistics Authority.

   This paper refers only to microdata derived from ONS social surveys. Guidance for other types
   of microdata from other sources will be developed in the future.




 References

 Elliot, M. J., and Dale, A. (1998) Disclosure risk for microdata: Workpackage DM1.1 What is a key
 variable? Report to the European Union ESP/204 62/DG III




                                                                                                  11
Elliot, M. J., and Dale, A. (1999) Scenarios of attack: The data intruder’s perspective on statistical
disclosure risk. Netherlands Official Statistics. Vol 14, Spring 1999, 6-10.


Purdam, K., Elliot, M. (2006) Data Environment Analysis Service Scoping Study Final Report.


Elliot, M. (2006) Assessment of disclosure risk for hierarchical microdata files.


Bycroft, C., Clift-Matthews, M., Spicer, K., Jackson, P. J. (2005) 2001 Household Sample of
Anonymised Records (SAR), a report to the ONS Data Stewardship Working Group.


Winkler, W. (1999) Re-identification methods for evaluating the confidentiality of analytically valid
microdata, Research in Official Statistics, 1(2), 87-104.


National Statistics Code of Practice
http://www.statistics.gov.uk/about/national_statistics/cop/default.asp


ONS Corporate Policy for Micro-data Access and Release
http://www.knowledgenetwork.gsi.gov.uk/statnet/statnet.nsf/ac54bd08f13d0c8780256b22005306
46/7ef578c682e2418e802571880038bac6/$FILE/Micro-
data%20Access%20and%20release%20policy%20paper.doc


Gross, B., Guiblin, P., Merrett, K. (2004) Risk Assessment of the Individual Sample of
Anonymised Records (SAR) from the 2001 Census.




                                                                                                    12

				
DOCUMENT INFO
Shared By:
Stats:
views:37
posted:2/19/2010
language:English
pages:12
Description: New Policy and Procedures Governing the Release of Microdata