The HESA Unique Student Identifier (HUSID) and HIN Processing by thI762H


									PAPER 6

     The HESA Unique Student Identifier (HUSID) and HIN Processing

Identification by HUSID

Since 1994/95 HESA has collected an individualised student record from all UK publicly-
funded HE institutions. The HESA Unique Student Identifier (or HUSID) is a 13 digit numeric
code that identifies each individual student within the record. For UCAS entrants, the HUSID
is comprised of the 9 digit UCAS number prefixed by four zeroes, whereas for direct entrants
the HUSID is compiled from: year of entry into the (first) HE institution, the HE institution
identifier (INSTID) + 1000 and a 6 digit reference number allocated internally by the
institution. Both methods of HUSID construction include a check digit.

The maintenance of the HUSID given at the start of entry to HE is essential for tracking
purposes. Once a number has been allocated, it should never be re-used, although there is
evidence to suggest some contrary practice. Transfer to another institution does not require a
new HUSID, nor does deferred entry. To assist receiving institutions in finding the correct
existing HUSID for transferring students, HESA provides a HUSID look-up service, employing
simple ‘fuzzy’ matching techniques.

HESA’s data collection system (known affectionately as ‘Aardvark’) employs several data
validation checks for the HUSID field: it must not be empty, it must pass the checksum test,
the INSTID part must be valid (for a non-UCAS entrant) and the field must not be all zeros.
Where a student is following two programmes of study a further check is made so that when
HUSID appears twice, the data submission is failed if the gender and birth date are
inconsistent between the two records.

The concept of HIN

The definition of the basic unit of coverage of the HESA student record is ‘a student on a
programme of study leading to a qualification aim’. The necessity for a further field to
uniquely define this basic unit becomes apparent if the case of a student following two
programmes of study at the same institution is considered. The logical approach is to use the
QUALAIM field to help define unique records, but on examination, the entries in this field
reveal problems. For example, one student may be concurrently studying for both a Certificate
in Philosophy and a Certificate in Sociology, where the QUALAIM code for both of these
programmes is the same. Additionally, a qualification aim may evolve within an individual’s
programme of study, for example MPhil to PhD or Cert HE to Dip HE to Degree.
So, within the student record an additional field, the Student Instance Number, or NUMHUS,
is used to complement the HUSID in order to uniquely define the basic unit. NUMHUS is an
‘up to’ 20 character alphanumeric code and is institution defined, thus allowing use of any
internally held identifier that the institution uses. As the HUSID cannot be relied upon to
contain the identifier of the current institution of study (or any, for UCAS entrants), the
INSTID is also considered important for identification purposes.

Thus these three fields, HUSID + INSTID + NUMHUS, together form a unique identifier
(known as the HIN) for each instance of a student on a programme of study.

HIN Rules
The data collection system validation process rejects data containing duplicate HIN values.

Once a record has been returned for one HESA reporting year, a record with the same HIN
should be returned in subsequent years until either:
 - a record is returned with Reason for Leaving and Date Left completed, or
 - a record is returned with Suspension of Active Studies completed, or
 - a record is returned with Mode of Study completed indicating ‘Dormant’.

HIN & Target List Registers

The maintenance by institutions of HIN between successive reporting years is policed by the
HIN processing system using a Target List Register (or TLR). The TLR is a dataset that holds a
selection of critical fields from previous returns for each unique HIN received. Each record
includes two derived fields that between them determine: the state of the record when it was
last returned (one of Live, Dormant or Left), the state of the record in the previous collection,
and the year of collection in which the last active record was received. The TLR is maintained
such that records marked Dormant or Left are archived-off after a significant period of time.

In an attempt to link records across successive years, with each new annual submission of data
the HIN processing system will search TLR for a match by HIN. If a match is made, TLR can
then be updated with the latest information. New starter records are added to TLR, in
preparation for the following year’s processing. The system can then start from TLR and
search the new submission for records that were expected, i.e. those last known as ‘Live’ (or
who are continuing their studies).

These two searches produce some very useful reports. Once a link across years is identified,
the data supplied with each record can be compared, and so reports of inconsistencies are
provided. For example, has the date of birth changed between years, has the postcode on entry
changed (many do, incorrectly), has the year of study been incremented, has the
commencement of study date changed (very important in progression statistics) etc. Currently
some twenty-five such checks are reported to institutions. Whilst HESA points out these
inconsistencies, only the institutions can decide which year’s data is correct.

From the second search institutions are provided with a list of students (on programmes of
study) that have simply ‘gone missing’. This is possible as last year the records now held on the
TLR implied that the student was continuing their studies, but no record of them is found in
the incoming dataset. These cases are of great importance to institutions because in some
analyses it is assumed these students have dropped-out.

In addition to the two searches based on HIN already mentioned, HESA also provides extra
matching and reporting for other ‘suspect records’. There is a reasonable chance that ‘gone
missing’ figures are artificially high due to the HIN link being broken, e.g. through the HUSID
or NUMHUS being changed (incorrectly), so the culprit records are disguised as ‘new starters’.
In such cases, starting from the unmatched ‘Live’ TLR records, a search is made of incoming
data using Date of Birth + Postcode + Level of Study. Matches are presented as ‘did you really
mean this?’ and, if appropriate, institutions are encouraged to restore the HIN link and re-
submit their data. As well as ‘gone missing’ records on TLR, there are often cases where an
incoming record appears to be a new starter, but the Date of Commencement of Studies
suggests that it is not the first year of study. This time a match is made from these records
back to TLR, again using Date of Birth + Postcode + Level of Study, with the same
encouragement to restore the HIN link given where appropriate. A final check is made by
matching perfectly normally appearing ‘new starter’ records with properly ‘closed-off’ records
in TLR in an effort to expose cases where a record is closed-off, but then continued in the next
return using a new HIN, thus giving the appearance of two separate entities when only one
entity exits in reality. Again, there is similar reporting and encouragement to amend the
incoming record.

A further spin-off of TLR is that as it holds a list of all ‘Live’ instances for the next year, which
can then be reported to institutions as a list of all the records that are expected in the following
year’s data collection.

Although a new process, the HIN system has already produced a large amount of useful
management information to assist institutions in providing consistent and accurate data, and
the indications are that a number of institutions are acting on this information. It is hoped that
further analysis of the data will provide clues as to the patterns of behaviour in record
collections, and eventually allow us to provide improved advice and guidance to institutions
(perhaps through ‘teach-in’ seminars) on how best to avoid the more common errors.

This subject is yet to be fully explored, the systems involved are certainly challenging, although
the effects and implications of what has already been achieved are likely to be far reaching in
terms of data quality and accurate progression statistics. However there is still much for us to

Chris Scammell
Software Development Manager
Higher Education Statistics Agency


To top