Qualitative Analysis - PDF
Description
Qualitative Analysis of Household Compounds
Shared by: BrittanyGibbons
Categories
Tags
qualitative analysis, qualitative data analysis, issues related to qualitative data analysis, qualitative risk analysis, qualitative water analysis, examples of qualitative data analysis, issues related to qualitative data analysis coding, qualitative analysis of 11 unknowns, qualitative analysis source of error, qualitative method of data analysis
-
Stats
- views:
- 292
- posted:
- 7/3/2008
- language:
- English
- pages:
- 3
Document Sample


VOL. 20 NO. 2 (2008)
TONY DAVIES COLUMN
Back to basics: qualitative
analysis, introduction
A.M.C. Davies
Norwich Near Infrared Consultancy, 75 Intwood Road, Cringleford, Norwich NR4 6AA, UK. E-mail: td@nnirc.co.uk
Tom Fearn
Department of Statistical Science, University College London, Gower Street, London WC1E 6BT, UK.
E-mail: tom@stats.ucl.ac.uk
More “Back-to-basics” The question “Is this sample compound know about distance measures, decision
In December 2004 I made the decision A or compound B?” is different to the boundaries, prior probabilities, misclassi-
that this column should make a return question “Is this sample compound A, or fication costs, ... . Luckily for me I have
visit to topics in quantitative analysis that B, or C, or ..., or Z” and very different to had Tom Fearn to guide and advise me
had been covered (or sometimes just the request “Identify this sample”. for the last 25 years and most of what
mentioned) in previous columns. Within follows is Tom’s work, much of it previ-
seconds of that decision I realised that Human skills ously published in the “Chemometric
we would need to treat qualitative analy- Spectroscopists have been looking at Space” in NIR news2 or in our frequently
sis to the same revision. The quantitative spectra, and giving answers to all three referenced book.3
aspects turned out to be a three year types of questions listed above, for a Tony Davies
marathon∗ journey but we have at last very long time. I do not know of any
arrived at the start of what was conceived spectroscopists who would claim to Supervised and
as “Part 2”. be able to look at a spectrum and esti- unsupervised classification
I have been working on problems in mate the percentage of ingredient x, but Statistical classification has a number of
qualitative analysis for 40 years! That’s computers can, so qualitative analysis interesting applications in spectroscopy.
before chemometrics as a topic began must be a less difficult problem! For NIR data in particular, it has been
and I regard them as being much more A recent query from one of our read- used in a number of scientific publica-
demanding than quantitative analysis. ers (always welcome!) resulted in an e- tions and practical applications.
There are several reasons for this, some mail discussion with some of the world There is an important distinction
more obvious than others: experts on IR qualitative analysis. Their between two different types of classifica-
■ qualitative analysis is not a single view is that qualitative analysis is too tion: so-called unsupervised and super-
problem, difficult to trust to a computer! As Peter vised classification. The former of these
■ some humans are very good at look- Griffiths points out in his recent second usually goes under the name of cluster
ing at spectra and making qualitative edition of Fourier Transform Infrared analysis and relates to situations with little
decisions, Spectrometry, “... a library search cannot or no prior information about group struc-
■ solutions require more statistics than identify an unknown unless the unknown tures in the data. The goal of the tech-
are needed for quantitative analysis. is present in the library”.1 niques in this class of methods is to find
or identify tendencies of samples to clus-
Problems in qualitative Statistics in qualitative analysis ter in sub-groups without the use of any
analysis In quantitative analysis if we have the prior information. This is a type of analy-
From the classical point of view, qualita- RMSEP then we have all the statistics sis that is often used at an early stage
tive analysis is divided into supervised or we need (some others may be useful). of an investigation, to explore, for exam-
unsupervised methods but the number In qualitative analysis we need to know ple, whether there may be samples from
of different objects is also very important. standard errors but we also need to different sub-populations in the dataset,
for instance different varieties of a grain
or samples of chemicals from different
*
Sorry! Races and currently marathons are uppermost in my mind because I have a place suppliers. In this sense, cluster analysis
in the London Marathon to run for the homelessness charity “Shelter” on 13th April. has similarities with the problem of iden-
Would you like to sponsor me? You can at: www.justgiving.com/tonydavies1 tifying outliers in a quantitative data set.
www.spectroscopyeurope.com SPECTROSCOPYEUROPE 15
VOL. 20 NO. 2 (2008)
TONY DAVIES COLUMN
Cluster analysis can be performed is the one that relates the so-called dot
using very simple visual techniques product of the two vectors
such as PCA, but it can be done more
x.z = x1 z1 + x2 z2 + ... + xp zp = Σ xi zi
formally, for instance by one of the hier-
archical methods. These are techniques to their lengths |x| and |z| and the angle
that use distances between objects to θ between them. The formula is
identify samples that are close to each θ
x.z = |x| |z| cos θ (1)
other. The hierarchical methods lead to
so-called dendrograms, which are visual where
aids for deciding when to stop a cluster-
|x|2 = x1 + x2 + ... + xp = Σ xi2
2 2 2
ing process.
Figure 1. Two spectra as vectors x and z in
The other type of classification, super- and
a three-dimensional space.
vised classification, is also known under
|z|2 = z1 + z2 + ... + zp = Σ zi2
2 2 2
the name of discriminant analysis. This
is a class of methods primarily used to Thus, to compute the angle we
build classification rules for a number each of the p measurements as the compute the dot product and the two
of pre-specified subgroups. These rules coordinate in one of the dimensions. We lengths, and then use Equation (1) to
are later used for allocating new and may equally well think of the spectra as find cos θ, and hence θ.
unknown samples to the most probable vectors, by joining the point representa-
sub-group. Another important applica- tion of the spectrum to the origin with a Standardising the length
tion of discriminant analysis is to help in line. As usual, the trick to understanding If we are going to be computing a lot of
interpreting differences between groups. the maths is to consider the case p = 3, these angles, it makes sense to stand-
Discriminant analysis can be looked upon for which it is easy to draw the picture. ardise all the spectra so that each has
as a kind of qualitative calibration, where Figure 1 shows two vectors in a three- length 1. This is achieved for x by divid-
the quantity to be calibrated for is not dimensional space. ing each xi by |x|.
a continuous measurement value, but a In the picture, the vectors keep their
categorical group variable. Discriminant Euclidean distance direction but are rescaled in length to lie
analysis can be done in many different Euclidean distance, D, is the “natural on a sphere of radius 1. Then |x| = |z| = 1
ways, some of these will be described in measurement” of distance between two and Equation (1) reduces to
following columns. Some of the methods objects.
x.z = cos θ (2)
are quite model orientated, while others Geometrically, D is the length of the
are very flexible and can be used regard- line joining the ends of the two vectors Now the angle and the dot product are
less of structures of the sub-groups. in the figure. For the multi-dimensional equivalent measures of distance in the
Some of the material in earlier columns case it is defined as: sense that each can be calculated simply
on quantitative analysis is also relevant to from the other. Note though that the
D2 = (x1 − z1)2 + (x2 − z2)2 + ... + (xp − zp)2
classification. Topics and techniques such maximum dot product, 1, corresponds
as collinearity, data compression, scatter = Σ(xi − zi)2 to the minimum angle, 0, whilst a dot
correction, validation, sample selection, product of 0 corresponds to an angle of
outliers and spectral correction are all which expands to: π/2 = 90°. This equivalence means that
as important for this area as they are for we could equally well define a region
D2 = Σ xi2 + Σ zi2 – 2 Σ xi zi
quantitative calibration. of similarity around x as all spectra that
have a dot product with x exceeding d, or
Distance measurements Angles between vectors as all spectra that make an angle of less
used in classification Geometrically, we can just measure than cos–1 d with x.
It seems a good idea before we begin the angle θ between the two vectors in
a discussion of techniques to describe Figure 1. If the vectors represent spectra, Relation with Euclidean distance
some of the ways of measuring distance then we can call this the angle between Using standardised spectra, there is a
that we will be using. The message is the spectra. It is clear from the picture fairly simple relation between these two
that there are some very simple though that the more similar the two spectra, the measures and the Euclidean distance D.
perhaps non-obvious relationships closer together will be the two points and
If D2 = Σ xi2 + Σ zi2 – 2 Σ xi zi
between some of these measures. the smaller will be the angle between
the corresponding vectors. Of course it then when the vectors are standardised
Spectra as vectors is usually preferable to use a formula to and the first two terms are each 1, we
A spectrum x = (x1, x2, ..., xp) measured compute the angle between x = (x1, x2, have
at p wavelengths can be thought of as ..., xp) and z = (z1, z2, ..., zp) directly from
a point in p-dimensional space by taking the measurements. The relevant formula D2 = 2(1 − x.z) = 2(1 − cos θ)
16 SPECTROSCOPYEUROPE www.spectroscopyeurope.com
VOL. 20 NO. 2 (2008)
TONY DAVIES COLUMN
Thus, for standardised spectra, the dot is the mean of the elements in x and lent, it just introduces a scale factor into
product, angle and Euclidean distance the equation relating them. Thus, in this
lx2 = Σ(xi − mx )
2
are all three equivalent measures of sense, using the correlation coefficient
distance. A region of similarity defined by is the squared length of x after it has (or its square) as a distance measure is
any of the three would be all spectra that been centred. Then the dot product essentially the same as pretreating with
lie within a circle around x on the surface between x* and the similarly centred and SNV and using either the angle or the
of the sphere. scaled z* is dot product between the spectra as the
The dot product is easily the quickest
x * .z * =
∑ ( xi − mx )( zi − mz ) distance measure.
to calculate, so would be the preferred
measure from a computational point of
∑ ( xi − mx )2 ∑ ( zi − mz )2 References
view. For non-standardised spectra the which, by definition, is the correlation 1. P.R. Griffiths and J.A. de Haseth,
three measures would, of course, all be coefficient between x and z. Thus we F o u r i e r Tr a n s f o r m I n f r a r e d
different. have yet another equivalence: the corre- Spectrometry, 2nd Edn. John Wiley
lation is the same as the dot product if & Sons, Inc. Hoboken, NJ, USA
Relation with correlation we centre and scale the spectra before (2007).
Another measure sometimes used to computing the latter. 2. T. Fearn, NIR news 14(2), 6–7
compare spectra is the correlation coef- The transformation in Equation (3) (2003).
ficient between them. To relate this to looks rather similar to the well-known 3. T. Næs, T. Isaksson, T. Fearn and
the distance measures above we need SNV standardisation.4,5 The only differ- T. Davies, A User-Friendly Guide
to centre as well as scale the spec- ence is that SNV would normally use sx to Multivariate Calibration and
tra. Suppose we transform from x to x*, as a divisor rather than lx , where Classification. NIR Publications,
where the ith element xi* of x* is given Chichester, UK (2002).
sx2 = lx2 / (p − 1)
by 4. J. Barnes, M.S. Dhanoa and S.J. Lister,
The only difference this would make Appl. Spectrosc. 43, 772 (1989).
xi* = (xi − mx) / lx (3)
is that the dot product now becomes 5. A.M.C. Davies and T. Fearn, Spectrosc.
where p − 1 times the correlation. This does not Europe 19(6), 15 (2007).
mx = Σ xi / p change the fact that the two are equiva-
FASTLINK / CIRCLE 009 FOR FURTHER INFORMATION
Related docs
Get documents about "