# Definition and overview of chemometrics by UUU5Mu

VIEWS: 9 PAGES: 46

• pg 1
```									Definition and overview of
chemometrics
Chairperson NIR Nord

Unit of Biomass Technology and Chemistry
Swedish University of Agricultural Sciences
Umeå
Technobothnia
Vasa

Project geography
Chemometrics

Mathematics
Statistics
Computer Science
In Chemistry
Similar fields

•   Biometrics ±1900
•   Psychometrics ±1930
•   Econometrics ±1950
•   Technometrics ±1960
Chemometrics

•   Design of Experiments (DOE)
•   Exploratory Data Analysis
•   Classification
•   Regression and Calibration
Design of Experiments
•   Most important where possible
•   Uses:
•   ANOVA
•   F-test
•   t-test
•   Plots
•   Response Surfaces
Design of Experiments
y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 +
b22x22 +...+ bKKxK2 + b12x1x2 +...+ e

Factors x1, x2,...xK changed systematically

Response y measured and modeled
Exploratory Data Analysis

•   Design not possible
•   Sampling situations
•   Find structure
•   Find groupings
•   Find outliers
Classification

•   Check for groupings = UNSUPERVISED
•   Existing groupings = SUPERVISED
•   Visualize groupings
•   Classify
•   Test
Regression / Calibration
•   Two types of variables X / y
•   Relationship linear / nonlinear
•   Model
•   Diagnostics
•   Residual
y

x
Multivariate Data Analysis
Multivariate Data Analysis

•   Sampled data and design with too many reponses:
•   Mining
•   Hospitals
•   Agriculture
•   Food industry
•   More
Nomenclature

• Samples are objects
• What is measured on the object is a variable
34.92           Spectrum

1 1                K
S
a
m            Vectors
p
l
e
s
I
A vector is a collection
12
of numbers.
3.6
11.1
It is always a column
5.9
vector.
34
0.5
1.4
17
12 3.6 11.1 5.9 34 0.5 1.4 17

The transpose of a vector is
a row vector.

Symbols for transpose are
’ and T. a’ or aT.
Particle size, 1 sample
18

16

14

12

10

8

6

4

2

0
0   5   10   15   20   25
Small particles, 35 samples
12

10

8

6

4

2

0
0   5   10   15   20   25   30   35   40
The Data Matrix
K

A data matrix is a
vector of vectors

I
Size histograms, all samples
40

35

30

25

20

15

10

5

0
0   5        10    15   20   25

Particle area
Times in batch reaction
4

3.5

3

2.5

2

1.5

1

0.5

0
0   200   400   600   800   1000   1200

NIR wavelengths
Geometry of multivariate space
Problem

I and K can be large
Correlation
Univariate statistics does not apply
3 variables: blood oxygen,
iron, hemoglobin

I patients
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe

O2
Hb

Fe
O2
Hb

Fe

O2
Properties of multivariate space
Rotation
vectors unchanged / distance unchanged
Translation
vectors changed / distance unchanged
Rescaling / change units
all changes
Consequences
• We can move the coordinate sytem around
• The relative distances between objects do
not change
• We can rotate the coordinate system
• Scale changes are important
• Move coordinate system to center of data
• Scale properly
Vectors (physics)

x = [ x1, x2, x3 ]

|| x || = ( x12 + x22 + x32 ) 1/2
Geometry

c2 = a2 + b2
a       c

b
Vectors (K dimensions)

x = [ x1, x2,..., xK ]

|| x || = ( x12 + x22 +...+ xK2 ) 1/2
Problem

We can not see in more than 3 dimensions

Paper, computer screen: 2-2.5 dimensions
Hb

Fe

O2
Hb

Fe

O2
Projection

2D plane (screen, paper)
Many projections possible
Find a good one
Find a few good ones

What is good?

```
To top