The H-principle of mathematical modelling by dfhdhdhdhjr

VIEWS: 2 PAGES: 51

									         The H-principle of
       mathematical modelling

Agnar Höskuldsson, IPL, DTU, Bldg 358, 2800 Kgs. Lyngby
Ph: +45 4525 5643, Fax +45 4593 1577
Email: ah@ipl.dtu.dk
Home page: www.predict.dk
                Modelling industrial data,
               background and basic issues
Industry requires:
• Simple models. Handled by different people. Different models
for different situations. Graphic tools for calibration/testing.
• Stable predictions. Small uncertainties associated with
predictions derived from new samples.
• Robust situation. Small measurement errors have small effects
on the predictions. Detection of abnormalities. Easy or simple
procedures to ’calibrate instruments’.
• Reliable data. The data used adequately reflect the measurement
situation (operating conditions) and the modelling task.
 Empirical approach to data or model analysis

                     w


         X                      Y
                         t

Score vector: t=Xw=w1 x1 + … + wK xK
Requirements to t:
• Explain Y. Projection of Y onto t: (Y’t)/(t’t) t
• As large t as possible (Advice from experimental design)
• Reflect features in X that are important in describing Y.
Results from empirical approach to data analysis
                                              Typical values:
              K       A             M
                                              M: <=5
       X             T          Y           A: <10
                                              K:
                                               IR&NIR: 1050
Examples:
NIR: K=926, A=3
IR: K=1050, A=4               T is the latent structure in X.
Furnace: K=21, A=5            Regression is based on T.
Process: K=12, A=4            Features in T are related to X.
                The H-principle, formulation
It is a recommendation of how we should carry out the modelling
     procedure for any mathematical model:
1. Carry out the modelling in steps. You specify how you want to look
     at the data at this step by formulating how the weights are
     computed.
2. At each step compute expressions for
     a) improvement in fit, Δfit
     b) the associated prediction, ΔPrecision
3. Compute the solution that maximizes the product
          ΔFit  ΔPrecision
 4. In case the computed solution improves the prediction abilities of
     the model, the solution is accepted. If the solution does not provide
     this improvement, we stop modelling.
 5. The data is adjusted for what has been selected and start again at 1).
Ref: Prediction Methods in Science and Technology, Vol 1.
        Thor Publishing, 1996, pp 405+disk. ISBN 87-985941-0-9
             The task of modelling data
In industry the primary focus is on the prediction associated
    with new samples. Assuming standard assumptions for
    linear regression the variance of the estimated response
    value, y(x0), given a new sample x0, is
   Var(y(x0)) = s2 x0T(XTX)-1 x0,
where
        s2 = yT (I-X(XTX)-1XT)y/(N-A)
The task of modelling is
a) Good fit, small value of s2, or of yT(I-X (XTX)-1XT)y
b) Low value of model variation, x0T(XTX)-1 x0
            Application to linear regression
Multivariate regression, YN(XB,). In step 1). we are looking for a score
vector t, computed as t=Xw, that gives us a good description of the data for the
response variables, Y. There are two aspects of this modelling task, step 2),

a)       improvement in fit: |YTt|2/(tTt)
b)       associated prediction in terms of variance:           |Σ|/(tTt)
Step 3) suggests to maximize the product
         {|YTt|2/(tTt)}  {1/[|Σ|/(tTt)]} = |YTt|2/|Σ|
Assuming |Σ| constant, we maximize
         |YTt|2 = wT (XTYYTX) w,             for      |w|=1.
Solution: Find w as the eigen vector of the leading eigen value of
         (XTYYTX) w = λ w.                   (PLS regression)
        Background for the H-principle. 1

The measure of error of fit:
        YTY- YTX (XTX)-1 XTY= YT(I- X (XTX)-1 XT)Y
and the measure of precision,
       (XTX)-1
are stochastically independent.
It means that a knowledge of the measure of error of fit does not
provide with any information on the precision.
(Estimate of mean value of the normal distribution does not
provide with any information on the variance)
         Background for the H-principle. 2
Significance testing:
The F-test of significance of a regression coefficient is
equivalent to test the significance of the correlation coefficient
between the present residual response values and the score
vector of the coefficient. The correlation coefficient is invariant
to the size of the score vector, tA+1, in question.

From significance testing there is no information on the
precision of the obtained model. The predictions
derived from a ’significant model’ can be good or bad.
(Statistical significance testing in industrial data with
many variables tends to lead to severe overfitting).
            Background for the H-principle. 3
Assuming a standard linear regression model, yN(X,2I). The
response value of the new sample, x0, is estimated as
y(x0)=b1Tx0. The variance of y(x0) after A components is given by
          Var(y(x0))A = 2 (1+x0T (X1TX1)+ x0)            with
2  s2A = [(yTy)- {(yTt1)2/(t1Tt1)+…+(yTtA)2/(tATtA)}]/(N-A-1)
x0T (X1TX1)+ x0 = [(x0Tr1)2/(t1Tt1) + … + (x0TrA)2/(tATtA)]
Result:
          Var(y(x0))A+1 < Var(y(x0))A
if and only if
          (yTtA+1)2 > f/[g/(x0TrA+1)2 + 1/(tA+1TtA+1)].

Conclusion: maximize (yTtA+1)2 and and check if the
prediction variance has reduced for all or most samples.
          Background for the H-principle. 4
Assuming a standard linear regression model, yN(X,2I), the
   variance of the estimated response, y(x0)=b1T x0, associated
   with a new sample x0 consists of two parts:
the error of fit, [(yTy) - yT X1(X1TX1)+ X1Ty ], always decreases
the model variation, (1+x0T (X1TX1)+ x0), always increases,
when the model is expanded. In terms of orthogonal components:
   Decrease in error of fit:         (yTtA)2/(tATtA)
   Increase in model variation:      (x0TrA)2/(tATtA)=
                             [(yTtA)2/(tATtA)](x0TrA)2/(yTtA)2

Conclusion: Balancing these two terms
  seek to maximize (yTtA)2.
     Schematic interpretation of the optimization task
                     w                                      w       q


           X             Y                       X                      Y

                                 q=
                 t=Xw            YTt                     t=Xw v=Yq


     Find a weight vector w:             Find weight vectors w and q:
     Maximize |q|2
                                                  Maximize (tTv)
               1050          5
                                 Common sizes for optical instruments
          X              Y
50                                     Ref.: PLS Regression Methods
                                       Journal of Chemometrics, 2 (1988) 211-228
                  Mathematical expansions
Data. Simultaneous decomposition of X and X+
X = d1 t1 p1T + d2 t2 p2T + … + dA tA pAT + … + dK tK pKT
X+ = d1 r1 s1T + d2 r2 s2T + … + dA rA sAT + … + dK rK sKT


Covariance matrix. Simultaneous decomposition of S and S+

S = d1 p1 p1T + d2 p2 p2T + … + dA pA pAT + … + dK pK pKT

S+ = d1 r1 r1T + d2 r2 r2T + … + dA rA rAT + … + dK rK rKT

Ref: Data analysis, matrix decompositions and generalized inverse,
         SIAM J. Sci. Comput, 15 (1994) 239-262.
                   Mean squared error

The squared bias, JA= |X1b1- Xβ|2 has the mean value
E(JA)/σ2 = A + [A+12 (tA+1TtA+1) + … + K2 (tKTtK)]/σ2
= A + 2TD22/σ2
The task of modelling is to find a balance between as low
   values as possible of:
1) The dimension of the model, A
2) The relative squared bias, 2TD22/σ2
Mallow’s Cp value is used to measure this balance.
        McMaster Process Control Consortium,
           McMaster University, Canada

In charge: Prof. John MacGregor
Participation of 16 large companies in USA and Canada
Success stories of implementing chemometric methods on
the factory floor.
John MacGregor has received many international honors for
the results obtained at the consortium.


http://www.chemeng.mcmaster.ca/macc/MACC.HTM
                        Process data.
    800


    700


    600


    500


    400


    300


    200


    100


      0
       0       2         4        6        8        10       12


12 x-variables. 289 samples (time points). Y=output variables.
Variables 3 and 8 have large variance (variation). Some variables
have small variance.
           Process data, squared correlations
      1

     0.9

     0.8

     0.7

     0.6

     0.5

     0.4

     0.3

     0.2

     0.1

      0
       0         2        4        6        8        10        12

Squared correlation coefficients between the x’s and the response
variable. Var. no. 3 has high value. Four variables have low value.
              Process data. Analysis, 1.
PLS regression. Explained variation.

  No (X)       X        (y)      y
   1 11.8432 11.8432 88.9710 88.9710
   2 14.3286 26.1718 6.5648 95.5358
   3 16.5351 42.7070 1.6504 97.1862
   4 15.0060 57.7129 0.5906 97.7768


 Four components explain 97.78% of the Y-variation.
 57.71% of the total variation in X has been used.
                    Process data. Analysis, 2.
                             Estimated Y1 vs observed Y1
    0.2


   0.15


    0.1


   0.05


     0


  -0.05


   -0.1


  -0.15


   -0.2
     -0.2   -0.15     -0.1      -0.05    0      0.05       0.1   0.15   0.2
Computed y-value according to model versus observed y-values
                  Process data. Analysis, 3.
           U1 vs score vector 1                    U2 vs score vector 2
  0.2                                     0.1

  0.1                                   0.05

    0                                       0

 -0.1                                   -0.05

 -0.2                                    -0.1
   -0.2    -0.1     0     0.1     0.2      -0.2      0     0.2    0.4     0.6

           U3 vs score vector 3                    U4 vs score vector 4
0.04                                    0.04

0.02                                    0.02

    0                                       0

-0.02                                   -0.02

-0.04                                   -0.04

-0.06                                   -0.06
    -0.2   -0.1     0     0.1     0.2       -0.4   -0.2     0     0.2     0.4
   Reduced y-values, u’s, (u1=y) versus score vectors.
                           Process data. Analysis, 4.
              Score vector 2 vs score vector no 1                   Score vector 3 vs score vector no 1
    0.3                                                   0.2

    0.2
                                                          0.1

    0.1
                                                            0
      0

                                                          -0.1
    -0.1


    -0.2                                                  -0.2
       -0.2        -0.1        0        0.1         0.2      -0.2        -0.1        0         0.1        0.2


              Score vector 4 vs score vector no 1                   Score vector 3 vs score vector no 2
    0.4                                                   0.2


    0.2                                                   0.1


      0                                                     0


    -0.2                                                  -0.1


    -0.4                                                  -0.2
       -0.2        -0.1        0        0.1         0.2      -0.2      -0.1      0       0.1    0.2       0.3


S              Scatter plots for the first four score vectors.
750
               Process data. Analysis, 5.
700

650

600

550

500

450

400

350

300

250
   0          50       100      150       200   250   300

       Plot of the y-values versus time
                           Process data. Analysis, 6.
       Loading vector 2 vs loading vector no 1                   Loading vector 3 vs loading vector no 1
  1                                                       0.5          11    9
                                                                                                4
                                                                     5                                        3
          5        6
                   7                                                               1
0.5                                                         0                      10       2
                                                  3                                     8
                           1

  0           11                                          -0.5
                                    2
                               12                                                   12
                       9                                                   6
                                                                           7
                           10 8
-0.5                                    4                  -1
   -0.5                0                    0.5       1    -0.5                0                    0.5           1


       Loading vector 4 vs loading vector no 1                   Loading vector 3 vs loading vector no 2
  1                                                       0.5                 11
                                                                       9
                                    2                            4
0.5                                                                                         3             5
                            12                                                          1
                           10                               0        10        2
                   7
                   6          8
                                                                      8
  0       5   11
                       9                          3
                                                          -0.5
-0.5                                    4                                 12
                                                                                                          6
                                                                                                          7
                           1
  -1                                                       -1
  -0.5                 0                    0.5       1    -0.5                0                    0.5           1

       Scatter plot of the first four loading vectors
                          Process data. Analysis, 7.
       Loading vector 2 vs loading vector no 1            Loading vector 3 vs loading vector no 1
0.6                                        3       0.6
               5
                                                                     11
                                                                                                             3
0.4                                                0.4              5
                                                                          9
                   6
0.2             11 7                               0.2
                               2                                                           2
  0                       1                          0                        10
                     9

-0.2                      12                       -0.2                                4
                         10                                                       8
                          8 4                                                     1
                                                                                  12
-0.4                                               -0.4                   7
                                                                          6
   -0.5              0             0.5         1      -0.5                0                      0.5             1


       Loading vector 4 vs loading vector no 1            Loading vector 3 vs loading vector no 2
0.5                                                0.6
                         10                3                                                    11
                11        12                                                                                     3
               5               2                   0.4                                                       5
                     9
  0                                                                           9
                          8
                     7
                     6                             0.2
                                                                                       2
                                                     0            10
-0.5                           4
                                                   -0.2      4
                                                              8
                          1                                                        1
                                                                   12
  -1                                               -0.4                                           7
                                                                                                  6
  -0.5               0             0.5         1      -0.4        -0.2         0               0.2     0.4       0.6

 Scatter plot of the first for causal vectors, the r’s, ti=Xri.
             Foss-Electric, Hillerød, Denmark

Produces IR instruments for the chemical industry.
Yearly sales around 200 mio euros. Around 1600 employees.
An instrument is calibrated against the concrete measurement
situation, like e.g., quality of milk (fat, etc).
The instruments are calibrated by chemometric methods. A
certain number of IR spectra (samples, say 50) are used to
identify the model that should be used in the measurement
situation. The model is then developed using chemometric
methods.


http://www.foss-electric.dk
IR (Infra-Red) data, (Diary-production)
    2


  1.5


    1


  0.5


    0


  -0.5


   -1


  -1.5
      0    200     400     600    800    1000    1200


  IR-data, 1050 x-variables, Y=quality parameter.
  Some variables have small variance, others large.
 IR (Infra-Red) data, squared correlations

        0.7


        0.6


        0.5


        0.4


        0.3


        0.2


        0.1


         0
          0      200     400      600     800     1000     1200



Squared correlation coefficients, 1050 x-vars and Y. Six intervals
of variables show high values.
Ref: Variable and subset selection in PLS regression.
Chemometrics and Intelligent Laboratory Systems, 55 (2001) 23-38
                    The CovProc procedures

      Search for intervals having high covariance:
      a) Restrict X to columns showing high covariance.
      b) Sort the variables according to the size of the covariance.
          1) For original X
          2) Select at each step intervals of X



Ref: Covariance procedures for subset selection in regression analysis.
Journal of Chemometrics, submitted
              Furnace data. Analysis, 1.
 Percentage variation extracted of X, Y:


 No (X)       X         (Y)     Y
  1 78.7264 78.7264 57.4918 57.4918
  2 10.4655 89.1920 41.0052 98.4970
  3   5.9258 95.1178    0.4263 98.9233
  4   2.4266 97.5444    0.2015 99.1248


99.12% of Y’s variation is extracted using 97.54% of X.
                  Furnace data. Analysis, 2.
             U1 vs score vector 1                            U2 vs score vector 2
  0.4                                        0.2


  0.2                                        0.1


    0                                          0


 -0.2                                       -0.1


 -0.4                                       -0.2
     -1    -0.5       0        0.5   1         -0.4     -0.2           0       0.2        0.4


             U3 vs score vector 3                            U4 vs score vector 4
0.06                                       0.06

0.04                                       0.04


0.02                                       0.02

    0                                          0


-0.02                                      -0.02


-0.04                                      -0.04
    -0.4   -0.2       0        0.2   0.4       -0.3   -0.2      -0.1       0        0.1   0.2


Plots of reduced y’s, the u’s, versus score vectors.
                     Furnace data. Analysis, 3.
         Score vector 2 vs score vector no 1                 Score vector 3 vs score vector no 1
0.4                                                0.4


0.2                                                0.2


  0                                                  0


-0.2                                               -0.2


-0.4                                               -0.4
    -1        -0.5        0        0.5         1       -1         -0.5        0        0.5         1


         Score vector 4 vs score vector no 1                 Score vector 3 vs score vector no 2
0.2                                                0.4

0.1
                                                   0.2

  0
                                                     0
-0.1

                                                   -0.2
-0.2


-0.3                                               -0.4
    -1        -0.5        0        0.5         1      -0.4        -0.2        0        0.2         0.4

       Scatter plots of the first four score vectors.
                         Furnace data. Analysis, 4.
       Loading vector 2 vs loading vector no 1                  Loading vector 3 vs loading vector no 1
  1                                                       0.5
                                                                                                      20
                                                                                                      19
                                                                                                     21
                              5
0.5                                             6                                    4                   18
                                                                                                 5
                     4                      7                               3                              17 10
                                          8
                                          9
  0                                      10                 0                                               119
              3                        11
                                      13
                                       12                                                                      87 6
                                      16
                                      14
                                      15                                2
                                  21 17
                                    18
                                   20
                                   19                                                                       12
          2
          1                                                             1                                  16
                                                                                                           13
-0.5                                                                                                       14
                                                                                                           15
  -1                                                      -0.5
  -0.4        -0.2        0          0.2            0.4      -0.4           -0.2             0             0.2        0.4


       Loading vector 4 vs loading vector no 1                  Loading vector 3 vs loading vector no 2
0.4                                                       0.5
                                            9
                                           10                               20
                                            8                               19
                                                                            21
0.2                               20 11 7
                                  19                                            18       4
                                                                                                           5
  0           3                                                                  3
                                                                                17 10
          2                       1812                                            11 9
                     4                          6           0                         8 7            6
          1                        17                                   2
-0.2                            21                                               12
                                    13
                                    16                                           16
                                                                    1            13
-0.4                          5     14                                           14
                                     15                                          15
-0.6                                                      -0.5
   -0.4       -0.2        0          0.2            0.4      -0.5                    0               0.5              1


  Scatter plots of the first four causal vectors.
                Beer data. Analysis, 1.
Percentage variation extracted of X, Y and B:

No d(X)       X     d(Y) Y
 1 86.7429   86.7429 89.1011   89.1011
 2 4.5285    91.2714 9.7844    98.8855
 3 0.6658    91.9372 0.3781    99.2636
 4 0.7473    92.6845 0.0973    99.3609




Three components extract 99.26% of the variation in Y,
and use 91.94% of X.
                           Beer data. Analysis, 2.
                  U1 vs score vector 1                                 U2 vs score vector 2
  0.6                                                  0.3

  0.4
                                                       0.2

  0.2
                                                       0.1
    0

                                                         0
 -0.2


 -0.4                                                 -0.1
     -4       -2           0         2         4         -0.5            0              0.5         1


                  U3 vs score vector 3                                 U4 vs score vector 4
0.04                                                 0.04


0.02                                                 0.02


    0                                                    0


-0.02                                                -0.02


-0.04                                                -0.04
    -0.2   -0.1        0       0.1       0.2   0.3       -0.3   -0.2      -0.1      0         0.1   0.2


 Plot of the first four reduced y’s, u’s, versus score vectors
                 Comparison of methods
Procedure cross-validation:
1. Leave out 10% of the samples. Use the 90% to estimate the
   parameters of the model. Use the estimates to compute the
   response values of the 10% left-out values. Register the
   average squared difference of observed and computed
   response values of the 10%, i(yi – ŷ-i,j)2/I.
2. Repeat 1. several times, J times, each time select 10% of the
   samples. Register the average total squared difference,
        j[i(yi – ŷ-i,j)2/I]/J
                        Comparison of methods
CovProc. The weight vector is computed as w=XTy/|XTy|. The absolute values
of w are sorted and only the largest values are used as long as the fit of the score
vector is improved and the others zeroed.
PLS regression. The weight vector w is chosen as w=XTy/|XTy|.
Forward regression. The weight vector is w=(0,0,…,1,0,0…), where the index
of 1 corresponds to the variable that gives the largest value of (yTxi)2/(xiTxi). The
value of xi is the value of the ith column of reduced X.
Principal Component Regression (PCR). The regression is based on the first
(having largest eigen values) components. The weight vector w is the eigen
vector associated with XTX.
PCR with sorted components. The same as the previous one except the
components are sorted according to the value of the squared covariance between
y and t, (yTt)2.
Ridge Regression (RR). The ridge constant is estimated by leave-one-out
procedure. I.e., it is the constant that gives the smallest value of N (yi-ŷ-i)2,
where ŷ-i is the estimate response value for the ith sample, when it is excluded
from the analysis.
RR with sorted components. Same as standard RR, except the components are
sorted according to the value of the squared covariance between y and t, (yTt)2.
                             Comparison of methods
Dimension       CovProc       PLS        Var Sel      PCR       PCR, sorted    RR        RR, sorted

            1       0.0263      0.1162       0.0276    0.1271         0.1271    0.1271        0.1271

            2       0.0081      0.0134       0.0208    0.0136         0.0136    0.0139        0.0139

            3       0.0065      0.0143       0.0162    0.0142         0.0122    0.0142        0.0154

            4       0.0061      0.0148       0.0150    0.0142         0.0105    0.0142        0.0175

            5       0.0061      0.0165       0.0135    0.0146         0.0093    0.0145        0.0195

            6       0.0062      0.0181       0.0130    0.0150         0.0083    0.0147        0.0204

            7       0.0062      0.0191       0.0126    0.0154         0.0080    0.0151        0.0205

            8       0.0064      0.0194       0.0126    0.0136         0.0072    0.0144        0.0207

            9       0.0065      0.0194       0.0128    0.0141         0.0065    0.0138        0.0204

        10          0.0067      0.0191       0.0129    0.0140         0.0060    0.0137        0.0197




                The square root of the cross-validation measure,
                {j[i(yi – ŷ-i,j)2/I]/J}½
0.5
         Variation of the solution vector



  0




        Prediction
                     Significance
-0.5


                                         Numerical                bi
                                         precision in data

 -1
   0          5          10         15            20         25
                                                       Dimension
Solutions considered as curves
     Spurious significance in industrial data, an
                      example

Too detailled model often leads to spurious significance in case
of low rank industrial data.
Illustration: Principal Component Regression of Furnace data
X is 119 times 21. SVD decomposition of X: X=USVT.
Matrix of score vectors: T=US, same size as X.
Score vector no. 17, t17 the seventeenth column of T.
It has no correlation with the response variable. But it has
correlation of 0.55 with the reduced response variable, when the
effects of all first 17 score vectors have been subtracted from
the response variable.
Scatter plot of response variable vs the 17th score vector
                    Response variable vs score variable no. 17
 0.3




 0.2




 0.1




   0




 -0.1




 -0.2




 -0.3




 -0.4
   -0.02   -0.015    -0.01           -0.005              0       0.005   0.01
Scatter plot of the reduced response variable vs the 17th score vector.

                      Reduced response variable vs score variable no. 17
 0.04




 0.03




 0.02




 0.01




    0




 -0.01




 -0.02
    -0.02    -0.015        -0.01           -0.005              0           0.005   0.01




 Note that the all except one score value is between 0.01.
                          New methodologies
• Weighing procedures. At each step there are found appropriate
weights on variables and samples (use all data, but ‘zoom in’ on important parts).
• Multi-way data analysis. Extend analysis of matrices (two-way data,
xij) to multi-way data (data with many indices, xijklmn). It includes weighing schemes,
multiplication, inverse, precision and other concepts. Normal analysis of matrices
is different because linearity, orthogonality and similar concepts are only valid in
each mode.
• Path modelling. The concept of regressin analysis X  Y          is extended to a
large network of data blocks, where there can be many input (X’s) and output (Y’s)
data blocks.
• Non-linear modelling. There are two approaches. One is to extend the
linear model to a low dimensional surface. This accounts for many situations in
industry, where non-linearity appears as weak curvatures. The other is to replace
at the iterations exact solutions (or adjusted ones like Ridge and others) by low
rank and stable solutions.
• Dynamic models. There are typically three types of objectives: good fit,
small changes in solution vectors over time and target values. Stable solutions are
found meeting these objectives.
• Stable solutions to statistical models. Significance testing
becomes more efficient, when it is based on stable solutions.
                         Weighing schemes
                                                          w
          v




                                    X


                                                              t=Xw

                                                       p=X’v
Find w such that the score vector t is good.
Find v such that the loading vector p better meets the objective.
Ref: Weighing schemes in Multivariate Data Analysis,
        Journal of Chemometrics, 15 (2001) 371-396.
                   Multi-way data analysis
                                   w2                     w2
    w1                       w1                     w1



              X                         X                      X

                                                    w3
                         T                      t

X=(xijk). One weight vector generates T, two weight vectors generate t and
three weight vectors generate one value.
Ref: Multi-way data analysis,
         Journal of Chemometrics, submitted
      Three-way regression analysis
               w2                   v2
          w1                   v1



                    X                    Y

                           t u
Find weight vectors w1 and w2 from X and weight vectors v1 and v2
from Y such that the score vectors show maximal covariance,

       maximize (tTu), subject to |w1|=|w2|=|v1|=|v2|=1.
  Path modelling, types
                               Paths          Interpretation


                           1   X              One block. We get PCA or PCA-types of
                                              solutions.
                           2   X1 X2         Two blocks. We get linear regression.
                           3   X1 X2 X3     Multi-block extensions of linear regression.
                           4   X1             Multi-block extensions of PCA-type of
                                             solutions with the role of variables and
                               X2             samples exchanged.
                           5   X2 X3         Here we want the regression to be done with
                                             components generated from X1.
                               X1
                           6   X1 X2         Here we want to study possible changes in the
                                             response values even if we have not been able
                                         X3   to observe or measure the corresponding X-
                                              values.
                           7   X1  X2        Here are two ‘sources’ X1 and X3 that
                                             influence on the data block X2.
                                         X3
Ref: Causal and Path Modelling,
        Chemometrics and Intelligent Laboratory Systems, 58 (2001) 287-311.
     Path modelling, three data blocks



           X                        Y                  Z

           X0                        ?                   ?


When new X-samples, X0, become available, we want to know
the estimated Y-samples and how the estimated Y-samples
project onto Z.

X is projected onto Y and the projection is projected further onto Z.
  Path modelling, two input and two output
           w       1



        X1                                         Z1

                   t1                                    pz,1
                              Y
                 w2
                                              t3
                                         py         Z2
       X2
                   t2                                    pz,2

X1 chemical measurements, X2=physical.
Z1=quality variables, Z2=technical
          Non-linear modelling, illustration
               0.6
                                 U1 vs score vector 1
                                                                      0.2
                                                                                       U2 vs score vector 2



               0.4
                                                                      0.1

               0.2
                                                                        0
                 0

                                                                      -0.1
               -0.2


               -0.4                                                   -0.2
                   -2       -1             0            1      2        -0.15   -0.1     -0.05       0        0.05   0.1


                                 U3 vs score vector 3                                  U4 vs score vector 4
               0.1                                                    0.1


              0.05                                                   0.05


                 0                                                      0


              -0.05                                                  -0.05


               -0.1                                                   -0.1
                 -0.15   -0.1      -0.05       0        0.05   0.1      -0.04    -0.02           0       0.02        0.04


Adjusted response variable versus the first four score vectors. The first
plot shows quadratic curvature, the third and fourth show third order
curvature.
Ref: The Heisenberg Modelling Procedure and Application to non-linear modelling,
        Chemometrics and Intelligent Laboratory Systems, 44 (1998) 15-30.
  Non-linear modelling, quadratic surface

       y = Tx + xTFx +                          Transformations.
       = TPRT x + xTRPTFPRTx +                  Variables:

       = (PT)T(RT x) + (RT x)T(PTFP)(RT x) +    t = RT x
                                                  Parameters:
       = Tt + tTGt +
                                                   = PT
                                                  G = PTFP
Quadratic function in x is transformed into a quadratic function in t.
The parameters  and G are identified by the estimation procedure.
When new sample x0 is available, the associated score vector is
computed as t0=RTx0 and the values of the score vector are used in
computing the corresponding response value.
                           Summary
• The H-principle provides with (almost) optimal results with
respect to prediction
• It is applicable to most mathematical modelling of data.
• Advanced graphic facilities to mathematical models
• ’Safe play’ modelling of data. No risk of spurious significance
• It can be used to validate/judge the predictive ability of specific
algorithms
• The criteria of the H-principle extend to different mathematical
areas and generate new methodologies/mathematics
• The numerical procedures are very fast (on the level of ordinary
regression analysis)
• Can handle efficiently linear/non-linear models with thousands of
variables. Models can include a network of (multiway) data blocks
• Automatic/online/remote procedures can be established

								
To top