Least Square Method

Shared by: P2UCs4
Categories
Tags
-
Stats
views:
6
posted:
5/27/2012
language:
pages:
23
Document Sample
scope of work template
							      Correlation and Regression Analysis
•   Many engineering design and analysis problems involve factors that are
    interrelated and dependent. E.g., (1) runoff volume, rainfall; (2) evaporation,
    temperature, wind speed; (3) peak discharge, drainage area, rainfall intensity;
    (4) crop yield, irrigated water, fertilizer.
•   Due to inherent complexity of system behaviors and lack of full understanding
    of the procedure involved, the relationship among the various relevant factors
    or variables are established empirically or semi-empirically.
•   Regression analysis is a useful and widely used statistical tool dealing with
    investigation of the relationship between two or more variables related in a
    non-deterministic fashion.
•   If a variable Y is related to several variables X1, X2, …, XK and their
    relationships can be expressed, in general, as
                                Y = g(X1, X2, …, XK)
    where g(.) = general expression for a function;
           Y = Dependent (or response) variable;
           X1, X2,…, XK = Independent (or explanatory) variables.
                               Correlation
•   When a problem involves two dependent random variables, the degree of
    linear dependence between the two can be measured by the correlation
    coefficient r(X,Y), which is defined as

    where Cov(X,Y) is the covariance between random variables X and Y defined
    as



    where <Cov(X,Y)<       and  r(X,Y)  .

•   Various correlation coefficients are developed in statistics for measuring the
    degree of association between random variables. The one defined above is
    called the Pearson product moment correlation coefficient or correlation
    coefficient.

•   If the two random variables X and Y are independent, then r(X,Y)=
    Cov(X,Y)= . However, the reverse statement is not necessarily true.
Cases of Correlation
   Perfectly linearly
   correlated in opposite
   direction



   Uncorrelated in
   linear fashion




   Strongly & positively
   correlated in
   linear fashion




   Perfectly correlated in
   nonlinear fashion, but
   uncorrelated linearly.
    Calculation of Correlation Coefficient
• Given a set of n paired sample observations of two random variables
  (xi, yi), the sample correlation coefficient ( r) can be calculated as
                           Auto-correlation
•   Consider following daily stream flows (in 1000 m3) in June 2001 at Chung
    Mei Upper Station (610 ha) located upstream of a river feeding to Plover Cove
    Reservoir. Determine its 1-day auto-correlation coefficient, i.e., r(Qt, Qt+1).

                 Day (t) Flow Q(t)   Day (t) Flow Q(t)   Day (t) Flow Q(t)
                      1      8.35       11     313.89       21      20.06
                      2      6.78       12     480.88       22      17.52
                      3      6.32       13     151.28       23     116.13
                      4     17.36       14      83.92       24      68.25
                      5    191.62       15      44.58       25     280.22
                      6     82.33       16      36.58       26     347.53
                      7    524.45       17      33.65       27     771.30
                      8    196.77       18      26.39       28     124.20
                      9    785.09       19      22.98       29      58.00
                    10     562.05       20      21.92       30      44.08


•   29 pairs: {(Qt, Qt+1)} = {(Q1, Q2), (Q2, Q3), …, (Q29, Q30)};
    Relevant sample statistics: n=29
        Qt  186.22; SQt  230.06; Qt 1  187.45; SQt 1  229.17

    The 1-day auto-correlation is 0.439
                                                   Chung Mei Upper Daily Flow

                           800                                                                                     900
                           700                                                                                     800




                                                                                                Q(t+1), 1000 m^3
                                                                                                                   700
Flow (1000 cubic meters)




                           600

                           500
                                                                                                                   600
                                                                                                                   500
                           400
                                                                                                                   400
                           300
                                                                                                                   300
                           200
                                                                                                                   200
                           100
                                                                                                                   100
                            0
                                                                                                                     0
                                                          10       Day   20         30                                   0   200   400        600   800   1000
                                                                                                                                   Q(t), 1000 m^3



                                                          Autocorrelation for June 2001 Daily Flows at Chung Mei Upper, HK
                                                    1.0
                                                    0.8
                                                    0.6
                                 Autocorrelation




                                                    0.4
                                                    0.2
                                                    0.0
                                                   -0.2
                                                   -0.4
                                                   -0.6
                                                   -0.8
                                                   -1.0


                                                               1              2             3                                  4               5
                                                                                     Time lags (Days)
                      Regression Models
• due to the presence of uncertainties a deterministic functional
  relationship generally is not very appropriate or realistic.
• The deterministic model form can be modified to account for
  uncertainties in the model as
                  Y = g(X1, X2, …, XK) + e
  where e = model error term with E(e)=0, Var(e)=s2.

• In engineering applications, functional forms commonly used for
  establishing empirical relationships are
   – Additive: Y = b0 + b1X1 + b2X2 + … + bKXK +e

    – Multiplicative: Y  β 0 X1 1 X β 2 ... X β K e.
                               β
                                     2         K
                      Least Square Method
Suppose that there are n pairs of data, {(xi, yi)}, i=1, 2,.. , n and a plot of
   these data appears as
             y




                                                                     x


What is a plausible mathematical model describing x & y relation?
                      Least Square Method

Considering an arbitrary straight line, y =b0+b1 x, is to be fitted through these
  data points. The question is “Which line is the most representative”?

                y
                                               ^ =b +b x
                                               y 0 1
                                                               b1
                                                           1


                                                    ^
                                          ei = yi – yi = error (residual)

                                ^
                                yi
                                          yi

           b0
                                                                            x
                                     xi
                 Least Square Criterion
• What are the values of b0 and b1 such that the resulting line “best” fits
  the data points?

• But, wait !!! What goodness-of-fit criterion to use to determine among
  all possible combinations of b0 and b1 ?

• The least squares (LS) criterion states that the sum of the squares of
  errors (or residuals, deviations) is minimum. Mathematically, the LS
  criterion can be written as:




• Any other criteria that can be used?
      Normal Equations for LS Criterion
• The necessary conditions for the minimum values of D are:
         D          D
             0 and     0
        b  0       b          1


                                                          n
                                                            y i  b   b  xi   0
            D       n

            b  2  y i  b 0  b 1 xi  1  0     i 1
            0      i 1
                                                          n
          
            D  2  y  b  b x  x   0            x y  b  b x   0
                                                           i i
                     n

            b 1   i 0 1i                      i
                                                           i 1
                                                                                i
                  i 1




•   Expanding the above equations
            n                    n
                      b   b  xi  0
             y i  n          
             i 1              i 1
            n               n         n
             xi y i  b  xi  b  xi2  0
            
             i 1
                          i 1
                                     
                                     i 1


• Normal equations:
                  n             n
                                                    
          nb     xi  b    yi               
                  i 1        i 1               
                                                  
           x  b   x 2  b  x y                
               n             n          n
            i    i    i i
           i 1        i 1                    
                                     i 1          
       LS Solution (2 Unknowns)

          n
                        n
                              
        y i    xi 
b    i 1    i 1  b   y  x b 
  ˆ                              ˆ
       n   n 
                          
                          
        n                n    n       n
                     1
b      xi yi  n  xi  yi  xi yi  nxy
  ˆ  i 1             i 1 i 1    i 1
                                     n
                                 2
             n
                      1 n 
                                      xi     nx
                                            2      2
                x i    xi 
           n  i 1 
                  2

          i 1
                                       i 1
Fitting a Polynomial Eq. By LS Method
y i  b   b  xi  b 2 xi2      b k xik  e i , i  1,2,  , n
LS criterion:
                        D=  yi  b   b  xi  b  xi2      b  xik 
                              n
                                                                              2
      minimize
                             i 1

      b  ,  , b 

        D
Set           0 , for j  0,1,2,  , k
        b j
 Normal        Equations are:

                            n                    n k        n

        bn           b    xi       b    xi    y i
                            i 1                 i 1     i 1

         n                n 2                  n k 1      n
        b    xi   b    xi       b    xi    y i xi
         i 1             i 1                 i 1      i 1
                                                             
       
         n k               n k 1                n 2k  n
        b    xi   b    xi       b    xi    y i xi
                                                                     k

         i 1              i 1                  i 1  i 1
Fitting a Linear Function of Several Variables
                    y  b   b  x1  b  x 2    b  x k  e
 LS criterion :

           Minimize D=   yi   b   b xi  b  x1      b xk  
                                    n                                        2


                       i 1
                                                                        
       b  b  , b 1 ,  , b k 

          D
 Set            0 , for j  0, 1, 2,   , k
          b j

Normal equations:
                               n                      n              n

           bn           b    xi1       b    xik    y i
                               i 1                   i 1         i 1

            n                n 2                    n                n
           b    xi   b    xi1       b    xi1 xik    y i xi1
            i 1             i 1                   i 1           i 1
                                                                      
          
            n                 n                         n 2 n
           b    xik   b    xik xi1       b    xik    y i xik
            i 1              i 1                      i 1  i 1
Matrix Form of Multiple Regression by LS
              y1   1   x11 x12  x1k   b    e 1 
             y   1                       
                          x 21 x22  x2 k   b   e 2 
              2                              
                                       
                                            
              yn   1
                         x n1 x n 2  x nk   b k   e n 
                                            

      (Note: xij = ith observation of the jth independent variable)

 or         y=Xb+e                    in short

 LS criterion is:
                   n
  min D   e i2  ε' ε  y - X β ' y - X β 
                 i 1

  β                                                                        ^
  Set         D         0 , and result in:                    X' ( y - X β )  0
                   β
 The LS solutions are:                       ˆ  X' X 1 X' y
                                              β
         Measure of Goodness-of-Fit

R2 = Coefficient of Determination
             n 2
              ε
                 i
    1    i 1
          n
           
           yi  y2
        i 1

  = 1 - % of variation in the dependent variable, y, unexplained by
    the regression equation;
  = % of variation in the dependent variable, y, explained by the
    regression equation.
Example 1 (LS Method)
Example 1 (LS Method)
LS Example
LS Example (Matrix Approach)
LS Example (by Minitab w/ b0)
LS Example (by Minitab w/o b0)
LS Example (Output Plots)

						
Related docs
Other docs by P2UCs4
Culpeper County Gifted Programs
Views: 5  |  Downloads: 0
Opleiden in school
Views: 11  |  Downloads: 0
S195 Homework
Views: 0  |  Downloads: 0
Ochsner Clinic Foundation
Views: 2  |  Downloads: 0
Ice Cream 285
Views: 11  |  Downloads: 0
Questionnaire Aid
Views: 28  |  Downloads: 0
Chapter 4: Network Layer
Views: 5  |  Downloads: 0