Embed
Email

fried

Document Sample
fried
Shared by: HC11113018140
Categories
Tags
Stats
views:
0
posted:
11/30/2011
language:
English
pages:
58
Robustness for High-dimensional Data RobHD 2004, Vorau, May 5th-8th 2004







Signal Extraction from Time Series



Roland Fried Ursula Gather

Universidad Carlos III Universität Dortmund

de Madrid, Spain Germany

Hemodynamic variables

of a critically ill patient

arterial pressures,

300

pulmonary artery pressures,

central venous pressure,

heart rate, pulse,

250

temperature

200







150







100







50







0



0 500 1000 1500 2000 2500 3000 3500



Time [Minutes]

Goals of Clinical Data Analysis



• Intelligent Alarm Systems

• Clinical Decision Support









• Signal extraction

• Outlier detection / classification

• Level shift / trend detection

• Dimension reduction

• Fast, automatic algorithms

• Good clinical interpretability

• Signal extraction from univariate (yt)



Signal + noise model y t   t  ut  v t





Signal Observational noise Spiky noise

smooth with a symmetric measurement

few sudden shifts mean zero artifacts …



Moving window (yt-m, …, yt, … ,yt+m) of width n=2m+1

for approximation of  t



Choice of m: bias, time delay admissible

variance, robustness, computation time

Location based filtering



Heart rate , running mean and running median (length 31)



85





80





75





70





65





60







Problems: 0 50 100 150

Time [min]

200 250







• Running mean not robust

• Running median not smooth

• Robust Linear Regression



Extract local linear trend μt  jβt from y t m,, y t m 





– Least median of

squares: argminμt ,βt  :

 med j y t  j  μt  jβ t 2 



 

– Repeated

 y t  j  y t i 

median: μt  medj y t  j  jβt , β t  med j  medi j

 



 ji 

– L1 Regression: 

 

argminμt ,β t  :  y t  j  μt  jβ t 



 j 



Simulations: MSE for Level Approximation





different log  MSERM

MSELMS 



– numbers of

outliers and

– sizes of

• outliers

• level shifts

log  MSEL1

MSELMS 



• trends

L1 best

RM best

LMS best

(Davies, Fried, Gather 2004)

Application: heart rate



85

85

80

80

75

75

70

70

65

65

60

60









0

0 50

50 100

100 150

150 200

200 250

250

Time

Time









LMS not stable, Repeated Median and L1 similar

• Modified Double Window (DW) Filters

Double Window Modified Trimmed Mean Filter (Lee, Kassam, 1985)

 t  medy t k ,, y t k , k  m

~

1

MTM ( y t )   y t i s t  MADy t k ,, y t k 

~

Jt iJt

J  i  m,, m : y    2s 

t t i t

~ t

~



Regression based analogues

1) Apply RM to (yt-k,…,yt+k)

2) Estimate st from regression residuals

3) Trim all yt+i with large regression residuals in (yt-m,…,yt+m)

4) Apply least squares or repeated median





TRM Filter MRM Filter

Removal of spikes



Number of spikes which can be dampened

Med RM MTM TRM MRM

m m-1 k k-1 k-1





Number of spikes which can be removed completely if ut  0

Med RM MTM TRM MRM

Steady state m m-1 k k-1 k-1

Trend period 0 m-1 0 k-1 k-1



Length of outlier patches important for choice of inner window

Efficiency for Gaussian Noise, m=10

100









Location based: Median

MTM, k=10

80









MTM, k= 5

60









Regression: RM

MRM, k= 7

40









TRM, k= 7

20

0









0.0 0.1 0.2 0.3 0.4 0.5 slope





Efficiency of modified filters high

Shift Preservation

Max. MSE for outliers at the end of the window

(max. for outliers of sizes 1, … ,10)

Med

MTM

5









5

RM

4









4

3









3

DW:

2









2

MTM

MRM

1









1

TRM

0









0

number of

2 4 6 8 10 2 4 6 8 10

outliers

Slope b= 0.0 b= -0.5



Location based filters blur shifts during trends,

double window filters reduce blurring of shifts

Series with outliers, shifts and trends







15

10

5

0

-5









0 50 100 150 200 250 300 Time

RM smooth, MTM good at shifts, TRM compromise

• Hybrid Filter

FIR-Median Hybrid (FMH) Filter (Heinonen and Neuvo, 1987, 1988)



FMH( y t )  med1( y t ),, M( y t )



Predictive FMH Filter, M=3

m m

1( y t )   w h y t h 2 ( y t )  y t 3 ( y t )   w h y t h

h1 h1



Combined FMH Filter, M=5

1 m 1 m

 4 ( y t )   y t h 5 ( y t )   y t  h

m h1 m h1

Predictive / Combined Repeated Median Hybrid (RMH) Filter

Half-window averages half-window medians

One-sided linear predictors one-sided repeated medians

Removal of spikes



Number of spikes which can be dampened

SM RM PFMH CFMH PRMH CRMH

m m-1 1 1 m / 2 m / 2  1



Number of spikes which can be removed completely if ut  0

SM RM PFMH CFMH PRMH CRMH

Steady state m m-1 1 1 m / 2 m / 2

Trend period 0 m-1 1 0 m / 2 m / 2



Hybrid filters more limited than DW filters

RMH filters more robust than FMH filters

100 Efficiency for Gaussian Noise, m=10





Location based: Median

Regression: RM

80









FMH: PFMH

60









CFMH

RMH: PRMH

40









CRMH

20

0









0.0 0.1 0.2 0.3 0.4 0.5 slope





Hybrid filters less efficient than DW filters

RMH filter almost as efficient as FMH filter

Shift Preservation

Max. MSE for outliers at the end of the window

(max. for outliers of sizes 1, … ,10)

Med









5

5

RM

4









4

FMH:

PFMH

3









3

CFMH









2

2









RMH:

1









1

PRMH

number of

0









0

CRMH 2 4 6 8 10 2 4 6 8 10

outliers

Slope b= 0.0 b= -0.5





Combined hybrid filters blur shifts in trends slightly,

but even less than DW filters

Series with outliers, trends, shifts and extremes







20

15

10

5

0

-5









0 50 100 150 200 250 300 Time



RM smooth, PFMH preserves extremes, PRMH more robust

• Signal extraction from d-variate (Yt)

(e.g. Peña & Box, 1987)



Factor Model: Yt  Λft  ε t , t  ...

ft  : Process of r latent variables Λ  (d  r ) - Matrix of loadings

125





Results 100







10 vital parameters: 7



5

5



0



25



300 0



-25



250 -50



-75

50 20 30 50

10 30 40

0

0 1 0 20 40 50

50 00 50

00 50 00

50 00 50

00

200

i

Z e t





150

Series of extracted factors

125



100 100



75



50 50



25



0 0





050

1 0 20 40 50

0 00 50 00

10 30 40

50 00 50

20 30 50 0

50 00 500 -25



-50



timei

Z e t

-75

50 20 30 50

10 30 40

0

0 1 0 20 40 50

50 00 50

00 50 00

50 00 50

00









Series of “model errors’’

i

Z e t

Conclusion



• Extraction of time-varying mean from

contaminated time series by robust regression



• LMS very robust, but slow and unstable

Repeated median robust, fast and stable



• Double window and hybrid filters improve

preservation of shifts and extremes



• Adaptive window width selection

Full online version

Multivariate signal extraction

References

Bernholt, T., Fried, R. (2003).

Computing the update of the repeated median regression line in linear time.

Information Processing Letters 88, 111-117.

Davies, P.L., Fried, R., Gather, U. (2004).

Robust signal extraction for on-line monitoring data.

J. Statistical Planning and Inference, 122, 65-78.

Fried, R. (2004).

Robust filtering of time series with trends.

J. Nonparametric Statistics, to appear.

Fried, R., Bernholt, T., Gather, U. (2004).

Repeated median and hybrid filters.

Technical Report 10/2004, SFB 475, University of Dortmund, Germany.

Gather, U., Fried, R. (2004a).

Robust scale estimation for local linear temporal trends.

Tatra Mountains Mathematical Publications, 26, 87-101.

Gather, U., Fried, R. (2004b).

Methods and algorithms for robust filtering.

Proceedings of COMPSTAT 2004, to appear.

Statistical demands



Analysis must Procedure needs



• work automatically unique solution



• work online low computation time



• resist measurement artifacts high breakdown point

(many data situations possible) low bias curves



• attenuate observational noise satisfactory efficiency





No claim of optimality, compromise needed

Desirable properties



• Noise attenuation: efficiency

• Stability: continuity

• Removal of spikes: exact fit, robustness

• Preservation of shifts and extremes:

exact fit, robustness

• Trend tracking: invariance

• Online analysis: fast algorithms

Computation time (millisec.)



window width n=2m+1

m=10 m=15

– Least median of

squares: O(n2) 5.40 11.15

– Repeated

median: O(n2) 2.60 4.50

online: O(n) 0.62 0.83

(Bernholt, Fried 2003)



– L1 Regression:

O(n log n) 2.40 4.70

Finite Sample Replacement Breakdown Point



Define Uk ( yt )  {z  (z1,..., zn ) : #{i : zi  y t i}  k}





k* / n  min k / n : sup{|| T(z)  T(y t ) ||, z  Uk (y t )}  



k* TL2 TL1 TRM TLMS

n=21 1 7 10 10

n=31 1 10 15 15



k*: smallest number of contaminated observations

which can cause a spike of any size in the extracted signal

Robustness

Smallest number k* of contaminated observations

which can cause a spike of any size in the extracted signal





k*  min k : sup{|| T(z)  T(yt ) ||, z  Uk (yt )}  



where Uk ( yt )  {z  (z1,..., zn ) : #{i : zi  y t i}  k}



k* L2 L1 RM LMS

n=21 1 7 10 10

n=31 1 10 15 15

Finite-sample efficiency relatively to L2 (MSE)

80

%

70





60





50





40



Med

30

L1

20

RM

10 LMS

Slope 0.00 0.10 0.00 0.10 0.00 0.10 0.00 0.10

0.05 0.05 0.05 0.05

Width n=11 n=21 n=31 n=61

Rep. median and L1 regression never much worse than median

Finite-sample Efficiency w.r.t. L2 (MSE)

80

%

70





60





50





40



Med

30

L1

20

RM

10 LMS

Slope 0.0 0.1 0.0 0.1 0.0 0.1

Width n=21 n=31 n=61



Rep. median and L1 regression never much worse than median

Performance when outliers present

MSE for  t

Level shift of 10s









size number of outliers



LMS d1 = 0 : Trimming

d0 = d1 > 0 : Winsorization

• Online Outlier Replacement for RM

~ ~

Outlier region centered at y t m1   t  (m  1)  bt

ˆ



Robust scale estimation

e.g. Rousseeuw and Croux‘ Q ,   

~ 

s t  c t  | ri r j|, m  i  j  m(h) very good for



 

shifts and inliers

m1

h 2

~ ~

r j  y t  j  t  j bt , j  m,...,m residuals



~

Replace y t m1 by y t m1 if | y t m1  y t m1 |  dO  st

ˆ ˆ

e.g. dO  3



(Gather, Fried, 2004a, Fried, 2004)

Scale Approximation

~

~  j b , j  m,..., m

Residuals r j y t  j   t t







MAD

sMAD  c1, t  med | r m |,..., | rm |

~

t



Length of the shortest half

~

t 

sLSH  c 2, t  min | r( j m )  r ( j) |, j  m,...,0 

Rousseeuw and Croux‘ Q

~ 

sQN  c3, t  | ri r j|,m  i  j  m

t (h )

, h  

m 1

2



Nested scale statistic

~

sSN  c4, t  med i med ji | ri r j|

t

(Gather, Fried, 2003, Fried, 2003)

• Outlier replacement

~

~  j b , j  m,...,m

Residuals r j y t  j  t t



MAD

sMAD  c1,t  med| rm |,...,| rm |

~

t works fine



Length of the shortest half

~

t 

sLSH  c 2,t  min | r( jm) r ( j) |, j  m,...,0 better worst case



e.g. with robust scale estimation

Rousseeuw and Croux‘ Q

~

t 

sQN  c 3,t  | ri r j|, m  i  j  m (h) very good for

shifts and inliers

h  

m1

2





(Gather, Fried, 2004a, Fried, 2004)

Application: Heart Rate



Heart rate , LMS and RM with outlier replacement



85





80





75





70





65





60





0 50 100 150 200 250

time

Sinusoid, 10% patchy additive N(0,9s) outliers









Trimming with Q LMS

• Level Shift Detection



EWMA, CUSUM, Runs etc. not robust against outliers



Robust majority rule:



~ ~

~ b s

Compute t t t 

Detect positive LS at t+j0 if # j  1,...,m : rj  dLS  s t    

~ m

2

 

j0  min j  1,..., m : rj  dLS  s t 

~





dLS : clinically relevant threshold, e.g. dLS=2

Simulated Time Series with Shifts

LMS and RM with outlier replacement and shift detection

18



16



14



12



10



8



6



4



2



0



-2

0 50 100 150 200 250 300 350 400 450 500

time



time

Trend invariance

Replacing yt-m, …,y0, … ,yt+m by yt-m - mb, …, y0, … ,yt+m + mb

does not change the extracted signal



Invariant: RM, TRM, MRM, PFMH, PRMH

Not invariant: SM, MTM, CFMH, CRMH





Lipschitz continuity



SM RM PFMH CFMH PRMH CRMH

Const. 1 2k+1 4/(k-1) 4/(k-1) 2k+1 2k+1



Not continuous: MTM, TRM, MRM

Efficiency for Gaussian noise







100









100

80









80

60









60

efficiency









efficiency

40

40









20

20









0

0









0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

slope slope

100









100

80









80

60









60

efficiency









=0.6

efficiency

40









40

20









20

0









0









0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

slope slope

Efficiency for Gaussian Noise









100

100

Autocor.

=0.0









80

80

60









60

Med Med









40

40







RM RM









20

20









MTM FMH:









0

0









0.0 0.1 0.2 0.3 0.4 0.5 PFMH 0.0 0.1 0.2 0.3 0.4 0.5









100

100









DW: CFMH

MTM RMH:









80

80









MRM PRMH









60

60









TRM CRMH







40

40









20

20









=0.6

0

0









0.0 0.1 0.2 0.3 0.4 0.5 slope 0.0 0.1 0.2 0.3 0.4 0.5





Efficiency of repeated median high, of hybrid filter low

Efficiency for Gaussian Noise

Modified filter Hybrid filter









100

100









80

80









60

60









40

40









20

20









slope









0

0









0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5



Med, MTM RM Med, RM

DW: MTM MRM TRM FMH: PFMH, CFMH

RMH: PRMH, CRMH



Efficiency of modified filter high, of hybrid filter low

Shift Preservation



5









5

4









4

3









3

RMSE









RMSE

2









2

1









1

0









0

2 4 6 8 10 2 4 6 8 10



number of outliers number of outliers









5

5









4

4









3

3









RMSE

RMSE









2

2









1

1









0

0









2 4 6 8 10 2 4 6 8 10

Shift Preservation

Max. RMSE for increasing number of outliers at the end

Slope









5

b= 0.0 5

4









4

3









3

Med Med

2









2

MTM RM









1

1









RM FMH:









0

0









2 4 6 8 10 PFMH 2 4 6 8 10

DW: CFMH

5









5

MTM RMH:

4









4

MRM PRMH

3









3

TRM CRMH

2









2

1









1

number of

b=-0.5

0









0

2 4 6 8 10 outliers 2 4 6 8 10





Double window and hybrid filter reduce blurring of shifts

Removal of spikes



Number of spikes which can be removed competely



SM RM MTM TRM MRM PFMH CFMH PRMH CRMH

Steady m m-1 k k-1 k-1 1 1 m / 2 m / 2

Trend 0 m-1 0 k-1 k-1 1 0 m / 2 m / 2



Number of spikes which can be dampened

SM RM MTM TRM MRM PFMH CFMH PRMH CRMH

m m-1 k k-1 k-1 1 1 m / 2 m / 2  1

Outlier patch in the center

10









10

8









8

6









6

RMSE









RMSE

4









4

2









2

0









0

2 4 6 8 10 2 4 6 8 10



number of outliers number of outliers

10









10

8









8

6









6

RMSE









RMSE

4









4

2









2

0









0









2 4 6 8 10

2 4 6 8 10

Outlier patch in the center

Max. RMSE for increasing number of outliers at the end

Slope

10









10

b= 0.0 8









8

6









6

Med Med

4









4

RM RM

2









2

MTM FMH:

0









0

2 4 6 8 10 PFMH 2 4 6 8 10

DW: CFMH









10

10









MTM RMH:









8

8









MRM PRMH









6

6









TRM CRMH







4

4









2

2









number of

b=-0.5

0

0









2 4 6 8 10 outliers 2 4 6 8 10



Double window and hybrid: info about length of patches needed

140

120









140

100

mu[1:nt, 19]









120

100

mu[1:nt, 19]

80









80

60

60









40

20

40









0 50 100 150 200 250 300



Index

20









0 50 100 150 200 250 300









140

Index



Green: SM







120

Blue: RM



100

mu[1:nt, 19]



80

RED: MRMS

60

Purple: MRM 40





Orange: MTM

20







0 50 100 150 200 250 300



Index





Yellow: DWMTM

15









15

10









10

mu[1:nt, 19]









mu[1:nt, 19]

5









5

0









0

-5









-5

0 50 100 150 200 250 300

0 50 100 150 200 250 300

Index









15

Index









Green: SM Online estimates 10



Blue: RM Blue RM

mu[1:nt, 19]









RED: TRMS Yellow MRM

Purple: MRM Green TRM

5









Orange: MTM

Yellow: DWMTM

0

Hybrid filters in practice



Med

RM

20

FMH: 15







PFMH

CFMH

10

mu[1:nt, 15]









RMH:

PRMH

5









CRMH

0

-5









0 50 100 150 200 250 300



Index

• Robust Window Width Selection



Assumption of a constant slope only locally valid

Example: Smoothing of a maximum by a linear fit







Adjust window width

{

{

{



5 11 5 when slope changes









Short window width: Large window width:

+ small time delay + better noise attenuation

+ preservation of extremes + robustness

Basic Algorithm for Window Width Selection

Consider sign of the residuals:

St  #  ri  0, i   (mt  2) / 2 ,,  mt 

where nt = 2mt+1 window width at time point t





ESt   mt  1 / 2

for noise with symmetric d.f.





Choose 0
if St  d  mt  1 / 2 or St  2  d  mt  1 / 2 reduce mt ,

set mt+1 = mt + 1 otherwise

Gather, Fried (2004b)

Sawtooth signal

overlaid by Gaussian noise and additive outliers







10

5

0

-5









0 50 100 150 200 250 300

time

Repeated median with adaptive window width, d=0.7, 4 < mt < 16

Sawtooth signal

overlaid by Gaussian noise and additive outliers







10

5

0









Repeated

median,

-5









m=10

0 50 100 150 200 250 300

time

Repeated median with adaptive window width, d=0.7, 5  mt  15

Application: Pulmonary Artery Pressure





50

45

40

35

30

25

20









0 500 1000 1500 2000 2500

time

RM with adaptive width, outlier replacement and shift detection

Loadings of rotated factors constant?

Observations 1-1000 APD .0001 .4552 .0421 -.2488

APS .0091 .6104 -.0505 .0021

APM -.0158 .6296 .0073 .1582

PAPD -.5070 -.0291 .0149 -.1960

PAPS -.5424 -.0694 -.0562 -.1093

PAPM -.4846 .0370 .0129 .1385

CVP -.4619 .0667 .0276 .1559

HR .0080 -.0774 .6914 .0511

Puls -.0107 .0725 .7173 -.0533

Temp -.0200 -.0092 .0054 .9015





APD -.0770 .5166 .0641 -.1394

Observations 1000-2000 APS .0528 .5939 -.0313 .0171

APM .0339 .5975 -.0160 .1227

PAPD .1163 .0449 -.1732 -.5932

PAPS .4554 .0625 .0122 -.1696

PAPM .5633 .0002 .0501 -.0546

CVP .6405 -.0447 -.0480 .1777

HR -.1078 .0766 .7011 -.0517

Puls .1618 -.0667 .6755 .0870

Temp .0388 .0621 -.1360 .7321





But: Outliers, artifacts, ...

Loadings for filtered time series

APD -.0088 .4385 .0656 -.3643

Observations 1-1000 APS -.0205 .6015 -.0561 -.0201

APM .0344 .6374 -.0025 .1387

PAPD .5240 -.0473 -.0138 -.1597

PAPS .5436 -.0787 -.0512 -.1338

PAPM .4717 .0576 .0199 .1539

CVP .4532 .0732 .0421 .1530

HR -.0014 -.1016 .6854 .0197

Puls .0097 .0926 .7194 -.0273

Temp .0073 .0572 .0122 .8694





APD -.0788 .5796 .0335 -.2393

Observations 1001-2000 APS .0512 .5721 -.0409 .0572

APM .0243 .5668 .0133 .1345

PAPD .4059 .0561 -.1097 -.3168

PAPS .4536 .0412 .0664 -.0993

PAPM .5150 -.0186 .0528 -.0014

CVP .5756 -.0340 -.0606 .2636

HR -.0817 .0392 .7255 -.0360

Puls .1282 -.0431 .6689 .0481

Temp -.0134 .0746 -.0111 .8590


Related docs
Other docs by HC11113018140
Chapter 2 The U S and Global Economies
Views: 0  |  Downloads: 0
Minutes
Views: 4  |  Downloads: 0
Work
Views: 0  |  Downloads: 0
Public Sector Act 2009
Views: 0  |  Downloads: 0
PowerPoint Presentation
Views: 22  |  Downloads: 0
standard and quality report 2008 2009
Views: 0  |  Downloads: 0
Text
Views: 8  |  Downloads: 0
Please read this before using presentation
Views: 4  |  Downloads: 0
INDICE
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!