Robustness for High-dimensional Data RobHD 2004, Vorau, May 5th-8th 2004
Signal Extraction from Time Series
Roland Fried Ursula Gather
Universidad Carlos III Universität Dortmund
de Madrid, Spain Germany
Hemodynamic variables
of a critically ill patient
arterial pressures,
300
pulmonary artery pressures,
central venous pressure,
heart rate, pulse,
250
temperature
200
150
100
50
0
0 500 1000 1500 2000 2500 3000 3500
Time [Minutes]
Goals of Clinical Data Analysis
• Intelligent Alarm Systems
• Clinical Decision Support
• Signal extraction
• Outlier detection / classification
• Level shift / trend detection
• Dimension reduction
• Fast, automatic algorithms
• Good clinical interpretability
• Signal extraction from univariate (yt)
Signal + noise model y t t ut v t
Signal Observational noise Spiky noise
smooth with a symmetric measurement
few sudden shifts mean zero artifacts …
Moving window (yt-m, …, yt, … ,yt+m) of width n=2m+1
for approximation of t
Choice of m: bias, time delay admissible
variance, robustness, computation time
Location based filtering
Heart rate , running mean and running median (length 31)
85
80
75
70
65
60
Problems: 0 50 100 150
Time [min]
200 250
• Running mean not robust
• Running median not smooth
• Robust Linear Regression
Extract local linear trend μt jβt from y t m,, y t m
– Least median of
squares: argminμt ,βt :
med j y t j μt jβ t 2
– Repeated
y t j y t i
median: μt medj y t j jβt , β t med j medi j
ji
– L1 Regression:
argminμt ,β t : y t j μt jβ t
j
Simulations: MSE for Level Approximation
different log MSERM
MSELMS
– numbers of
outliers and
– sizes of
• outliers
• level shifts
log MSEL1
MSELMS
• trends
L1 best
RM best
LMS best
(Davies, Fried, Gather 2004)
Application: heart rate
85
85
80
80
75
75
70
70
65
65
60
60
0
0 50
50 100
100 150
150 200
200 250
250
Time
Time
LMS not stable, Repeated Median and L1 similar
• Modified Double Window (DW) Filters
Double Window Modified Trimmed Mean Filter (Lee, Kassam, 1985)
t medy t k ,, y t k , k m
~
1
MTM ( y t ) y t i s t MADy t k ,, y t k
~
Jt iJt
J i m,, m : y 2s
t t i t
~ t
~
Regression based analogues
1) Apply RM to (yt-k,…,yt+k)
2) Estimate st from regression residuals
3) Trim all yt+i with large regression residuals in (yt-m,…,yt+m)
4) Apply least squares or repeated median
TRM Filter MRM Filter
Removal of spikes
Number of spikes which can be dampened
Med RM MTM TRM MRM
m m-1 k k-1 k-1
Number of spikes which can be removed completely if ut 0
Med RM MTM TRM MRM
Steady state m m-1 k k-1 k-1
Trend period 0 m-1 0 k-1 k-1
Length of outlier patches important for choice of inner window
Efficiency for Gaussian Noise, m=10
100
Location based: Median
MTM, k=10
80
MTM, k= 5
60
Regression: RM
MRM, k= 7
40
TRM, k= 7
20
0
0.0 0.1 0.2 0.3 0.4 0.5 slope
Efficiency of modified filters high
Shift Preservation
Max. MSE for outliers at the end of the window
(max. for outliers of sizes 1, … ,10)
Med
MTM
5
5
RM
4
4
3
3
DW:
2
2
MTM
MRM
1
1
TRM
0
0
number of
2 4 6 8 10 2 4 6 8 10
outliers
Slope b= 0.0 b= -0.5
Location based filters blur shifts during trends,
double window filters reduce blurring of shifts
Series with outliers, shifts and trends
15
10
5
0
-5
0 50 100 150 200 250 300 Time
RM smooth, MTM good at shifts, TRM compromise
• Hybrid Filter
FIR-Median Hybrid (FMH) Filter (Heinonen and Neuvo, 1987, 1988)
FMH( y t ) med1( y t ),, M( y t )
Predictive FMH Filter, M=3
m m
1( y t ) w h y t h 2 ( y t ) y t 3 ( y t ) w h y t h
h1 h1
Combined FMH Filter, M=5
1 m 1 m
4 ( y t ) y t h 5 ( y t ) y t h
m h1 m h1
Predictive / Combined Repeated Median Hybrid (RMH) Filter
Half-window averages half-window medians
One-sided linear predictors one-sided repeated medians
Removal of spikes
Number of spikes which can be dampened
SM RM PFMH CFMH PRMH CRMH
m m-1 1 1 m / 2 m / 2 1
Number of spikes which can be removed completely if ut 0
SM RM PFMH CFMH PRMH CRMH
Steady state m m-1 1 1 m / 2 m / 2
Trend period 0 m-1 1 0 m / 2 m / 2
Hybrid filters more limited than DW filters
RMH filters more robust than FMH filters
100 Efficiency for Gaussian Noise, m=10
Location based: Median
Regression: RM
80
FMH: PFMH
60
CFMH
RMH: PRMH
40
CRMH
20
0
0.0 0.1 0.2 0.3 0.4 0.5 slope
Hybrid filters less efficient than DW filters
RMH filter almost as efficient as FMH filter
Shift Preservation
Max. MSE for outliers at the end of the window
(max. for outliers of sizes 1, … ,10)
Med
5
5
RM
4
4
FMH:
PFMH
3
3
CFMH
2
2
RMH:
1
1
PRMH
number of
0
0
CRMH 2 4 6 8 10 2 4 6 8 10
outliers
Slope b= 0.0 b= -0.5
Combined hybrid filters blur shifts in trends slightly,
but even less than DW filters
Series with outliers, trends, shifts and extremes
20
15
10
5
0
-5
0 50 100 150 200 250 300 Time
RM smooth, PFMH preserves extremes, PRMH more robust
• Signal extraction from d-variate (Yt)
(e.g. Peña & Box, 1987)
Factor Model: Yt Λft ε t , t ...
ft : Process of r latent variables Λ (d r ) - Matrix of loadings
125
Results 100
10 vital parameters: 7
5
5
0
25
300 0
-25
250 -50
-75
50 20 30 50
10 30 40
0
0 1 0 20 40 50
50 00 50
00 50 00
50 00 50
00
200
i
Z e t
150
Series of extracted factors
125
100 100
75
50 50
25
0 0
050
1 0 20 40 50
0 00 50 00
10 30 40
50 00 50
20 30 50 0
50 00 500 -25
-50
timei
Z e t
-75
50 20 30 50
10 30 40
0
0 1 0 20 40 50
50 00 50
00 50 00
50 00 50
00
Series of “model errors’’
i
Z e t
Conclusion
• Extraction of time-varying mean from
contaminated time series by robust regression
• LMS very robust, but slow and unstable
Repeated median robust, fast and stable
• Double window and hybrid filters improve
preservation of shifts and extremes
• Adaptive window width selection
Full online version
Multivariate signal extraction
References
Bernholt, T., Fried, R. (2003).
Computing the update of the repeated median regression line in linear time.
Information Processing Letters 88, 111-117.
Davies, P.L., Fried, R., Gather, U. (2004).
Robust signal extraction for on-line monitoring data.
J. Statistical Planning and Inference, 122, 65-78.
Fried, R. (2004).
Robust filtering of time series with trends.
J. Nonparametric Statistics, to appear.
Fried, R., Bernholt, T., Gather, U. (2004).
Repeated median and hybrid filters.
Technical Report 10/2004, SFB 475, University of Dortmund, Germany.
Gather, U., Fried, R. (2004a).
Robust scale estimation for local linear temporal trends.
Tatra Mountains Mathematical Publications, 26, 87-101.
Gather, U., Fried, R. (2004b).
Methods and algorithms for robust filtering.
Proceedings of COMPSTAT 2004, to appear.
Statistical demands
Analysis must Procedure needs
• work automatically unique solution
• work online low computation time
• resist measurement artifacts high breakdown point
(many data situations possible) low bias curves
• attenuate observational noise satisfactory efficiency
No claim of optimality, compromise needed
Desirable properties
• Noise attenuation: efficiency
• Stability: continuity
• Removal of spikes: exact fit, robustness
• Preservation of shifts and extremes:
exact fit, robustness
• Trend tracking: invariance
• Online analysis: fast algorithms
Computation time (millisec.)
window width n=2m+1
m=10 m=15
– Least median of
squares: O(n2) 5.40 11.15
– Repeated
median: O(n2) 2.60 4.50
online: O(n) 0.62 0.83
(Bernholt, Fried 2003)
– L1 Regression:
O(n log n) 2.40 4.70
Finite Sample Replacement Breakdown Point
Define Uk ( yt ) {z (z1,..., zn ) : #{i : zi y t i} k}
k* / n min k / n : sup{|| T(z) T(y t ) ||, z Uk (y t )}
k* TL2 TL1 TRM TLMS
n=21 1 7 10 10
n=31 1 10 15 15
k*: smallest number of contaminated observations
which can cause a spike of any size in the extracted signal
Robustness
Smallest number k* of contaminated observations
which can cause a spike of any size in the extracted signal
k* min k : sup{|| T(z) T(yt ) ||, z Uk (yt )}
where Uk ( yt ) {z (z1,..., zn ) : #{i : zi y t i} k}
k* L2 L1 RM LMS
n=21 1 7 10 10
n=31 1 10 15 15
Finite-sample efficiency relatively to L2 (MSE)
80
%
70
60
50
40
Med
30
L1
20
RM
10 LMS
Slope 0.00 0.10 0.00 0.10 0.00 0.10 0.00 0.10
0.05 0.05 0.05 0.05
Width n=11 n=21 n=31 n=61
Rep. median and L1 regression never much worse than median
Finite-sample Efficiency w.r.t. L2 (MSE)
80
%
70
60
50
40
Med
30
L1
20
RM
10 LMS
Slope 0.0 0.1 0.0 0.1 0.0 0.1
Width n=21 n=31 n=61
Rep. median and L1 regression never much worse than median
Performance when outliers present
MSE for t
Level shift of 10s
size number of outliers
LMS d1 = 0 : Trimming
d0 = d1 > 0 : Winsorization
• Online Outlier Replacement for RM
~ ~
Outlier region centered at y t m1 t (m 1) bt
ˆ
Robust scale estimation
e.g. Rousseeuw and Croux‘ Q ,
~
s t c t | ri r j|, m i j m(h) very good for
shifts and inliers
m1
h 2
~ ~
r j y t j t j bt , j m,...,m residuals
~
Replace y t m1 by y t m1 if | y t m1 y t m1 | dO st
ˆ ˆ
e.g. dO 3
(Gather, Fried, 2004a, Fried, 2004)
Scale Approximation
~
~ j b , j m,..., m
Residuals r j y t j t t
MAD
sMAD c1, t med | r m |,..., | rm |
~
t
Length of the shortest half
~
t
sLSH c 2, t min | r( j m ) r ( j) |, j m,...,0
Rousseeuw and Croux‘ Q
~
sQN c3, t | ri r j|,m i j m
t (h )
, h
m 1
2
Nested scale statistic
~
sSN c4, t med i med ji | ri r j|
t
(Gather, Fried, 2003, Fried, 2003)
• Outlier replacement
~
~ j b , j m,...,m
Residuals r j y t j t t
MAD
sMAD c1,t med| rm |,...,| rm |
~
t works fine
Length of the shortest half
~
t
sLSH c 2,t min | r( jm) r ( j) |, j m,...,0 better worst case
e.g. with robust scale estimation
Rousseeuw and Croux‘ Q
~
t
sQN c 3,t | ri r j|, m i j m (h) very good for
shifts and inliers
h
m1
2
(Gather, Fried, 2004a, Fried, 2004)
Application: Heart Rate
Heart rate , LMS and RM with outlier replacement
85
80
75
70
65
60
0 50 100 150 200 250
time
Sinusoid, 10% patchy additive N(0,9s) outliers
Trimming with Q LMS
• Level Shift Detection
EWMA, CUSUM, Runs etc. not robust against outliers
Robust majority rule:
~ ~
~ b s
Compute t t t
Detect positive LS at t+j0 if # j 1,...,m : rj dLS s t
~ m
2
j0 min j 1,..., m : rj dLS s t
~
dLS : clinically relevant threshold, e.g. dLS=2
Simulated Time Series with Shifts
LMS and RM with outlier replacement and shift detection
18
16
14
12
10
8
6
4
2
0
-2
0 50 100 150 200 250 300 350 400 450 500
time
time
Trend invariance
Replacing yt-m, …,y0, … ,yt+m by yt-m - mb, …, y0, … ,yt+m + mb
does not change the extracted signal
Invariant: RM, TRM, MRM, PFMH, PRMH
Not invariant: SM, MTM, CFMH, CRMH
Lipschitz continuity
SM RM PFMH CFMH PRMH CRMH
Const. 1 2k+1 4/(k-1) 4/(k-1) 2k+1 2k+1
Not continuous: MTM, TRM, MRM
Efficiency for Gaussian noise
100
100
80
80
60
60
efficiency
efficiency
40
40
20
20
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
slope slope
100
100
80
80
60
60
efficiency
=0.6
efficiency
40
40
20
20
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
slope slope
Efficiency for Gaussian Noise
100
100
Autocor.
=0.0
80
80
60
60
Med Med
40
40
RM RM
20
20
MTM FMH:
0
0
0.0 0.1 0.2 0.3 0.4 0.5 PFMH 0.0 0.1 0.2 0.3 0.4 0.5
100
100
DW: CFMH
MTM RMH:
80
80
MRM PRMH
60
60
TRM CRMH
40
40
20
20
=0.6
0
0
0.0 0.1 0.2 0.3 0.4 0.5 slope 0.0 0.1 0.2 0.3 0.4 0.5
Efficiency of repeated median high, of hybrid filter low
Efficiency for Gaussian Noise
Modified filter Hybrid filter
100
100
80
80
60
60
40
40
20
20
slope
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Med, MTM RM Med, RM
DW: MTM MRM TRM FMH: PFMH, CFMH
RMH: PRMH, CRMH
Efficiency of modified filter high, of hybrid filter low
Shift Preservation
5
5
4
4
3
3
RMSE
RMSE
2
2
1
1
0
0
2 4 6 8 10 2 4 6 8 10
number of outliers number of outliers
5
5
4
4
3
3
RMSE
RMSE
2
2
1
1
0
0
2 4 6 8 10 2 4 6 8 10
Shift Preservation
Max. RMSE for increasing number of outliers at the end
Slope
5
b= 0.0 5
4
4
3
3
Med Med
2
2
MTM RM
1
1
RM FMH:
0
0
2 4 6 8 10 PFMH 2 4 6 8 10
DW: CFMH
5
5
MTM RMH:
4
4
MRM PRMH
3
3
TRM CRMH
2
2
1
1
number of
b=-0.5
0
0
2 4 6 8 10 outliers 2 4 6 8 10
Double window and hybrid filter reduce blurring of shifts
Removal of spikes
Number of spikes which can be removed competely
SM RM MTM TRM MRM PFMH CFMH PRMH CRMH
Steady m m-1 k k-1 k-1 1 1 m / 2 m / 2
Trend 0 m-1 0 k-1 k-1 1 0 m / 2 m / 2
Number of spikes which can be dampened
SM RM MTM TRM MRM PFMH CFMH PRMH CRMH
m m-1 k k-1 k-1 1 1 m / 2 m / 2 1
Outlier patch in the center
10
10
8
8
6
6
RMSE
RMSE
4
4
2
2
0
0
2 4 6 8 10 2 4 6 8 10
number of outliers number of outliers
10
10
8
8
6
6
RMSE
RMSE
4
4
2
2
0
0
2 4 6 8 10
2 4 6 8 10
Outlier patch in the center
Max. RMSE for increasing number of outliers at the end
Slope
10
10
b= 0.0 8
8
6
6
Med Med
4
4
RM RM
2
2
MTM FMH:
0
0
2 4 6 8 10 PFMH 2 4 6 8 10
DW: CFMH
10
10
MTM RMH:
8
8
MRM PRMH
6
6
TRM CRMH
4
4
2
2
number of
b=-0.5
0
0
2 4 6 8 10 outliers 2 4 6 8 10
Double window and hybrid: info about length of patches needed
140
120
140
100
mu[1:nt, 19]
120
100
mu[1:nt, 19]
80
80
60
60
40
20
40
0 50 100 150 200 250 300
Index
20
0 50 100 150 200 250 300
140
Index
Green: SM
120
Blue: RM
100
mu[1:nt, 19]
80
RED: MRMS
60
Purple: MRM 40
Orange: MTM
20
0 50 100 150 200 250 300
Index
Yellow: DWMTM
15
15
10
10
mu[1:nt, 19]
mu[1:nt, 19]
5
5
0
0
-5
-5
0 50 100 150 200 250 300
0 50 100 150 200 250 300
Index
15
Index
Green: SM Online estimates 10
Blue: RM Blue RM
mu[1:nt, 19]
RED: TRMS Yellow MRM
Purple: MRM Green TRM
5
Orange: MTM
Yellow: DWMTM
0
Hybrid filters in practice
Med
RM
20
FMH: 15
PFMH
CFMH
10
mu[1:nt, 15]
RMH:
PRMH
5
CRMH
0
-5
0 50 100 150 200 250 300
Index
• Robust Window Width Selection
Assumption of a constant slope only locally valid
Example: Smoothing of a maximum by a linear fit
Adjust window width
{
{
{
5 11 5 when slope changes
Short window width: Large window width:
+ small time delay + better noise attenuation
+ preservation of extremes + robustness
Basic Algorithm for Window Width Selection
Consider sign of the residuals:
St # ri 0, i (mt 2) / 2 ,, mt
where nt = 2mt+1 window width at time point t
ESt mt 1 / 2
for noise with symmetric d.f.
Choose 0
if St d mt 1 / 2 or St 2 d mt 1 / 2 reduce mt ,
set mt+1 = mt + 1 otherwise
Gather, Fried (2004b)
Sawtooth signal
overlaid by Gaussian noise and additive outliers
10
5
0
-5
0 50 100 150 200 250 300
time
Repeated median with adaptive window width, d=0.7, 4 < mt < 16
Sawtooth signal
overlaid by Gaussian noise and additive outliers
10
5
0
Repeated
median,
-5
m=10
0 50 100 150 200 250 300
time
Repeated median with adaptive window width, d=0.7, 5 mt 15
Application: Pulmonary Artery Pressure
50
45
40
35
30
25
20
0 500 1000 1500 2000 2500
time
RM with adaptive width, outlier replacement and shift detection
Loadings of rotated factors constant?
Observations 1-1000 APD .0001 .4552 .0421 -.2488
APS .0091 .6104 -.0505 .0021
APM -.0158 .6296 .0073 .1582
PAPD -.5070 -.0291 .0149 -.1960
PAPS -.5424 -.0694 -.0562 -.1093
PAPM -.4846 .0370 .0129 .1385
CVP -.4619 .0667 .0276 .1559
HR .0080 -.0774 .6914 .0511
Puls -.0107 .0725 .7173 -.0533
Temp -.0200 -.0092 .0054 .9015
APD -.0770 .5166 .0641 -.1394
Observations 1000-2000 APS .0528 .5939 -.0313 .0171
APM .0339 .5975 -.0160 .1227
PAPD .1163 .0449 -.1732 -.5932
PAPS .4554 .0625 .0122 -.1696
PAPM .5633 .0002 .0501 -.0546
CVP .6405 -.0447 -.0480 .1777
HR -.1078 .0766 .7011 -.0517
Puls .1618 -.0667 .6755 .0870
Temp .0388 .0621 -.1360 .7321
But: Outliers, artifacts, ...
Loadings for filtered time series
APD -.0088 .4385 .0656 -.3643
Observations 1-1000 APS -.0205 .6015 -.0561 -.0201
APM .0344 .6374 -.0025 .1387
PAPD .5240 -.0473 -.0138 -.1597
PAPS .5436 -.0787 -.0512 -.1338
PAPM .4717 .0576 .0199 .1539
CVP .4532 .0732 .0421 .1530
HR -.0014 -.1016 .6854 .0197
Puls .0097 .0926 .7194 -.0273
Temp .0073 .0572 .0122 .8694
APD -.0788 .5796 .0335 -.2393
Observations 1001-2000 APS .0512 .5721 -.0409 .0572
APM .0243 .5668 .0133 .1345
PAPD .4059 .0561 -.1097 -.3168
PAPS .4536 .0412 .0664 -.0993
PAPM .5150 -.0186 .0528 -.0014
CVP .5756 -.0340 -.0606 .2636
HR -.0817 .0392 .7255 -.0360
Puls .1282 -.0431 .6689 .0481
Temp -.0134 .0746 -.0111 .8590