Sensitivity of PCA for Traffic Anomaly Detection
Document Sample


Sensitivity of PCA for
Traffic Anomaly Detection
Evaluating the robustness of
current best practices
Haakon Ringberg1, Augustin Soule2,
Jennifer Rexford1, Christophe Diot2
1Princeton University, 2Thomson Research
Outline
Background and motivation
Traffic anomaly detection
PCA and subspace approach
Problems with methodology
Conclusion & future directions
2
A network in the Internet
AS AS
Network
AS AS
3
Network anomalies
March
Madness
BOTNET
Computer Computer Computer Network
Computer Computer Computer
Computer Computer Computer
Computer Computer Computer
Computer Computer Computer
VIAGRA
Computer Computer Computer
We want to be able to
detect these anomalies!
4
Network anomaly detectors
We’re
good!
Network
Anomaly
Detector
Monitor health of network
Real-time reporting of anomalies
5
Principal Components
Analysis (PCA) Benefits
Finds correlations
across multiple links
Network-wide analysis =
Anomaly
[Lakhina SIGCOMM’04] Detector
PCA
Demonstrated ability to
detect wide variety of AS
AS
anomalies
[Lakhina IMC’04] Network
Subspace methodology
We use same software
AS
Victim
6
Principal Components
Analysis (PCA)
PCA transforms data
into new coordinate
system
Principal components
(new bases) ordered by
captured variance
The first k tend to
capture periodic trends
normal subspace
vs. anomalous subspace
7
Pictorial overview of
subspace methodology
1. Training: separate normal &
anomalous traffic patterns PCA
normal
2. Detection: find spikes signal
3. Identification: find original anomalous
spatial location that caused
spike (e.g. router, flow)
A
Network
B
8
Pictorial overview of problems
with subspace methodology
topk
Defining normalcy can
be challenging
PCA
Tunable knobs normal
Contamination signal
anomalous
PCA’s coordinate
remapping makes it
A
difficult to identify the
original location of an
Network
anomaly
B
9
Data used
Géant and Abilene networks
IP flow traces
21/11 through 28/11 2005
Anomalies were manually
verified
10
Outline
Background and motivation
Problems with approach
Sensitivity to its parameters
Contamination of normalcy
Identifying the location of detected anomalies
Conclusion & future directions
11
Sensitivity to topk
PCA
PCA separates normal from normal
anomalous traffic patterns signal
Works because top PCs tend anomalous
to capture periodic trends
And large fraction of variance
12
Sensitivity to topk
topk
Where is the line drawn
between normal and PCA
anomalous? normal
signal
What is too anomalous? anomalous
13
Sensitivity to topk
Very sensitive to number of
principal components included!
14
Sensitivity to topk
Sensitivity wouldn’t be
an issue if we could
tune topk parameter
We’ve tried many
different methods
3σ deviation heuristic
Cattell’s Scree Test
Humphrey-Ilgen
Kaiser’s Criterion
None are reliable
15
Contamination of normalcy
What happens to large
anomalies?
They capture a large
fraction of variance
Therefore they are included
among top PCs
Invalidates assumption that
top PCs need to be periodic
Pollutes definition of normal
PCA In our study, the outage to
normal the left affected 75/77 links
signal
Only detected on a handful!
anomalous
16
Identifying anomaly locations
Spikes when state
vector projected on
anomaly subspace
But network operators
don’t care about this
They want to know
where it happened! state vector
How do we find the
original location of the
anomaly?
17
anomaly subspace
Identifying anomaly locations
Previous work used a state vector
simple heuristic
Associate detected spike
with k flows with the
largest contribution to the
anomaly subspace
state vector v
No clear a priori reason A
for this association
Network
B
18
Outline
Background and motivation
Problems with approach
Conclusion & future directions
Defining normalcy
Identifying the location of an anomaly
19
Defining normalcy
Large anomalies can
cause a spike in first
few PCs
Diminishes effectiveness
But we can presumably
smooth these out (WMA)
But first PCs aren’t
always periodic
whichk instead of topk?
Initial results suggest this
might be challenging also
20
Fundamental disconnect
between objective functions
PCA is optimal at
finding orthogonal
vectors ordered by
captured variance
But variance need not
correspond to normalcy
(i.e. periodicity)
When do they
coincide?
21
Identifying anomaly locations
AS
PCA is very effective at AS
finding correlations
Network
But is accomplished by
remapping all data to
new coordinate system AS
Victim
Strength in detection
becomes weakness in AS
identification
Network
Inherent limitation
AS
22
Conclusion
PCA is sensitive to its parameters
More robust methodology required
Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
Disconnect between objective functions
PCA finds variance
We seek periodicity
PCA’s strengths can be weaknesses
Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
23
Thanks!
Questions?
Haakon Ringberg
Princeton University Computer Science
http://www.cs.princeton.edu/~hlarsen/
Outline
Background and motivation
Problems with approach
Future directions
Conclusion
Addressable problems, versus
Fundamental problems
25
Conclusion: addressable
PCA is sensitive to its parameters
More robust methodology required
Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
Previous work used same data and optimized
parameter settings as Lakhina et al.
But these concerns might be addressable
26
Conclusion: fundamental
We don’t know what “normal” is
Disconnect between objective functions
PCA finds variance
We seek periodicity
PCA’s strengths can be weaknesses
Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
Are other methods are more appropriate?
We require a standardized evaluation framework
27
Related docs
Get documents about "