Methodology by RCMFernando


									Broadband Traffic Profiler


Port Mirroring

         Port mirroring, also known as a roving analysis port, is a method of monitoring network traffic that
forwards a copy of each incoming and outgoing packet from one port of a network switch to another port where the
packet can be studied.

The first step is to mirror a default- gateway of a Switch/Router in desired network to export the data to monitoring
station’s database. When considering port mirroring location, we adopt that the most suitable place to mirror a port
is inside the NAT (network address translation), that because the other entire network connected hosts can be
distinctively identified through it.

         While inside NAT we completely ignore concerning firewalls, due to that reason we will mirror complete
dataflow without any barriers to flow collector. That means if any firewall existence in future BTP deployment
environment will not be affected to accuracy of BTP tool, because BTP tool comes with more pattern matching

Broadband Traffic Profiler

Analyzing Data Packets

         After collecting mirrored packets, the second step is to analyze the captured data packets. Considering
captured data packets, they will be divided into two segments by using packet matching algorithms.

        Non-encrypted Data (D.P.I.).
                  Mostly unsecured data going through the network without any encryption. Those data packets can
         be easily analyzed by inspecting packet payload.

        Encrypted/Statistical Data.
                  [1]The proportion of encapsulated or encrypted data packets. Examples include proxies, VPN,
         tunneling, and applications using a different protocol to exchange data. Encapsulation would change the
         pattern of the original application level protocol, while encryption of packet payload also renders the
         identification mechanisms based on payload inspection inefficient.

         First, the system captures all the packets passing through it, and aggregates them into traffic flow according
to the 5- tuple (i.e., source and destination address, source and destination port and protocol). Second, the header
information of each packet is collected and stored to the corresponding flow database.

         Thereafter flow statistics is computed in term of the each feature, to establish the statistics database. Third,
we sample and form the training and test sets for the specific classification. Once the datasets is ready, the system
could carry out feature selection to eliminate the redundant and irrelevant features, and yield the best feature subset.
Feature selection is the process of reducing the number of features for less computational complexity while
maintaining or improving the classification accuracy [2].

Broadband Traffic Profiler

Data Classification

         After analyzing data packets, the next step is to classify the analyzed data packets. Unlike the usual way to
obtain traces, which is simply labeled by the payload identification or by port characteristics for calibration, we set a
local experimental network behind NAT with many hosts to generate the simulation traffic manually. Let each host
run the specific application (HTTP, SMTP, POP3, FTP, and P2P etc.) in the same period. The traces are collected
and stored to the database.

          Since the applications run in the host is pre-determined, it is easy to classify and categorize the locally
captured traffic flow according to the IP addresses. By imitating the behavior of each application in the Internet, the
real traces could be acquired. The main applications in our experiments include Bios, DNS, HTTP, POP3, FTP,
Streaming, BitTorent, eMule, Game, PPlive, MSN, QQ, KaZaA, Gnutella and Skype etc.Then we process the traces
for the specific usages [2, 3].

         In order to more accurately classify an object, it would require collecting more information (entropy) for
classification. However, collecting more information, for example: more packets or a larger number of features may
introduce higher latency and higher cost in both computation and memory usage. For the traffic classification system
to be operated at near real-time with a considerable throughput, we should capture an appropriate, small number of
features, and from a small number of packets and a limited duration, rather than from a complete flow. It means, in
order to gain the real-time quality, an amount of information has to be sacrificed from the complete-flow objects
which in theory would result in some level of degradation in accuracy [1].

Broadband Traffic Profiler

User Profiling

         After classifying data packets, the next step is to profile the users according to their bandwidth usages,
generating statistical reports will help network administrators/ISP providers to utilize peak hour’s bandwidth for
prioritized traffic by shifting non-time critical traffic to off-peak, introduce promotional packages to utilize off-peak

        Application Base Reports

                   This report will generate details regarding user’s habits of application usages. It regulates the
         prohibited or non- prohibited vise applications.

        Bandwidth Usage Reports

                   This report will exhibit details of users who used network bandwidth in the period of peak and off
         – peak.

        Downstat Report

                   This report will extract details of users who using their network bandwidth at critical times for low
         prioritized work. This will be more effective in a location like contention exist network. And also this will
         help ISPs to introduce new packages as well.

All of these statistical reports will help to increase the QoS. It also helps real-time feedbacks as well as off-line

Broadband Traffic Profiler


[1] Wei Li and Andrew W. Moore.” A Machine Learning Approach for Efficient Traffic Classification”. Internet:, [Jan. 20, 2011].

[2] Jun Li, Member, IEEE, Shunyi Zhang, Yanqing Lu and Junrong Yan.” Real-time P2P Traffic Identification”.
   Internet:, [Jan.20, 2011].

[3] Jeffrey Erman, Anirban Mahanti, Martin Arlitt, Ira Cohen and Carey Williamson.” Offline/Realtime Traffic
   Classification Using Semi-Supervised Learning”. Internet:
   121.html, July 13, 2007 [Feb.05, 2011].


To top