Design and Implementation
Document Sample


Design and Implementation
of TWAREN Hybrid Network
Management System
National Center for High-Performance Computing
Speaker: Ming-Chang Liang & Li-Chi Ku
1
Outline
Introduction
Motivation
Issues
Design
Implementation
Future works
2
INTRODUCTION
3
About TWAREN
TWAREN (TaiWan Advanced Research &
Education Network) network construction was
completed at the end of 2003 and started its
operation and service in the beginning of 2004.
In its initial phase, IP routing was the main
service provided.
The network management programs coming
along with the purchase of network equipments,
including CIC, Webtop, CW2K, HP Openview,
HP NNM and other solutions.
4
Initial phase of TWAREN
MOECC 10GE
NTU STM-64/OC-192
NCCU C6509
ASCC STM-16/OC- 48
C6509 GE
C6509 C7609
NDHU
C6509
Taipei
C6509
NCU GSR EBT10GE
CCU
C6509
NHLTC
C6509 TWAREN
C6509
GSR GSR
C6509 NCU
Hsinchu Tainan
NCTU
NTHU GSR
C6509
C6509 C6509 Taichung NTTU
C6509
NCHU NYSU
5
Initial phase of NMS
Remedy CLI
WebTop Help Desk
Notification
Gateway
API CLI
SMTP HTTP
ISM Cisco Info
DNS FTP
Center
Probe
Trap
CW2K
NNM CTM
(DFM) Trap
Trap
PING PING PING
Trap Trap
Polling Polling Polling
12416 7609 3750 2522 2600 15454 15600
NAM
6
Phase 2 of TWAREN
TWAREN was adapted for more protection methods and
better availability at the end of 2006, called TWAREN
phase 2.
Tens of optical switches and hundreds of lightpaths were
then served as the foundation of the layer 2 VLAN
services and the layer 3 IP routing services.
In 2008, tens of VPLS switches were further incorporated
to provide additional Multi-point VPLS VPN service.
The layer 1 lightpaths can be protected by SNCP, layer 2
VLAN by spanning tree recalculation and layer 2 VPLS
by fast reroute technology.
All these improvements transform TWAREN phase 2 into
a true hybrid network capable of providing multiple
layers of services and high availability .
7
Architecture of TWAREN phase 2
NTU ASCC NCCU NIU 6509
15454
6509 7609 7609 7609
NDHU
6509
15454 15454 15454 3750 7609
6509 15600
NCU 15454
12816 15454 12816
7609
MOEcc 7609C NCNU
NHLTC
Taipei 7609
12816 12816
6509 3750 6509
NCTU 15454 15600 15454 7609C Hsinchu Taichung 7609C 15454
7609
12816 NCHC NCHC 12816 6509
Tainan
15454 NCHU
NCHC 7609C
7609
6509
NTTU
12816 15454 12816
NTHU 15454 3750 6509
15600
7609
15454 15454 15454 STM64
STM16
6509 7609 6509 7609 6509 7609 10GE
NSYSU NCKU CCU GE
8
MOTIVATION
9
Why need new NMS?
The architecture of TWAREN phase 2
became more and more complicated.
Since TWAREN phase 2 has more protection
methods, a single point of hardware or circuit
failure will not interrupt the service level
provided to the end users.
The initial phase of NMS was no longer
competent for the hybrid network anymore
because it is hard to determine and predict
the correlation between failures and affected
services.
10
Requirements for new NMS
Automatically determine the correlation
between failures, affected services, affected
customs and severity level on this highly
safeguard network.
Provide single integrated visual user interface.
Use integrated database, logs, message flows
and exchange protocols.
After several surveys, we decided to develop
a new NMS which be suitable for monitoring
all services provided by TWAREN phase 2.
11
ISSUES
12
Uncertainty of SNMP implementation
There are some different implementations
of the SNMP TRAP/MIB among
equipments of same brand.
The SNMP OIDs or the return values may
vary between OS upgrade on the same
equipment and are usually hard to reveal
beforehand.
Therefore, the system must be designed in
a way such that these changes can be
accommodated with minimal
modifications.
13
The lack of skillful programmers
Our programmers are the same guys with
the members of operating team.
We are not professional programmers and
have not accordant programming language.
The system must be partially available and
operational during the early phase of its
development such that it can evolve along
with the real needs.
So, an unified standard of communication
between different modules is necessary
14
Huge historical data and computing
For minimizing the false positive and
false negative rate, baseline thresholds
would have much better quality when
they are dynamically generated from
historical data.
Therefore, we need to store
sufficiently large historical data sets
and to have very high efficiency to
retrieve the data back while
calculating those thresholds.
15
Automatically determine affected
services and customs
TWAREN phase 2 inherently has the
ability to guard against a single point
of hardware or circuit failure, so the
failure is less likely to affect the actual
service provisioning.
An intelligent management system
which is able to determine the scope
of failure affected service will reduce
the management cost.
16
DESIGN
17
1st Stage System Architecture
GUI &
Monitor Objs Control API
Ticket System
Traps
Data Collectors Fault Detection
MIBs Fault Location
Current Status
Syslogs DB
Threshold
DB
Net flows
Long Term
DB
Telnet/SSH
Case/Action
DB
TL1 Auto Action
Mirror Threshold Analyzer
Interactive Report System
Passive
18
Relationship of Data Tables
Basic Data Tables Relationship Tables
Component Circuit
People VLAN Services
Location VPLS Services
ONS
Unit
Light Path
ONS
Vendor Cross Connection
…., etc …., etc
19
Basic Data Tables
Component Data Table
Component_ID Parent_C_ID Name
Vendor Data Table
1 0 TN7609P ID Name
12 1 Slot_1 1 CHT
2 0 TP15454 2 APBT
16 2 Slot_3 3 RingLine
135 12 Port_9
People Data Table
ID Name Phone Address Service_Time Service_WeekDay
1 John 0939123123 xxxxxxx 8-17 1,3,5
2 Mary 0958123123 xxxxxxx ALL ALL
Location Data Table Unit Data Table
ID Name Address ID Name
1 MOEcc xxxxx 1 NCKU
2 NTU xxxxx 18 THU
20
Relationship Data Tables
Circuit Data Table
ID Name Vendor Identify From_CID To_CID Bandwidth
1 Taipei_Tainan_STM64 1 8D543267 13 35 STM64
2 NCHU_NCNU_10GE 2 ST16987 23 67 10GE
ONS Topology Link Table ONS Light Path Table
NodeA NodeB PortA PortB LP PortFrom PortTo SNCP_LP CRS_Trace Size
12 45 1467 2346 2 2312 2345 0 359,556,522,475 4
16 32 2312 3421 98 3434 4455 99 482,541,335 16
99 3434 4455 98 482,469,541,335 16
ONS Cross Connection Table
CRS PortA PortB SNCP_CRS ChannelA ChannelB Size
482 1744 1756 0 5 13 4
21 3321 3343 24 17 33 16
24 3546 4534 21 1 17 16
21
IMPLEMENTATION
22
Current monitor objects
Trap monitor
Used interfaces, BGP, etc.
Environment of equipment room
Temperature (auto threshold), Voltage
Statuses of equipments
Temperature , CPU, RAM, FANs, Power-Supply
BGP peering with other networks
Statuses, Number of exchanged routes (auto threshold), Utilization analysis
Performance monitor
End to End RTT (auto threshold), End to End Packet Lost Rate (auto
threshold), End to End Availability
Throughput
Backbone (auto threshold), Designate interfaces
Top N
Bytes, Flows, Packets
Routes monitor
The routes of customs (exact comparison)
VPLS VPN
Throughput of CE side, MACs of VPN
Optical Network
Current topology of lightpaths
VLAN
Current topology of VLAN
23
Future works
Combine all developed monitor objects
with single integrated visual user
interface.
Enhance the monitoring of optical,
VPLS and VLAN networks.
Automatically determine the fault
location, root cause and affected scope.
Minimize the false positive and false
negative rate.
24
Get documents about "