VIEWS: 4 PAGES: 41 POSTED ON: 9/11/2012
Henry Starzynski Network Operations Support January 2011 Global Network Mgmt Centre Bell Canada Henry Starzynski – Manager, Global Network Management Centre • Graduated from the University of Waterloo in 1982 with Bachelor of Mathematics (Computer Science) • Post graduation, worked for a computer time sharing company called Datacrown, which become CSG, then SHL-Systemhouse • I’ve been with Bell almost 27 years! • Started out working on network design tools for Datapac and Megastream services • Moved to our network management centre taking care of Datapac, managing the 7/24 console then Frame Relay (Hyperstream) support • Today, I continue with legacy network support, bring in new business for our centre, support our computers • I have a life outside of Bell too! I’m involved in the local community with Scouts Canada and Parent Council at my son’s high school – so, when you are free of University life, don’t forget to be involved in your community as well! You have lots of energy and knowledge that can help make local communities, whereever you end up, much better! • Don’t forget, when you leave Carleton, learning never ever stops! Keep your brains active, technology is continually changing Bell Canada’s GNMC • GNMC = Global Network Management Centre • One of the world’s first Data Network Management Centres • Operating locally in Ottawa, serving Bell Canada customers globally Bell Canada GNMC A bit about who were are … • Involved in managing data networks in Canada since 1974, globally since 1992 • Originally - the National Data Network Control (NDNC) for domestic (Canada only) core data networks: Dataroute, Datapac (packet switching) , Megastream (Pt-Pt T1), Hyperstream (frame relay), Canadian ATM Gateway networks • Expanded to include private networks (Lotto Quebec) and VPN clouds • Started internationally with Financial Networks Associates (FNA – consortium of 8 countries ) network in 1991 (Alcatel based network) • Evolved into Global Network Management (GNMC) at the individual customer circuit level • Today, we serve as International Help Desk/SPOC (single point of contact) for international data circuit troubles – Provide proactive fault management, provisioning, change and performance management Bell Canada GNMC Main Focus Areas: Core Network Management (WAN) of legacy data networks (Datapac=Packet Switching, Frame Relay, Mega services=T1 point-point services) Single Point of Contact (SPOC) for international customer data circuits VPN Managed Services (MPLS) and support of private or virtual private network clouds and routers (LAN) Technical Support and Maintenance Engineering on existing legacy networks GNMC is involved in all 4 major processes of Network Management: Fault Management Configuration Management (Provisioning) Performance Management Change Management Network Management • Like any industry, we toss around lots of BUZZ WORDS • What do all those terms mean?? • WANs • Clouds • OSSs • Network Management • SPOC • Why do we do network management & customer management? • Why is it important? • What the heck is a network – anyway? WELL let’s start … WHAT IS A NETWORK? What is a Network ?? A Network means something different to everyone For example, a ‘network’ can be .. • Local Area Network - those within a building, office, floor, etc. • Point to Point network - connecting two sites regardless of distance • The ‘CLOUD’ - the service provider’s network – the infrastructure, sometimes termed the Public Network • The `NET - the ubiquitous network • The PSTN – Public System Telephone Network • Wireless network • Home Network • A VPN – a Virtual Private Network • A ‘social’ network! A NETWORK MEANS DIFFERENT THINGS TO DIFFERENT PEOPLE BUT whatever your definition, all networks do the same thing! What is a Network ? • A standard definition of a ‘network’ we will use is the following: • A set of elements linked together to provide paths to transmit information, (data, voice, video) from one location to another. • A critical tool which allows businesses to operate and people to communicate • When it is all boiled down, All information is ‘data’, and it travels over a network. • Successful networks are managed Examples of Data Networks • Transport Networks (Sonet, DS3, DS1, Fibre, MPLS core) – the BIG infrastructures • Circuit Switched (Public Switched Telephone Network) • Dedicated (Point to point) • Packet/Frame/Cell (legacy services) • IP (Internet/ Intranet) • Local Area Networks, in the home, office, or around the campus. • Private (TV, Radio, Financial, Lottery) or Virtual Private Networks (VPNs) • Wireless Network Characteristics • Common characteristic of all networks is • the transmission of DATA (information, etc.) • Some type of information (i.e. - data) is being transmitted from one person/computer/location to another, for business, pleasure, research, etc. • In today’s world, we take data communications over networks for granted - it is there, reliable, fault tolerant, and it NEVER fails. • We use it every day, it is part of our daily routines, part of our ‘life’! We expect connectivity! What then - is Network Management and why is it important ? • All types of networks transmit data in some form • Network management has 5 main processes: Fault Management Configuration Management (Provisioning) Accounting Management Performance Management (including Change Management) Security Management Bruce Deachman The Ottawa Citizen Sunday, March 20, 2005 In 1994, Nicholas Negroponte, founder of MIT's Media Lab, predicted one billion people would be using the Internet by the year 2000. What he failed to point out, was that most of them would be trying to get U2 tickets. At least that's how it must have felt for countless fans who were unable to snag tickets to the Bono-led, Irish rock band's Nov. 25 Corel Centre show yesterday morning, as technology failed to keep pace with overwhelming demand, leaving old-fashioned overnight campers the happiest of all Question! What is the latest current estimate of the number of internet users in the world? Anyone remember this?? ROOT CAUSES OF BLACKOUTS AND THEIR REMEDY The electric power transmission system of the United States is seriously deficient. Experts generally agree that fixing this system to an adequate level would take many years and cost of tens of billions of dollars. But the root causes of the recent “Blackout of 2003” can be solved in a relatively short time and at a much more reasonable cost. The root causes of the present problems are: • A totally outdated reliability philosophy; and • Inadequate real time monitoring of the transmission grid. Isn’t the power grid a network too? Of course! Electricity is just a form of ‘data’! Why ‘Network Management’? From a network provider’s viewpoint … • Manage network resources equitably to ensure users can establish communications quickly & reliably • Ensure information is transferred with original quality, integrity, and securely • Operate a high performance, reliable, cost effective network that meets customer/ business/organizational needs and requirements • Plan and implement measures to prevent or mitigate interruptions of service degradation • Make $$$$$ for the network provider and its shareholders • Gain market share for the network provider • At Bell Canada, networks are the building blocks of our own business – they are why we exist! Why ‘Network Management’? From the customer’s viewpoint … • Ensure information is transferred with original quality, integrity, and securely • Obtain service at best cost/service/value combination • To ensure a customer’s business operates with minimum downtime, in order to meet the requirements of its’ customers • Meet regulatory, legal, safety requirements • For a customer, networks are critical • For businesses, for their operations. • For the general public, so we can communicate, get money, etc Network Management Poses Endless Challenges by Willie Schatz If network managers are in accord about anything, it’s that they have a lot more tasks to do than resources to handle them. The fundamental roles of a network administrator are to provide network connections for computer equipment and to ensure availability and performance of network communications. But that’s only the beginning. The administrator must set up and manage hardware and software solutions, enabling servers, clients, printers and other peripherals to communicate. He or she also is responsible for providing users the highest quality server functionality, which means uninterrupted, optimum network availability and performance. This same individual also must plan so any changes required in the network conform with changes in the larger enterprise system. “People really think network management is easier than it really is”. Network Management Processes There are five processes involved in network management Configuration Management ==Provisioning • Programming network elements to communicate with each other and user equip. • User datafill to make their service functional • Copying critical (non default) network provisioning parameters to storage in offline in databases • Ensuring billable parameters/features are updated in related billing systems • Providing ‘dumps’, downloads, or application program interfaces (APIs) to other downstream systems Why is Configuration/Provisioning management important? • Users want their service when it is ordered (on due date) • Users want to get the options they pay for • The network provider needs to ensure their service is billed Network Management Processes Fault Management==Service Assurance • Surveillance - proactive - alarms/traps from the network that indicate major problems • Isolating problems - reactive - when users have troubles • Having clearly defined escalation procedures - how to prioritize troubles • Providing customers with timely and honest status on problems - when will it be fixed? • Performing analysis on failures for trends, root cause Service Assurance is .. REAL-TIME surveillance, control , and analysis of a network, with the objective of ensuring maximum use of network resources , particularly when it is under stress due to traffic overload or failure conditions. Network Management Processes Performance Management • Performance measures can be internal (for the provider), regulated (CRTC), or to assist the customer (how is my network performing) • Network performance (Mean time to repair, Network availability) are standard metrics used in the industry, and are often basis for ‘service level agreements’ • Customers may require information on their traffic patterns - are they paying for bandwidth they don’t require, or is their network overloaded? • Many customers want guarantees of performance – a Service Level Agreement (SLA) in order to ensure they are getting the performance they pay for. • A SLA may include the following • Network Availability • Frame/Cell/Packet delivery • Mean time to Repair • Penalty clauses for non-performance • Delay metrics Network Management Processes Change Management • Scheduling downtime / maintenance activities (new software, network upgrades) with users (notification, release or emergency) • Ensuring software levels are compatible with all network components • Keeping the customer informed of planned service interruptions is critical Networks are in need of periodic maintenance for software or hardware upgrades, etc. In a 7x24 world, unscheduled downtime can mean • loss of revenue • legal liability • threats to public safety. FROM: CHANGE MANAGEMENT PLANNED OUTAGE Foreign-Tel COMMUNICATIONS Dept.: GNMC Phone: 1-555-868-7883 Fax: 1-555-868-7822 Please respond to the following Email: firstname.lastname@example.org ForeignTel Communications would like to inform you that the Change Management activity will be performed as indicated below: _____________________________________________________________________ Outage #: POM041793 / POT356369 Your ref. #: Description: DISREGARD OUTAGE NOTICE//THIS IS NOT SERVICE AFFECTING//WE ARE ADDING BACKBONE CAPACITY: PORTLAND-SANTA CLARA DURING THIS PERIOD, NETWORK WILL BE IN HAZARDOUS CONDITION. WALL NOC WILL CLOSELY MONITOR THE NETWORK AND ANY ALARMS ON IT Scheduled Planned Start Date (UTC): february 16, 2009 15:00:00 Scheduled Planned End Date (UTC): february 24, 2009 03:00:00 Related Network Management Activities • Co-ordination with other Carriers and Agencies. No one carrier can route traffic everywhere on the planet. Strategic alliances and co-operation amongst carriers is essential. • Dynamic Controls. Can traffic be rerouted around failures or congestion? Is this automatic or manual? • Disaster recovery planning. Could it happen to you? What would you do in the event of a ‘disaster’? • Security Who has access to the network infrastructure? Can it be ‘hacked’? Ensuring one customer’s data does not go to another customer. Security Management • The goal of security management is to control access to network resources according to local guidelines so that the network cannot be sabotaged (intentionally or unintentionally) and sensitive information cannot be accessed by those without appropriate authorization. • Security management subsystems work by partitioning network resources into authorized and unauthorized areas. – They identify sensitive network resources (including systems, files, and other entities) and determine mappings between sensitive network resources and user sets. – They also monitor access points to sensitive network resources and log inappropriate access to sensitive network resources. AT&T Customer Info Hacked By TSC Staff 8/29/2006 9:05 PM EDT AT&T late Tuesday said that hackers broke into a computer system and accessed personal data, including credit card information, from thousands of customers who had purchased DSL equipment from the company's Web store. Kaspersky says Web hack 'should not have happened' 02/09/2009 It's the worst thing that can happen to a computer security vendor: This weekend, Moscow's Kaspersky Lab was hacked. A hacker, who identified himself only as Unu, said that he was able to break into a section of the company's brand-new U.S. support Web site by taking advantage of a flaw in the site's programming. Network Management Centre Functions • 7 x 24 operation - it’s more than a buzzword. • Operations Support Systems for provisioning, change management, surveillance, trouble tracking, customer records • Subject experts/access to engineering support personnel or labs • Multiple & diverse communications channels • Situation (War) room • Secure and Independent Power Supply • Access to Information Databases • Contact information for support resources (level 1,2 3 support, vendor support) • Secure location • Fully redundant backup location When Disaster strikes! • If something will go wrong .. It will .. • Ice Storm of 1998/Hurricane Katrina & other natural disasters • Toronto Simcoe Central Office fire July 1999 • Power plant failures • Hackers and viruses (SQL Worm) • September 11/terrorist attacks • All of these test the plans of a network provider. • Are contingency plans in place? Have they been tested or gathered dust for 5 years? • Is there an escalation chain of command? • Are there agreements with other suppliers/vendors/competitors? • What contingencies are in place to get critical services restored as quickly as possible • When service is lost, the prime objective, after immediate human safety, is the restoration of service From July 1999 … TORONTO - Phones stopped ringing in several major cities in Canada on Friday after an explosion caused a major system failure at a Bell Canada building in Toronto. The failure knocked out phone lines, most cell phones, internet services and bank machines in downtown Toronto. Cantel and digital cell phones appear to be working. Police report 911 emergency systems are working, but the police are urging people to use these systems only for real emergencies. The failure was caused by an explosion on the fourth floor at the downtown bell centre at around 8:00 am. One person was reportedly injured. Immediately after the explosion, battery powered backup systems kicked in. But they ran out of power a few hours later. The Toronto Stock Exchange is back up and running after it suspended trading briefly but brokerages are having trouble communicating. Phone systems in Ottawa and Montreal and as far away as Halifax and Vancouver have also been affected as calls that normally routed through Toronto are rerouted through other cities. Bell Canada says it hopes to have services restored by midafternoon. ATLANTA (CNN) -- A series of cyber-attacks Tuesday left some of the Web's most high-profile sites staggering under the weight of tens of thousands of bogus messages. The targets included retail giant Amazon.com, electronic auction house eBay, discount retailer Buy.com and CNN Interactive. DISASTERS CAN HAPPEN? How will your network provider handle the trouble? • Another aspect of Network Management is Planning • A carrier will have a plan for a disaster situation, as well anticipating potential issues • Examples of planning for potential issues include • Y2K • more recently, the change in dates for Daylight Savings Time • Other various clock rollover issues • A carrier may also do periodic disaster simulations to test the response of various groups as well as procedures SPOC Function What is a SPOC? In Bell Canada, the GNMC is the Single Point of Contact (SPOC) for all Fault Management and Change Management between Canadian Help Desks and Test Centres and all the global carriers that Bell uses to provide international reach for our customer circuits SPOC for all other carriers to get their issues fixed within Canada One door for all trouble management into or out of Canada Avoids having many different groups learn the processes for dealing with each of the carriers, or the carriers having to learn about all the various ops centers within Canada Provides flexibility to move quickly and customize for customer reasons, with centralized expertise As a SPOC, we get to compare service levels provided by different global carriers and use this info to get better performance Operational Support Systems • Successful network management uses standardized protocols or vendor-specific mechanisms to transmit alarms and commands (e.g. Simple Network Management Protocol) • Operational control data can be transmitted over conventional data networks, over the same network (inband), or over another network (out of band). • The systems which receive alarms, allow for network configuration, troubleshooting, and control is commonly called Operational Support Systems (OSS). • OSS may be more than 10 times the cost of the network infrastructure! • OSSs may consist of Workstations, Databases, network elements, scripts, provisioning systems, security systems, offline databases and billing systems. • Without a good OSS structure, a great network infrastructure will fail. The network objectives cannot be met without this. Operational Support Systems • No one OSS does it all - if fact, many OSSs are required, and these must interact with each other. This is typically via Application Program Interfaces (API) or some standard format for information exchange. • The interaction can be simple - or complex. Often, simple format changes in one OSS will impact many other ‘downstream’ OSSs. • Remember where the money is spent - Not on the network infrastructure, but on the systems that make the network run. • The following diagram shows a SAMPLE interaction between various systems. Sample Operational Support Systems Test Centres, NDNC Fault Mgmt/ trouble shooting OSSs BILLING BILLING BILLING BILLING FILES FILES Call detail/ Billing OSS SYSTEM usage OSS (Customer receives BILLING bill for service/usage) RECS PROV ORDER Order INFO ORDER ENTRY/ Network Provisioning Recs NETWORK system Assignment system system Elements (Customer gets service) SNMP Customer and assignment TRAPS dumps (feed other OSSs) CUSTOMER Fault ORDERS PROV Cust.. Stats Data Mgmt OSS SERVICE RECS Trouble Collection Sys. Ticket system ALERTS ALERT DISPLAY Surveillance Telco local Centres assignment system Change Mgmt Metrics • Each network needs some means of measuring its success, and to see where improvement can be made. Public networks may be regulated. Metrics may be stipulated in Service level agreements (SLAs) between provider and customer • To the end user/customer, the most critical metrics are the following: • Mean time to repair (MTTR) • Network Availability ((Total available time-total downtime)/(Total avail. Time)) • Quality of Service (QOS) • round trip delay • Network congestion/blocking • frame/packet/cell loss • repeat failures • To the network provider, the following are important metrics: • Network Availability • EBITDA (Earnings Before Interest Taxes Depreciation & Amortization) • Cost / Revenue (return on investment) • Market Share • Network capacity Metrics •To the shareholder the following are important: • Dividend • share price • Return on Investment Summary • Networks can be simple, or extremely complex and mission critical • Network quality , reliability, diversity, and low cost are essential • The operation of a high quality reliable, cost effective network requires effective Network Management Centre(s), along with skilled people and good support tools (operational support systems) • As networks continue to evolve, customers will manage more and more of their own networks. • Challenges for the future include global coverage, scaling for growth, new technologies, telco mergers, acquisitions, failures - an industry always in flux.
Pages to are hidden for
"Network Department of Systems and Computer Engineering"Please download to view full document