Document Sample
Telcomm-NET-DR-Drill-06-25-08 Powered By Docstoc
					              Telecommunications Services - Network Engineering
                          Disaster Recovery Drill
                                       Dave Bouhl, Deputy Director
                                       Telecommunications Services

                                     Michael Shelton, Deputy Director
                                       Network Engineering Team

On June 25th, the Information Technology department’s Telecommunications Services group and
Network Engineering Team held a joint, un-announced disaster recovery drill. The purpose of the drill
was to develop competencies, to share possible corrective activities, to gain greater understanding of
and to explore available resources and overall to develop critical thinking processes in an unexpected,
emergency scenario.

Drill Participants: Mike Smart, Beth Dallas, Terry Hill, Jim Bourland, Jeff Goh, John Daly, Jerry Richards,
Gi Vania, Scott Smith, Aaron Ragusa, Mike Robinson, Kishore Thakur, Tim Davis, Scott McBride, Dave
Bouhl and Michael Shelton.

A meeting was scheduled for June 25th at 8:15 AM in the Wham conference room. The drill participants
were told the meeting was to discuss the upcoming phone switch upgrade. Instead a Power Point
presentation was displayed that stated the true meeting purpose. A brief 10 minute introduction was
given to describe the nature of the disaster, the conditions of the drill and goals. The introduction
stated that there would be a one hour time limit for the drill where the first 20 minute segment would
represent day one of the disaster, the second 20 minute segment would represent the first week of the
disaster and the last 10 minute segment would represent the subsequent two months.

The introduction started at 8:15AM. The drill started at 8:25 and concluded at 9:15. An additional hour
was used to review the plan and activities developed by both groups and to discuss the drill itself.

The following disaster scenario was presented to the drill participants.
     At 10:15 AM on June 24th, a 24” potable water pipe burst in the basement of the Student
        Center. PSO was made aware of this problem at 1:37 AM when a maintenance worker
        attempted to enter the basement level and discovered about two feet of water in the stairwell.
        PSO was unable to locate the appropriate shut-off valve until after 4:00 AM, but did contact
        Mike Smart at about 2:15 AM. The Network Control Center contacted John Daly at about 3:45
        AM to report equipment outages in the Student Center and no Internet connectivity.
     The Student Center basement is flooded with about four feet of water, meaning that everything
        on Mike Smart’s desk is submerged.
     All power to the Student Center has been turned off to include the emergency generator.
     PSO is working to pump the water from the basement.
     The phone switch is not functioning.
     The Internet connection is not functioning.
     The Student Center basement level will be off-limits for one week and not habitable for two
        months due to chemical and biological contamination.
      All Telecommunications and Network Engineering employees are available, except Dave Bouhl
       (vacation-backpacking in the Brazilian rain forest) and Michael Shelton (long-term
       hospitalization with a furuncle).
      Cellular communications are working normally.
      It’s 4:15AM. What are you going to do?

This scenario was specifically designed to remove the leadership of both groups in order to create a
greater challenge. Both Dave Bouhl and Michael Shelton were present in the room during the drill and
answered questions about the drill and about the disaster scenario. When necessary, Dave and Michael
acted in the capacity of the IT administration, the campus administration, as ICN representatives,
Verizon representatives, etc.

Both groups were instructed to record on paper their plan and activities. Those notes were collected
and recorded here.

From the Telecommunications Services Group
DR Drill - Day One
    Immediate Assessment
    Notify telecom staff and get appropriate staff on campus – use warehouse location as command
    Notify carriers/service providers – Verizon, CMS, AT&T, Nortel, Shared Tech.
    Advise media spokespeople and disaster notification group
    Coordinate temporary voice communications for health services, DPS, and media groups
    Discuss and establish priorities for reconnecting voice communications to university
    Begin to establish NE needs and time frames
    Order whatever materials can be identified as needed
    Coordinate requirements for temporary cell towers, switch on trailer, etc.
    Start repressurizing cable.

DR Drill - Week One
    Air pressure work completed by technicians
             o PSO resources available
    Work with vendors to establish list of materials needed to make connections for temp. services
         from communications building
    Media to keep communications flowing
    Terry Hill to order supplies/materials that may not be on hand
             o Receive within 24 hrs
    Order call pilot chassis - replacement working from communications with 24 hours of receiving
    Make needed connections at Quigley manhole
    Services available with some limitations
    Start working with vendors to have damaged equipment replaced
             o Ordered and delivered to campus
      SL100 delivered within 72 hrs. to 7 days
      Work with media for news releases
      Work with telecom staff
      ACD’s ? contact centers
      Student center contact businesses and staff
       If needed
      Dunn Richmond has fiber
      Warehouse – Telecom meeting place
      Emergency radios within disaster response team
      Wham – IT network team

DR Drill – Two Months
    Access the Student Center switch room
    Remove ruined equipment
    Clean up
    Coordinate recovery efforts w/PSO, CEHS, Central Receiving, etc.
    Compile the require reports and documentation
    Install additional facilities between RSC’s as required
    Continue to re-establish campus communications until complete

From the Network Engineering Team
DR Drill – Day One
    Notify all pertinent agencies:
            o Network Engineering
            o Telecommunications
            o Illinois Century Network
            o Verizon
            o IT – Lowell
    Determine exactly locations are not connected via the CAN.
            o Reconnect to closest available routed network
            o Assist displaced departments with connectivity needs
    Verizon, Illinois Century Network, Telecommunications:
            o Determine physical location for ICN, and physically connect.
    Determine what routers (or other equipment) we have in inventory to connect to ICN
            o We have need for BGP and could us an ACL for Firewall
            o If we don’t have a capable router, we would borrow one from Cisco, ICN, or other as a
                 short term solution
    Gi will find out about satellite uplink possibilities.
            o IEMA, temp.
    Cisco 6500 can do BGP, if we add memory.
            o What is available?
            o If needed, we can get ICN to BGP advertise the SIUC Class B address space
    We have a firewall available in inventory
    Work to reestablish the three campus VPN tunnels. The firewall in inventory can terminate VPN
       Work to reestablish dial-up networking. We have a spare AS5300 in inventory.
       Contact Jeff Duke, who has unique data related needs. When a location for his data equipment
        is determined, work to reestablish connectivity for those systems.
       Outdoor wireless, reroute fiber to the Engineering core switch.
       Continuous reprioritization as needed.
       Continuous contact with University Communications or other to communicate network status

DR Drill – Week One
    Follow up and complete all day one tasks
    We must wait for Telecommunications to establish off-campus connectivity
    Work on usual tasks and offer assistance to others if we have time available

DR Drill – Two Months
    Remove old, damaged equipment from the Student Center
    Replace all equipment in Student Center
             o Obtain prices, part numbers, create purchase requisitions
    Coordinate cutover (moving back to the Student Center) with ICN, Telecommunications and
    Be prepared to act on any possible contingency
    Hire additional help as needed

Upon expiration of the one hour time limit, a member of each group stated and explained the day one,
week one and subsequent two month activities that was recorded on paper during which time questions
were asked concerning the how’s and why’s of the various activities. All participants were given the
opportunity to comment on both the drill and the disaster scenario. Lastly, Dave Bouhl and Michael
Shelton gave a brief evaluation of the drill and thanked everyone for their efforts.

A video tape recording was made of the entire drill which will be available on the Network Engineering
website at drill-06-25-08/video. The Power Point presentation for the drill
and this document can be found at the same URL.

What did we learn? How did we benefit from this drill?
  1. Leadership: Mike Smart and John Daly were singled out via the disaster scenario as being the
       first people contacted in each of their respective groups. As such, this unofficially put the onus
       on each to direct the activities of their group during the drill. I was pleased to see each accept
       that role.
  2. Cooperation: Comment from Dave Bouhl - Overall I thought we had a very good drill this
       morning. It got everyone thinking and provided an opportunity to get the two work groups
       together as one group. I was enthused with the interaction between everyone with no turf
       battles, or attitudes, just a group of professionals working together to complete a task.
  3. Input and Creative Thinking: Many of and possibly all participants had some input.
  4. Established Knowledge: Many of the plans and activities demonstrated that the drill
       participants already know:
            a. Who they are going to contact for assistance.
                     i. Internal to campus
                    ii. External (vendors, service providers, IEMA, etc.)
              b. Campus facilities and their potential uses in a disaster
              c. What external resources are available (mobile cellular trucks)
              d. The importance of good communications
                        i. Between groups
                       ii. With IT and campus administration
                      iii. With our customers
                      iv. With news media and University Communications
   5.    Alternate Connectivity Options: I had heard that fiber exists from the Verizon central office,
         down Oakland Street, through the Communications building TSC to the Student Center TSC. This
         information was confirmed and I also learned that:
              a. A man-hole near Quigley Hall would be a good place to intercept the fiber and copper
                  from the Verizon central office for a short term solution.
              b. Copper cabling exists from DPS at Washington Square A to the Verizon central office
                  that will aide in keeping DPS communications functioning.
              c. That fiber is available from the Verizon “hut” to the TWC in the east end of the
                  Dunn/Richmond Center.
         All of the above are locations where connectivity to Verizon could be reestablished without an
         insurmountable expense and effort.
   6.    Alternate Communications Needs: Disasters typically result in all phone (land lines and cellular)
         circuits being busy. The drill included a discussion of the potential use of two-way radios and
         their availability from PSO. More discussion on this topic would be prudent.
   7.    Understanding: Everyone should have a greater understanding of the challenges of disasters in
         general and specifically an incident that severs the campus connection to the outside world.
   8.    Determination of Campus Critical Needs: Both groups discussed the campus needs during this
         disaster and how each group could work towards addressing those needs. Groups mentioned
              a. Department of Public Safety – Connectivity to the Jackson County 911 system was
                  discussed and
              b. Student Center Business Office
   9.    Unanswered Questions: Telecommunications Services has identified the warehouse on
         McLafferty road as a disaster rally point where employees should initially meet when a disaster
         strikes. Network Engineering does not have a primary or alternate rally point identified. IT
         should have the same but that location or locations are not known to me at this time.
   10.   Take-Away Items:
              a. The emergency cooling equipment is stored in the basement of the Student Center.
                  Several of those units should be stored in either the east or west switching center.
              b. NET needs established rally points.
              c. Determine if IT has identified a rally point(s).
              d. Investigate other institutions concerning how they have dealt with media organizations.
              e. Cell or landlines are not or may not be reliable. Investigate two-way radio options.
              f. Investigate what University Communications is doing in terms of an off-campus web

Overall this was a good exercise. Given more time the plans and activities would be more complete and
well documented. It is my opinion that both groups have a good foundation to work from in the event
of a disaster. A goal has been set to have at least two joint Telecommunications-NET disaster recovery
drills annually.

Shared By: