SLAC Networking

Document Sample
SLAC Networking Powered By Docstoc
					SLAC Networking
                        Les Cottrell
               for the SLAC network group
                          SLAC

                     <cottrell@slac.stanford.edu>


 Presented by Charley Granieri at the SLAC Computing External Review, June 1999
   6/24/99                                                                        1
                   Outline of talk
•   LAN - architecture, assets, monitoring
•   Residential access to SLAC
•   WAN - connectivity and monitoring
•   Email - servers, spam, majordomo etc.
•   Other network services such as News ...
•   Advanced technology pilots
•   Summary - challenges etc.




      6/24/99                                 2
                      Mission etc.
• Provide leadership and support in data
  communications to the Laboratory as a whole and to
  physics research in particular.
  – Network engineering & management - 4.3 FTEs, 1 open
    slot
  – Network monitoring:
     • LAN 1.5 FTEs
     • WAN 2.7 FTEs
  – Network services (email, news, VMS etc): 2.5 FTEs
  – NetOps: 3 + 1 open slot
• Telecommunications also under same hat (helps
  coordination and convergence):
  – 2.5FTEs + contractor
    6/24/99                                               3
              Network Drivers
• Deployment of computers to new areas/farms/people
• Faster interfaces, more capable, easier to use
  computers
• New applications (BaBar, BSD, multimedia, VoIP
  …)
• Increased reliance
• Increased security
• World wide collaborations - distance independent
• New technologies (media, interfaces, protocols,
  applications)
    6/24/99                                       4
          Growth of SLAC LAN




6/24/99                        5
               Principles for LAN design
• Simplicity
  – Enet 10/100/1000 Mbps, phase out FDDI, LocalTalk etc.
  – Reduce number of protocols in core to IP only, limit
    bridging, keep smart stuff at edges,
• Stay away from edges of performance envelope
  – over-provision, double aggregate every 18 months
     • shared to switched 10Mbps => 100Mbps for desktop
     • 100Mbps switched => 1Gbps for core & high vol. Servers
• Provide high availability
  – redundancy of core components so can schedule outages
  – UPS
• Invest in network management tools
     6/24/99                                                    6
               Network Architecture
• Structured wiring started 1995, complete outside
  radiation fence this fiscal year, i.e. 90% completed
• Increasingly switched network (from shared media)
  – Based on mass market Ethernet
  – improved error isolation, ability to know where assets
    are, and security
  – scalable




     6/24/99                                                 7
                   SLAC Switched LAN Summer 1999
                   ESA                               Modems,
                                                                                    10BaseT
                                                      ISDN        Internet
                        Old                           xDSL                          FDDI/CDDI
                       Servers                                                       100BaseFL
                                                                     DMZ            100BaseT
 SSRL
                                                                                    1Gbit FL
                                 FDDI Ring
                                                                                     4Gbit FL
        Legacy
                                                                                     Concentrator
                                                        Routers                     Gigaswitch
                                                                     Core
                                                       Switches                      Router

                                                                                     Switch
                                                                     BSD
                                                                                      Hub


             4 Farms               3 Servers   IR2
                                                                                      MCC1
                                 16 Building
                                                                                         MCC2
                                  switches                                   MCC3
                                                     BaBar
SSRL

   6/24/99                                                                                       8
              Current state - availability
• Switched segmentation reduces impact of many
  problems, simplifies identification
• UPS for core components
• Redundant core devices
• Redundant power supplies on core switches &
  routers
• Redundant trunks
• Cisco Hot Standby Routing Protocol

    6/24/99                                      9
               Current state - performance
• Just been through major upgrade, switch fabric
  occupancy ~ 60%, 1000Mbps in core + high
  performance servers
  – 46% of available bandwidth in 1000Mbps links, 47% in
    100Mbps links
  – 2.6 hosts / collision domain (down from 3.6 at last
    review)
• Close collaboration with BaBar & systems to
  improve/optimize performance for trigger farms and
  data collection

     6/24/99                                               10
                      BaBar
• Make sure network is not the bottleneck
• Measured > 400 Mbps (UDP or TCP - with
  extended windows) Gbps to Gbps
• Measured ~ 400Mbps aggregate from Gbps to 4 *
  100 Mbps between CC & IR2
• Provide real-time web accessible monitoring page
  showing thruputs for various components & drill
  down



    6/24/99                                          11
Real time BaBar thruput Monitoring




 6/24/99                             12
             LAN assets inventory
• Oracle Database of network equipment, linked to
  property control, phone etc.
• Much of network info gleaned automatically and
  entered into dB:
  – connectivity from router ARP tables, from bridge/switch
    CAM tables, from CDP
     • gives MAC level addresses etc
     • create “model” of router/switch/hub & host connections
  – MIBs in nodes provide make & model, S/N, swr/hdw rev
    level, port type, speed
• Other info is entered manually:
  – when host registered it gets property control number, IP
    address, owner, admin
  – DNS entered into dB then automatically updates DNS
    tables
     6/24/99                                                 13
        LAN performance monitoring
• Read MIBs from routers & switches & plot:
  – octets, errors
  – generate alerts (outside thresholds, e.g. heavy
    multi/broadcast activity, heavy utilization, high error
    rates)
  – graphical Web reports using Java & other (MRTG) tools
    with history for baselines




    6/24/99                                               14
                DMZ monitoring
• FDDI probe monitors traffic coming in via ESnet,
  data is read out at intervals (typically once/hour) and
  logged to database.
• Reports are generated daily.
  – Report on common protocol utilization and suspicious
    use
  – top 20 nodes, conversation pairs, reports by domain,
    complete list of conversations



     6/24/99                                               15
     Residential & dialup services
           Dialup/ Dialup /        DSL-
           ARA     PPP        ISDN Covad
                                   144k,         DSL-PBI
                                   384k,         384k /
                              128 1.5M /         128k, 1.5M
Max speed 33kbps 56kbps       kbps 384k          / 384k
Inside SLAC
Firewall    Yes     Yes       Yes     No         No
Clients     Mac     Any       Any     Any        Any
            Opt.              Opt.    ~80%       ~60%
Location    Local Opt. Local Local    BayArea    BayArea
Users            70       150    80         14             2
Ports            12     46=>69        Campus     ISP
• Use PPTP VPN for security, have NT, Win98,
  Mac clients, also useful for travelers
 6/24/99                                                   16
                             Utilization




Tracking use, keeping logs for more detailed auditing



     6/24/99                                            17
               WAN Challenges
• No single management responsible for Internet
• Exponential growth
• HEP critically dependent on WAN for
  collaborations
• HEP/Research & Education competing with
  commodity usage in many cases
• Internet extremely complex, changing rapidly,
  internal behavior hard to predict
• HEP use is very diverse, collaborators, vendors,
  services
     6/24/99                                         18
                   Connectivity
• Use of ESnet link (43Mbps) up by factor of 2 in last
  4 months
  – 5 minute averages up to 10-15 Mbps / direction fairly
    typical/ day
  – In process of upgrading to 155 Mbps
• 40Gbytes/day IP traffic, roughly 50% TCP, 50%
  UDP
  – FTP, AFS, ssh, http, xwin are top protocols
• Campus link just upgraded from 10 Mbps to 155
  Mbps
• Working to get NTON reconnected
     6/24/99                                                19
     Internet End-to-end Monitoring
       www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
• Within ESnet connectivity excellent, Internet 2
  good, after that only acceptable to poor
• Monitor to set “user” expectations, help with
  problem detection, get planning information &
  trends, identify problem areas, optimize routes
• Collaborative effort to provide HEP-wide & ESnet
  wide monitoring requested by ICFA, ESnet
  – Partially funded by DOE/MICS FWP
  – Involves many HEP sites, led by SLAC & HEPNRC

    6/24/99                                                   20
               Main tool (PingER) currently
                      uses Ping
• Treats Internet as black box
• Provides useful real world measures of network
  round trip response time, loss, reachability, jitter
• Low cost/lightweight tool
  – ping “universally available”, easy to understand
      • no software for clients to install
      • no special privileges needed for monitor sites
  – resources: 100bps/link, ~600kBytes/month/link
• Agrees well with more complex measurements
     6/24/99                                             21
Extent of measurements
• 18 Monitoring sites - 7 in US (5 ESnet, 2
  vBNS), 2 in Canada, 7 in Europe (ch, de,
  dk, hu, it, uk(2)), 2 in Asia (jp, tw)
• 1261 monitoring-remote-site pairs
                                      PingER pair distribution by
• 379 unique hosts, 272 sites                   global area
                                                Russian   South
                                                         America
• 50 beacon sites, 27 countries          Japan
                                                 Fed
                                                  4%
                                                           1%
                                                                   Edu
                                                                   33%
• Metrics include response, jitter, loss, 3%



      reachability                     Europe                       Com
                                        38%                          2%
• Data goes back > 4 years                                        Gov
                                                                   7%
                                          China                      Mil

• 1 Million probes of Internet / day Canada Australasia Asia 1%
                                           2%                    Org 0%

                                                   5%     1%    2%
      6/24/99                                                         22
                                      Results 1/2
                   Comparison of median packet loss
                   for Mar-99 for various communities
                     10          75%
% median monthly




                                      median
                                      25%
   packet loss




                         1

                     0.1

                                            Community
                   0.01
                              ESnet -    vBNS -     XIWT -    ELab -
                   6/24/99   ESnet (31) vBNS (18) XIWT (140) ELab (14)   23
                                             Results 2/2
                             TCP bandwidth < (1470/RTT) * (1/sqrt(loss))
                          10000
                                      Canada (18 pairs)
                                      Edu/US (138 pairs)
                                      ESnet (31 pairs)
                                      Japan (12 pairs)
Bandwidth in kbytes/sec




                                      Europe (95 pairs)
                                      100% improvement / year
                                      Expon. (ESnet (31 pairs))
                           1000       Expon. (Europe (95 pairs))
                                      Expon. (Edu/US (138 pairs))
                                      Expon. (Canada (18 pairs))
                                      Expon. (Japan (12 pairs))

                            100




                             10
                             Jun-94
                           6/24/99       Oct-95        Mar-97       Jul-98   Dec-99   24
                              Email
• Gateway processes about 40K msg/day (growing
  25% / year, doubled since last review)
  – monitor & alerts (email, pager) on exceptions
  – 95% trivial email delivered in < 1 min
• ~ 2700 email users
  –   Support generic addresses fname.lname or userid@slac.stanford.edu
  –   700 POP users, 30 IMAP, Quickmail gone, VM gone
  –   separated IMAP & POP servers
  –   dedicated internal SMTP server
  –   IMAP pilot - Netscape & pine most popular clients
• See www.slac.stanford.edu/comp/net/email/futures.html
    6/24/99                                                               25
                      Current Email system
                                           VAX clusters                     Listserv
 Offsite                              SLACAX SSRL SLD SLC
  Mail
gateways              Redundant
                      mail servers                                         PC/Mac
                                                 SMTP       SMTP          Email users
            SMTP                      SMTP
                     Serv01..2                                 SMTP
                                        SMTP                                Eudora
              Non-authenticated relay           SMTPserv
Screening                             SMTP

 Router                                                                    Netscape



                                         NFS      POPserv           POP     Outlook
                       NFS                                           &
              SMTP                               IMAPserv          IMAP
  Unix                NFS                 Problem areas in red: cleartext   Unix
                               NFS server                                 IMAP users
Email users                                cleartext passwords
                                           NFS mounted spool
           NFS mode
                                         non-authenticated SMTP
      6/24/99                                                                       26
                                                 Backup                     Pine
                 Proposed Email System
                                                VAX clusters                  Listserv
 Offsite                                   SLACAX SSRL SLD SLC
  Mail
gateways              Redundant
                      mail servers                                           PC/Mac
                                                  SMTP        SMTP          Email users
            SMTP                           SMTP
                                                                 SMTP
                                                                              Eudora
                      Authenticated SMTP          SMTPserv
Screening                                  SMTP                SMTP

 Router                                                                      Netscape


                                                                      POP
                                                    POPserv            &      Outlook
                       NFS
                                                                     IMAP
              SMTP                                 IMAPserv
                                                                     SSL      Unix
  Unix                NFS        NFS server
Email users                                                                 IMAP users
           NFS mode
      6/24/99                                                                             27
                                                                               Pine
          Mail list server (majordomo)
• 215 lists (up from 155 last review)
• On a separate server
• Have web forms for requesting lists, maintaining
  subscriptions and querying the lists




     6/24/99                                         28
                  Spam / Viruses
• Actively provide anti-spam support:
  – last review was growing (factor of 16 in 9 months) up to
    40 spam actions/week
  – now stable ~ 10 actions/week
  – ~ 2100 sites blocked (was 90 two years ago)
  – prepared to restore domain upon user request
• Since Melissa remove any Excel or Word
  attachment with a macro on SLAC incoming email
• Also strip out well known viruses / worms (e.g.
  happy99, explore.zip.exe)
    6/24/99                                                    29
 Dynamic Host Configuration Protocol
• Provide DHCP for fixed hosts & roamers
• Tension between easy walk-up use & security
  – require registration for accountability
     • this is for connection inside the site firewall
     • an issue is whether to provide anonymous DHCP outside
       firewall (i.e. what you are using today)
  – seek guidance on how to strike the balance




    6/24/99                                                    30
                           DHCP
• Is in production but barely
• Web forms for adding to DHCP database
  – needs to allow editing, deleting, more restrictive
    availability, better integration with Enterprise DB
• Work in progress or queued
  –   automate log file pruning, restrict who can register hosts
  –   convert to use Enterprise DB as master
  –   convert DHCP server from SunOS to Solaris
  –   increase information logged about user, location etc.
• Needs resources (aka part of new hire) to focus on it
  and fix current problems
      6/24/99                                                      31
               Lightweight Directory
                  Access Protocol
• Microsoft is embracing LDAP in Windows 2000
• Email vendors are migrating towards password DBs
  in LDAP
• Have an LDAP-v3 server
  – loaded with the SLAC user directory information,
  – read only at the moment
• Starting to coordinate with other HEP labs (e.g.
  CERN), there is a HEP LDAP email list

     6/24/99                                           32
              News, NTP, DNS
• News down to 20 groups, out-sourced to campus
• NTP: driven from GPS on-site
• DNS: driven from Oracle network database of hosts




    6/24/99                                       33
              VMS central support
• Driven by SLD, has SLACVX for SLD offline
  – AlphaServer 8400 + 10 smaller alphas & + 6 VAXes (for
    X support hosts & legacy code)
  – ~ 6000 SpecInt92
  – 500 Gbytes disk, RAID controller, STK connection
  – HSM, Oracle etc.
  – Software & hardware basically stable
  – Supported by SCS staff (~0.5FTE)
  – Support folks autopaged

    6/24/99                                             34
    Advanced technology exploration
• ESnet IPv6 collaborator
• NGI proposal (Particle Physics Data Grid) and high
  performance WAN networking (China Clipper)
• NTON project (480 Mbps disk to application
  SLAC<>LBNL)
• Internet monitoring (IEPM)
• VoIP pilot with CERN, FNAL, DESY, ESnet/LBNL


    6/24/99                                        35
                 Major challenges
•   Tracking topology & configuration
•   Monitoring a switched network
•   Staying at right point in technology curve
•   Constraining complexity,
    – phase out of legacies, Appletalk, Macs, DECnet IV,
      FDDI (user resistance)
    – embracing new needs: e.g. VPNs, xDSL, IPv6, video,
      VoIP, IMAP, DHCP, QoS, new routing protocols



       6/24/99                                             36
                Major challenges
• Balancing security vs. usability & simplicity
• Increasing purposes for and dependence on the net
  – video, VoIP, multicast
  – outages hard to schedule, upgrades hard to do
• Finding & keeping staff




    6/24/99                                           37
                    Summary
• LAN: well positioned, architecture scales, follows
  industry practices, will need continued growth
• WAN: little control, yet must understand, track,
  monitor and collaborate with others inside & outside
  HEP, nationally & internationally
• VMS: central support reducing, stable, goes away
  with SLD
• Network services, technologies & protocols keep
  emerging
• People / skills resources are major gating factor
     6/24/99                                         38