Site Validation Report by X69r57T3


									Site Validation Session Report

    Piotr Nyczyk, CERN IT/GD
    Leigh Grundhoefer, IU / OSG
    Notes from Judy Novak
    WLCG-OSG-EGEE Workshop
    CERN, June 19-20th 2006
  Service Availability Monitoring
  (SAM) - “extension” of SFT:
   • generalized framework to monitor all
     LCG/EGEE services and not only CE: BDII,
     RB, LFC, FTS, etc.
   • most of the sensors run remotely (from
     central machine)
   • no installation needed on service machines
   • moved from MySQL to Oracle, optimized
     data schema
Available at:
•   SAM sensors:
    – currently: BDII (Taiwan), RB (RAL), CE, SRM, LFC, FTS, SE
•   release updates + SAM (SFT)
    – certifying current tests with each new release
    – Create update tests as necessary
    – CA cert. releases are special
• Availability views
    – current, daily, weekly, monthly
    – For CE, SE, SRM, siteBDII
    – displayed with GridView

      OSG Validation services
• CE/SE Validation aggregation : VORS - site scanner, BDII info
• OSG VO’s VOMS validation
• GridEX - application validation ( pilot job submissions )
• Site Policy template and publication
• GIP Validation
• Monitoring validation : MonALisa Client status (VO Jobs I/O)
• GridCat and the MIS-CI client
   – - Production instance
   – Client software:
• It seems to be impossible to avoid
  cross-monitoring (OSG monitoring
  doesn't include LCG-specific services,
  and the other way around)
• We should synchronize on VO level, but
  LCG/EGEE is also using regional
  OSG and EGEE Validation
• Site discovery - using discovered sites using
  – Ops VO - supported only on OSG sites which are
    interoperable. (fully deployed in July)
  – How can we determine if EGEE site is
    interoperable? Review certain BDII informations
• Cross installation of necessary tools and
  libraries for site validation
  – LCG tools - added as optionally installed package
    for OSG sites
  – OSG environment variables - ? (GIP)
   OSG and EGEE Validation
     Interoperability (cont)
• Use of existing GGUS- OSG GOC ticket exchange
  for error reporting
   – SAM database to use contact information for OSG
• Issue of coordinating scheduled downtime
   – OSG GOC will maintain a web page with
• Propose review of effort to add OSG specific
  validations to SAM framework.
• Testing and iterative development will be
  accomplished using Pre-Production sites and OSG
DB monitoring in SAM for Tier
    1’s (Dirk Duellmann)
• Jobs are connecting to the DB with either http (VO lib) or direct
  Oracle (instant client)
• Should be completed by October when experiments will start
  using DBs
•   CMS + Alice don't need them, but only 'squid’
• existing DB monitoring is too detailed for SAM/SFT, but SAM
  could provide highlevel monitoring of DB service
• some DB services (like LFC) are already tested by SAM, BUT
  only the functionality is tested, not the DB! The test could be:
   – threshold for connection between T0 -> T1
   – user access (squid)
   – client latency (?)
• Oracle client will be installed on the Worker Nodes

To top