VIEWS: 5 PAGES: 28 POSTED ON: 1/7/2012
GridPP3 Goal To provide UK computing for the Large Hadron Collider K 1.1 I K 1.2 I K 1.3 I K 1.4 I ATLAS LHCb CMS Other experiments 2 3 4 5 6 Grid services Tier-1 Tier-2 Management External K 2.1 I K 3.1 I K 4.1 I K 5.1 I K 6.1 I Operations Front end systems LondonGrid Planning Outreach & engagement K 2.2 I K 3.2 I K 4.2 I K 5.2 I K 6.2 I Security Resource delivery ScotGrid Execution National Grid & tracking Infrastructure K 2.3 I K 3.3 I K 4.3 I K 6.3 I Network Hardware procurement SouthGrid LCG & deployment K 2.4 I K 3.4 I K 4.4 I K 6.4 I Data and storage Storage systems NorthGrid EGEE management K 2.5 I Middleware support GridPP3 Goal To provide UK computing for the Large Hadron Collider Date 5/10/2011 210 Metric OK ATLAS LHCb CMS Other experiments 18 Metric close to target 1.1 1.2 1.3 1.4 8 Metric not OK 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4.1 1.4.2 1.4.3 1.4.4 1.4.5 2 Not able to be measured 1.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.3.6 1.3.7 1.3.8 1.3.9 1.3.10 1.4.6 1.4.7 1.4.8 1.4.9 1.4.10 118 Milestone achieved 1.1.11 1.1.12 1.1.13 1.1.14 1.1.15 1.2.11 1.2.12 1.2.13 1.2.14 1.2.15 1.3.11 1.3.12 1.3.13 1.3.14 1.3.15 1.4.11 2 Milestone underway 1.1.16 1.1.17 1.1.18 1.1.19 1.1.20 1.2.16 1.2.17 1.2.18 1.2.19 1.2.20 0 Milestone overdue 1.1.21 1.1.22 1.1.23 1.1.24 1.1.25 1.2.21 1.2.22 1.2.23 1.2.24 1.2.25 2 Milestone not due / metric n/a 1.2.26 1.2.27 1.2.28 1.2.29 1.2.30 12 Suspended 1.2.31 1.2.32 1.2.33 1.2.34 1.2.35 0 Awaiting input 1.2.36 372 Total 2 3 4 5 6 Grid services Tier-1 Tier-2 Management External Operations Front end systems LondonGrid Planning Outreach 2.1 3.1 4.1 5.1 6.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.5 3.1.6 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 2.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.7 3.1.8 3.1.9 3.1.10 3.1.11 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 5.1.8 5.1.9 5.1.10 6.1.6 6.1.7 6.1.8 6.1.9 2.1.11 2.1.12 2.1.13 3.1.12 3.1.13 3.1.15 3.1.20 3.1.21 4.1.11 4.1.12 4.1.13 4.1.14 4.1.15 5.1.11 3.1.22 Security Resource delivery ScotGrid Execution NGI 2.2 3.2 4.2 5.2 6.2 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 3.2.6 3.2.7 3.2.8 3.2.9 3.2.10 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.10 2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 3.2.11 3.2.12 3.2.13 3.2.14 3.2.15 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 5.2.11 5.2.12 6.2.11 2.2.16 2.2.17 2.2.18 2.2.19 2.2.20 3.2.16 3.2.17 3.2.18 3.2.19 3.2.20 2.2.21 2.2.22 3.2.21 3.2.22 3.2.23 3.2.24 3.2.25 3.2.26 Network Hardware procurement SouthGrid LCG 2.3 3.3 4.3 6.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5 2.3.6 2.3.7 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 6.3.6 6.3.7 6.3.8 6.3.9 3.3.11 3.3.12 3.3.13 3.3.14 3.3.15 4.3.11 4.3.12 4.3.13 4.3.14 4.3.15 3.3.16 3.3.17 3.3.18 3.3.19 3.3.20 3.3.21 3.3.22 3.3.23 3.3.24 3.3.25 3.3.26 3.3.27 3.3.28 3.3.29 3.3.30 3.3.31 Data and storage Storage systems NorthGrid EGI 2.4 3.4 4.4 6.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5 2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 3.4.6 3.4.7 3.4.8 3.4.9 3.4.13 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10 6.4.6 6.4.7 6.4.8 6.4.9 6.4.10 2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 3.4.14 3.4.15 3.4.16 3.4.17 3.4.18 4.4.11 4.4.12 4.4.13 4.4.14 4.4.15 6.4.11 6.4.12 6.4.13 6.4.14 6.4.15 2.4.16 3.4.19 3.4.20 3.4.21 6.4.16 6.4.17 6.4.18 6.4.19 6.4.20 Middleware support 2.5 2.5.5 2.5.6 2.5.7 2.5.8 1.1 ATLAS Owner Roger Jones Metric no. Description Source Owner Target Q111 Comment Roger 1.1.1 Tier 1 - Available kSI for reconstruction PANGEA/BDII Jones Roger 1.1.2 Tier 1 - Available kSI for group analysis PANGEA/BDII Jones Roger 1.1.3 Tier 1 - Available kSI for MC production PANGEA/BDII Jones Roger 94.90% 1.1.4 Tier 1 - Job success rates in batch system Matt Jones 95% Note: rate is now from new dashboard Roger 1.1.5 Tier 1 - Available storage in usable service classes DQ2/BDII Jones UB allocation Roger 1.1.6 Tier 1 - Data reading rates from storage system to batch farm Job logs Jones Planning document Roger 1.1.7 Tier 1 - Rates of data movement from tape to disk for reprocessing. Jones Planning document Roger 91.70% 1.1.8 Tier 1 - Data availability in storage system. DQ2/BDII Jones 99% SAM metrics used Roger 1.1.9 Tier -1 Data loss per quarter (when not recoverable) DQ2 Jones 0.10% Roger 79.00% 1.1.10 Tier 1 - Data acceptance from CERN, Tier 1s, Tier 2s DQ2 Jones 99% GGUS and ATLAS UK Operations savanna portal Roger 1.1.11 Tier 1- MoU service levels and mailing list Jones MoU Roger 91.70% 1.1.12 Tier 2 - Data acceptance from Tier 1 DQ2 Jones 95% Estimated from SAM storage capacity Roger 1.1.13 Tier 2 - Available simulation slots BDII Jones >20% capacity Roger 1.1.14 Tier 2 - Available analysis slots BDII Jones >70% capacity Roger 92.40% 1.1.15 Tier 2 - Job success rates in batch system Jones 95% Note: rate is now from new dashboard Roger 1.1.16 Tier 2 - Available storage in production space DQ2 Jones UB allocation Roger 1.1.17 Tier 2- Available storage in user space DQ2 Jones UB allocation ATLAS UK Operations 91.70% savanna portal and mailing Roger 1.1.18 Tier 2 - Data availability in storage system list Jones 95% Estimated from SAMxstorage capacity Roger Server lost at Glasgow, recovered; fraction 1.1.19 Tier 2 - Data loss from storage system Sites Jones 1% 4.3 estimated by data volume GGUS and ATLAS UK Operations savanna portal Roger 1.1.20 Tier 2- MoU service levels (acknowledge, diagnose and prognosis; fix) and mailing list Jones MoU Close at least 4 Savannah Tickets (bug reports/feature requests) 1.1.21 Contribute to GangaAtlas bug fixes and new functionality Mark Slater per quarter 5 Dedicate one week per month to being a support shifter and answer >90% 1.1.22 Make significant contributions to user support Mark Slater of Ganga related queries 4 Contribute to at least 2 Contribute to the preparation of material for and delivery of the Ganga CERN and 2 UK tutorials 1.1.23 Tutorials held at CERN and the UK Mark Slater per year Run a complete and comprehensive check of current GangaAtlas Run at least one site test 1.1.24 functionality with the latest versions of Athena and DQ2 Mark Slater per week Description Milestone no. Owner Due date Date complete Evidence Comment The release manager list is posted at https://twiki. cern.ch/twik i/bin/view/A This helped me to understand the whole Ganga 1.1.25 Spend one month as Release Manager for Ganga Mark Slater Oct-08 20/08/08 rdaGrid/Ga package a lot more 1.2 LHCb Owner Glenn Patrick Metric no. Description Owner Source Target Q111 Comment 1.2.1 UK share of LHCb production computing needs Raja Nandakumar 18% 28.30% http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php 1.2.2 MC production (generation) efficiency in the UK Raja Nandakumar https://lhcbweb.pic.es/DIRAC/LHCb- 95% 98.00% https://lhcbweb.pic.es/DIRAC/LHCb- Very little Monte Carlo processing activity at Tier-1 1.2.3 T1 MC production (reconstruction, stripping) efficiency Raja Nandakumar Production/lhcb_prod/systems/accountingPlots 90% 94% 15% 7.7% share - share https://lhcbweb.pic.es/DIRAC/LHCb- 70% 90.6% 1.2.4 T1 MC/Event user analysis - UK share/ efficiency Raja Nandakumar Production/lhcb_prod/systems/accountingPlots efficiency efficiency https://lhcbweb.pic.es/DIRAC/LHCb- Low activity Production/lhcb_prod/systems/accountingPlots/da 1.2.5 T2 data transfer - T2->RAL Raja Nandakumar taOperation 50 MB/s 2.5 MB/s https://lhcbweb.pic.es/DIRAC/LHCb- Production/lhcb_prod/systems/accountingPlots/da 1.2.6 T2 data transfer- T2->others (failover?) Raja Nandakumar taOperation https://lhcbweb.pic.es/DIRAC/LHCb- 10 MB/s 50 MB/s Production/lhcb_prod/systems/accountingPlots/da 1.2.7 T1 data transfer - Incoming Raja Nandakumar taOperation 100MB/s 85 MB/s https://lhcbweb.pic.es/DIRAC/LHCb- Production/lhcb_prod/systems/accountingPlots/da 1.2.8 T1 data transfer - Outgoing Raja Nandakumar taOperation 20 MB/s 192 MB/s http://www.gridpp.rl.ac.uk/stats/ https://sls.cern.ch/sls/service.php?id=LHCb- MoU/17 1.2.9 T1 data storage : Tape (Allocated/UK fraction) Raja Nandakumar Storage % 446 http://www.gridpp.rl.ac.uk/stats/ https://sls.cern.ch/sls/service.php?id=LHCb- MoU/17 1.2.10 T1 data storage : Disk (Allocated/UK fraction) Raja Nandakumar Storage % 741 Various downtimes due to upgrades, srm interventions, network interruptions, batch system 1.2.11 LHCb SAM tests uptime T1 Raja Nandakumar http://lcg-sam.cern.ch:8080/reports/site_avail.xsql 98% 79% issues 1.2.12 LHCb SAM tests uptime T2 Raja Nandakumar http://lcg-sam.cern.ch:8080/reports/t2/site_avail.xsql 80% 87% 1.2.13 LHCb Castor storage reliability LHCb SAM tests Glenn Patrick >85% 93% 1.2.14 % analysis jobs using UK T1 DIRAC Raja Nandakumar >5% 7.70% 1.2.15 % analysis jobs using all UK resources DIRAC Raja Nandakumar >10% 13.30% RAL diskserver issues. 1.2.16 >90% T1 liaison and Castor meetings attended when at Minutes/reporting Glenn Patrick Yes RAL Yes Above http://ganga.web.cern.ch/ganga/release/5.5.2/repo 85% of rts/html/coverage/summary/GangaLHCb/index.ht code 1.2.20 Unit testing of code within area of responsibility Stuart Paterson m covered Above 80% within 2 1.2.21 Classification of raised tickets for bugs Stuart Paterson https://savannah.cern.ch/projects/ganga/ days Above 50% within next minor release The un-even amount of time available during this 1.2.22 Resolving bugs Stuart Paterson https://savannah.cern.ch/projects/ganga/ cycle period led to some delays. Review https://twiki.cern.ch/twiki/bin/view/LHCb/GangaTut every 3 1.2.23 Keep user training material updated Stuart Paterson orial1 months Review every 3 Use cases have been refined following the review in 1.2.35 Review new use cases Stuart Paterson Ganga Savannah list of feature requests months Fully Sep 2010. trained shift Support and develop the Distributed Analysis cover at 1.2.36 DAST team Stuart Paterson Support Team (DAST) in LHCb all times Description Milestone no. Owner Evidence Due date Date complete Comment 1.2.17 Glenn Patrick Two FEST runs in Q1 2009. Successful transfer of Mar-09 data from CERN to RAL and between Tier-1s followed by processing at RAL. No issues raised FEST operations have been extended and will Successful UK participation in FEST tests against RAL. Mar-09 continue for a few more months 1.2.18 Glenn Patrick Jun-10 In a sense, we have achieved the milestone since over the last year of running we have had sustained reconstruction and stripping at the UK T1. However, the recent load problems on the Tier 1 seem to have been caused by workflows consisting of simultaneous reconstruction, merging and user analysis. The recent steps taken to beef up the disk servers and SRMs hopefully mean that the T1 is Successful reconstruction & stripping of real data at UK T1 Stripping is not yet successful enough to allow this more robust, but we won't really know this until we re- sustained for > 1 month. to go green. Revisit in September 2010. Mar-11 process this year's data. 1.2.19 GridPP4 plan for future support beyond March 2011. Glenn Patrick GridPP4 proposal submitted. Outcome received Dec-09 Submitte August 2010. d Feb 10 The old DIRAC2 WMS was turned off in mid January having achieved a full transfer to DIRAC This works was done in collaboration with PG 1.2.24 Ensure transfer of users to DIRAC3 WMS Mike Williams 3. Jan-09 Dec-08 student Will Reece Enhance DIRAC3 backend through use of native DIRAC3 As project has turned more ambitious it will take 1.2.25 API Mike Williams Prototype up and running. Apr-09 Jun-09 slightly longer. See new milestone below. Modify Ganga to DIRAC communication to to work through 1.2.26 a client-server process Mike Williams Release done and working Jul-09 Sep-09 All functionality is now available. Support new use cases arising from analysis of data coming 1.2.27 from detector Mike Williams On track Dec-09 Dec-09 This is ongoing. It will move in as a milestone. 1.2.28 Integrate better with LHCb s/w installation process Ulrik Egede Default release process now Dec-09 Feb-10 This should make user support easier 1.2.29 Improved configuration of ROOT application Mike Williams Feb-10 Mar-10 Part that is within LHCb responsibility implemented. Extensive testing of new job repository before it becomes 1.2.30 default Mike Williams The default repository now Dec-09 Feb-10 Integration performed following initial work by 1.2.31 Introduce timestamps in Ganga Mike Williams Feb-10 Feb-10 summer student 1.2.32 Make dataset definitions more flexible Mike Williams Mar-10 Mar-10 1.2.33 Integrate XML job summary from Dirac into Ganga Stuart Paterson Jun-10 Aug-10 Introduced in release 5.5.12 of Ganga Suspended as LHCb eventually has realised that these files are not well supported by the whole 1.2.34 Fully support the use of ETC datafiles on the Grid Stuart Paterson Aug-10 analysis chain. 1.3 CMS Owner Dave Colling Metric no. Description Source Owner Target Q111 Comment OK Overall – not the Uks best quarter at supporting CMS as there were lots of minor things that went wrong and didn't paint the UK in a great light especially the T1. These are not properly reflected in these outdated metrics. However, as always seem to the case the UK managed to get all the things that went wrong to happen when CMS was not taking Dave CCRC data and so it didn't really matter. The data taking period has 1.3.1 T0 -RAL T1 transfer rate PhEDEx Colling targets been very good indeed. Dave CCRC OK 1.3.2 T0 - RAL T1 transfer quality PhEDEx Colling targets Dave CCRC OK 1.3.3 T1 -T1 (where RAL is one of T1) transfer rate PhEDEx Colling targets Dave CCRC OK 1.3.4 T1 - T1 (where RAL is one of T1) transfer quality PhEDEx Colling targets Dave CCRC OK Some issues with Brunel's network however an upgrade plan 1.3.5 Transfer rates to the UK Tier 2 PhEDEx Colling targets is in progress. Dave 1.3.6 Time between Data arriving at RAL and being processed Monitoring at Tier-1 Colling 1 day OK Dave 1.3.7 Tape Migration rate Monitoring at RAL Colling 250MB/s OK <0.1% Dave over the 1.3.8 Data Loss at Tier-1 Colling year OK Close to target actually ~90% but second best in CMS so cannot Dave really 1.3.9 Data Availability at Tier-1 Failed jobs from Dashboard Colling >99% complain Dave 1.3.10 CMS SAM and availability tests at Tier-1 Central CMS monitoring Colling >80% OK OK but should really now be the readiness Dave 1.3.11 CMS SAM and availability tests at Tier-2s Central CMS monitoring Colling >80% OK Mixed bag ... generally OK but with some problems at sites. Dave <1% over Not really OK some unique data lost at Imperial. Steps now 1.3.12 Data loss at Tier-2s Colling the year Not OK put in to make sure that it doesn't happen again. OK (or very near to OK Dave at all UK 1.3.13 Failed analysis jobs Dashboard Colling <10% sites) Dave 1.3.14 Failed production jobs Dashboard Colling <10% OK As above 1 Tier-1 Dave and 2 Tier- 1.3.15 Provide an appropriate level of resources to CMS CRB Colling 2s As in the previous quarter. Very happy generally. 1.4 Other experiments Owner Glenn Patrick Metric no. Description Source Owner Target Q111 Comment Tier 1 efficiency statistics Glenn Patrick 75% 71.9% Dominated by ALICE(68.8% and then 1.4.1 CPU efficiency for "other" experiments H1(91%), Pheno(98.1%) and T2K(77.2%) Fraction of CPU time used by "other" experiments Tier 1 from UB Schedule Glenn Patrick T1>5% T1=15.1% ALICE & other LHC not data UK Grid from EGGE Glenn Patrick Grid>5% CPU used. taking for most of this quarter. acounting Grid=9.5% of 1.4.2 norm. CPU. % of jobs reported by any experiment as failing due to Grid Experiment-UB reporting Experiments <50% in No MINOS numbers Pheno comment part of quarter available (effort). survey for GridPP26 talk. Anecdotal report by Pheno 1.4.3 of efficiency < 50% Number of new user groups using the T1/UK Grid UB/T1 statistics Glenn Patrick 1 per year 0 H1 largest user of T1 outside of 1.4.4 LHC experiments. Number of issues dealt with by user support posts Janusz M. Glenn Patrick >1 per 2+ Stephen B: GGUS tickets and emails (esp. Stephen B. quarter cpacity publishing). Some consultancy on GLUE schema for EMI. Papers submitted from CHEP talks. Janusz M: Support to SuperNemo LFC. NA62 - ported MC program to Centos5 m/c at IC. Enabled NA62 VO at IC. Working on Grid job script to install NA62 s/w. 1.4.5 Description Milestone no. Evidence Owner Due date Date complete Comment Report published on UB Combined with T1 VO Support Survey (2008- 1.4.6 User support questionnaire 2008/2009 website. Glenn Patrick Apr-10 30.04.2009 2009). Report published on UB 1.4.7 Satisfaction questionnaire for other experiments 2008/2009 website. Glenn Patrick Apr-10 30.04.2009 Report published on UB Combined with T1 VO Support Survey (2009- website and GridPP news 2010). Delay was due to wait for responses 1.4.8 User support questionnaire 2009/2010 item released. Glenn Patrick Dec-10 10.09.2010 from GPD experiments. Report published on UB website and GridPP news 1.4.9 Satisfaction questionnaire for other experiments 2009/2010 item released. Glenn Patrick Dec-10 10.09.2010 1.4.10 User support questionnaire 2010/2011 Glenn Patrick Mar-11 Suspended to GridPP4. Given above delays, assume won't appear until 1.4.11 Satisfaction questionnaire for other experiments 2010/2011 Glenn Patrick Mar-11 Suspended to GridPP4. later in 2011 (i.e. likely GridPP4). 2.1 Operations Owner Jeremy Coles Metric no. Description Source Owner Target Q111 Comment Manual - Information available Number of GridPP sites in certified 100% 2.1.1 Fraction of UK sites in Production from GOCDB Jeremy Coles status. Require 100% 50 Glasgow has added new Vos (such as dames.org.uk and Manual - Tier-2 quarterly reports Sum of unique VOs supported across enroller.org.uk) to those it supports. 2.1.2 Number of supported VOs and Tier-1 Jeremy Coles GridPP sites - target 4 LHC + 5 other 56% The walltime figure this month is 64%. 2.1.3 Fraction of HEPSPEC06 used From APEL accounting Jeremy Coles Jobslots used/ jobslots provided - 80% Total GridPP KSI2K nominally 195507 Total HEPSPEC06 is above pledged amounts for each Tier-2. Manual sum over entries in available at the end of the last quarter. 2.1.4 GridPP CPU Available quarterly reports Jeremy Coles Target WLCG pledge 15203 Each Tier-2 meets its WLCG pledge. Total TB of disk storage nominally Manual sum over entries in available from GridPP at the end of the 2.1.5 GridPP disk storage available quarterly reports Jeremy Coles last quarter. Target WLCG pledge Based on the user jobs submitted by Steve (http://pprc.qmul.ac.uk/~lloyd/grid Bristol no longer support the jobs from SL. Both Bristol and 2.1.6 Job success rates pp/uktest.html) Jeremy Coles 95% 96% Birmingham are therefore not included in this calculation. Percentage of jobs run by LHC V0s Figures still do not account for US contribution. The experiment Manual sum over data provided processed in UK in last quarter. Target derived figures should be used. This number is for all Grids 2.1.7 UK contribution to LHC experiments by accounting portal Jeremy Coles is as pledged shares (11%?) 17.50% accounted in APEL. Deployment team meetings take place 2.1.8 Deployment team meetings Manual review of UK agenda page Jeremy Coles on average biweekly OK 2.1.9 UK wide deployment support active Manual review of UK agenda page Jeremy Coles 10 per year OK Some pages like the security area, links, and top-level descriptive pages have been updated. A review of the wiki content and user Web-page time stamps indicate Static deployment web-pages updated areas has yet to be done. New areas for the ops team need to be 2.1.10 GridPP deployment web-pages up-to-date updates Jeremy Coles within last 6 months OK populated to cover the team tasks. Sysman meetings and Training events 2.1.11 Training needs addressed Jeremy Coles held. - 2 per year currently OK 2.1.12 GridPP site response to tickets Jeremy Coles Number of flagged tickets <3 0 The figure is below 10% but for the reasons mentioned in Q4 is 2.1.13 Number of sites on VO blacklists Jeremy Coles Less than 10 site days per quarter <10% now inaccurate as a measure. 2.2 Security Owner Tony Doyle Metric no. Description Source Owner Target Q111 Comment Number of Tier 2 security 0 No incident was reported in this quarter; two incidents were reported to EGI CSIRT, but 2.2.1 incidents in the last quarter Mingchao Ma <3 no UK site was affected Average quality of response to n/a No incident was reported in this quarter Tier 2 security incidents in last 2.2.2 quarter. Mingchao Ma 2 Handling of security 8 8 new Grid Middeware vulnerabilities reported. vulnerabilities reported in the 2.2.7 last quarter e-mail 15/08/08 Linda Cornwall 2 Proportion of requests for new 92.00% features or changes to GridSite 2.2.19 met Andrew McNab 50% Number of non-GridPP projects 16 2.2.20 using GridSite Andrew McNab 1 2.2.21 Number of GridSite downloads Andrew McNab 100 2484 Milestone no. Description Evidence Owner Due date Date complete Comment 1. Due to the wide-spread security incient in August 2008 and the delay of the SSC3 tool kit, SSC3 at UKI was delayed for about one month; https://www.gridpp.ac.uk/security 2. the results and scores together with the final report (draft) are availabe via URL: 2.2.3 Complete SSC3 at UK Tier 2s /ssc/ssc3/index.html Mingchao Ma 31-Nov-08 24-Dec-08 https://www.gridpp.ac.uk/security/ssc/ssc3/index.html A security questionnaire (http://www.gridpp.ac.uk/security/document/Site_Security_Review.pdf) was sent to all Tier2 sites on 8th Dec. 2009. By now, most sites have completed the questionnaire. 2.2.4 Security review of Tier 2 sites Mingchao Ma 30-Sep-09 29-Jan-10 The rest (2 sites) will complete it today. The workshop was held at RAL on 1st July, where about 40-50 system admins http://hepwww.rl.ac.uk/sysman/J attended. Two external speakers, one from Oxford University CSIRT and another from 2.2.5 GridPP security workshop une2009/agenda.html Mingchao Ma 31-Jul-09 1-Jul-09 JANET CSIRT, were invited to give a talk on incident handling. GridPP Tier 2 security survey 2.2.6 and report Mingchao Ma 31-Jul-11 it has been agreed that this milestone is merged with milestone 2.2.12. https://edms.cern.ch/document/9 2.2.8 Security Assessment plan 29864/1 Linda Cornwall 31-Jul-08 8/21/2008 Recommended for AMB approval Security Vulnerability and risk 2.2.9 management strategy Linda Cornwall https://edms.cern.ch/document/988573/1 31-May-09 7/15/2009 Currently in review - updated 15/07/09 review complete. https://edms.cern.ch/document/1 EGEE Milestone MSA1.11 final draft completed on 21/12/2009. Final report from 2.2.10 Security Policy integration 055025/1 Dave Kelsey 31-Jan-10 2/10/2010 reviewer received on 10/2/2010 so completed then. This EGEE Deliverable was a complete review of all operational, policy and middleware https://edms.cern.ch/document/1 security in EGEE (and therefore GridPP too). Linda Cornwall, Mingchao Ma and David 2.2.11 Review of GridPP Security 058629/2 Dave Kelsey 30-Jun-10 5/4/2010 Kelsey all contributed to this review. The milestone was completed on time (as recommendations were in place and used in the planning of GridPP4 and our involvement in EGI) but the production of this document was delayed by two months (until 31 March 2011) because of the funding uncertainties following CSR2010; we were waiting for notification from STFC of the full http://www.gridpp.ac.uk/docs/Gri GridPP4 Award (this happened on 11 March 2011) and also hoping that more Security recommendations for dPP-PMB-153- information about future funding for the UK NGS beyond October 2011 would be 2.2.12 the future SecurityRecommendations.pdf Dave Kelsey 31-Mar-11 3/31/2011 forthcoming (this did not happen and is now not expected before May 2011). https://www.gridsite.org/reports/ 2.2.13 GridSite software report gridsite-09.pdf Andrew McNab 30-Mar-09 6/30/2009 https://www.gridsite.org/reports/v 2.2.14 Security report oms-sec-09.pdf Sergey Dolgobrodov 30-Mar-09 6/30/2009 https://www.gridsite.org/reports/ 2.2.15 GridSite software report gridsite-2010.pdf Andrew McNab 30-Mar-10 03/31/10 https://www.gridsite.org/reports/v 2.2.16 Security report oms-sec-2010.pdf Sergey Dolgobrodov 30-Mar-10 03/31/10 https://www.gridsite.org/reports/ 2.2.17 GridSite software report gridsite-2011.pdf Andrew McNab 30-Mar-11 03/30/11 https://www.gridsite.org/reports/v 2.2.18 Security report oms-sec-2011.pdf Sergey Dolgobrodov 30-Mar-11 03/31/11 https://documents.egi.eu/secure/ 2.2.22 Document produced EGI PM3 ShowDocument?docid=47 Linda Cornwall 31-Jul-01 7/31/2010 EGI operational security milestone defining the EGI vulnerability issue handling process 2.3 Network Owner Tony Doyle Metric no. Description Source Owner Target Q111 Comment Robin 80% of meetings Attend LHCOPN meetings and LHCONE 2.3.1 Represent GridPP interest for all LHCOPN activities Tasker attended discussions Robin all meetings 2.3.2 Act as Chairman of the RAL Network Technical Design Authority Tasker attended Next meeting scheduled for May Robin 80% of meetings Represent GridPP interests nationally, for example with JANET(UK), at the Tasker, attended (2 JCN, the JANET Technical Design Authority, JANET Optical Steering Group; Peter meetings per 2.3.3 and internationally as appropriate Clarke year) pm Robin Reports on each 2.3.4 Provide "network consultancy" to GridPP as required Tasker activity The move of Gridmon to Glasgow has not Robin 6 monthly status completed through other work pressures. 2.3.5 Maintenance of Gridmon nodes and associated software Tasker reports This will finish within the next 2 months Robin Plans reviewed 2.3.6 Up to date network plans available at each site Tasker each year Peter Plans reviewed 2.3.7 GridPP annual networking forward look available Clarke each year Document delivered 2.4 Storage and data management Owner Tony Doyle Metric no. Description Source Owner Target Q111 Comment Correctly defined space tokens as per New space allocation at t2s in Feb, but no new space tokens. 2.4.1 experiments' requirements. Brian Davies 90% 100.00% In fact we got rid of one, MCDISK Success rates for experiment data 2.4.2 transfers Matt Hodges 70% 86.80% 2.4.3 Rate of data deletion from site for Vos Andrew Elwell 1 Hz ATLAS 151.99 MB/s CMS 184.58 MB/s LHCb 5.46 MB/s Data transfer rates from T2s satisfy As defined by T2K.org 9.80 MB/s 2.4.4 experiment requirements. Matt Hodges LHC VOs Total 351.88 MB/s CHEP and/or AHM paper produced each Jens Jensen, Greig Cowan, Andrew New year... reset counter. year describing GridPP Elwell, Graeme Stewart, Peter Love, developments and innovations in data Shaun DeWitt, Chris Kruk, Brian Davies 2.4.5 management and storage area et al. One per year Engage with storage and data Jens Jensen, Greig Cowan, Andrew Brian presented to GDB in Jan.. Lots of talks given at management experts within WLCG to Elwell, Graeme Stewart, Peter Love, Number of GridPP27 and the storage workshop. reinforce GridPP's recognised Shaun DeWitt, Chris Kruk, Brian Davies talks given = 2 2.4.12 competence in this area. (talks given) et al. per quarter Engage with storage and data management experts within WLCG to Jens Jensen, Greig Cowan, Andrew Number of reinforce GridPP's recognised Elwell, Graeme Stewart, Peter Love, meetings competence in this area. (meetings Shaun DeWitt, Chris Kruk, Brian Davies attended = 2 Wahid met DPM developers in March and discussed 2.4.13 attended) et al. per quarter 2 collaboration. Brian attended the GDB in Jan (remotely) Brian worked with CMS on WAN tuning their data transfers. Work with experiment data management Report back He also worked with ATLAS on their data cleanup. Sam has experts on relevant problems Jens Jensen, Greig Cowan, Andrew from each worked with NeISS as they are a new VO and need support for LHC experiments (good contacts with Elwell, Graeme Stewart, Peter Love, experiment on (but of course they are not WLCG) DDM team for ATLAS; need to Shaun DeWitt, Chris Kruk, Brian Davies work 2.4.14 work on other VOs) et al. conducted Jens Jensen, Greig Cowan, Andrew Blog about work with experiment data Elwell, Graeme Stewart, Peter Love, management experts on relevant Shaun DeWitt, Chris Kruk, Brian Davies 6 blog entries Brian: 4 (mostly Dave but also ATLAS data); Jens 2: mostly 2.4.15 problems for LHC experiments et al. per quarter 6 cloud storage related. Milestone no. Description Evidence Owner Due date Date complete Comment Provide input into glue 2.0 schema discussions about useful and practical information which can be 08Q2: 2.4.6 delivered from storage systems Jens Jensen 30-Jun-08 http://storage.esc.rl.ac.uk/standards/ogf23-srm.doc Deliver a tool for sites to report on See talk given at storage usage per user in a Storage workshop July 2.4.7 VO for DPM 2009 Greig Cowan 30-Sep-08 Part of DPM toolkit, managed by Sam 2.4.8 Develop a tool to test bulk deletion rates Brian Davie (was Andrew Elwell) 30-Sep-08 Study integration of experiment data transfer monitoring with SAM See talk given at There are several studies done, we look at it after most larger systems and nagios alarm systems on Storage workshop July challenges, but the most complete study so far was done by 2.4.9 site. 2009 Andrew Elwell 30-Mar-09 Kashif Mohammad at Oxford: see workshop. Deliver a tool for sites to report on storage usage per user in a This is completed (by DESY) but only works for dCache 2.4.10 VO for dCache See report Greig Cowan 30-Mar-09 installations that have migrated to Chimera Help develop a toolkit for discovering files orphaned from the Toolkit - now maintained 2.4.11 storage system namespace. by Sam Greig Cowan 30-Sep-09 Completed - part of the toolkit Study data access patterns once LHC data is live to suggest improvements in data placement policies http://www.gridpp.ac.uk/ Talk at the across the grid. (Building on wiki/FileAccessPatterns end of the first See also http://www.gridpp.ac.uk/gridpp24/Filesystems- 2.4.16 Optor Sim work) (and links from there) Andrew Elwell, Graeme Stewart run Wahid.pptx 2.5 Middleware support Owner Tony Doyle Metric no. Description Source Owner Target Q310 Comment 2.5.7 Quarterly reports on R-GMA bugs and their classification Steve Fisher Report each quarter 1 See last sheet 2 APEL (still) and Grid Ireland's Number of software packages using R-GMA Intrusion Detection work 2.5.8 Steve Fisher 1 Milestone Date no. Description Evidence Owner Due date complete Comment The indicated completion date is Middleware releases produced in the first year and report on status of multi- https://edms.cern.ch/d when the EGEE Activity 2.5.5 platform support ocument/992930/ Steve Fisher Mar-09 May-09 Management Board approved it The indicated completion date is Middleware releases produced in the second year and update on operation https://edms.cern.ch/d when the EGEE Activity 2.5.6 and multi-platform support ocument/1072430/ Steve Fisher Apr-10 May-10 Management Board approved it 3.1 Front end systems Owner Andrew Sansum Number Tier-1 no. Description Source Owner Target Q111 Comment 3.1.1 3.1.1 Availability of LFC service Catalin Condurache 99% 99% 3.1.2 3.1.2 Availability of LHCb LFC service Catalin Condurache 99% 99% 3.1.3 3.2.1 Availability of WMS service Catalin Condurache 99% 100% Catalin Condurache 98%? 100% 3.3.1, Availability of LHCb, ALICE and CMS VO 3.1.5 3.3.2, 3.3.3 boxes 3.1.6 3.4.1 Availability of R-GMA Registry service Catalin Condurache 99% N/A No longer relevant 3.1.7 3.5.1 Availability of RGMA service Catalin Condurache 95% N/A [2010Q4]Now decommissioned 3.1.8 3.6.1 Availability of CE service Derek Ross 99% 99% 3.1.9 3.10.3 SAM availability of FTS service Matt Hodges 99% 99% 3.1.10 3.11.1 SAM availability of MyProxy service Matt Hodges 99% 100% 3.1.11 3.12.1 Availability of UI service Matt Hodges 95% 100% 3.1.12 3.13.1 SAM availability of site BDII service Matt Hodges 99% 100% 3.1.13 3.13.2 SAM availability of toplevel BDII service Matt Hodges 99% 100% 3.1.15 ? 3D service for ATLAS and LHCb availability 100% Date MilestonesTier-1 no. Description Evidence Owner Due date complete Comment 3.1.20 3.2.3 Resilient national WMS Service in place Catalin Condurache 31-Jul-08 27-Aug-08 on-target 3.1.21 4.1.8 Provide site dashboard for experiments. Gareth Smith 3-Sep-09 3-Sep-09 1-Nov-09 [2009Q3]RAL networking had problems making the system work through the site firewall. Progress is LHC Monitoring infrastructure operational at slowly being made but not all functionality is in place 3.1.22 7.4.4 RAL Robin Tasker 1-Sep-08 yet. 3.2 Resource delivery Owner Andrew Sansum Number Tier-1 no. Description Source Owner Target Q111 Comment Extractable from Andrew Sansum 93% 97% [2010Q3]5% addition applied to August data to correct for Oxford Nagios WLCG Service Availability http://lcg.web.cern.ch/LCG/ problem Target (set lower by WLCG MB/availability/site_reliability [2009Q2]Impacted by R89 migration, scheduled CASTOR development work 3.2.1 1.2.2 than MoU) .pdf (migration of core RAID hardware) and extended network outages. Meet WLCG MoU target Andrew Sansum 95% 99% response time for operational problems (2 hours in prime shift and 12-48 hours outside [2009Q3]We do not monitor 12 hour response - we only record how we meet 3.2.2 1.2.3 prime shift) our own target response time (2 hours day or night). Fraction of WLCG MoU John Gordon 100% 113% 3.2.3 1.2.7 commitment for CPU Metric 0.106 or 0.107 and 63% Fraction of available T1 GOC accounting for LCG. 3.2.4 ? KSI2K used in quarter CPU time 20-95% Andrew Sansum 2/year 0 3.2.5 1.4.1 Number of Security Incidents % Time [weighted by Jeremy Coles 1% resource share] on VO 3.2.6 1.4.2 blacklists Andrew Sansum 1 0 Number of Incidents reaching level 3 in the disaster 3.2.7 1.4.3 management system Andrew Sansum 0 0 Number of Incidents reaching level 4 in the disaster [2010Q1]No incident was formally logged as level 4, but Februaries SAN 3.2.8 1.4.4 management system problems should have been and this report includes these in the assesment. Matt Hodges 100% 101% [2009Q3]Does not include September (UB allocation not set) also see Q2 % met of normalised UB note. [2009Q2]UB allocations did not fully reflect resources reported as 3.2.9 1.6.3 allocation for CPU being available, It was impossible to meet the allocations. 3.2.10 1.6.4 Job Efficiency (CPU/Wall) Matt Hodges 85% 89% Metric 0.106 or 0.107 and Matt Hodges 80% 77% GOC accounting for LCG. 3.2.11 1.6.5 Farm Occupancy Wall clock time [2009Q3]Waiting on UB allocation - no September data Percentage of GRIDPP3 Andrew Sansum 93% 98% 3.2.12 2.1.3 Staff in Post Andrew Sansum 100% 3-May Quarterly milestones/metrics report to GRIDPP available 3.2.13 2.1.4 within 1 month of quarter end Number of GGUS tickets Matt Hodges <30/month 19 [2009Q3]Because of helpdesk changes, we no longer receive enough 3.2.14 4.1.1 handled information to track this. Need to review what to do about it. Number of GGUS tickets not Matt Hodges Two hours 3 responded to within 2 working hours (any time for alarm 3.2.15 4.1.2 tickets) [2008Q4]GGUS is not producing this data Date Milestones Tier-1 no. Description Evidence Owner Due date complete Comment General Incident Response Document version 1.5 System excercised to level 3 Disaster and Business (of 4 levels) with PMB 3.2.16 1.4.1 Continuity Plan Available involvement. Andrew Sansum 30-Apr-08 14-Dec-09 [2009Q3]Major incident contingency plans presented to the GRIDPP review 3.2.17 1.4.4 Ready for 2008 running Andrew Sansum 31-Aug-08 31-Aug-08 2008Q3]The service was ready for data taking. DM system is fully operational as presented at Disaster Plan fully Cambridge GRIDPP 3.2.18 1.4.5 implemented meeting Andrew Sansum 30-Jan-09 14-Dec-10 [2009Q3]This milestone WAS successfully completed at the end of September following certification of the Tier-1 during STEP and the end of our development cycle at the end of September. Subsequently after the end of the quarter new operational problems emerged, however all the work required in the project plan for this milestone was completed. Propose 3.2.19 1.4.7 Ready for 2009 running Andrew Sansum 1-Aug-09 30-Sep-09 closing it as complete. 3.2.20 1.4.9 Ready for 2010 running Andrew Sansum 16-Mar-10 Delete as n/a - no separate 2010 running Recruitment of additional [2009Q2]Just 0.5 FTE of effort remains to be recruited. interviews have been 3.2.21 2.1.1 GridPP3 posts Complete Andrew Sansum 30-Mar-08 5-Oct-09 completed and an offer is being prepared Assign experiment coordinator (depends on PPD recruitments, or existing staff 5-Feb-09 Matt Hodges assigned to this role.22-jul-08 Still pending on PPD if to be done prior to second recruitment of experiment support staff. I’d like to see who the team are 3.2.22 4.1.3 phase of CCRC08). Andrew Sansum 30-Jun-08 1-Nov-08 before nominating a coordinator 22-jul-08 Have some information from Atlas, but which is not supposed to be used outside Atlas as it may change. Am talking to CMS UK to get Experiment requirements for information out of central CMS, Have received information from LHCb 3.2.23 4.1.5 first data-taking understood. Derek Ross 30-Jun-08 7-Aug-08 04-jul JG to raise with GDB and agree new date Ensure well-defined experiment contacts in place 22-jul-08 Will try and do Tier-1 side this week; suggest changing expected (at Tier-1 and experiment data to August 1 3.2.24 4.1.7 ends). Matt Hodges 30-Jun-08 27-Oct-08 2-jul-08 On agenda for quarterly UB meeting on 2008-06-24. Review of overall effectiveness of experiment support in conjunction with the experiments. Expect to 2009Q1]Revieve underway - waiting for late experiment response. Expect 3.2.25 4.1.4 do this via the User Board. Matt Hodges 31-Dec-08 30-Apr-09 writeup to be completed by May. We have been unable to capture the experiments' requirements in a formal document as they are only partially defined and continue to evolve. However we have a regular weekly meeting with the experiments where they are able to raise requirements, we are also taking input from the WLCG MB and Experiment requirements for GDB. As far as we are aware our published program of work meets stated 3.2.26 4.1.9 2009 running. Catalin Condurache 1-May-09 30-Jul-09 needs. 3.3 Hardware procurement and deployment Owner Andrew Sansum Milestones Tier-1 no. Description Evidence Owner Due date Date complete Comment Tier-1 able to meet 2008 WLCG MoU 3.3.1 1.2.4 resource commitment John Gordon 1-Apr-08 30-Apr-08 Disc and CPU (only) 2009 Capacity Tape media purchase and robot purchase defined seperately 3.3.2 1.2.5 Procurements Started Andrew Sansum 26-May-08 26-May-08 in 7.3 Closed partially 1-Feb-10 completed. GridPP did not intend to fully meet the Tape MoU commitment owing to the reduced tape [2009Q4]Able to meet disk commitments - however GRIDPP requirements caused never planned to meet tape MoU commitment and we will be Tier-1 able to meet 2009 WLCG MoU by the changed LHC unable to do so until T10KB service is operational. Should be 3.3.3 1.2.6 resource commitment schedule. Andrew Sansum 31-Aug-09 able to meet 2010 commitment. 3.3.4 1.2.7 2010 capacity procurements Started Andrew Sansum 1-Jun-09 20-Aug-09 Tier-1 able to meet 2010 WLCG MoU 3.3.5 1.2.8 resource commitment Andrew Sansum 1-Jun-10 1-Jul-10 WLCG changed this deadline to be June. We are on track 3.3.6 1.2.9 2011 capacity procurement Started Andrew Sansum 1-Jun-10 15-May-10 7-Apr-11 [2011Q1]CPU was a few days late following deployment problem. GRIDPP plan to meet tape MoU using buy on demand[2010Q4]Owing to late availability of T10KC hardware we are unlikely to be able to reach tape MoU commitments until August. However actually usage is well Tier-1 able to meet 2011 WLCG MoU below current available capacity and this is unlikely to be a 3.3.7 1.2.10 resource commitment Andrew Sansum 30-Mar-11 problem. 22-jul-08 GRIDPP have made a proposal to STFC. We therefore have a draft financial plan and could start to construct a spend plan. I may be able to hit the expected date for this as all relevant info is now available. Main 3.3.8 2.1.2 2008 purchasing plan available Andrew Sansum 31-Jul-08 28-Oct-08 constraint is a little effort to collate input 3.3.9 6.3.2 2008 Disk Tender Started Martin Bly 31-May-08 31-May-08 15-May-10 [2009Q4]Needs to be started by 1 May 2010 for December 3.3.10 7.1.2 2010 Disk Tender Started Martin Bly 1-May-10 delivery 31-Jan-11 3.3.11 7.1.3 2010 Disk hardware Accepted and bill paid Martin Bly 1-Dec-10 [2010Q4]Hardware delivered - acceptance tests running 14-Sep-09 90% of capacity is now in SL5. Although the SL4 service has 3.3.12 7.1.7 Migration to 64bit Martin Bly 30-Sep-09 not yet been closed - this milestone is effectivly complete. 3.3.13 7.1.9 2008 CPU Hardware Received Martin Bly 1-Sep-08 15-May-09 Delayed pending R89 availability. 3.3.14 7.1.12 2008 Disk Hardware Received Martin Bly 1-Sep-08 15-May-09 Delayed pending R89 availability. [2009Q3]Problems with acceptance have delayed this. Not 3.3.15 7.1.13 2008 Disk hardware Accepted and bill paid Martin Bly 31-Aug-09 12-Feb-10 likely to be complete before December 2009. 3.3.16 7.1.14 2009 Disk Tender Started Martin Bly 1-May-09 30-Jul-09 Rescheduled to reflect WLCG High level milestones 5th Nov 2010 [2010Q1]Ongoing problems but expected to pass acceptance 5th November [2009Q4]Expect to complete this in April [2009Q2]In light of plan for phased delivery this cannot be completed until June 2010 3.3.17 7.1.16 2009 Disk hardware Accepted and bill paid Martin Bly 30-Jun-10 [2009Q1]Propose we change this to be 31-Feb10 3.3.18 7.2.1 2008 CPU Tender Started Martin Bly 31-May-08 31-May-08 20-Aug-09 Rescheduled to commence once we have cleared the previous tender from the system. Leads to an early January 3.3.19 7.2.2 2009 CPU Tender Started Martin Bly 1-May-09 delivery. 2009Q2]Not likely to be complete until April/May following a February delivery. Delay was caused by late agreement of 3.3.20 7.2.3 2009 CPU hardware Accepted and bill paid Martin Bly 31-Feb-10 1-Jun-10 financial plan with STFC. 15-May-10 [2009Q4]Needs to be started by 1 May 2010 for December 3.3.21 7.2.4 2010 CPU Tender Started Martin Bly 1-May-10 delivery 3.3.22 7.2.9 R89 Available for Installation Martin Bly 1-Dec-08 24-Apr-09 3.3.23 7.2.13 2008 CPU hardware Accepted and bill paid Martin Bly 31-Aug-09 31-Aug-09 Rescheduled to reflect WLCG High Level Milestones FY08/09 Tape media capacity procurement 3.3.24 7.3.5 started Tim Folkes 1-Aug-08 1-Nov-08 [2008Q3]Waiting for GRIDPP approval of spend plan FY08/09 Tape robot capacity procurement 3.3.25 7.3.6 tender out David Corney 1-Aug-08 30-Sep-08 Added 4-jul-08 30-Oct-10 We have not wished to migrate the remaining critical servers Atlas centre phased out other than for to the UPS room until the UPS issue was resolved. Only a 3.3.26 8.2.1 emergency backup servers Martin Bly 31-Mar-10 few services remain in ATLAS centre. 11-Mar-11 3.3.27 7.2.5 2010 CPU hardware Accepted and bill paid Martin Bly 1-Dec-10 [2010Q4]Hardware delivered - acceptance tests running 3.3.28 8.1.1 Tier1 fully operational in R89 Martin Bly 31-Mar-09 6-Jul-09 Expected by July 2009 [2008Q4]The migration plan was discussed at the PMB[2008Q3]Plan to agree this in first half of November [2008Q2]Ongoing Need to 02-jul-08 Ongoing reconcile 22-apr-08 Possible inconsistencies with PO overall migration with MROG plan. Need to be investigate and clarified. Keep under close 3.3.29 8.1.3 Migration plan Agreed by GridPP Martin Bly Gant observation. 3.3.30 ? Network upgraded Document is available on request and has been distributed to 3.3.31 R89 migration document available Andrew Sansum 30-Oct-09 1-Mar-10 the PMB 3.4 Storage systems Owner Andrew Sansum Number Tier-1 no. Description Source Owner Target Q111 Comment Fraction of WLCG MoU Tier1 quarterly John Gordon 100% 100% [2010Q1]Tape capacity now meets MoU but a little below in Apr-June 3.4.1 1.2.9 commitment for Tape report before the T10KB came online. Fraction of WLCG MoU John Gordon 100% 121% 3.4.2 1.2.8 commitment for Disk [2009Q4]Tagged green as actually at 100% since february Matt Hodges 100% 99% [2010Q3]Usage much lower than allocation. Allocation problem resolved in September 3.4.3 1.6.1 % met of UB Allocation for Tape [2009Q3]Waiting on UB allocation - no September data Matt Hodges 100% 114% [2009Q3]September allocations not set. Allocations do not match available disk[2009Q2]UB allocations did not fully reflect resources 3.4.4 1.6.2 % met of UB Allocation for Disk reported as being available, It was impossible to meet the allocations. Fraction of available T1 Disk used 53% 3.4.5 ? in quarter 20-95% 3.4.6 3.10.1 Data imported to Tier-1 via FTS. Matt Hodges <1500TB/month 399 Data exported from Tier-1 via Matt Hodges <1500TB/month 370 3.4.7 3.10.2 FTS. Bonny Strong 99% each 99% [2010Q3] Poor August availability was corrected by 5% to allow for Oxford Nagios problem [2009Q4] Caused by major downtime at end of January(SAN problem)[2009Q3]Impacted by machine room upgrade, major cooling system failure and problems after CASTORnameserver upgrade, [2009Q2]Impacted by CASTOR developments, unscheduled network outages and R89 migration. [2009Q1]Gradual improvements as we continue to resolve 3.4.8 5.1.1-5.1.4 CASTOR SAM tests: LHC Vos outstanding problems with CASTOR 2.1.7 David Corney <= 6 incident/month 0.00 level severe or 3.4.9 5.1.3 CASTOR Incidents reported higher; [2009Q3]No September data Number of File-system corruptions Martin Bly 1 per month 1.00 [2010Q4]We are now gathering this data and will have a metric next 3.4.13 7.1.2 per month quarter Number of Damaged GRIDPP Tim Folkes <1 in 500 0.7 3.4.14 7.3.1 Tapes, leading to data loss [2009Q3]Missing September data[2009 Q2]Missing June Data Tim Folkes >99% 100% [2009Q3]Missing September data. Water leak and handbot problems 3.4.15 7.3.4 Reliability (Robot up-time) led to downtime.[2009 Q2]Missing June Data Actual data volume for LHC (tape- Tim Folkes 2792.3 3.4.16 7.3.5 based view) [2009Q3]Missing September data.[2009 Q2]Missing June Data 3.4.17 ? Data rates to tape >200 125 Date MilestonesTier-1 no. Description Evidence Owner Due date complete Comment Set up CASTOR Gen instance for 3.4.18 5.1.4 small experiments Bonny Strong 1-Jun-08 30-Jun-08 Waiting diskservers from Tier1 CASTOR certification testbed 3.4.19 5.1.9 ready to use Chris Kruk 1-Aug-08 22-Jul-08 New milestone at jul-08 http://www.gridp [2008Q3]Tape service expected to end in December but disk service p.rl.ac.uk/blog/20 will probably have to continue until March 2009 22-jul-08 Ongoing, 09/02/06/so-long- LHCb and CMS migrated, Atlas have tape files remaining only, Minos and-thanks-for- soon to begin migration - this should allow us to shutdown dcache- 3.4.20 5.3.1 DCache Service Ends all-the-files/ Derek Ross 31-Dec-08 1-Feb-09 tape.gridpp.rl.ac.uk and the dCache ADS interface [2010Q1]Situation reviewed. Service remains available (and free) over GRIDPP3 but propose moving PP data off (or deleting it more likely) by the end of GRIDPP3. [2009Q4]Major users migrated. It is not obvious there is significant benefit from persuing the minor users [2009Q3]Tape reclaim from major users has commenced. [2009Q2]This has simply not been a priority in a period of major activity. [2009Q1]A plan has been provided to the UB. Closure process has started. Service is read only. Expected to terminate read access by September 2009. 22-jul-08. Agreed with AS. Also to terminate ADS service (effectively) allowing occasional future acess on case by case. Otherwise 3.4.21 5.4.1 General ADS Service Ends David Corney 31-Mar-11 experimnets migrate into CASTOR. 4.1 LondonGrid Owner Jeremy Coles Metric no. Description Owner Target Q111 Comments 4.1.1 % of promised (by that time) disk available to GridPP Duncan Rand 100% 145% QMUL low on disk - new disk on site and racked 4.1.2 % of promised (by that time) CPU available Duncan Rand 100% 221% RHUL new CPU on site and racked Average SAM (SLL page) availability performance 4.1.3 over the last quarter Duncan Rand 95% averaged over sites in Tier-2 91% QMUL and RHUL poor Average SAM (SLL page) reliability performance 4.1.4 over the last quarter Duncan Rand 95% averaged over sites in Tier-2 96% Average SLL untargeted ATLAS test performance 4.1.5 (UK test) Duncan Rand 95% averaged over sites in Tier-2 71% UCL-HEPand RHUL low perf brings average down 4.1.6 Average SLL SE test performance Duncan Rand 95% averaged over sites in Tier-2 90% 4.1.7 Approx. CPU utilisation (wall clock time) Duncan Rand 50% 64% 4.1.8 Approx. CPU utilisation (CPU time) Duncan Rand 50% 50% 4.1.9 Percentage of disk used Duncan Rand 20% 50% 4.1.10 Number of technical meetings held in last year Duncan Rand 8 12 Meetings held monthly 4.1.11 Number of management meetings held in last year Duncan Rand 4 0 Last management meeting Sept 2009 Tier-2 responded to all LCG problems covered by the GridPP MoU in the 4.1.12 Tier-2 meeting LCG MoU service levels Duncan Rand agreed times over the last quarter Yes Tier-2 quarterly reports available by 4.1.13 Quarterly operational performance review Duncan Rand one month after the end of the quarter Yes 100% of sites in Tier-2 upgrade to the 4.1.14 Middleware upgrading Duncan Rand timetable agreed by the DB Yes RHUL has WAN of 0.3-0.45 Gbps (so far not a 4.1.15 Network is OK at all sites Duncan Rand SLL tests 30MBs Yes problem but v.low) 4.2 ScotGrid Owner Jeremy Coles Metric no. Description Owner Target Q111 Comments 4.2.1 % of promised (by that time) disk available to GridPP Graeme Stewart 100% 109% 4.2.2 % of promised (by that time) CPU available Graeme Stewart 100% 172% Average SAM (SLL page) availability performance 4.2.3 over the last quarter Graeme Stewart 95% averaged over sites in Tier-2 97% Continuing improvement at Durham Average SAM (SLL page) reliability performance over 4.2.4 the last quarter Graeme Stewart 95% averaged over sites in Tier-2 98% Average SLL untargeted ATLAS test performance Glasgow LCG-CE decomission ammended against 4.2.5 (UK test) Graeme Stewart 95% averaged over sites in Tier-2 100% figures supplied. 4.2.6 Average SLL SE test performance Graeme Stewart 95% averaged over sites in Tier-2 95% 4.2.7 Approx. CPU utilisation (wall clock time) Graeme Stewart 50% 67% 4.2.8 Approx. CPU utilisation (CPU time) Graeme Stewart 50% 59% Recent deployemnt of new disk at Glasgow has 4.2.9 Percentage of disk used Graeme Stewart 20% 25% reduced fractional usage. 4.2.10 Number of technical meetings held in last year Graeme Stewart 8 22 4.2.11 Number of management meetings held in last year Graeme Stewart 4 4 On target for year. Tier-2 responded to all LCG problems covered by the GridPP MoU in the 4.2.12 Tier-2 meeting LCG MoU service levels Graeme Stewart agreed times over the last quarter Yes Tier-2 quarterly reports available by 4.2.13 Quarterly operational performance review Graeme Stewart one month after the end of the quarter Yes 100% of sites in Tier-2 upgrade to the 4.2.14 Middleware upgrading Graeme Stewart timetable agreed by the DB Yes Durham have not provided Cream CE as an LCG-CE replacement. 4.2.15 Network is OK at all sites Graeme Stewart 30 MB/s% Yes 4.3 SouthGrid Owner Jeremy Coles Metric no. Description Owner Target Q111 Comments 4.3.1 % of promised (by that time) disk available to GridPP Pete Gronbech 100% 192% 4.3.2 % of promised (by that time) CPU available Pete Gronbech 100% 143% Average SAM (SLL page) availability performance 4.1.3 over the last quarter Pete Gronbech 95% averaged over sites in Tier-2 93% Average SAM (SLL page) reliability performance over 4.3.4 the last quarter Pete Gronbech 95% averaged over sites in Tier-2 96% Average SLL untargeted ATLAS test performance 4.3.5 (UK test) Pete Gronbech 95% averaged over sites in Tier-2 91% 4.3.6 Average SLL SE test performance Pete Gronbech 95% averaged over sites in Tier-2 93% Accounting problem at Cambridge being worked on, 4.3.7 Approx. CPU utilisation (wall clock time) Pete Gronbech 50% 59% pulling average down. 4.3.8 Approx. CPU utilisation (CPU time) Pete Gronbech 50% 54% As above 4.3.9 Percentage of disk used Pete Gronbech 20% 41% 4.3.10 Number of technical meetings held in last year Pete Gronbech 8 9 4.3.11 Number of management meetings held in last year Pete Gronbech 4 3 Tier-2 responded to all LCG problems covered by the GridPP MoU in the 4.3.12 Tier-2 meeting LCG MoU service levels Pete Gronbech agreed times over the last quarter Yes Tier-2 quarterly reports available by 4.3.13 Quarterly operational performance review Pete Gronbech one month after the end of the quarter Yes 100% of sites in Tier-2 upgrade to the 4.3.14 Middleware upgrading Pete Gronbech timetable agreed by the DB Yes 4.3.15 Network is OK at all sites Pete Gronbech TBD Yes 4.4 NorthGrid Owner Jeremy Coles Metric no. Description Owner Target Q111 Comments 4.4.1 % of promised (by that time) disk available to GridPP Alessandra Forti 100% 179% 4.4.2 % of promised (by that time) CPU available Alessandra Forti 100% 502% Average SAM (SLL page) availability performance 4.4.3 over the last quarter Alessandra Forti 95% averaged over sites in Tier-2 96% Average SAM (SLL page) reliability performance 4.4.4 over the last quarter Alessandra Forti 95% averaged over sites in Tier-2 97% Average SLL untargeted ATLAS test performance 4.4.5 (UK test) Alessandra Forti 95% averaged over sites in Tier-2 93% 4.4.6 Average SLL SE test performance Alessandra Forti 95% averaged over sites in Tier-2 95% 4.4.7 Approx. CPU utilisation (wall clock time) Alessandra Forti 50% 57% 4.4.8 Approx. CPU utilisation (CPU time) Alessandra Forti 50% 54% 4.4.9 Percentage of disk used Alessandra Forti 20% 32% 4.4.10 Number of technical meetings held in last year Alessandra Forti 8 (2 per quarter) 6 4.4.11 Number of management meetings held in last year Alessandra Forti 4 3 Tier-2 responded to all LCG problems covered by the GridPP MoU in the 4.4.12 Tier-2 meeting LCG MoU service levels Alessandra Forti agreed times over the last quarter Yes Tier-2 quarterly reports available by 4.4.13 Quarterly operational performance review Alessandra Forti one month after the end of the quarter Yes 100% of sites in Tier-2 upgrade to the 4.4.14 Middleware upgrading Alessandra Forti timetable agreed by the DB Yes 4.4.15 Network is OK at all sites Alessandra Forti 30MB/s Yes 5.1 Planning Owner Sarah Pearce Date Milestones Description Evidence Owner Due date complete Comment 5.1.1 Financial plan for GridPP3 established Dave Britton Draft ProjectMap for GridPP3. GridPP3 ProjectMap exists with 5.1.2 >50% of areas defined Sarah Pearce 10-Mar-08 10-Mar-08 Complete in time for GridPP20 Final project map for GridPP3. More than >90% of areas 5.1.3 defined Sarah Pearce 30-Jun-08 1-Jun-08 5.1.4 Quarterly reporting system agreed for Tier-1 Sarah Pearce 30-Jun-08 Jun-08 5.1.5 Quarterly reporting system agreed for other areas PMB minutes Sarah Pearce 30-Jun-08 Jun-08 5.1.6 Hardware allocation made by UB for next period G http://www.gridpp.ac.uk/eb/ lenn Patrick Now replaced by T1 resources meeting SAM metrics review. Re-evaluate SAM availability and reliability metrics based on performance of other countries. For revising 5.1.7 MoU 1-Dec-09 Reviewed by phone with SL and JC 5.1.8 Post-GridPP planning initiated Dave Britton 1-Jan-10 11-Nov Started Nov 2009 5.1.9 Allocations calculated for round 2 of Tier-2 hardware grants Steve Lloyd 31-Oct-09 Sent to CB and agreed 5.1.10 FY09 grants for Tier-2 hardware issued Sarah Pearce 31-Mar-10 5.1.11 FY10 grants for Tier-2 hardware issued Sarah Pearce 30-Apr-10 16 July 21010 issued All 5.2 Execution Owner Sarah Pearce Metric no. Description Source Owner Target Q111 Comment >90% of GridPP 5.2.1 GridPP staff in post Quarterly reports Sarah Pearce posts are appointed <90% All posts now in place No vacant posts but staff turnover 5.2.2 Number of vacant posts (Only applies after 1-Oct-08) <=4 at T2s UK represented on the main CERN committees, providing 5.2.3 Participation in LCG management LCG web pages Sarah Pearce feedback to the UK 100% of quarterly reports received by Project Manager within 2 5.2.4 months of quarter end. 100% 5.2.5 ProjectMap is updated within 3 months of end of each quarter 3 months Financial spreadsheets reflect any changes within 1 5.2.6 Financial model updated month http://www.gridpp.ac.uk/php/pmb/ >= 38 weekly PMB 5.2.7 Weekly video/phone PMB meetings minutes.php meetings per year http://www.gridpp.ac.uk/php/pmb/ 5.2.8 Face to face PMB meetings minutes.php >= 3 per year 5.2.9 CB meetings http://www.gridpp.ac.uk/cb/ 1 per year 2 Last meeting May '09. Role of UB has been revised for GridPP4. Functions taken over by T1 5.2.10 UB meetings http://www.gridpp.ac.uk/eb/ 3 per year resources meeting. 5.2.11 DB meetings http://www.gridpp.ac.uk/db/ 2 per year 5.2.12 Collaboration meetings http://www.gridpp.ac.uk/cb/ 2 per year 6.1 Outreach & Engagement Owner Sarah Pearce Metric no. Description Source Owner Target Q111 Comment 6.1.1 Number of events GridPP attends with stand/posters Neasan O'Neill 4 per year 42 Since GridPP3 start 6.1.2 Number of GridPP talks published on website Neasan O'Neill 1 per month 146 Since GridPP3 start 6.1.3 Number of GridPP publications Neasan O'Neill 10 per year 47 Since GridPP3 start One every 2 6.1.4 Number of news items on GridPP web site Neasan O'Neill weeks 74 Since GridPP3 start 6.1.5 Number of GridPP press releases Neasan O'Neill 2 per year 7 Since GridPP3 start 6.1.6 Number of press articles about GridPP Neasan O'Neill 10 per year 51 Since GridPP3 start Number of non-GridPP funded Groups who use GridPP One per 6.1.7 hardware Neasan O'Neill year 26 Number of e-Science projects in other disciplines with 6.1.8 GridPP input Neasan O'Neill 5 11 GridPP people in leadership roles in a personal capacity 6.1.9 on other projects (EGEE, e-sci centres) Neasan O'Neill 5 12 6.2 NGI Owner Robin Middleton Metric no. Description Source Owner Target Q111 Comment 90% by No change GridPP reources at sites with NGS Associate or http://www.gridpp.ac.uk/w end 6.2.8 Affiliate status iki/Working_with_NGS GridPP3 72% 1 extra No change 6.2.9 Number of services shared per year 3 Number of NGS representatives on GridPP Geddes represents NGS interests at PMB. 6.2.10 committees 1 1 1 at each Britton & Geddes (Middleton away) at NGS Management 6.2.11 GridPP attendance at NGS committee meetings meeting 2 Board Date Description Milestone no. Evidence Owner Due date complete Comment Comments sent on transition plan - formal response not 6.2.1 Formal GridPP response to EGI proposal Robin Middleton Jul-08 Jul-08 required Minutes of the first EGI Council Meeting, Early Jul- 6.2.2 UK NGI signs EGI MoU Amsterdam, 9th July 2009 Robin Middleton Jun-09 09 Signed by JISC for the UK 6.2.3 EGI Transition Planning for GridPP Robin Middleton Mar-10 Jun-10 Used as input for negotiation on EGI needs in GridPP4 Minutes of 3rd EGI Council Meeting, 6.2.4 UK NGI pays first subscription to EGI Barcelona, Sept09 Robin Middleton Oct-09 Sep-09 Paid jointly by JISC and STFC. JC produced documents and a Venn diagram explaining the allocation of tasks across NGS and GridPP. He met Agreement with NGS/NGI on partition of services with NGS to agree this and presented the results at the 6.2.5 between GridPP and NGS/NGI Robin Middleton Nov-09 Mar-11 PMB F2F meeting at Sussex. Agreement on SLA(s)/SLD(s) for NGI services Relies on SLA(s) being developed between NGIs and EGI - external to GridPP on which GridPP depends which are under development; however operational 6.2.6 Robin Middleton Mar-10 agreement on this already established. 6.2.7 End/hand over to NGI Robin Middleton Apr-11 6.3 LCG Owner Tony Cass Metric no. Description Source Owner Target Q111 Comment Twice per 6.3.6 Timely reports to the C-RRB (April and October/Nov) year http://lcg.web.cern.ch/LCG 100% of /planning/planning.html#re WLCG 6.3.7 GridPP resources available compared to pledge s pledge 100 6.3.8 GridPP staff with ad-personam roles in LCG Personal knowledge 1 3 GridPP and related staff invited to speak at LCG Review of GDB and LCG 1 per 6.3.9 meetings/workshops workshop sites quarter 3 Milestone no.Description Evidence Owner Due date Date complete Comment http://indico.cern.ch/confer enceOtherViews.py?view= 6.3.1 Readiness for data taking standard&confId=23563 Jun-08 Jun-08 http://indico.cern.ch/materi alDisplay.py?contribId=9& or sessionId=0&materialId=sli http://indico.cern.ch/contributionDisplay.py?contribId=0&sessi 6.3.2 Readiness for data taking des&confId=51936 May-09 Jul-09 onId=0&confId=56581 6.3.3 Readiness for data taking May-10 6.3.4 Report on performance during LHC operations Workshop at IC Jun-10 6.3.5 Report on performance during LHC operations Jan-11 5/3/2011 GridPP OC documents 6.4 EGEE Owner Robin Middleton Metric no. Description Source Owner Target Q410 Comment UK NGI quarterly reports Timesheets are principal content of the 6.4.1 submitted on time Claire Devereux Robin Middleton report UK Integration metrics sent to SA1 reporting 6.4.2 EGI NGI quarterly report Robin Middleton UK CPU and storage delivered No Problems 6.4.3 to EGI NGI quarterly report Robin Middleton No Problems Monthly timesheets complete by 6.4.4 10th of each month Claire Devereux Robin Middleton GridPP staff PM delivered as No Problems 6.4.5 required Claire Devereux Robin Middleton UKI Attendance at SA1 6.4.6 Coordination Meetings. Minutes of meetings. John Gordon 90% 100% No more than 1 No cost claim this quarter. 6.4.7 UK JRU Costs Claims Project Office John Gordon partner late 0 6.4.8 SA1 QPR submitted SA1 Coordinator John Gordon 1 1 4 hours scheduled downtime. 3 to migrate 6.4.9 GOCDB Availability EGEE Broadcasts John Gordon 95% 95.65% GOCDB3->4 Central repository went down over Xmas holidays 2 days of this fell in Q4 although 6.4.10 APEL Availability Gridview John Gordon 95% 98% not back up fully until 4th Jan. 6.4.11 UKI APEL Publishing APEL Portal John Gordon 19 19 Milestone no. Description Evidence Owner Due date Date complete Comment Signed EGEE Consortium 6.4.12 EGEE III Negotiations Agreement John Gordon 31/4/2008 26/06/2008 Signed JRU Consortium 6.4.13 Set up JRU in UK Agreement John Gordon 5/30/2008 8/26/2008 6.4.14 Host SA1 Week in UK Agenda in Indico John Gordon 30/04/2009 05/12/2008 This plan is part of the EGEE Operations Automation Team (OAT) workplan for year 2 of EGEE III. It was presented to an OAT meeting in February - see agenda at http://indico.cern.ch/conferenceDisplay.py? confId=52954 Plan will be the subject of 6.4.15 GOCDB Development Plan Plan presented to EGEE. John Gordon 28/02/2009 12/02/2009 OTAG meeting in jan 11 Plan is a composite of EMI and EGI SA1 and JRA1 plans/roadmaps. Messaging plan was discussed at EMI F2F in Nov 10. Bigger plan will be signed off at EGI 6.4.16 APEL Development Plan Plan presented to EGEE. John Gordon 28/02/2009 12/02/2009 Meetings in Jan 11. Progress documented at http://goc.grid.sinica.edu.tw/goc 6.4.17 GOCDB WS Interface wiki/GOCDB4_development John Gordon 31/03/2009 10/03/2009 GOCDB 4 went into full production with 6.4.18 GOCDB4 in production John Gordon 10/31/2009 30/01/2010 end of service of GOCDB3 in October 10. End of life for RGMA did not meet end of APEL Infrastructure using 2010 target. Currently delayed until end of 6.4.19 ActiveMQ John Gordon 12/31/2009 09/06/2010 Jan11 but may slip another month. Delayed from 1/3/2011. A series of intermediate milestones culminating in a release of the regionalised repository in 6.4.20 APEL Ready for distributed use John Gordon 4/30/2010 June.
Pages to are hidden for
"Excel - GridPP"Please download to view full document