Docstoc

HPCx_March_2005

Document Sample
HPCx_March_2005 Powered By Docstoc
					                         HPCx Service Report
                                  March 2005

1 Introduction
This report covers the period from 1 March 2005 at 0800 to 1 April 2005 at 0800.
Taking into account the start of summer time, this is a service month of 743
hours.

Overall utilisation has continued to increase, reaching nearly 78%, the highest
value since April 2004. This month we delivered more than 3.5 million AUs to
users, the highest value so far.


1.1 Availability

Incidents

During this month, there were 18 incidents, only one of which was at SEV 1. The
following table indicates the severity levels of the incidents, where SEV 1 is
defined as a Failure (in contractual terms). The definitions used for severity levels
can be found in Appendix A.

                                 Severity Number
                                        1      1
                                        2      3
                                        3     14
                                        4      0


The attributions for the SEV 1 incident were as follows:

                            SEV1     Incidents     MTBF
                            IBM            0.0        ∞
                            Site           0.0        ∞
                            External       1.0      732
                            Overall        1.0      732

This is the third month running with no failures attributed to IBM or the site.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                                Page 1 of
The following table gives more details on the Severity 1 incident:

    Failure       Site   IBM External Reason
05.017              0%     0%   100% External network loss


Serviceability

There was a total of 15.8 hours of scheduled downtime this month.

 Attribution     UDT   Serviceability
 IBM              0:00         100.0
 Site             0:00         100.0
 External         0:40           99.9
 Overall          0:40           99.9


1.2 CPU Usage by Consortium

The PIs and titles for the various consortia are listed in Appendix B.

   Consortium        CPU Hours       CPU Hours      AUs        %age
                      (Parallel)      (Other)     charged
e01                          39841              0 154084         4.4%
e02                        127183             641 494359        14.1%
e03                          50427             41 195186         5.5%
e04                          52350            322 203711         5.8%
e05                        132206              75 511595        14.5%
e06                        189483             314 734040        20.9%
e07                              381            0     1474       0.0%
e08                           5055              0    19549       0.6%
e10                           3250              0    12571       0.4%
e11                           2315              0     8955       0.3%
e14                                0          163      629       0.0%
e15                                0            1        3       0.0%
e16                                0            0        2       0.0%
e18                           1641              0     6348       0.2%
e19                          20747              0    80240       2.3%
e20                           9621            813    40353       1.1%
e24                              269            1     1044       0.0%
e25                                0            0        0       0.0%
z09                           2301              0     8898       0.3%
EPSRC Total                637071          2371 2473041         70.3%



ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                 Page 2 of
n01                     88545           11      342492  9.7%
n02                     38293           10      148135  4.2%
n03                     32925          140      127878  3.6%
n04                     28491           17      110254  3.1%
NERC Total             188254          178      728759 20.7%

p01                       850              9      3321   0.1%
PPARC Total               850              9      3321   0.1%

c01                     50896          258      197839   5.6%
CCLRC Total             50896          258      197839   5.6%

b05                      3368              1     11691   0.3%
b06                        99              0       382   0.0%
BBSRC Total              3467              1     12073   0.3%

x01                       356              33     1502   0.0%
x02                     18583               0    71869   2.0%
External Total          18938              33    73371   2.1%

z001                     7341              20    28460   0.8%
z002                        0               0        0   0.0%
z06                        25               1      100   0.0%
HPCx Total               7366              21    28560   0.8%




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                        Page 3 of
1.3 CPU Usage by Job Type

The figures for Raw AUs given here show the number of AUs actually supplied
by the system to users’ jobs. It uses the conversion rate for the AU which
corresponds to the results of the Linpack benchmark running on the new
platform; that is, 1 CPU hour = 3.8675 AUs.

Number of                           Number of
              Raw AUs       %age
Processors                            Jobs
≤32              447292    12.8%        5693
33–64            304679     8.7%         758
65–128          1096462    31.3%         960
129–256          767694    21.9%         520
257–512          550672    15.7%         147
513–1024         221295     6.3%           73
>1024            119118     3.4%            5

The system is divided into three regions.

Development Region (9 frames, jobs using ≤64 CPUs): a total of 751971 raw AUs
were used; that is 90.9% of the total available in this region

Production Region (40 frames, jobs using >64 CPUs): a total of 2755241 raw AUs
were used; that is 74.9% of the total available in this region

The remaining frame is reserved for interactive parallel jobs.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                         Page 4 of
1.4 Slowdown and Job Wait Times

Slowdown

Slowdown is a widely used measure of the relative wait times of different classes
of jobs. It is defined as:

              Slowdown = (job run time + job wait time) / (job run time)

Slowdowns of less than around 10 are usually regarded as reasonable. The
graph below plots slowdown against run-time (ignoring jobs of less than 5
minutes duration). Despite the continuing increase in utilisation the pattern of
slowdowns continues to be satisfactory.


              8
              7
              6
   Slowdown




              5
              4
              3
              2
              1
              0
                   1     2     3    4     5     6    7     8     9    10   11   12   >12

                                              Run Time (Hours)




In the graph below, we plot the slowdown figures against the number of
processors used and ignoring the development jobs of less than 1 hour.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                                   Page 5 of
              8



              6
   Slowdown



              4



              2



              0
                  ?32              33–64    65–128    129–256    257-512     513-1024      >1024

                                             Number of Processors



Job wait times

The following table and graph shows the average wait time (in hours) for each
class of job. These are also satisfactory in general. The most prominent spike
corresponds to two 1280-processor 12-hour jobs submitted by the same user;
because of their size, a substantial wait time is inevitable. Both jobs did indeed
run for 12 hours, so the user in question used the entire production region for a
full day.

    Job Class               Category         Maximum        Maximum Job     Average      Number
                                           Number of CPUs      length       wait time    of Jobs

par32_1                 parallel                       32               1          1.0      4884
par32_3                 parallel                       32               3         21.3       127
par32_6                 parallel                       32               6         16.6       682
par64_1                 parallel                       64               1          1.8       447
par64_3                 parallel                       64               3         24.7        43
par64_6                 parallel                       64               6         18.1       268
par128_1                parallel                      128               1          1.0       401
par128_3                parallel                      128               3          2.8       113
par128_6                parallel                      128               6          4.0       173
par128_9                parallel                      128               9          0.8        14
par128_12               parallel                      128              12          8.6       259
par256_1                parallel                      256               1          1.0       208
par256_3                parallel                      256               3         12.9       172
par256_6                parallel                      256               6          8.7        72
par256_9                parallel                      256               9          5.3        13
par256_12               parallel                      256              12          2.7        55
par512_1                parallel                      512               1          2.4        80
par512_3                parallel                      512               3          4.5        11
par512_6                parallel                      512               6          3.5        22



ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                                           Page 6 of
par512_9                              parallel                                                                                                         512                                                                       9                                             4.6                                      1
par512_12                             parallel                                                                                                         512                                                                      12                                             2.4                                     33
par1024_1                             parallel                                                                                                        1024                                                                       1                                            10.0                                     57
par1024_3                             parallel                                                                                                        1024                                                                       3                                            14.9                                      2
par1024_6                             parallel                                                                                                        1024                                                                       6                                            12.4                                      8
par1024_9                             parallel                                                                                                        1024                                                                       9                                             1.1                                      1
par1024_12                            parallel                                                                                                        1024                                                                      12                                            13.2                                      5
par1280_1                             parallel                                                                                                        1280                                                                       1                                            11.8                                      3
par1280_3                             parallel                                                                                                        1280                                                                       3                                             0.0                                      0
par1280_6                             parallel                                                                                                        1280                                                                       6                                             0.0                                      0
par1280_9                             parallel                                                                                                        1280                                                                       9                                             0.0                                      0
par1280_12                            parallel                                                                                                        1280                                                                      12                                            63.6                                      2
serial_1                              serial                                                                                                             1                                                                       1                                             1.5                                    593
serial_12                             serial                                                                                                             1                                                                       3                                             0.1                                    224
serial_3                              serial                                                                                                             1                                                                       6                                             1.3                                     19
serial_6                              serial                                                                                                             1                                                                       9                                             0.0                                     21
serial_9                              serial                                                                                                             1                                                                      12                                             0.0                                     83
inter32_1                             interactive                                                                                                       32                                                                       1                                             0.0                                   3638
course32_1                            parallel                                                                                                          32                                                                       1                                             0.0                                      0




                                                                                                                          Job wait times

           70.0
           60.0
           50.0
   Hours




           40.0
           30.0
           20.0
           10.0
            0.0
                  par32_1
                            par32_3
                                      par32_6
                                                par64_1
                                                          par64_3
                                                                    par64_6
                                                                              par128_1
                                                                                         par128_3
                                                                                                    par128_6
                                                                                                               par128_9


                                                                                                                                      par256_1
                                                                                                                                                 par256_3
                                                                                                                                                            par256_6
                                                                                                                                                                       par256_9


                                                                                                                                                                                              par512_1
                                                                                                                                                                                                         par512_3
                                                                                                                                                                                                                    par512_6
                                                                                                                                                                                                                               par512_9
                                                                                                                          par128_12




                                                                                                                                                                                  par256_12




                                                                                                                                                                                                                                          par512_12
                                                                                                                                                                                                                                                      par1024_1
                                                                                                                                                                                                                                                                  par1024_3
                                                                                                                                                                                                                                                                              par1024_6
                                                                                                                                                                                                                                                                                          par1024_9
                                                                                                                                                                                                                                                                                                      par1024_12
                                                                                                                                                                                                                                                                                                                   par1280_1
                                                                                                                                                                                                                                                                                                                               par1280_3
                                                                                                                                                                                                                                                                                                                                           par1280_6
                                                                                                                                                                                                                                                                                                                                                       par1280_9
                                                                                                                                                                                                                                                                                                                                                                   par1280_12
                                                                                                                                                              Job class




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                                                                                                                                                                                                                                                                                                                        Page 7 of
1.5 Disk Occupancy

Home Space
Home space is the part of the disk space that is regularly backed up.

                 Consortium    Disc Occupancy       Disc Quota
                                     (Kb)              (Kb)
               b02                    12,550,272        51,200,000
               b03                         4,096        51,200,000
               b04                            64        51,200,000
               b05                    13,710,336        51,200,000
               b06                    15,353,152        51,200,000
               c01                    50,109,440        51,200,000
               e01                    48,341,408        50,006,016
               e02                    21,442,496        39,760,896
               e03                    86,335,392       230,412,288
               e04                    71,510,848       102,400,000
               e05                  118,001,504        423,936,000
               e06                  238,781,440        307,200,000
               e07                    14,588,192        20,480,000
               e08                    14,620,160        20,480,000
               e10                     8,425,888        10,240,000
               e11                    51,255,648       102,400,000
               e12                     8,976,640        20,480,000
               e14                     1,365,120       102,400,000
               e15                     3,086,368        51,200,000
               e16                        36,224        20,480,000
               e17                        53,504        51,200,000
               e18                    27,319,840        40,960,000
               e19                           128        40,960,000
               e20                    52,210,752        61,440,000
               e21                            64        51,200,000
               e22                            96        10,240,000
               e23                            64        51,200,000
               e24                       521,920        51,200,000
               e25                         3,264        51,200,000
               e26                    14,048,288        20,480,000
               n01                    33,866,496        51,200,000
               n02                    71,951,072       110,592,000
               n03                    48,754,688        51,202,048
               n04                  150,913,024        307,198,976
               n05                         2,080        10,240,000
               p01                    27,378,752        35,840,000
               x01                    25,797,760        51,200,000
               x02                     7,582,752        20,480,000
               z001                 214,639,744        235,521,024
               z002                   35,047,744        49,153,024
               z003                          256             3,072
               z004                   39,993,024        51,200,000



ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                Page 8 of
             z05                  4,237,024        30,720,000
             z06                 48,969,152        51,200,000
             z07                  9,326,944        10,240,000
             z09                    419,264        51,200,000


Workspace

              Consortium   Disc Occupancy      Disc Quota
                                 (Kb)             (Kb)
             b02                      15,104        1,049,600
             b03                   4,937,888      102,400,000
             b04                          64      102,400,000
             b05                   4,885,184      102,400,000
             b06                     638,272      102,400,000
             c01                  67,815,808       71,680,000
             e01              1,108,272,832     1,177,600,000
             e02                   9,157,216       10,240,000
             e03                       9,824          524,288
             e04                582,168,224     2,252,800,000
             e05                101,709,312       162,817,024
             e06                383,993,280       409,600,000
             e07                  47,749,824      102,398,976
             e08                     116,096        1,024,000
             e10                260,183,200       307,200,000
             e11                         160      102,400,000
             e12                     743,584      102,400,000
             e14                         128      102,400,000
             e15                  18,141,824      102,400,000
             e16                         192       61,440,000
             e17                         224      102,400,000
             e18                         160       81,920,000
             e19                172,685,120       204,800,000
             e20                692,570,016     1,024,000,000
             e21                          64      102,400,000
             e22                          96       20,480,000
             e23                          64      102,400,000
             e24                  45,154,080      102,400,000
             e25                         128      102,400,000
             e26                         128       40,960,000
             n01                211,465,888       256,000,000
             n02                840,274,144     1,248,257,024
             n03                      21,632        1,026,048
             n04                377,606,624       511,998,976
             n05                  25,564,480       92,160,000
             p01                     217,312        1,024,000
             x01                  63,835,104      102,400,000
             x02                         128       20,480,000
             z001               170,064,864       307,198,976
             z002                    295,488          788,480
             z003                        192            3,072
             z004                  1,023,776        1,024,000


ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                        Page 9 of
                  z05                          128         1,024,000
                  z06                   60,088,160       102,400,000
                  z07                        1,376             1,024
                  z09                   11,315,840       102,400,000

1.6 Tape Archive

                               Usage        Quota
              Consortium                              Files    Data (Gb)
                              (Tapes)      (Tapes)
           c01                       2            2        8          16
           e01                      38          38    36,675       3,403
           e03                       5            5   14,972         373
           e04                       4          14     1,260         254
           e26                       2            2       72          11
           n01                      49          49     1,845       4,840
           n02                      21          30    68,058       2,115
           n04                       7          20     7,532         746
           z001                      2            2    4,982          32
           z002                      3            4    1,590          11
           z06                       1            3      833          68

Note that a tape is counted in the Usage column even if it is only partly occupied.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                              Page 10 o
2 Support
2.1 Helpdesk

Classifications

Category                     Number             % of all
Administrative                   42               40.8
Technical                        51               49.5
In-depth                         10                 9.7
PMR                               0                 0.0
TOTAL                           103              100.0

The PMR category indicates in-depth queries that result in Problem Management
Reports for IBM.

Service Area                 Number             % of all
Phase 2 platform                 95               92.2
Website                           5                 4.9
Other/general                     3                 2.9
TOTAL                           103              100.0


Performance

All non-indepth queries            Number           %          Target
Finished within 24 Hours               74         79.6          75%
Finished within 72 Hours               93        100.0          97%
Finished after 72 Hours                 0          0.0

Administrative queries             Number           %          Target
Finished within 48 Hours               41         97.6          97%
Finished after 48 Hours                 1          2.4


Experts Handling Queries

Expert                     Admin   Technical   In-Depth      PMR
epcc.ed.ac.uk                 34         29           4        0
dl.ac.uk                       0          8           4        0
Sysadm                         8         14           2        0
Other people                   0          0           0        0



ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                        Page 11 o
2.2 Training


Title of Course                            Start    Length Place  HPCx HPCx
                                           Date     (Days) days   User Staff
                                                                  Days Days
Shared Memory Programming                  30-Mar        3     78    18     0




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                        Page 12 o
3 Staffing
3.1 Science Support Staffing


Daresbury Laboratory
                         Name            Days
                         Ashworth          10.4
                         Blake              5.3
                         Bush              17.0
                         Guest              5.3
                         Johnstone         10.5
                         Jones              4.2
                         Plummer           21.0
                         Sherwood           2.6
                         Sunderland        20.0
                         Thomas            10.0
                         Pickles            2.0
                         van Dam            1.9
                         Total (Days)     110.0
                         FTEs               6.2

EPCC
                       Name                 Days
                       Simpson               13.7
                       Booth                 14.4
                       Henty                 10.9
                       Smith                 11.9
                       Bull                   4.5
                       Fisher                 8.0
                       Hein                  10.1
                       Jackson, Adrian        5.2
                       Pringle                3.7
                       Reid                  15.1
                       Stratford              1.5
                       Nowell                 5.3
                       Holden                14.3
                       Kartsaklis            14.9
                       Nazarova              13.1
                       Trew                   4.8
                       Total (Days)         151.3
                       FTEs                   8.5


ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc            Page 13 o
Overall Levels
                                              FTEs
                               DL               6.2
                               EPCC             8.5
                               Total           14.7



3.2 Systems Staffing
                              Name            Days
                              Andrews          14.3
                              Blake             0.0
                              Brown            23.0
                              Fisher           10.0
                              Georgeson        12.4
                              Franks           13.5
                              Jones             0.0
                              Shore            15.0
                              BITD             21.0
                              Total (days)    109.1
                              FTEs              6.1


Note: BITD covers a range of bookings from a support department who provide
approximately 1 FTE to support computer room operations, electrical and
mechanical site services and networking and security. Roughly a dozen staff
charge time to the project in amounts which vary from month to month. We
believe that it adds no value to report these individual bookings although a full
listing can be provided annually if required.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                            Page 14 o
4 Summary of Performance Metrics

Metric                                          TSL    FSL        Monthly
                                                              Measurement
Technology serviceability                        80% 99.2%         100.0%
Technology MTBF (hours)                           200   300             ∞
Number of AV FTEs                                 7.5    10           14.7
Number of training days per month             22.5/12 30/12            6/3
Non in-depth queries resolved within 3 days      85%   97%         100.0%
Number of A&M FTEs                               3.75  5.75            6.1
A&M serviceability                               80% 99.6%         100.0%




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                     Page 15 o
Appendix A: Incident Severity Levels

SEV 1 ― anything that comprises a FAILURE as defined in the contract with
EPSRC.

SEV 2 ― NON-FATAL incidents that typically cause immediate termination of a
user application, but not the entire user service.

The service may be so degraded (or liable to collapse completely) that a
controlled, but unplanned (and often very short-notice) shutdown is required or
unplanned downtime subsequent to the next planned reload is necessary.

This category includes unrecovered disc errors where damage to filesystems
may occur if the service was allowed to continue in operation; incidents when
although the service can continue in operation in a degraded state until the next
reload, downtime at less than 24 hours notice is required to fix or investigate the
problem; and incidents whereby the throughput of user work is affected (typically
by the unrecovered disabling of a portion of the system) even though no
subsequent unplanned downtime results.

SEV 3 ― NON-FATAL incidents that typically cause immediate termination of a
user application, but the service is able to continue in operation until the next
planned reload or re-configuration.

SEV 4 ― NON-FATAL recoverable incidents that typically include the loss of a
storage device, or a peripheral component, but the service is able to continue in
operation largely unaffected, and typically the component may be replaced
without any future loss of service.




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                              Page 16 o
Appendix B: Projects

B.1 Current Projects

EPSRC Projects

Code Class Title                                         PI
e01    1   UK Turbulence Consortium                      Prof Neil Sandham
e02    1   Ab-initio simulation of covalently bonded     Dr Patrick Briddon
           materials
e03    1   Multi-photon, electron collisions and BEC     Prof Ken Taylor
           HPC consortium
e04    1   Chemreact Computing Consortium                Prof Jonathon Tennyson
e05    1   Materials Chemistry using Terascaling         Prof Richard Catlow
           Computing
e06    1   UK Car-Parrinello Consortium                  Prof Paul Madden
e07    2   Turbulent Plasma Transport in Tokamaks        Dr Colin M Roach
e08    2   Organic Solid State                           Prof Sarah Price
e10    1   Reality Grid                                  Prof Peter Coveney
e11    1   Bond making and breaking at surfaces          Prof Sir David A King
e12    1   Parallel programs for the simulation of       Dr Mark R Wilson
           complex fluids
e14    1   Blade and Cavity Noise                        Prof Neil Sandham
e15    2   CSAR/HPCx Collaboration                       Dr Mike Pettipher
e16    1   Cardiac virtual tissues                       Prof Arun V Holden
e17    1   Integrative Biology                           Dr David Gavaghan
e18    1   DARP: Highly swept leading edge               Prof Michael A Leschziner
           separations
e19    1   Edinburgh Soft Matter and Statistical         Prof Michael E Cates
           Physics Group
e20    1   UK Applied Aerodynamics Consortium            Dr Ken Badcock
e21    1   Intrinsic Parameter Fluctuations in           Prof Asen M Asenov
           Decananometer MOSFETs
e22    1   Preconditioners for finite element            Prof David J Silvester
           problems
e23    1   Exploitation of Switched Lightpaths for e-    Prof Peter Clarke
           Science Applications
e24    1   DEISA - Distributed European                  Dr David Henty
           Infrastructure for Supercomputing
           Applications
e25    1   Turbulent vortex motion in stratified flows   Dr Gary Coleman
e26    1   Simulation of Radioprobing                    Dr Charlie Laughton
z09        HECToR Benchmarking                           Dr Edward Smyth


ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                             Page 17 o
PPARC Projects

Code Class Title                                    PI
p01    1   Atomic Physics and Astrophysics          Prof Alan Hibbert



NERC Projects

Code Class Title                                    PI
n01    1   Large-Scale Long-Term Ocean              Dr David Webb
           Circulation
n02    1   NCAS                                     Prof Alan J Thorpe
n03    1   Computational Mineral Physics            Dr John Brodholt
           Consortium
n04    1   Shelf Seas Consortium                    Dr Roger Proctor
n05    2   Non-linear Wave-particle Instabilities   Dr Mervyn Freeman
           in Plasmas



BBSRC Projects

Code Class Title                                    PI
b02    1   Modelling enzyme catalysis               Dr Adrian J Mulholland
b03    1   Towards a virtual outer membrane         Prof Mark S Sansom
b04    1   Life sciences software development       Dr Jo L Dicks
b05    1   Virtual forced evolution of catalytic    Dr Marcus Durrant
           transition metal complexes
b06    2   Biomolecular computational chemistry     Prof Jonathan D Hirst



CCLRC Projects

Code Class Title                                    PI
c01    1   Daresbury Laboratory Facilities          Dr Richard J Blake
           Agreement Consortium




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                     Page 18 o
Externally-funded Projects

Code Title                      PI
x01  HPC-Europa                 Dr J-C Desplat
x02  OHM Ltd                    Mr Mark Westwood



HPCx Projects

Code   Title                    PI
z001   HPCx Support             Dr Alan Simpson
z002   Systems and Operations   Mr Mike Brown
z003   Test Project             Dr Denis Nicole
z004   HPCx Training            Dr David Henty
z05    Outreach Projects        Dr Richard Blake
z06    Application Porting      Dr David Henty
z07    Package Installation     Dr Mike Ashworth



B.2 Former Projects

Code Class Title                                   PI
b01    2   Quantum Chemistry Studies of the        Prof Samar Hasnain
           Rusticyanin Protein Crystal
e09    2   Molecular Properties and their          Prof Peter Taylor
           Geometry
e13    1   TeraGyroid project                      Dr Richard J Blake




ccdc8943-8d09-4bc6-bd79-caf968ef3524.doc                                Page 19 o

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/19/2013
language:Unknown
pages:19
yaofenji yaofenji
About