Running CCSM2.0 - CESM - UCAR by dffhrtcv3


									Running CCSM

            Tony Craig
  CCSM Software Engineering Group


 •   General review of CCSM
 •   Setting up and running a simple case
 •   Datasets
 •   Production
 •   Modifying source code
 •   Errors
 •   Tools
 •   Performance

Review of CCSM
 • Five components / Ten models
   – Atmosphere(3) : atm, datm, latm
   – Ocean(2) : ocn, docn
   – Land(2) : lnd, dlnd
   – Ice(2+) : ice, ice (prescribed mode), ice (mixed
     layer ocean mode), dice
   – Coupler(1) : cpl
 • Communication via MPI between
   components and coupler only
 • Each component runs on multiple processors
   via MPI, OpenMP, MPI/OpenMP
Component parallelization
 •   atm : MPI, OpenMP, or MPI/OpenMP
 •   lnd : MPI, OpenMP, or MPI/OpenMP
 •   Ice : MPI only
 •   ocn : MPI only
 •   cpl : OpenMP only
 •   The data models, datm, docn, dice, dlnd, and
     latm : serial only, 1 processor

 •   A = datm, dlnd, docn, dice, cpl
 •   B = atm, lnd, ocn, ice, cpl
 •   C = datm, dlnd, ocn, dice, cpl
 •   D = datm, dlnd, docn, ice, cpl
 •   F = atm, lnd, docn, ice (prescribed mode), cpl
 •   G = latm, dlnd, ocn, ice, cpl
 •   H = atm, dlnd, docn, dice, cpl
 •   I = datm, lnd, docn, dice, cpl
 •   K = atm, lnd, docn, dice, cpl
 •   M = latm, dlnd, docn, ice (ml ocn mode), cpl

 •   atm/lnd/datm/dlnd = T42, T31
 •   ocn/ice/docn/dice = gx1v3, gx3, gx3v4
 •   latm = T62
 •   Scientifically validated combinations
     – B, T42_gx1v3 = b20.007 control run
       (test.a1 case)
     – B, T31_gx3v4 = paleo control run (test.a2

“Available” configurations

               A B C D F G H I                     K M

   T42_gx1v3   * * * * *                  * * *
   T31_gx3     * * * * *                  *   *
   T31_gx3v4     *
   T62_gx1v3                          *                  *
   T62_gx3                            *                  *
                               *     = supported (subject to change)
                               *     = b20.007 control

                               *     = paleo control

 • IBM
 • SGI
 • Compaq*

Review of scripts

 • Main script (
   – Sets primary ccsm environment variables
   – Calls $model.setup.csh
      • Gets input datasets
      • Builds components
   – Runs model
   – Archives
   – Harvests

Setting up a simple case

 • Use the GUI !!
   – The GUI modifies the scripts and creates a new
     case for you
   – Input resolution
   – Input configuration (A-M)
   – Sets processor layout based on configuration (first
   – Sets some batch environment variables
   – Works well in the NCAR environment, other sites
     require post script-generation tuning

Setting up a simple case, without GUI

 • Create new case directory under
   scripts, copy over test.a1 files
 • Rename file to $
   – Edit batch environment parameters
   – Edit $GRID
   – Edit $SETUPS
   – Edit $NTASKS, $NTHRDS

 • $NTASKS are the total number of MPI tasks
   for each component
 • $NTHRDS are the number of OpenMP
   threads per MPI task
 • $NTASKS*$NTHRDS = total number of
   processors for each component
 • Tuning required to get optimal load balance
 • Batch parameters should match processors
   used, consistency important, task_geometry
   (loadleveler) is very powerful

Component parallelization
 •   atm : MPI, OpenMP, or MPI/OpenMP
 •   lnd : MPI, OpenMP, or MPI/OpenMP
 •   ice : MPI only, NTHRDS=1
 •   ocn : MPI only, NTHRDS=1
 •   cpl : OpenMP only, NTASKS=1
 •   The data models, datm, docn, dice, dlnd, and
     latm : serial only, 1 processor, NTASKS=1,

Main script configuration summary

 •  B case
 MODELS ( atm lnd   ocn     ice         cpl)
 SETUPS ( atm lnd   ocn     ice         cpl)
 NTASKS ( 8    2     40      8           1)
 NTHRDS ( 4    4      1       1          4)

 •  datm/dlnd/ocn/ice case
 MODELS ( atm lnd ocn ice          cpl)
 SETUPS ( datm dlnd ocn ice         cpl)
 NTASKS ( 1     1     64 16         1)
 NTHRDS ( 1     1      1   1        4)

• Startup - initial startup of model using arbitrary
   – set $CASE, $BASEDATE
• Continue - continuation of case, bit-for-bit
  guaranteed, uses model restart files
   – set $CASE
• Branch - start new case as a bit-for-bit continuation of
  another case, uses model restart files, requires
  continuous date
• Hybrid - start new case, not bit-for-bit continuation,
  uses model initial files in atm and land, can change
  starting date
Coupler namelist
 • Stop_option: ndays, nmonths, newmonth, halfyear,
   newyear, newdecade
 • Stop_n : integer (ndays, nmonths)

 • Rest_freq : ndays, monthly, quarterly, halfyear, yearly
 • Rest_n : integer (ndays)

 • Diag_freq : daily, weekly, biweekly, monthly,
   quarterly, yearly, ndays
 • Diag_n : integer (ndays)

 • info_bcheck : integer
Data Sets

 • Types
   – Grid files, binary
   – Namelist input, ascii
   – Initial datasets, binary/netcdf
   – Restart datasets, binary
   – History datasets, netcdf
   – Log files, ascii
 • inputdata directory
   – This is usually pointed to by $CSMDATA

Data Flow, Input
 • Everything is copied to $EXEROOT
 • Tools and scripts attempt to automate most of the
   “get input files”
 • Main script variables include $CSMDATA, $LFSINP,


$CSMDATA = inputdata

     Mass Store
Data Flow, Output
 • Output files are moved out of $EXEROOT
 • Harvesting is a separate process
 • Writing of restart files coordinated by the coupler
 • Writing of history files is not coordinated between
   components, monthly average is default
 • Main script variables include $LMSOUT, $MACOUT,

                                                           Mass Store
                            $ARCROOT          harvesting

Log Files

 • Each component produces a log file,
 • $LID is a system date stamp
 • Date stamps are the same on all log files for a run
 • Log files are written into the $EXEROOT/$model
   directories during execution
 • Log files are copied to $SCRIPTS/logs at the end of a
 • There are separate stdout and stderr that sometimes
   contain output information

Archiving, ccsm_archive
 • Means moving model output to a separate
   area on a local disk, ccsm_archive
 • Local disk area is set by $ARCROOT in the
   main script
 • Benefits
   – Allows separation of running and harvesting
   – Mass storage availability does not prevent
     continued execution of the model
   – Allows users to run in volatile temporary space
   – Supports simple harvesting in a clustered
     machine environment (like nirvana)

Harvesting, $CASE.har
 • Means copying model output to the local mass store
 • Separate script in scripts/$CASE, $CASE.har
 • Typically submitted in batch, can also be run
 • Submitted by main script after model run, off by
 • Sources ccsm_joe for important environment
 • Harvests all files in $ARCROOT/{atm,lnd,ocn,ice,cpl}
 • Verifies accurate copy on mass store before
 • Can scp files to remote machines
Exact Restart

 • CCSM can stop and restart exactly
 • The coupler controls the frequency of
   restart file writes
 • Restart files guarantee bit-for-bit
   continuity at a checkpoint boundary
 • rpointer files are updated in the
   scripts/$CASE directory after each run

Restart file management (1)
 • ccsm_archive
   – In scripts/$CASE
   – Called from main script after model run is
     complete, commented out by default
   – $ARCROOT/restart contains the latest full set of
     restart files
   – ccsm_archive copies full set of restart datasets
     into $ARCROOT/restart after each run
   – ccsm_archive then tars up that restart set into the
     $ARCROOT/restart.tars directory
   – These tar files can be large, regular clean up

Restart file management (2)
 • ccsm_getrestart
   – In scripts/tools
   – Called from main script before model run starts,
     commented out by default
   – Copies the latest set of restart files from
     $ARCROOT/restart to the appropriate directories
 • To “backup” model run to previous model
   – Assumes both ccsm_archive and ccsm_getrestart
     have been active in the main script
   – Delete all files in $ARCROOT/restart
   – Untar an $ARCROOOT/restart.tars file into
   – Resubmit   

 • RESUBMIT file in scripts/$CASE
   – contains a single integer
   – If the integer is >0, main script resubmits
     itself and decrements the integer
 • Runaway jobs
   – FIRST! set value in RESUBMIT file to 0
   – Attempt to kill running jobs

 • Modify coupler namelist in cpl.setup.csh, set
   run length and restart frequency, turn down
   diagnostic frequency, set info_bcheck to 0.
 • Run a startup, hybrid, or branch case
 • Transition to continue $RUNTYPE
 • Turn on archiving, harvesting, and
 • Edit RESUBMIT file to initiate auto-

Monitoring a run

 • Monitor the batch jobs using llq, bjobs, qstat
 • Verify that runs complete successfully, check
   for timing information at the end of a log file
 • Tail -f $EXEROOT/cpl/cpl.log*
 • If runs are not succeeding,
    – tail each log file
    – grep for ENDRUN in atm and lnd log files
    – Check stdout and stderr files for component
      messages or system messages
    – Look for core files in $EXEROOT/$model
    – Look for zero length files in $EXEROOT/$model
    – Check email
Modifying source code
 • Modifying files in the ccsm models directory is
   not recommended
 • Create directories under scripts/$CASE
   – src.atm, src.lnd, src.ocn,, src.cpl
   – Copy subset of model source code to these
     directories and modify it
   – Has highest priority with respect to build
 • Benefits include
   – Release source code remains unmodified and
   – Allows implementation of case dependent code

Multiple Machine Support
 • Should run on blackforest, babyblue, and ute
   “out of the box”
 • “Other” machines include seaborg, nirvana,
   eagle, falcon, cheetah
 • Supported platforms are indicated in $OS,
   $SITE, $MACH, $ARCH environment
   variables in the main script
 • See also scripts/tools/test.a1.mods.$MACH
   for suggested changes to for
   “other” machines.

Running on a “New” Machine
 • Main script
   –   Set batch queue commands
   –   Add new $OS, $SITE, $MACH, $ARCH options
   –   Set standard CCSM path names, $CSMROOT, …
   –   Harvester submission issues
   –   Set data movement variables, $LMSINP, …
 • Harvester script
   – May require modification
 • Tools
   – May need to modify ccsm_msread, ccsm_mswrite
 • Build
   – Modify models/bld/Macros.$OS file

 • Created by main script
 • Updated every time the main script runs
 • Case dependent
 • Records important ccsm environment
 • Can be “sourced” by other scripts to
   inherit ccsm environment variables

Interactive/Batch Issues
 • Can run main script interactively
 • Typically used to build and pre-stage initial
 • Uncomment “exit” command in main script to
   stop the script before script starts ccsm
 • Batch environment highly site dependent
   –   NQS
   –   Loadleveler
   –   LSF
   –   PBS
Common Errors (1)
 • Model won’t build
   – Try rebuilding clean
   – Remove all obj directories, these are
     $OBJROOT/model/obj which is normally
     equivalent to $EXEROOT/model/obj
   – When rebuilding, make sure $SETBLD is true in
     main script
 • Model won’t continue due to restart problem
   – Determine cause of problem; quota, hardware,
     script, zero length files, rpointer problems
   – Fix if possible
   – Back up to latest “good” restart dataset
   – Rerun
Common Errors (2)
 • Ice model stops due to mp transport error
   –   Double ndte in ice.setup.csh ice model namelist
   –   Back up to latest “good” restart dataset
   –   Run past previous stop date
   –   Reset ndte value
 • Ocean model non-convergence
   – Add about 10% to the number of model
     timesteps/hour in ocn.setup.csh, DT_COUNT
   – Back up to latest “good” restart dataset
   – Run past previous stop date
   – Reset DT_COUNT
   – Non-convergence on first timestep is special case

 • Under scripts/tools
   – ccsm_getfile : hierarchical search for file
   – ccsm_getinput : hierarchical search for input file
   – ccsm_msread : copies a file from local mass store
   – ccsm_mswrite : copies a file to local mass store
   – ccsm_checkenvs : echo ccsm environment
     variables, used to created ccsm_joe
   – ccsm-getrestart : copies restart files from
     $ARCROOT/restart to appropriate $EXEROOT
     and scripts/$CASE directories

 • This is complicated!
 • Issues
   – Performance of components and system as a
     function of resolution and configuration
   – Scalability of individual components, scaling
     efficiency of individual components
   – Task/Thread counts
   – Components sharing nodes, overloading nodes
     with multiple components, overloading threads,
     overloading tasks
   – Load balance of coupled system

Component Timings

 Seconds/simulated day

                         200                                              atm
                         100                                              ocn


                               4   8             16             32   64
                                       Number of processors

CCSM Load Balancing

  40 ocean
                 8.6                40.4
  32 atm
                 6.2            15.0
  16 ice
                 9.4        3.0
  12 land
                 10.0           10.0
  04 cpl
              5 3     2

  104 total
                  Timings in seconds per day
Component/Hardware layout
 • Machine, set of nodes
 • Nodes, group of processors that share
 • Processors, individual computing elements
 • General rules
   – Do not oversubscribe processors, place only 1
     MPI task or 1 thread on each processor
   – Minimize the number of nodes used for a given
     component and processor requirement
   – Multiple components can share a node as long as
     there is no oversubscription of processors
   – Test several decompositions, layouts, task/thread
     combinations to try to optimize performance
 • CCSM is a complicated multi-executable climate
   model, expect there to be “spin-up” time
 • CCSM is a scientific research code
 • There are many possible components,
   configurations, platforms, and resolutions; we are
   unable to test everything
 • Users are responsible for validating their science
 • NCAR can help with software/configuration problems,
 • Please report bugs, fixes, improvements, and ports to
   new hardware, so we can incorporate those changes!


To top