Cray SMW 4.0.UP02 Software Release Announcement The Cray SMW

Cray SMW 4.0.UP02 Software Release Announcement The Cray SMW 4.0.UP02 update package is available for all Cray System Management Workstations (SMWs). PRODUCT DESCRIPTION ------------------The Cray SMW 4.0 release adds support for Cray XT5m systems. Support continues for Cray XT3, Cray XT4, Cray XT5, Cray XT5h and Cray XMT systems. PLEASE REVIEW THE INSTRUCTIONS FOR EVERY UPDATE listed in this README since your last update. Significant changes occurred in: - 4.0.UP02 o Cray XMT 1.4 TTE library and diagnostics update. o CMS log performance improvements. Performance improvements have been made in the area of data copying, checking the database, comparing of duplicate messages, dropping of ec_console_logs (if backed up), parsing and reducing number of checks. o Speed up of xtdumpsys in the area of eventlog and console analysis. o xtnetwatch will not terminate if it does not receive a response from one or more modules that it expects to respond during system boot. o Cabinet EPOs now appear in the event log. o Fix to xtdimminfo where a correctable memory error was not corrected by the hardware and the kernel would panic. The error will now be listed as UNCORRECTABLE. - 4.0.UP01 o Support for Cray XT5 and Cray XT5m systems six-core AMD Opteron processors (code-named Istanbul). Sites are required to upgrade to the CLE 2.2 release. o Enhancement for multi-socket hardware check. Inter-node and intra-node cpu and memory consistency checks were added and displayed during xtbounce. o Fix for xtbootsys handling of user-directed choice to stop waiting for background process (Bug 748216) o xtnetwatch, x2netwatch and xtwm now reconnect when erd restarts. - 4.0.UP00 (GA) o Mazama name change - The Mazama system administration software included with the SMW 4.0 release has been renamed Cray Management Services (CMS), which more accurately describes the component functionality. A new document, S-2484-40 Using Cray Management Services (CMS), is provided. Note: Documentation associated with this release may use the terms Mazama and Cray Management Services (CMS) interchangeably; however, command names did not change. o CMS Enhancements - CMS System State Service. The CMS system state service optimizes the collection of system data and provides the infrastructure for future enhancements. For SMW 4.0, the components consist of a pair of daemons (mzsd and mzsd-client); an application programming interface (API) to allow applications to send and receive system information; and the new mz2attr command, which provides a way to display system data kept by the state daemon. - CMS Log Enhancements and Job History. The CMS log manager has been optimized through the use of log aggregation; message buffering; filtering of repetitive messages; and table enhancements that reduce the size of tables and increase the speed of searching, inserting, and deleting data. o Graceful Shutdown for Cray XT CNL Compute Nodes. The new xtcli shutdown command attempts to gracefully shut down the specified compute nodes. The new xtcli shutdown -f argument, to force a shutdown in an emergency situation, is also provided. o SEDC Warning and Control System (WACS). Through the WACS/Environmental Monitoring feature, you can set upper and lower limits for all measurable scan IDs by modifying the /opt/cray/etc/sedc_srv.ini file. SEDC issues warning notification, if a monitored scan ID value falls outside of specified limits. o Cray XT5m Support. The SMW 4.0 release adds support for Cray XT5m systems. o Virtual Channel 2 (VC2) Now Default Setting. Using the second virtual channel class that is present within the SeaStar network (VC2) has been shown to improve performance of a Cray XT system that has dual-core or quad-core processors when it is under a heavy communication load. As a result, beginning with the SMW 4.0 release, VC2 is now the default setting. This variable setting is not applicable to Cray X2 compute nodes. Also, use of VC2 on Cray XMT systems is not permitted. o SLES 9 No Longer Supported. Cray SMWs running the SLES 9 SP2 base operating system are no longer supported. The Cray System Management Workstation (SMW) Software Installation Guide, publication S-2480-40, provides the procedures to upgrade your SMW 3.1 software to the Cray SMW 4.0 software and to SLES 10 SP1. o The SMW 4.0 release is the final SMW release supporting xtwm and XTGUI, which includes the xtgui and xtguiview commands. CRITICAL/URGENT CUSTOMER BUGS ----------------------------- The following critical/urgent customer problems have been resolved with this release: o o o o Bug Bug Bug Bug 751366 751409 749061 746678 xtnetwatch dies during system boot. cabinet EPO not reported in event logs xtdumpsys processing of eventlogs takes too long xtdumpsys on a large XT4 is too slow and recommended dump procedure is unclear RELEASE INFORMATION ------------------Distribution for this release has changed. The SMW 4.0 release package must be ordered through the Cray Software Distribution Center in any of the following ways: E-mail: orderdsk@cray.com CrayPort (for subscribers): crayport.cray.com Telephone (inside U.S., Canada): 1-800-284-2729 (BUG CRAY), then 6059100 Telephone (outside U.S., Canada): +1-651-605-9100 Fax: +1-651-605-9001 This change was needed to assure that upgrade customers sign or incorporate additional MySQL license terms. Processing of MySQL license terms may take several days to resolve; plan accordingly. After the MySQL Pro license Addendum has been signed, the new terms are in force for both SMW 4.0 and CLE 2.2, including subsequent upgrades. This release package includes: - This README file, which includes: - Installation instructions (with documentation changes) for SMW 4.0 GA - All necessary SMW 4.0 RPMs are included in the ISO images, specifically hss 4.0.0-1.0400.23092.312, mazama 4.0.0-4.0400.2366.0, system diagnostics 4.0.0-1.0000.64195.0 and XT5 online diagnostics 4.0.0-1.64195.0 RPMs - CRAYSMWinstall.sh script is included in the ISO images - Man pages are included in the related RPMs - SMW 4.0.UP00 Errata - Cray System Management Workstation (SMW) 4.0 Software Installation Guide, S-2480-40 (PDF). - Cray System Management Workstation (SMW) 4.0 Software Release Overview, S-2482-40 (PDF) - Using Cray Management Services (CMS), S-2484-40 The SMW 4.0.UP02 update has been tested with the Cray CLE 2.1.UP02 and the Cray CLE 2.2.UP01 releases. CLE 2.2 requires the SMW be installed with a version of SMW 4.0. The SMW 4.0 release can be run with prior versions of Cray CLE 2.1 releases. At this time Cray XMT 1.3 is not supported with this SMW release. Check your Cray XMT release information for the required Cray SMW and CLE versions. For more detailed information about the SMW 4.0 release, please refer to the Cray System Management Workstation (SMW) 4.0 Software Release Overview, S-2482-40. Security Patches for Cray SMW systems ------------------------------------During the packaging of SMW 4.0 GA, SUSE has announced security issues in various packages and has released fixed versions. Cray has combined these RPMs in getfix packages as listed below. These security FNs and CVEs are included in the SMW 4.0 base. FN # ---5571 5573 5577 5579 5580 5583b 5588 5591 5593 5597 5603 5604 5606 5611a Update -----SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP00 SMW 4.0.UP01 SMW 4.0.UP01 SMW 4.0.UP02 SMW 4.0.UP02 SMW 4.0.UP02 SMW 4.0.UP02 SMW 4.0.UP02 Getfix Package -------------3114 3116 3120 3141 3142 3151 3154 3156 3157 3161 3165 3163 3179 3183 As newer RPMs are delivered in FNs, beyond those included in an update, mismatches may be reported by the installation script. If the installed RPM is newer, no action is required. Installation Instructions for 4.0.UP02 -------------------------------------See the Cray System Management Workstation (SMW) Software Installation Guide, S-2480-40 for detailed installation instructions. - INITIAL installation: To do an initial installation of the SMW 4.0 release, follow the procedures provided in the Cray System Management Workstation (SMW) Software Installation Guide, S-2480-40, Chapter 2. - UPGRADE (SLES 9 SP2) installation: SMW 4.0 no longer supports SLES 9 SP2. The procedures are provided in the Cray System Management Workstation (SMW) Software Installation Guide, S-2480-40, Chapter 5, to upgrade from SMW 3.1 based on SLES9 SP2 to SMW 4.0 based on SLES10 SP1. - UPGRADE (SLES 10 SP1) installation: To do an upgrade installation of the SMW 4.0 release from SMW 3.1 running SLES 10 SP1, follow the procedures provided in the Cray System Management Workstation (SMW) Software Installation Guide, S-2480-40, Chapter 4. - UPDATE (SMW 4.0.UP00 to SMW 4.0.UP02) installation: To do an update installation of the SMW 4.0 release, follow the procedures provided in the Cray System Management Workstation (SMW) Software Installation Guide, S-2480-40, Chapter 6. Additional Information for Installation/Configuration ----------------------------------------------------After this upgrade, SEDC will be turned on by default ----------------------------------------------------Before every SMW upgrade the local Cray support staff should check the Customer Service Best Practices Wiki/XT/CRMS, http://service-new.us.cray.com/wiki/CRMS_(Cray_RAS_and_Management_System) to ensure that the current system hardware PICs are at a revision that will support running SEDC. There is a possibility that a cabinet EPO or component power off could occur if PICs are not up to date and SEDC is automatically started after the upgrade. Please refer to FN 5572 for more information about Cray XT systems component power-off problems. Upgrades (excluding Cray XT5h with X2 compute nodes) - mzwatcher ---------------------------------------------------------------All upgrades on systems with exception of Cray XT5h with X2 compute nodes, need to disable mzwatcher from restarting mzinitd and mzsmd daemons. These daemons are only to be started on Cray XT5h systems with X2 compute nodes. After Chapter 4 "Upgrading From Cray SMW 3.1 (SLES 10 SP1) Software" section 4.4 "Installing an SMW 4.0 Upgrade Package" Step 10 "Enable and restart these CMS daemons, complete the following: Disable mzwatcher from restarting the mzinitd and mzsmd daemons: smw# vi /opt/mazama/etc/mzwatcher.conf Comment these two lines by adding # to the beginning of these lines: # /var/run/mazama/mzsmd.pid:/etc/init.d/cray-mzsmd # /var/run/mazama/mzinitd.pid:/etc/init.d/cray-mzinitd Stop the running mzinitd and mzsmd daemons: smw# /etc/init.d/cray-mzsmd stop smw# /etc/init.d/cray-mzinitd stop Also, Step 11 needs a clarification. Running some CMS commands, such as mzlslog, will produce the following error message. smw:~ # mzlslog -e1 -FtHCPm -v dbDBSimpleOpen] mysql_real_connect failed: Host 'smw' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' Clear this MySQL error condition by executing the following command and entering your mysql root password. smw# mysqladmin flush-hosts -uroot -p Enter password: ******** Clarification of SMW 3.1 to 4.0 upgrade step -------------------------------------------In the Cray System Management Workstation (SMW) 4.0 Software Installation Guide (S-2480-40) section 4.4, you must complete step 11, "Clear this MySQL error condition", whether or not you experience the error condition. Changing the default MySQL password on the SDB ---------------------------------------------If you set a site-specific password for mazama MySQL account on your SDB node, make sure the mazama MySQL account on your SMW matches it. The following procedure is included as steps of Procedure 51, "Changing the default MySQL passwords on the SDB" in "Cray XT System Management," S2393-22 provided with the CLE 2.2 GA release package. 1. To change the mazama account password for the MySQL accounts on the SMW, type these MySQL commands. smw# mysql -u root -p mysql> set password for 'mazama'@'%' = password('newpassword'); mysql> set password for 'mazama'@'localhost' = password('newpassword'); mysql> set password for 'mazama'@'smw' = password('newpassword'); mysql> exit 2. Update /etc/sysconfig/mazama on the SMW. a. Make the following change: smw# vi /etc/sysconfig/mazama ## Type: string ## Default: mazama ## Config: "" # # Default password for mazama user in the mazama database # passwd=newpassword b. Make the following additional change, unless you are using a remote MySQL server for CMS logs. ## Type: string ## Default: mazama # Default password for mazama user in the mazama Log database # log_passwd=newpassword New xtcdr_generator core type for AMD Opteron Processor (Istanbul) -----------------------------------------------------------------The xtcdr_generator -C subtype_str option has a new IB12 core type that supports the six-core AMD Opteron processors (code-named Istanbul) for Cray XT5 and Cray XT5m systems. Additional System Administration Information -------------------------------------------Must stop mzsd before booting XT (bug 749433) --------------------------------------------A problem existed when mzsd was not stopped on the SMW before a Cray XT system reboot. Restarting mzsd on the SMW after the Cray XT system reboot was not sufficient to restart the daemon. This problem has been corrected. You can safely on the SMW, in 40. Section 2.7.3, also no longer ignore the second note in section 2.7.1, Restarting Daemons Using Cray Management Services (CMS), publication S-2484Restarting Daemons Not Stopped Before a System Reboot, is necessary and can be ignored. Changing the Time Zone for L0 and L1 Controllers -----------------------------------------------The "Cray System Management Workstation (SMW) 4.0 Software Installation Guide," S-2480-40, Appendix B, "Updating the Time Zone, Procedure 58, Changing the time zone for L0 and L1 controllers, should include the following warning at the beginning of the procedure: Warning: Do not flash L0 and L1 controllers while the Cray XT system is booted; perform this procedure while the Cray XT system is shut down. xtbounce consistency checks --------------------------xtbounce now includes consistency checking for CPU model, speed, NB speed, and memory configuration in response to bugs 746602 and 745699. This ensures that all sockets of multi-socket nodes and also all nodes on the same module are of the same configuration. This checking happens just after the coldstart phase of xtbounce. Consistency checks are done as follows: o CPU model - checks all sockets/dies on a node against each other and also checks all nodes against each other. o CPU speed - same checking. o CPU NB speed - same checking. o Memory configuration - intra-node checks are not done, allowing for non-uniform memory configurations amongst sockets within a node; all nodes must have the same overall memory configuration, however. In the event that inconsistencies are found, xtbounce will show warnings such as the following: WARNING: WARNING: WARNING: WARNING: c0-0c0s1n0 c0-0c0s2n1 c0-0c0s3n2 c0-0c0s6n3 334 335 336 337 CPU Model Mismatch CPU Speed Mismatch Memory Speed Mismatch Memory Configuration Mismatch In these cases, the offending component can be found by looking at the last lines of the /var/log/coldstart..0 files on pertinent L0. In the following example, node 0 has a CPU model mismatch between sockets on the same node, and node 3's NB speed mismatches the other nodes' NB speed. Note in the coldstart.X.0 output, the term 'die' is used. On singlesocket nodes, there will be one 'die' entry; on FR2 modules there are two 'die' entries. Any future product which may have multiple dies in a socket will have the total number of die for the node. /var/tmp/coldstart.0.0: Jun 9 20:28:42 CPUinfo signature found, retaining core/die info. Jun 9 20:28:42 numdie: 2 cores: 4 Jun 9 20:28:42 die[0]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2000, memgb: 8 Jun 9 20:28:42 die[1]: cpuid: 0x00100f24, model: B4, cpumhz: 2400, nbmhz: 2000, memgb: 8 /var/tmp/coldstart.1.0: Jun 9 20:28:42 CPUinfo signature found, retaining core/die info. Jun 9 20:28:42 numdie: 2 cores: 4 Jun 9 20:28:42 die[0]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2000, memgb: 8 Jun 9 20:28:42 die[1]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2000, memgb: 8 /var/tmp/coldstart.2.0: Jun 9 20:28:42 CPUinfo signature found, retaining core/die info. Jun 9 20:28:42 numdie: 2 cores: 4 Jun 9 20:28:42 die[0]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2000, memgb: 8 Jun 9 20:28:42 die[1]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2000, memgb: 8 /var/tmp/coldstart.3.0: Jun 9 20:28:42 CPUinfo signature found, retaining core/die info. Jun 9 20:28:42 numdie: 2 cores: 4 Jun 9 20:28:42 die[0]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2100, memgb: 8 Jun 9 20:28:42 die[1]: cpuid: 0x00100f23, model: B3, cpumhz: 2300, nbmhz: 2100, memgb: 8 When running xtbounce, if there are any of the new coldstart warnings described above, the module can be disabled and replaced at the next PM period. Differences between 4.0.UP01 and 4.0.UP02 ----------------------------------------HSS: Bug: 751366 xtnetwatch dies during system boot Description: xtnetwatch now keeps running if initial ec_lcb_monitor_rsp not received. xtnetwatch, when starting up, sends out an ec_lcb_monitor event informing all pertinent L0's that they should begin monitoring. It also passes things like LCB masks, sampling frequency, etc. xtnetwatch expects a response from each L0 it send the original event to. Previously, if one or more L0's didn't respond, xtnetwatch would print an error message an exit. This mod changes that to a warning, and xtnetwatch continues to run. Not receiving this event does not preclude the L0 from performing network monitoring -- it just may be that the controller was busy, or ERD was busy, and the controller didn't respond in time. This warning just is informational, and informs the operator that something may be wrong with the controller(s) specified in the warning message. Revision: hss:22896 Bug: 750961 rca notifies of ec_node_unavailable 7 minutes too late Description: Make event handler registration idempotent to avoid redundancy. To avoid the problem of multiple identical handlers being called with the same user data for the same ERD client, this mod ensures that multiple calls to cray_event_add_handler() for the same event ticket, with the same function pointer and user data, just returns success without adding yet another handler to the list for that subscription. Revision: hss:22898 Bug: 749101 more specific events needed from xtprocadmin Description: Add ec_software_node_info type. ec_software_node_info type is used to send informational messages from XT system software about various changes that the software has made to node state. Format is string encoding ##. Revision: hss:22904, hss:22905 Bug: 751039 SMW daemons seem to get confused after an OS shutdown Description: Mod to state manager to improve the performance of response event processing. State manager subscribes to all response events, but has entries in it's state transition tables for only a small subset of these. This results in state manager wasting time going through the state transition tables looking for a match on event type. This mod changes state manager so that it subscribes to only those response events that have entries in the state transition tables. This will have the effect of reducing wasted cpu cycles and should result in 1) fewer events being queued up for state manager and 2) faster processing of events sent to state manager. Revision: hss:22952 Bug: 751409 cabinet EPO not reported in event logs Description: Pass EPO message from L1 to the SMW Along with printing the EPO error message to the L1 log file, pass the error along with the ec_l1 _failed event to the SMW so it appears in the eventlog. Revision: hss:22965 Bug: 748264 coldstart needs to set additional bits to properly support Linux booting Description: Coldstart mods to support booting linux 2.6.27. Changes to conform to a more standard boot loader (i.e. LILO) and support the Linux boot protocol described in: linux/kernel/Documentation/i386/boot.txt. This mod allows for the removal of several hacks to the Cray linux kernel. (Mod provided by Eric Jones.) For 'version tracking', the bug number (748264) has been added to prints in linuxcmdline/main.c and wait4boot/main.c. (Bug 748264) Revision: hss:22971 Bug: 752320 l0sysd shouldn't call check_nodes for XMT modules, causes l0sysd seg fault Description: Fix for opteron specific code and XMTs. Code was added in an update to the 4.0 software that is incompatible with XMT. This mod checks for the node arch type and disallows the processing of check_nodes when the target arch is ThreadStorm. Revision: hss:23034 Bug: 751300 xtdimminfo improperly characterizes error as correctable Description: Fix xtdimminfo to catch an uncorrectable case. Fix a case in xtdimminfo where the memory error is a correctable error (xtoperonmca CECC = 1 = Correctable ECC Error) BUT, the hardware is unable to correct the error (xtoperonmca - UC = 1 = Error NOT Corrected by HW). Without the fix, the kernel would panic but xtdimminfo would output that the error was correctable. Revision: hss:23049 Bug: 746678 xtdumpsys on a large XT4 is too slow and recommended dump procedure is unclear Description: Speed up xtdumpsys eventlog analysis and console analysis. The xtdumpsys eventlog analysis stage can take a very long time to process eventlogs on large systems. The problem has been traced to TCL/expect regular expression performance and the use of linear lists instead of hash tables. This mod replaces the eventlog analysis stage in TCL with an implementation in Perl that greatly speeds up processing time. The Perl script is called from the xtdumpsys script and all eventlog analysis variables are passed from stdin and back through stdout. The xtdumpsys console analysis stage can take a very long time to process on large systems, especially if there are multiple duplicate MCA errors in the console output which requires a lookup to determine the MCA status value with xtopteronmca. This mod speeds up the console analysis stage by changing how the TCL console regular expressions are processed to work around TCL's slow regexp performance. It also adds a cache for xtopteronmca data so that multiple error messages about the same component only require one actual execution of the command. Revision: hss:23028, hss:23070 Bug: 752704 Need to reliably log cabinet health register to SMW after an EPO Description: Partial fix for BUG 752704 - log cabinet Health register after EPO. During an EPO event, the only text message to the eventlog was the fact that an EPO had occurred. The Health status register bit that indicates the cause of the EPO was not given in the eventlog. So, after an EPO, the reason for the EPO could be unknown. This mod adds the text message (from the Health Status bit set) to the eventlog after an EPO. Bug 752704 also asks for the VFD status and the VFD fault history logged to the SMW which is not fixed at this time. Revision: hss:23080 Bug: 753099 XMT's needs for SMW 4.0UP02 Description: Update TTE libray and diags for Cray XMT. TTE has undergone a couple changes and this updates the version from 5.18.0 to 5.18.1. XMT bug 749185 sbe's during coldstart may lead to early OS hang, and lack of scanout is resolved and a build problem introduced in xmt:104092, which caused L0 version of gdb to fail to build. Revision: hss:23088 CMS: Bug: 751158 mzpartition has a segmentation fault in X2 1.2.15a10 Description: mzpartition segfaults if only base address is set when the nodes are assigned. The update: smw:~> mzpartition --add --partition HSN --component r Segmentation fault The fix enforces the requirement of changing the base address, netmask and gateway as a set. NOTE: Here is how to setup the alternate base address. When the XT changes. 1. Power off the system to ensure stargates get initialized (XT and X2 are talking on the same base networks) mzpower --all off 2. Run mzsysinit on the X2 (after any reboot of the XT). mzsysinit 3. Ensure that the /etc/xt.conf is setup on both the SMW and the boot node. 4. mzupdate -i -p 5. Create a image: mzimage -c -i 429 mzimage -p -i 6. Create a partition: mzpartition -c -p mzpartition -s -i mzpartition -A -C r* 7. Boot the system mzboot -p Revision: mazama:2293, mazama:2294 Bug: 749032 mzinit problem using "script" in SLES10 - workaround the getlogin issue Description: Work around the getlogin problem for script and console. This is to work around the getlogin problem in script konsole and other shells. Revision: mazama:2295 Bug: 749340 mzsd init script sourcing xt_boot_functions in wrong location on obs root Description: Check if we are using OBS paths or non-OBS paths. Revision: mazama:2296 Bug: 750096 Using CMS API causes hangs Bug: 751081 ALPS alarm triggered on connect to CMS Description: This mod adds timeout support to TCP socket connect call for CMS APIs. Remove existing connect with timeout used by the daemon. All socket connect calls use a timeout of 6 sec. Revision: mazama:2301 Bug: 747257 mzsd does not see state transitions Description: Support for reading/parsing ec_software_nodeinfo events. Fix event data when getting it through SMW interface. In order to get correct ev_data, we need to get service ids and increment event data pointer past these if we are getting event from the SMW (i.e. mzsd). Revision: mazama:2304, mazama:2305 Bug: 752014 Need changes to handle large system boot Description: Changes to handle large system boot and reading of config. At XT re-boot, CMS re-reads the system config. Modifying config read based on boot and boot rsp events is need to minimize the number of reads. Revision: mazama:2310 Bug: 747987 mz2attr summary changes Description: This mod changes the mz2attr summary function to not display network ids and places the network revision next to the network type. Revision: mazama:2317 Bug: 751218 mzsd displays some error messages as if it was mzsmd Description: This mod remove the wording of 'mzsmd' from exit message. Also fix some message severity level and better messages. Revision: mazama:2318 Bug: None Description: Performance improvements for CMS logging. Performance improvement in the area of data copying, checking the database, comparing of duplicate messages, dropping of ec_console_logs if backed up, more efficient parsing and reducing number of checks and copying of data. Code simplification and use of new routines and data structures is implemented. mzlogmanagerd nice value is now 1, so it is nice to others besides the nanosleeps. Revision: mazama:2341, mazama:2342, mazama:2343, mazama:2344 Bug: 751069 mzboot does not timeout after more than 10 hours waiting for X2 nodes to boot Description: The default boot timeout value in mzboot was 0, which means no timeout. This mod changes the default timeout to be 30 minutes. Revision: mazama:2346 Bug: 751038 mzinit error handling of bad command lines Description: Fixes a mzinit error message if password is bad. Also fixed the options to be the same as the usage and man page, mzinit option was -passwd is now --password, as documented. Revision: mazama:2349 Bug: 752764 mzsd hogs CPU Bug: 752698 CMS errors when creating reservation on nodes that have existing reservation Description: Avoid event router loops when mzsd gets partition information. Change to use the parse command line versus event to get partition information. Add boot lock to handle boot timing using uses existing node discovery instead of additional retries. Simplify sleep(delay) for node discovery. Add a 'default' partition info when partition info not retrieved. Add locking to buffered nid states. Make the MZE_send_attr ced independent of mzsd. This is probably the main issue in the CPU hog for mzsd. Remove alarm/timer from boot rsp processing. Revision: mazama:2352,mazama:2359, mazama:2354, mazama:2363 --------------------------------------------------------------------------- Refer to the README file for the cumulative list of instructions contained in this fix package.

Related docs
Cray
Views: 3  |  Downloads: 0
Cray 2006 Annual Report
Views: 98  |  Downloads: 0
Cray XT3XT4 Software Status and Plans
Views: 0  |  Downloads: 0
Robert_Cray
Views: 1  |  Downloads: 1
Cray Roadmap (2004-2010)
Views: 9  |  Downloads: 2
CRAY INC Termination Severance Agreement
Views: 2  |  Downloads: 0
CRAY INC Termination Severance Agreement
Views: 1  |  Downloads: 0
CRAY INC Loan Agreement
Views: 1  |  Downloads: 0
Cray Pushing IPv6 to Next Level
Views: 0  |  Downloads: 0
Seymour_Cray
Views: 1  |  Downloads: 0
Other docs by uie761km
Amendment to Contract
Views: 386  |  Downloads: 11
ajij[0]
Views: 132  |  Downloads: 0
ma_prit
Views: 163  |  Downloads: 1
Option to Purchase Building
Views: 135  |  Downloads: 4
Employment agreement
Views: 253  |  Downloads: 6
Transcript of Social Security Act Amendments
Views: 269  |  Downloads: 2
Measuring Globalization
Views: 301  |  Downloads: 9
Underlying lease of shopping center
Views: 379  |  Downloads: 6
Transcript of Treaty of Fort Laramie
Views: 166  |  Downloads: 0
Checklist for Starting a Small Business
Views: 5437  |  Downloads: 180
Monroe Doctrine info
Views: 196  |  Downloads: 0