DMP Field Guide
There are various ways of providing high availability of business critical application and data such as high availability clustering and data redundancy through RAID. High availability of applications is achieved by connecting various computers together through an interconnect and clustering software that enable transparent failover between systems in case of system failure. High availability of storage can be achieved by implementing redundant disks using RAID-1
Mirroring of disks (RAID-1) provides high availability only against disks failure. Protection against failure of any component in the data path (HBA, cable, switch or switch port etc) is not provided by RAID-1. Having multiple paths from server to storage can not only provide high availability but also provide high performance depending on type of arrays.
DYNAMIC MULTIPATHING (DMP)
VERITAS Storage Foundation‟s DMP feature intelligently improves I/O performance and enhances system reliability in case of a failure in a data path cause due to failure of HBA or Switch or cable failure. In case of a path failure DMP automatically select next available I/O path for I/O requests dynamically without administrator intervention. DMP proactively takes corrective actions (path restore) automatically and transparently, once the failed or disable paths are restored or repaired. DMP improves I/O performance by spreading or balancing I/O requests across available paths between server and storage depending on the type of storage array. Further DMP can enhance availability in HA Clustering environment by avoiding unnecessary node failure due to a I/O path failure in a Multipathing environment. DMP provides high availability and improves performance on a wide variety of storage arrays of different types such as Active/Active, Active/Passive etc. Integration of Logical Volume Manager (or storage virtualization software) and I/O Multipathing software provides great benefits as customer can do logical volume management as well as I/O multipathing across various storage arrays/platforms centrally. The same codepath can provide performance advantage, unified error detection and correction mechanism
Types of Arrays supported by DMP
Following built-in types of arrays are supported with DMP in SF 4.0: Active/Active arrays (A/A): A/A arrays support simultaneous I/O on all paths. A/A array that has multiple active interfaces to a LUN or a logical device and this LUN can be accessed through any of the available paths during normal operation without causing a failure. In case of a path failure Active/Active arrays continue to operate without causing any type of path failover or any temporary interruption. Examples of this type of array include the EMC Symmetrix, HDS 99xx series, IBM ESS series (Shark) etc. Active/Passive Arrays (A/P): A/P arrays have only one controller actively processing I/O and the other controller in standby mode. A/P arrays in auto-trespass mode arrays allow I/O on a single primary (active) path, and the secondary (passive) path is used in the event of failure of the primary path. Failover occurs when I/O is used on the secondary path. If multiple primary paths are configured, the additional primary paths are used in preference to the secondary path. Example: EMC Clariion Cx600, Cx700 etc, HDS 95xx, HDS 92xx, IBM FASt-T, T4/T4. Further some of the A/P arrays can be configured to work as A/P-C, A/PF, A/PF-C, A/PG, A/PG-C. Active/Passive Concurrent (A/P-C): A/P-C I/O arrays in auto-trespass mode. Such arrays allow I/O on multiple primary (active) paths, and the secondary (passive) path is used in the event of failure of all primary paths. Failover occurs when I/O is used on the secondary path. This array type allows load balancing of I/O across multiple primary paths. This array type is introduced in VxVM 4.0. LUNs or logical devices can be assigned to multiple controllers in a A/P-C array, out of which I/O occurs through only one assigned controller. For optimal distribution of I/O, assignment of LUNs is distributed across various controllers e.g. Odd number LUNs to controller A and even numbered LUNs to controller B. In case of a failure of a path to the controller or controller failure all LUNs assigned to that controller are reassigned (trespassed) to the surviving (passive) controller. Example: EMC CLARiiON, HDS 95xxV series, IBM FASt-T, T3/T4 etc. Almost all A/P arrays support A/P-C Active/Passive arrays in explicit failover mode (A/PF): A/PF arrays are A/P arrays in which the multipathing software initiate (requires a special command to be issued to the array) the ownership change (trespass) of the LUNs/devices from active path to the passive path in a failover scenario. Currently A/P explicit is available only with T3 arrays (on Solaris only). This support is not yet available in Clariion as EMC has not give the APIs or Commands to do this (needs verification though). Example: T3/T4 can be configured both in A/PF and A/P Active/Passive Concurrent I/O arrays in explicit failover mode (A/PF-C): A/PF-C are A/P arrays where failover requires a special command to be issued to the array. This array type allows load balancing of I/O across multiple primary paths. This array type is introduced in VxVM 4.0. Currently no ASL is available to support this array type with exception of Clariion Cx series which is supported through an APM. Active/Passive arrays with LUN group failover (A/PG): A/PG arrays treat a group of LUNs connected through a controller as a single failover entity. Failover occurs at the controller level, and not at the LUN level (as is the case for A/P in auto-trespass mode). The primary and
secondary controllers are each connected to a separate group of LUNs. If a single LUN in primary controller‟s LUN group fails, all LUNs in that group fail over to the secondary controller‟s passive LUN group. Example: HDS 9200 Active/Passive Concurrent Arrays with LUN group failover (A/PG-C): A/PG-C arrays treat a group of LUNs that are connected through a controller as a single failover entity. Failover occurs at the controller level, and not at the LUN level (as is the case for A/P arrays in auto-trespass mode). The primary and secondary controllers are each connected to a separate group of LUNs. If a single LUN in the primary controller‟s LUN group fails, all LUNs in that group fail over to the secondary controller‟s passive LUN group. This array type allows load balancing of I/O across multiple primary paths. This array type is introduced in VxVM 4.0. Currently no ASL is available to support this array type. This set of array types can be extended by creating an Array Policy Module (APM) in conjunction with an ASL. An APM consists of a set of kernel procedures in a dynamically loadable kernel module. The new array type is declared by setting the ATYPE attribute in the associated ASL.
Different Types of Multipathing Configurations Configuration 1:
This is a simple configuration with two HBAs, connected to two controller of an array or a JBOD with 10 disks. Number of Disks: 10 Number of Paths: 2 Number of devices OS sees: 20 (10x2) Number of devices/disks DMP displays to VxVM: 10
This is a simple configuration with two HBAs, connected to two ports of a FC Switch. Two ports of the switch are connected to a controllers of an array or a JBOD with 10 disks. Number of Disks: 10 Number of Paths: 4 Number of devices OS sees: 40 (10x4) Number of devices/disks DMP displays to VxVM: 10
This is a simple configuration with two HBAs each connected to a ports of a distict FC Switch. A ports of the each FC switch is connected to a controllers of an array or a JBOD with 10 disks. Number of Disks: 10 Number of Paths: 2 Number of devices OS sees: 20 (10x2)
Number of devices/disks DMP displays to VxVM: 10
This is a configuration of an A/A array with each LUN assigned to a ports on different controllers. Two HBAs on the host are connected to a port in two distinct FC-switches. Each switch is connected to two different ports of two controllers. Number of Disks/LUNS: 2 Number of Paths: 2 Number of devices OS sees: 4 Number of devices/disks DMP displays to VxVM: 2
This is a configuration of an A/P array with each LUN assigned as a primary or secondary to a port on different controllers/processor. Two HBAs on the host are connected to a port in two distinct FC-switches. Each switch is connected to two different ports of two controllers. Number of Disks/LUNS: 2 Number of Paths: 2 Number of devices OS sees: 4 Number of devices/disks DMP displays to VxVM: 2 (Total of 4 paths, 2 active and 2 passive)
I/O Balancing Policies
Round Robin: Round-Robin I/O balancing policy shares I/O equally between the paths in a round-robin sequence. For example, if there are three paths, the first I/O request would use one path, the second would use a different path, the third would be sent down the remaining path, the fourth would go down the first path, and so on. No further configuration is possible as this policy is automatically managed by DMP. This is the default and is a recommended policy for A/P-C configurations with multiple active paths per controller. Balanced Path: Balanced path is the default policy for the A/A arrays that optimizes use of caching in disk drives and RAID controllers. This policy is a special type of round robin policy, scheduling I/O across all the available paths for an A/A arrays. During normal operation, LUNs are logically divided into a number of regions (or partitions), and I/O to/from a given region is sent on only one of the active paths. The size of this region is configurable using dmp_pathswitch_blks_shift or partitonsize attribute (default is 2048 sectors), all I/Os starting in this range go through same path. This policy is an optimization for Round-Robin I/O policy to better deal with sequential data and data with close to each other in the address space. On a path failure the workload is automatically redistributed across the remaining paths. Balanced Path I/O policy can be advantageous when using JBODS as it utilizes track-cache during load balancing since generally JBODs do not have shared cache like A/A arrays. Minimum Queue: This policy sends I/O on paths that have the minimum number of outstanding I/O requests in the queue for a LUN. No further configuration is possible as DMP automatically determines the path with the shortest queue. This policy should be used mainly to balance I/Os across controllers where controller bottleneck is reducing the performance. Adaptive: Adaptive I/O load balancing policy attempts to maximize overall I/O throughput from/to the disks by dynamically scheduling I/O on the paths. It keeps track of I/O statistics to find the path with the expected greater throughput and then assigns the path priority to transfer I/Os to achieve better throughput. It is suggested for use where I/O loads can vary over time. For example, I/O from/to a database may exhibit both long transfers (table scans) and short transfers (random look ups). The policy is also useful for a SAN environment where different paths may have different number of hops. Priority: This policy is useful when the paths in a SAN have unequal performance, and users want to enforce load balancing manually. Users have to assign priorities to each path based on
knowledge of the configuration and performance characteristics of the available paths, and of other aspects the system. Single Active: Single Active I/O load balancing policy routes I/O down one single designated active path. This is the default policy for A/P arrays with one active path (or preferred path) and more than one failover (secondary or backup) paths. If configured for A/A arrays, there is no load balancing across the paths, and the alternate paths are only used to provide high availability (HA) by allowing failover. Simultaneous access to a LUN via multiple paths will cause LUN ownership to shift back and forth across the array‟s controllers, causing a “ping-pong” effect. This can result in performance degradation. Single Active l/O policy for A/P or A/P-C array uses the primary path as long as it‟s accessible. If the primary path fails, I/O is shifted to the secondary path. Performance of A/P arrays can be optimized by intelligent assignment of LUNs to controllers e.g. Odd number LUNs to controller A and even numbered LUNs to controller B. This was all the controllers can process I/O during normal operation.
DDL, ASL, APM
Prior to release 3.2 of VERITAS Volume Manager (VxVM), the procedure for adding support for third-party vendor arrays required modification of the existing Dynamic Multipathing (DMP) code and DMP driver. VERITAS or the vendor then tested the modification for incorporation into a later VxVM release. To reduce the costs of providing such support for new arrays and making them available to market, VERITAS has introduced the Device Discovery Layer (DDL). DDL allows vendors who meet certain VERITAS prerequisites for array support to qualify their arrays with the Array Support Library (ASL).This release of the ASL development kit also allows array vendors to implement Array Policy Modules (APMs). An APM consists of a set of kernel procedures that DMP uses to handle I/O scheduling, I/O failover, and error handling policies for a set of disks.
Device Discovery Layer The Device Discovery Layer (DDL) is a component of VxVM that provides a device discovery service to the VxVM configuration daemon (vxconfigd). DDL is responsible for discovering the multipathing attributes of disks and of disk arrays that are connected to a host system. It also discovers any available enclosure information from the disk arrays. The enclosure attribute is an important factor in the data and/or metadata placement algorithms that VxVM uses to provide robustness. DDL uses SCSI commands to discover disk array attributes. As the procedure for discovering such attributes differs between disk arrays, DDL is able to use a different dynamically loaded library for each specific type of disk array. Such Array Support Libraries (ASLs) form the core modules for disk array support in VxVM. After DDL has used the ASLs to gather the attributes of all disks connected to a host, it configures VxVM (specifically, the vxdmp driver) to enable multipathing of the disks. DDL allows ASLs to be added or removed from a running VxVM system. This means that DMP support for a particular disk array type can be dynamically added to or removed from VxVM without stopping VxVM or rebooting the system. If the affected disks contain active volumes, they remain available during and after the addition or removal of support. Note In VxVM 3.2 and later releases, DMP support for new disk arrays can be added or removed dynamically without changing the VxVM package. DDL loads ASLs at runtime. For each disk that is visible to the host operating system, DDL issues the SCSI inquiry command to obtain vendor identification (VID) and product identification (PID) information. Array Support Library A disk array may require the implementation of a new algorithm for its discovery. The Device Discovery layer (DDL) supports integration of the discovery mechanism for new types of disk array by being able to dynamically load an Array Support Library (ASL) that contains the required routines for a specific type of array. Each ASL is responsible for discovering the attributes that are needed by VxVM and DMP. The DDL passes these attributes back to the VxVM configuration daemon, vxconfigd, during the device discovery phase. They are then used to reconfigure the kernel component of DMP. For major Tier-1 arrays such as EMC, HDS, and IBM VERITAS (SWIFT or iLab) writes the Array Support Libraries. For other arrays such as LSI, 3Par etc the vendor is responsible for writing the ASLs. Array Policy Modules Array Policy Modules form a set of kernel procedures that DMP uses to handle I/O scheduling, I/O failover, and error handling policies for a set of disks. Taken together, an APM forms a set of procedures that define an array type. VERITAS or array vendors can develop an APM. The APM may then be associated with an array type, and claimed by the ASL for the array.
Operating System commands for device discovery
devfsadm (Solaris) devfsadm maintains the /dev and /devices namespaces. The default operation is to attempt to load every driver in the system and attach to all possible device instances. devfsadm then creates device special files in /devices and logical links in /dev. cfgmgr (AIX) The cfgmgr command configures devices and optionally installs device software into the system. ioscan & insf –e (HPUX) ioscan scans system hardware, usable I/O system devices, or kernel I/O system data structures as appropriate, and lists the results. The insf command installs special files in the devices directory, normally /dev. If required, insf creates any subdirectories that are defined for the resulting special file. The –e option reinstall the special files for pseudo-drivers and existing devices. MAKEDEV (RedHat LINUX) Makedev will create the devices in /dev used to interface with drivers in the kernel.
VERITAS Volume Manager Commands and Daemons
vxconfigd: Volume Manager configuration daemon (vxconfigd) maintains disk configurations and disk groups in VxVM. vxconfigd takes requests from other utilities for configuration changes, and communicates those changes to the kernel and modifies configuration information stored on disk. vxconfigd also initializes VxVM when the system is booted. vxdctl enable: vxdctl enable causes vxconfigd to scan for any disks that were newly added since vxconfigd was last started. In this manner, disks can be dynamically configured to the system and then recognized by VxVM. If the multipathing support is available, this option also causes vxconfigd to rebuild the DMP internal database to reflect the new state of the system after addition of the disk devices. The new disk devices detected by vxconfigd are added in the DMP database with their associated subpaths and parent DMP device. Also if volume manager has already recognized the ASL is added to the host/system at a later stage, vxdctl enable converts the devices to array specific devices vxdisk scandisk (VxVM Command) vxdisk scandisks initiates rescanning of devices in the OS device tree by VxVM. If necessary, DMP reconfiguration is triggered. This allows VxVM to configure and multipath disks dynamically. This command also supports operations such as new, fabric, ctlr, pctlr etc more information on this is can be obtained from the man page
How DMP does Disk/Device Discovery:
Consider a disk with a single path. The OS (Solaris for this example) will have created a device handle for this path in /dev/[r]dsk. At discovery time, DMP would create a metanode (pseudo device) for this disk in /dev/vx/[r]dmp. When I/Os are sent to this metanode, they will be passed to the underlying OS device name. The I/O stack would look something like: /dev/vx/[r]dmp/c1t0d0s0 | | /dev/[r]dsk/c1t0d0s0 | | disk Now let us consider a disk that has two paths. Since the OS is not "multi-path aware", it will create a device handle I/O for both paths to that disk. As far as the OS is concerned, each of these paths is a disk in its own right. At DMP discovery time, DMP will detect that each of these devices is in fact a separate path to the same disk. It will then build the DMP node which has subpaths to each of the OS devices. When an I/O is sent to the metanode, DMP will select the appropriate path /dev/vx/[r]dmp/c1t0d0s0 | | ----------------------------------| | | | /dev/[r]dsk/c1t0d0s0 /dev/[r]dsk/c2t0d0s0 | | | | ----------------------------------| | disk In the above example, vxdisk list would only show the metanode, c1t0d0. The OS would see two devices for this disk (format etc). DMP metanode is representation of a disk with ALL of its paths. More detailed technical information on DMP path and device discovery can be obtained from the following link.
http://library.veritas.com/docs/268035 Enable/Disable/Restore/Failover 11/21
Path Failure and/or Failover: Path failure can occur due to a failure, damage or loose connection of a component (cable, GBIC, MIA etc.) in the I/O Path. On a path failure a DMP initiates a path failover after determining whether it is really a path failure or a disk/device failure. I/O is sent via the failed path OS send I/O failed status to DMP DMP now has to determine if it is I/O failure or a disk failure o DMP does a SCSI inquiry to detect transient failure (path not available for a short period of time) or a disk failure. o If SCSI inquiry fails then a path failover is initated and I/Os are rerouted via another valid path (Active in case of A/A and passive in case of A/P). o (Limitation) This process happens only on a device. DMP cannot correlate/know the same information of the other devices on the same path. So in case of a path failure only the path for the device is failed over to an alternate path and NOT paths for all the devices/LUNS in that path (for both A/A and A/P).
Processes and/or Daemons affecting Path Failover: Error Daemon (dmperrd): On a path or a disk failure (i.e I/O failure), DMP will pass the failed I/O request to the DMP request queue (dmpreqQ). Now the dmperrd will take processing of the I/O sitting in the dmperrQ. DMP error daemon (dmperrd) does an SCSI inquiry (may be multiple time see tunables) for a valid path. On getting a valid path it sends all the I/O requests sitting in dmperrQ through the valid path. dmperrd exists as a kernel thread so will not be listed as in the process table (ps –aef)
Solaris: dmperrd starts when vxdmp driver is loaded HPUX: dmperrd starts when firs dmp ioctl is executed by vold (volume management daemon) DMP_RETRY_COUNT (Tunable): Discussion in next section Restore Daemon (restored): After a path failover, Restore Daemon (restored) will keep checking the health of the failed path. If the health of the path is restored, the restore daemon (resotred) will open the path for I/O. During the restore process the restore daemon will update the DMP database. By default the resotre daemon keeps check the paths every 300 seconds. This is configurable using vxdmpadm, however it should be done intelligently as it may take lot of resources. destored exists as a kernel thread so will not be listed as in the process table (ps –aef)
Platform Specific Discussion for Factors affecting DMP
OS and DMP Tunables, Parameters
DMP_RETRY_COUNT (Tunable): Whenever there is an I/O failure DMP will initiate a SCSI inquiry to that device. If the SCSI inquiry is successful I/O will be sent to the same path. If I/O still fails, DMP will continue retrying for DMP_RETRY_COUNT times ( default is 5) before failing the I/O and sending I/O failure to the applications. SCSI inquiry to the device can succeed but the I/O fails if a write operation is performed on a write protected devices or if a disk that is part of the LUN with multiple disk has failed. SCSI inquiry happens more than once only if is successful and if not the path/device has failed. Gotcha: DMP_RETRY_COUNT can and should be lowered as SCSI protocol also retires on its own this will help reducing accumulation of I/O to retry in the queue. Default value of this parameter is 5 on all platforms except HPUX (it is 30 on HPUX), recommended value is 2.
Platform Solaris HPUX AIX Linux
Y, N, Y, N,
Changeable defaults to 5 coded to 30 defaults to 5 coded to 5
How to change? /kernel/drv/vxdmp.conf only via adb!! smit vxvm (18.104.22.168+) n/a
DMP_PATHSWITCH_BLKS_SHIFT (All Platforms): When using Balanced path I/O policy to take advantage of caching in disk drives and/or RAID controllers the LUNs/devices are logically divided into a number of regions (or partitions), and I/O to/from a given region is sent on only one of the active paths. The size of this region is configurable using dmp_pathswitch_blks_shift attribute (default 1MB), all I/Os starting in this range go through same path. If arrays has port specific caches the it is recommended to increase value of this parameter beyond 1MB. The value of this tunable can be changed in /kernel/drv/vxdmp.conf (on Solaris)file.
Platform Solaris HPUX AIX Linux
Y, Y, Y, Y,
Changeable defaults 1 MB defaults 1 MB defaults 1 MB defaults 1 MB
How to change? /kernel/drv/vxdmp.conf only via adb!! smit vxvm (22.214.171.124+) n/a
DMP_FAILED_IO_TRESHHOLD: Whenever DMP does and I/O to a device it gets are return code (whether successful or failed). In some abnormal condition DMP does not get this return code from the device for an extremely long time and in turn DMP/VM cannot return the code to the application. By default dmp_failed_io_threshold is set to 24 hours. Setting it to a appropriate value this issue can be avoided. There are pros and counts in setting this tunable. Moreover, if the DMP does not get the return code various types of timeouts such as SCSI timeouts, Application I/O timeout etc. For Solaris the SCSI time out is 60 seconds and can be set in /etc/system/sd_io_timeout (This is an OS tunable).
Platform Solaris HPUX AIX
Changeable Y, defaults 60sec Y, defaults Y, defaults
How to change? /etc/system/sd_io_timeout only via adb!! smit vxvm (126.96.36.199+)
Linux AIX Specific Tunables
Throttling: Whenever there is an I/O failure, SCSI inquiry is initiated, DMP will keep sending I/Os to the SCSI layer (failed device) until it detects there is a failure. It will keep sending each I/O until it reaches dmp_retry_count. If the number of I/O to the failed devices is huge this can can be an issues as all the I/Os will fail once DMP determines that the device has failed. This is called throttling. This is specific to AIX platform. To overcome this issue reduce on AIX there is a tunable called DMP_QUEUE_DEPTH. DMP_QUEUE_DEPTH (AIX Specific): Whenever there is a huge influx of I/Os on a failed device (throttling), DMP will try sending I/Os equal to the number of I/Os mentioned the dmp_queue_depth parameter to the SCSI layer. DMP will try out these many I/O before aborting all other pending I/O to the device. Default value of this parameter is 3238. The value of this parameter can be changed to match that of array or vice-versa.
Changeable? & Default Y, defaults is 32-38
How to change? smit vxvm (188.8.131.52+)
DMP_PATH_IOD (AIX Specific): Once DMP determines a failed device a flag dmp_path_iod is set and a new thread will initiated that will clear all the pending I/O.
Solaris Specific Tunables: None HP-UX Specific Tunables: None LINUX Specific Tunables: None
HBA settings (Qlogic, Emmulex)
In HBA configuration files certain parameters related to path failover can be set to modify its behavior link_down_timeout = timout before which link will be marked as down and RSCN generated on SAN Link Retry Count = no of time the HBA will retry to login into fabric before declaring link down The HBA failover timeout is usually default is 30 sec
Fabric/Switch Settings (Brocade, McData, CISCO)
Refer to the switch vendor documentation for any setting that needs to be done as the setting can vary depending on the SAN architecture. Some of the examples may be Set the switch in interoperability mode if the Fabric contains multi-vendor switches. Buffer Credits, timeout values are dependent on the distance of the SAN architecture so look for such recommendation of such values in the vendor specific doc.
Interoperability with other Multipathing Software
Just like DMP, PowerPath is an I/O multipathing software that is by widely used by EMC customers. Almost all installations of PowerPath the arrays are on EMC Symmetrix/DMX or Clariion. There are several instances where users have already installed or may install VxVM (read DMP) and PowerPath simultaneously. There may be several issues in this type of configuration. Some background: PowerPath sits below the DMP layer if they are co-existing on the same host. EMC PowerPath displays (not correct word) the LUNs /devices in one of the two ways
1) Pseudo Devices or Powernames (e.g. emcpower1c) as PP mask all the paths to a devices into one name and displays it to the OS.
Whenever the PowerPath is installed and the devices are configured as Pseudo devices PowerPath will display only one path to the upper layer i.e. DMP if installed. In this scenario PowerPath will do (take care??) all the I/O multipathing including path failover and I/O balancing. DMP will just do pass thru of the I/Os. The Pseudo devices (emcpowerX) are added as type ‘simple’ (not sliced) when added by powervxvm utility
In clustering environment especially when using I/O fencing, VxVM via DMP uses SCSI-III PGR interfaces to registers keys on the disk. So this mechanism assumes that DMP has created metanodes of all the devices. But when using DMP along with PowerPath in Pseudo devices environment, DMP does not create metanodes for these Pseudo devices. Due to this DMP will not be able to do SCIS-III ioctl() or will not be able to use SCSI-III PGR interface to register keys on to the Pseudo devices. So in a nutshell when devices are configured using PowerPath pseudo name, DMP will not be able to do SCSI-III PGR and hence VCS 4.0, SF-RAC will not be supported in this scenario. .
2) Native Devices (CxTxDx). When using Native devices, Power Path displays all available path in form of /dev/rdsk/cxtxdx
PP Native mode
Whenever the PowerPath is installed and the devices are configured as Native devices PowerPath will display only all available paths to the upper layer i.e. DMP if installed. In this scenario both DMP as well as PowerPath will do (take care??) the I/O multipathing including path failover and I/O balancing. This has some performance implications and some issues in error detection and recovery.
There can be two scenarios in this case to have smooth functioning of multipathing a) Uninstall PowerPath from the host and DMP will do all I/O Multipathing. b) Suppress DMP to do multipathing (vxdiskadm) Best Practice: If two products are installed one product should be „passive‟, i.e. only one product duct should do multipathing.
How to remove PowerPath from the server and enable DMP to do multipathing.
As discussed above DMP and PowerPath can coexist together. However depending on the naming mode used in PowerPath, DMP will either do a passthru to devices or do multipathing on the devices. In case if PowerPath is required to be removed for the host to let DMP do the multipathing. The steps to remove PowerPath are described in the PowerPath installation and administration guide. Once PowerPath is completely removed from the system need are reboot. During the reboot process DDL will do the device discovery and DMP will create metanodes for the multipathed devices. VxVM will then go and read the configuration information to start all the Diskgroups and volumes
VxVM DMP MPxIO
MPxIO is a multipathing driver from Sun which is like VERITAS DMP that will manage multiple paths to the disk. MPxIO exports a new device name while suppressing underlying paths to the disk. The new disk name can be recognized by the name using format command. MPxIO export the new device name with C#tWWNd#s# format. MPxIO is a host based multipathing solution, so there is a facility to enable and disable this feature. Either to enable or to disdable this driver support a REBOOT is required on the host. But enabling and disabling of this software is global to that host, so if you enable it will create new devices for all eligible disks or if you disable it will create for NONE. To ENABLE please make below setting in /kernel/drv/scsi_vhci.conf file. mpxio-disable="no";
To DISABLE please make below setting in /kernel/drv/scsi_vhci.conf file. mpxio-disable="yes"; DMP co-exist with/with-out MPxIO software on the machine. For example you have 4 devices in a diskgroup which was created while MPxIO is in disabled state. You can still have the same diskgroup with MPxIO enabled also with out any data loss.
IBM-SubSystem Device Driver (SDD)
SDD has its own naming space and give names to devices like vpath01, vpath02 etc. DMP and SDD can co-exist on the same host. By default SDD give one path to VxVM and the actual multipathing is done by SDD. If users want to have DMP to do multipathing then the SDD/Vpath ASL needs to be excluded using the command # vxdmpadm exclude array libname=libvx-vpath.so # vxconfigd -r # vxdctl enable SDD is supported only on Shark arrays and IBM-SAN Volume Controller.
Array Specific Discussion
Type of Array
Recomm ended Policy
ASL neede d
EMC DMX 1000 SYM 5 SYM 4 EMC Clariion CX600 HDS 9960 HDS 9980 HDS 9900 HDS 9500 HDS 7700 HP-XP128 IBMESS(2105800)
A/A A/A A/A A/P R/R
SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC Y (AIX) R/R R/R Y(AIX) Y (AIX) SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC
SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA SF/HA, SF-RAC SF/HA, SF-RAC SF/HA SF/HA SF/HA, SF-RAC
SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC SF/HA, SF-RAC
SF/HA SF/HA SF/HA
SF/HA SF/HA SF/HA SF/HA, SF-RAC SF/HA SF/HA SF/HA, SF-RAC SF/HA, SF-RAC Host mode AIX required HBA WWN's must be bound to luns. Host type is selected when defining WWN's (1) IBM FastT's are OEM'd from LSI. Hardware is the same. (2) A NVRM flash can change it from a LSI to an IBM. (3) Vendor provides a script to be executed on the array (4) IBM embeds LSI's RDAC with the OS. RDAC is used to controll failover, VxDMP just co-exsists (1) IBM FastT's are OEM'd from LSI. Hardware is the same. (2) A NVRM flash can change it from a LSI to an IBM. (3) Vendor provides a script to be executed on the array (4) IBM embeds LSI's RDAC with the OS. RDAC is used to controll failover, VxDMP just co-exsists (5) Use host type "solaris (veritas DMP)" Should not have to run a script to enable this Host mode OF-AIX (AIX), 00Standard (LINUX) Host mode 00-Standard (LINUX) Tresspas-mode (AIX, SF) Standard-mode (AIX, SF)
A/A A/P A/P A/A A/A
Configuring DMP with T3/T4 arrays T3 can be configured in two different modes as it an A/P array: 1. rw - Implicit lun failover mode 2. mpxio- Explicit lun failover mode When T3 is connect connected in single host configuration "rw" is recommended, as it will take care the lun failover when an I/O comes for the passive path. But this mode should not be used while multiple hosts are accessing the array as it can cause Ping-Pong effect while accessing data on the disks from this array. So to eliminate Ping-Pong effect in multi-host configurations "mpxio" mode on the array should be used.
Sample listing of T3 configuration purple:/:<2>sys list blocksize : 64k cache : auto mirror : auto mp_support : rw rd_ahead : on recon_rate : med sys memsize : 32 MBytes cache memsize : 256 MBytes DMP supports both modes of T3 in single host configuration. In multihost configuration "mpxio" is the only mode.
How to remove stale DMP nodes? In 4.0 it DMP will remove on it own at boot time as it now uses tmpfs. To suppress/stop DMP use option 17 of vxdiskadm DMP not supported with SUN Cluster
From what I can gather, SC fencing will work on a solaris device /dev/[r]dsk/.. BUT will not work on a DMP metanode (/dev/vx/rdmp) that has mutiple paths. When there is a single path device, DMP would act as a pass thru driver, such that if SC applied some fencing ioctl, DMP passes it through to the only disk. In a multipathed environment, this may not be true. SC would apply the ioctl to DMP but what does DMP do with it, pass it on to both or one etc. I guess this is where the issue lays.
Effect of persistent binding of LUNs (for avoiding reconfig boots) to specific controllers when using device:
No issues. VxVM sees/uses what the sd driver presents up (cXtXdX), so whenever there is a persistent binding done below the sd driver layer (e.g. lpfc) isn‟t going to affect VxVM.
Can the CX600 ASL co-exists with Powerpath? (Document Id: 268287) There is a hardware configuration issue. It is required for DMP that Cx600 array be configured as auto-trespass whereas, PowerPath requires that the array be configured as explicit failover. In summary, if PowerPath controls Cx600 array then DMP would not be able to support it even with Cx600 ASL. This applies to all Active Passive EMC arrays using PowerPath. With PowerPath and Cx600 ASL installed, do not try to configure the system using VM. This is because both DMP and PP try to do multipathing and this degrades system performance significantly.
So the resolution is not to install the ASL and to exclude devices from DMP. Why "Prevent multipathing of" has limited practical use? If any of the "Prevent multipathing" options are used in vxdiskadm, then DMP will not set up multiple paths for the devices specificed. The side affect of this is that each path will now show up as a disk in its own right. VxVM will have issues with this as it will detect duplicate disk id's as each path is a disk with the same disk id. This option is best avoided. JBOD classification: There may be instances where an ASL is not available for a array. If the array follows the following criteria, then the array can be supported under the JBOD (Disk) A/A enclosure: 1. Array is of type A/A 2. Disks have unique serial number that is the same down the multiple paths to the disk 3. The serial number can be obtained from a known location The "vxddladm addjbod" command can be used to add the disks to this category by the vendorid (VID) and Productid (PID) of the device as seen in a SCSI inquiry. By default the serial number is located at page offset 36. If the serial number is located at a different location, then these values (opcode, pagecode & pageoffset) can also be specified. How to configure load balancing across multiple primaries (A/P) http://support.veritas.com/docs/266269
DMP utilities: /etc/vx/diag.d/vxdmpinq - Used to display SCSI inquiry info from a disk /etc/vx/diag.d/vxasldebug - Collect ASL info /etc/vx/diag.d/vxdmpdebug - Collects DMP info (uses vxconfigd -k) ftp://ftp.veritas.com/pub/support/vxexplore.tar.Z