VIEWS: 522 PAGES: 34 POSTED ON: 4/24/2011
.-. (-_-) (\x/) (-.-) Section 1 - Storage VMFS Filesystem & Metadata Objectives Knowledge Describe the VMFS file system o Metadata o Multi-access and locking o Tree structure and files o Applicability to clustered environment o Journaling Skills & Abilities Manage VMFS file systems using command-line tools The VMFS (Virtual Machine Filesystem) is a proprietary filesystem created by vmware. It has been created with the following goals: Optimized for (very) large files, up to 2Tb Be cluster aware. That is have multiple hosts have read/write access to the volume concurrently Be a journeled filesystem Provide a directory structure Provide sub-block allocation Use an LVM (Logical Volume Manager) to do cool stuff Amongst other things….. VMFS is a cluster aware filesystem, that is you can have multiple hosts with concurrent read/write access to a volume. One of the mechanisms for this is the file level locking mechanism implemented by VMFS. Traditional filesystems (think NTFS/ETX3) use a volume locking mechanism, that is when a host access a volume it *locks* the volume. Once a volume is locked, it can only be accessed by the host who has locked the volume. This locking mechanism is normally implemented by way of SCSI 2 Reservations (Exclusive Logical Unit Reservation). As VMFS leverages a file level locking mechanism, generally no one server will lock a volume, but rather the files located on that volume. This ensures that a file (think a VM - being just a bunch of files) can only be opened (think run) by a single host at a time. An integral part of this file level locking mechanism is VMFS metadata. Metadata contains certain filesystem descriptors such as: Block Size Number of Extents Volume Capacity VMFS Version Volume Label VMFS UUID Another crucial part of metadata are the file locks. File locks must be obtained when: A file is opened (e.g. powering on a VM) Creating a file (e.g. new VM/template) Deleting a file Changes to file ownership Access/Modification timestamps A file is grown (think thin disks & snaps) Creating/Deleting VMFS Volume Expanding a VMFS volume Resignaturing Using VDF And more…. Whenever metadata on a VMFS volume has to be updated, the VMFS volume must be reserved. When an ESX host locks a VMFS volume it must obtain a SCSI2 Reservation of the actual lun/disk hosting the VMFS volume. This reservation provides exclusive read/write access to the volume, ensuring that only 1 ESX host may update metadata at a time (and reduce the likely hood of metadata corruption). The (hopefully obvious) ramification of this is that other ESX hosts with access to the disk lose the ability to send I/O to the disk while it is locked. In the event that a lun is locked and a host attempts to send I/O while locked the following message will be logged in the vmkernel log and the I/O will be retried up to 80 times. Excerpt from /var/log/vmkernel Apr 24 15:59:53 esx35-1 vmkernel: 5:14:57:01.939 cpu0:1083)StorageMonitor: 196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0 Apr 24 15:59:53 esx35-1 vmkernel: 5:14:57:01.939 cpu0:1041)SCSI: vm 1041: 109: Sync CR at 64 Apr 24 15:59:56 esx35-1 vmkernel: 5:14:57:04.982 cpu0:1151)StorageMonitor: 196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0 Apr 24 15:59:56 esx35-1 vmkernel: 5:14:57:04.982 cpu3:1041)SCSI: vm 1041: 109: Sync CR at 16 Apr 24 15:59:56 mel-esx-02 vmkernel: 5:14:57:05.050 cpu0:1161)StorageMonitor: 196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0 Apr 24 15:59:57 esx35-1 vmkernel: 5:14:57:06.047 cpu3:1041)SCSI: vm 1041: 109: Sync CR at 0 Apr 24 15:59:57 esx35-1 vmkernel: 5:14:57:06.047 cpu3:1041)WARNING: SCSI: 119: Failing I/O due to too many reservation conflicts I have filtered a lot of repeats, what is good to look at above is: 24/0 0x0 0x0 0x0 This is the SCSI code for 'Reservation conflict: Device reserved by another host' then the next line Sync CR at 64 This is the count down from the 80 retries and you see that the conflicts continue to occur, the count goes down until the sad end: Sync CR at 0 Failing I/O due to too many reservation conflicts OK - so this is a little dramatic, but it shows a few things. That when a lun is locked the host will retry a few (80) times to send the I/Os. That if it does not get released in time - bad things happen! It is very normal for reservation conflicts to happen. As your VI grows - you add more hosts, run more VM's, have more snapshots, VMotion more etc etc these things happen. The time is *should* take to update metadata on a VMFS is in the vicinity of ~10microseconds so we are not talking about huge periods of time the lun is locked. My example above was caused by a failure in my array where a controller failed (but still appeared online) and the other controller did not tresspass the luns…nasty stuff (though quite uncommon). It is possible to administratively lock VMFS volumes using vmkfstools [root@esx35-1 root]# vmkfstools -L reserve /vmfs/devices/disks/vmhba2:1:11:1 [root@esx35-1 root]# vmkfstools -B /vmfs/devices/disks/vmhba2:1:11:1 Successfully broke LVM device lock for /vmfs/devices/disks/vmhba2:1:11:1 [root@esx35-1 root]# As the lock is broken, we see the following message in the vmkernel log. Sep 6 07:45:49 esx35-1 vmkernel: 0:00:06:55.314 cpu1:1033)LVM: 7433: Device lock for <(vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1, 21047328), 4a9e0381-cfc9b39f-cfe5-00151772654d> released Using the esxcfg-info command we can see which luns are reserved. Even when we direct esxcfg-info to display storage related information there is still a fair amount of output, grepping this can help. [root@esx35-1 root]# esxcfg-info -s | grep -i Reserved |----Is Reserved............................false |----Is Reserved............................false |----Is Reserved............................false |----Is Reserved............................false |----Is Reserved........................................false |----Is Reserved........................................false |----Is Reserved........................................false |----Is Reserved........................................false Querying Metadata Metadata can be queried with the 'vmkfstools' command. [root@esx35-1 root]# vmkfstools -P -h /vmfs/volumes/NewVMFS VMFS-3.31 file system spanning 1 partitions. File system label (if any): NewVMFS Mode: public Capacity 1.8G, 1.5G available, file block size 1.0M UUID: 4a7724b1-06761321-6bcd-000c29b6325a Partitions spanned (on "lvm"): vmhba0:1:0:1 The -P switch queries metadata, and the -h provides human-readable output. Metadata may also be seen as files stored on each VMFS volume [root@esx35-1 root]# ls -la /vmfs/volumes/NewVMFS/ total 290816 drwxr-xr-t 1 root root 980 Aug 4 03:56 . drwxr-xr-x 1 root root 512 Aug 4 07:17 .. -r-------- 1 root root 98304 Aug 4 03:56 .fbb.sf -r-------- 1 root root 22708224 Aug 4 03:56 .fdc.sf -r-------- 1 root root 6520832 Aug 4 03:56 .pbc.sf -r-------- 1 root root 260374528 Aug 4 03:56 .sbc.sf -r-------- 1 root root 4194304 Aug 4 03:56 .vh.sf The metadata is stored as .sf files at the root of the volume. .fdc.sf - file descriptor system file .sbc.sf - sub-block system file .fbb.sf - file block system file .pbc.sf - pointer block system file .vh.sf - volume header system file Another important part of metadata is a region of the disk called the Heartbeat Region. This is an area of the VMFS volume that ESX hosts accessing the volume will write their signatures to, as a way of keeping file locks valid, and informing other ESX hosts also sharing the volume that they have access to the volume. If a hosts signature is not updated within a certain period, or during a HA event, other hosts can age the locks on the files in order to lock the files (and open them) themselves. New VMFS Objectives Knowledge Explain the process used to align VMFS partitions Describe the VMFS file system o Extents Skills & Abilities Manage VMFS file systems using command-line tools Manually creating aligned VMFS partition When you create a new VMFS volume using the VI Client it is created aligned by default, however if you want to use the Service Console to create your volumes you must go ensure you take a few additional steps to ensure it is aligned. Background X86 systems require an MBR (Master Boot Record) at the beginning of any disk they use. The MBR consumes the first 63 sectors of the disk and then the usable space of the disks start from there. The picture above is a little stretched, but you can see what I mean, from the OS perspective, where the partition starts is actually in the middle of an array block. This is an misaligned partition. In this picture, we see that the OS voume is aligned with the array blocks. It is worth noting that this is not a vmware thing - it is an X86 thing. This affects windows, linux, <insert fav x86 OS here> When you have a misaligned partition, it can greatly affect performance. The table below describes vmware's numbers on the performance affect that misalignment can have. What you generally see, is as the I/O increases the performance degrades... What also amazes me is that when I teach about this issue in my DSA/FT classes or in the storage classes I teach that this is news to most people. That such a serious problem (which is VERY prevalent) is not widely known is an issue. It is not just storage nerds that need to know this….anyway - on with the show. Identify the disk to create the new volume on. You need to identify the disk you are going to partition. fdisk is a cool partitioning tool, but is soo easy to kill the wrong partition. If it is a new disk, using fdisk is the easy way, run 'fdisk -l' and look for the disk without a partition table. [root@esx35-1 root]# fdisk -l Disk /dev/sda: 10.7 GB, 10737418240 bytes 64 heads, 32 sectors/track, 10240 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 100 102384 83 Linux /dev/sda2 101 2597 2556928 fb Unknown /dev/sda3 2598 7597 5120000 83 Linux /dev/sda4 7598 10240 2706432 f Win95 Ext'd (LBA) /dev/sda5 7598 8141 557040 82 Linux swap /dev/sda6 8142 10141 2047984 83 Linux /dev/sda7 10142 10240 101360 fc Unknown Disk /dev/sdb: 2147 MB, 2147483648 bytes 67 heads, 62 sectors/track, 1009 cylinders Units = cylinders of 4154 * 512 = 2126848 bytes Disk /dev/sdb doesn't contain a valid partition table You could also use 'esxcfg-vmhbadevs' [root@esx35-1 root]# esxcfg-vmhbadevs -a vmhba0:0:0 /dev/sda vmhba0:1:0 /dev/sdb The '-a' switch shows all devices, whether they have a console device or not. compare this with the output of 'esxcfg-vmhbadevs -m' [root@esx35-1 root]# esxcfg-vmhbadevs -m vmhba0:0:0:2 /dev/sda2 4a0bdfe2-c789e0df-d96e- 000c29b6325a The useful part of 'esxcfg-vmhbadevs' is it displays the vmhba address & the service console device. If you use the '-m' switch it will also display the UUID of the vmfs volume. We can determine that there is no vmfs volume on /dev/sdb as it was not listed when we ran 'esxcfg-vmhbadevs -m' Using fdisk to create aligned partition If you create your VMFS volumes using the VI client, they are automatically aligned. If you create them via the service console, you need to manually align them. [root@esx35-1 root]# fdisk /dev/sdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-1009, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009): Using default value 1009 Command (m for help): t Selected partition 1 Hex code (type L to list codes): fb Changed system type of partition 1 to fb (Unknown) Command (m for help): x Expert command (m for help): b Partition number (1-4): 1 New beginning of data (62-4191385, default 62): 128 Expert command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@esx35-1 root]# Formatting the partition as VMFS Now that we have created the partition, we need to format the partition. 'vmkfstools' is the tool to create vmfs partitions (amongst other things) [root@esx35-1 root]# vmkfstools -C vmfs3 -b1m -S NewVMFS vmhba0:1:0:1 Creating vmfs3 file system on "vmhba0:1:0:1" with blockSize 1048576 and volume label "NewVMFS". Successfully created new volume: 4a7724b1-06761321-6bcd-000c29b6325a The '-C vmfs3' switch specifies to create a vmfs3 volume '-b1m' formats the volume with a 1m block size '-S NewVMFS' defines the volume the name and 'vmhba0:1:0:1' is the device to create the volume on. Creating aligned vmdk files Even though we have created an aligned VMFS filesystem it is advantageous to align any virtual disks you have on the filesystem also. As you can see below we still have a misalignment issue... To create an aligned virtual disk, when the disk has been added to the VM (assumes windows) open Disk Management and select 'Rescan Disks' to pick up the new SCSI device. Once you can see the new disk, 'Initialize' the disk right click the disk, and select 'Initialize Disk'. Now the disk is initialized open a command prompt and use the tool 'diskpart'. You could also use the tool 'diskpar' (but the syntax is different, diskpar uses a block offset, rather than the byte offset 'diskpart' uses). C:\Documents and Settings\Administrator>diskpart Microsoft DiskPart version 5.2.3790.3959 Copyright (C) 1999-2001 Microsoft Corporation. On computer: VIMGMT DISKPART> list disk Disk ### Status Size Free Dyn Gpt -------- ---------- ------- ------- --- --- Disk 0 Online 20 GB 0 B Disk 1 Online 2039 MB 2039 MB DISKPART> select disk 1 Disk 1 is now the selected disk. DISKPART> create partition primary align = 64 DiskPart succeeded in creating the specified partition. DISKPART> Creating Extended VMFS volumes The maximum size of a LUN that ESX may use is 2Tb. However you can create a single VMFS volume that is up to 64Tb by spanning or extending a VMFS volume across multiple LUNs. A VMFS volume may have up to 32 extents, these extents do not have to be the same size. For e.g. you can extend a 1Tb VMFS volume by adding an extent of 100Gb. This can be done via the GUI, but as this is VCDX stuff, we use the cmd line. The tool you will use 'vmkfstools' [root@esx35-1 root]# vmkfstools -Z "vmhba2:0:12:1" "vmhba2:0:1:1" VMware ESX Server Question: All data on vmhba2:0:12:1 will be lost. Continue and format? 0) Yes 1) No Please choose a number [0-1]: 0 [root@esx35-1 root]# vmkfstools -P -h /vmfs/volumes/MyLUN/ VMFS-3.31 file system spanning 2 partitions. File system label (if any): MyLUN Mode: public Capacity 29G, 29G available, file block size 1.0M UUID: 4a81775b-21e6f204-684f-00151772654d Partitions spanned (on "lvm"): vmhba2:0:1:1 vmhba2:0:11:1 References: Recommendations for Aligning VMFS Partitions www.vmware.com/pdf/esx3_partition_align.pdf Storage Block Alignment with Vmware Virtual Infrastructure http://communities.vmware.com/servlet/JiveServlet/download/2409-117516-821464- 4435/NetappDisk+Alignment+on+Virtuals+-+3593.pdf VMFS - Best Practices, and counter-FUD http://virtualgeek.typepad.com/virtual_geek/2009/03/vmfs-best-practices-and-counter-fud.html Storage troubleshooting Objectives Knowledge Identify storage related events and log entries Analyze storage events to determine related issues Skills & Abilities Verify storage configuration using CLI, VI client and server log entries Troubleshoot storage connection issues using CLI , VI Client and logs o Rescan events o Failover events Interpret log entries for configuration validation and predictive analysis Troubleshoot file system errors using logs and CLI Troubleshooting storage related issues is primarily done through analysis of the vmkernel log file. The format of messages in the vmkernel log file is as follows: Jun 19 09:12:54 pisa vmkernel: 14:22:31:50.009 cpu3:1033) scsi-qla0: Scheduling SCAN for new luns.... <System Time> <host> <msg source> <uptime> <CPU:world id> <device> <message> Most of the descriptions above should be pretty self explanatory, but the CPU:world id bit could do with more. Each VM is contained within a World and all of the vm's associated processes are contained within that world. A world itself is a schedulable entity owned by the vmkernel, similar to a process but more like a managed group of processes. A VM's world would contain multiple processes to handle the running of the vm itself, and the operation od its virtual devices. You can use the vm-support.pl script to determine the world id of each vm [root@esx35-1 root]# vm-support -x VMware ESX Server Support Script 1.29 Available worlds to debug: vmid=1115 vc-fs01 vmid=1133 ad1-lab This piece of information can make it easier to decipher messages within the vmkernel log file, which can be quite trying at times... The vmkernel log itself (/var/log/vmkernel) contains all messages logged by the vmkernel and is controlled by syslog (/etc/syslog.conf). It is the primary log file for all things vmkernel related, which itself is all things ESX. From a storage troubleshooting perspective it is the first place you should look. In the VMFS notes we looked at detecting SCSI reservation issues, which caused timeouts in accessing a VMFS volume. Below, we will look at path failover events which may be relatively benign, or demonstrate serious issues in your storage environment. Lets look at some common storage events, and how they appear in the vmkernel log First we will look at a rescan event. [root@esx35-1 /]# esxcfg-rescan vmhba2 Rescanning vmhba2...done. On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31. On scsi2, adding: 0:1 0:11 0:12 0:24 0:25 0:31. So we discover a few luns as you see above, and below, you see how this is shown in the vmkernel log. I have removed a bit of repeat stuff but most of it is here... Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.136 cpu1:1034)scsi-qla1: Scheduling SCAN for new luns.... Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.136 cpu1:1034)<6>scsi-qla1: Scheduling SCAN for new luns.... Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.137 cpu1:1035)<6>scsi(1) : Non NPIV Fabric, Capability 0x100 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.137 cpu1:1035)SCSI: 861: GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0, linktype=0, state=0, failreason=2, rv=0, sts=0 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L1': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L11': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.147 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L12': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.147 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.148 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L24': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.148 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.149 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L25': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.149 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.150 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T0:L31': Vendor: 'IBM ' Model: 'Universal Xport ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.150 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.159 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L1': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.159 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.160 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L11': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.160 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.161 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L12': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.161 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.162 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L24': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.162 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.163 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L25': Vendor: 'IBM ' Model: '1722-600 ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.163 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.164 cpu1:1035)ScsiScan: 395: Path 'vmhba2:C0:T1:L31': Vendor: 'IBM ' Model: 'Universal Xport ' Rev: '0914' Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.164 cpu1:1035)ScsiScan: 396: Type: 0x0, ANSI rev: 5 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.220 cpu1:1033)<6>scsi(1) : Non NPIV Fabric, Capability 0x100 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.220 cpu1:1033)SCSI: 861: GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0, linktype=0, state=0, failreason=2, rv=0, sts=0 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.244 cpu1:1034)<6>scsi(1) : Non NPIV Fabric, Capability 0x100 Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.244 cpu1:1034)SCSI: 861: GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0, linktype=0, state=0, failreason=2, rv=0, sts=0 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba2:C0:T0:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Device id info for vmhba2:C0:T0:L1: 0x1 0x3 0x0 0x10 0x60 0xa 0xb 0x80 0x0 0xf 0x9e 0x2b 0x0 0x0 0x17 0x53 0x49 0xd5 0xa9 0xb6 0x1 0x93 0x0 0x8 0x20 0x6 0x0 0xa0 0xb8 0xf 0x9e 0x2c 0x1 0x94 0x0 0x4 0x0 0x0 0x0 0x1 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Id for vmhba2:C0:T0:L1 0x60 0x0a 0x0b 0x80 0x00 0x0f 0x9e 0x2b 0x00 0x00 0x17 0x53 0x49 0xd5 0xa9 0xb6 0x31 0x37 0x32 0x32 0x2d 0x36 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba2:C0:T0:L11 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Device id info for vmhba2:C0:T0:L11: 0x1 0x3 0x0 0x10 0x60 0xa 0xb 0x80 0x0 0xf 0x9e 0x2b 0x0 0x0 0x19 0xca 0x4a 0x68 0xfb 0x3c 0x1 0x93 0x0 0x8 0x20 0x6 0x0 0xa0 0xb8 0xf 0x9e 0x2c 0x1 0x94 0x0 0x4 0x0 0x0 0x0 0x1 Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Id for vmhba2:C0:T0:L11 0x60 0x0a 0x0b 0x80 0x00 0x0f 0x9e 0x2b 0x00 0x00 0x19 0xca 0x4a 0x68 0xfb 0x3c 0x31 0x37 0x32 0x32 0x2d 0x36 So there is a bit to digest there. Lets focus on the bottom bit i have highlighted. VPD stands for Vital Product Data and describes the capabilities of the device. The 'Supported VPD pages' bit says what information will be described (or can be queried). The green parts are the vendor specific pages, and the blue ones provide information like: 0x0 "Supported VPD pages", 0x80 "Unit serial number", 0x83 "Device identification" , 0x85 "Management network addresses" Now lets look at a lun trespass. A trespass is an array side event where lun ownership changes from controller to another. Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.214 cpu2:1026)StorageMonitor: 196: vmhba2:1:1:0 status = 2/0 0x2 0x4 0x3 Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.214 cpu2:1026)SCSI: 5273: vml.0200010000600601602be019000840a437da99de11524149442035: Cmd failed. Blocking device during path failover. Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)WARNING: SCSI: 4562: Manual switchover to path vmhba1:0:1 begins. Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)SCSI: 3744: Path vmhba1:0:1 is already active. No action required. Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)WARNING: SCSI: 4614: Manual switchover to vmhba1:0:1 completed successfully. Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu1:1025)StorageMonitor: 196: vmhba2:0:1:0 status = 2/0 0x6 0x29 0x0 The first message translates to 'NOT READY: The LUN addressed cannot be accessed'. The kernel then initiates a failover FROM hba2 TO hba1. During events such as this, you may also see a message: 'Retry (unit attn)' this means that the lun has outstanding I/Os that will be lost. OK - now below is an example of some bad stuff. Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu0:1099)StorageMonitor: 196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0 Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)SCSI: vm 1040: 109: Sync CR at 0 Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: SCSI: 119: Failing I/O due to too many reservation conflicts Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: Vol3: 611: Couldn't read volume header from vmhba1:0:3:1: SCSI reservation conflict Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)FSS: 390: Failed with status SCSI reservation conflict for f530 28 1 493dc0c4 cc70d928 1f009457 e21d5c29 0 0 0 0 0 0 0 Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: Fil3: 1791: Failed to reserve volume f530 28 1 493dc0c4 cc70d928 1f009457 e21d5c29 0 0 0 0 0 0 0 Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)FSS: 390: Failed with status SCSI reservation conflict for f530 28 2 493dc0c4 cc70d928 1f009457 e21d5c29 4 1 0 0 0 0 0 I have truncated a lot but as we saw in the VMFS bit when a volume is reserved other hosts will attempt up to 80 times to get to the disk. Here we see that this host has reached this limit and its I/Os have failed. Generally reservation issues result due to many concurrent meta-data updates; too many snapshots, multiple power on/off operations, bad scripts (like overuse of vdf) or even worse a shonky array. In the event that you need more verbose (!) information than is contained within the vmkernel log when troubleshooting SCSI events, you can use the storagemonitor tool. [root@esx35-1 root]# /usr/lib/vmware/bin/storageMonitor Timestamp World adapter id lun Command Error Message ========= ===== ======= === === ======= ============= 0:00:00:37.166 01024 vmhba2 000 031 [0x28]READ(10) [0x5:0x21:0x0]ILLEGAL REQUEST: Logical block address out of range[0x 0] 0:00:00:38.095 01024 vmhba2 000 031 [0x8 ]READ(06) [0x5:0x21:0x0]ILLEGAL REQUEST: Logical block address out of range[0x 0] Messages in the storagemonitor output are formatted in the following manner: Mode:worldId:adapterName:id:lun:senseKey:additionalSenseCode:AdditionalSenseQualifier: Storagemonitor is configured by its default config file /etc/vmware/storagemonitor.conf References: http://www.vmprofessional.com/index.php?content=resources Storage VMotion Objectives Knowledge Describe Storage VMotion operation Explain implementation process for Storage VMotion Identify Storage VMotion use cases Understand performance implications for Storage VMotion Skills and Abilities Use Remote CLI to perform Storage VMotion operations o Interactive mode o Non-interactive mode Implement Storage VMotion based on various use cases o Migration of all virtual disks to target storage location o Migration of virtual disks to independent target storage locations Storage VMotion (SVmotion) is a feature which allows you to move a VMs files from one datastore to another - without any downtime on the VM itself. Without SVmotion you would need to power off a VM in order to relocate its constituent files, which depending on the size of the VM and your underlying storage infrastructure could take many hours... In ESX 3.5 SVmotion requires a valid Vmotion license, and is only supported between iSCSI & FC datastores. During the SVmotion process there will be temporary requirements for additional memory (double the VM's current allocation of physical memory for self vmotion) and additional disk space, for both the snapshots created and the temporary doubling of the vmdk space on the datastores. The virtual disks themselves must be eligible for snapshots (i.e. Not persistent or physical compatibility RDMs). Up to 4 concurrent SVmotions are supported per datastore at any one time, where the datastore is either the source or destination. SVmotion in ESX3.5 leverages snapshot technology to copy the VM's disks. All I/O for the SVmotion process is contained within the vmkernel and the kernel uses the NFC (Network File Copier) to move the data. SVmotion copies a VM's Working directory; the .vmx file, and log files the vswp file (if stored in the working directory) and any other files that may reside in the working directory. Virtual disks may be copied to specific datastores, i.e. You may move 1.vmdk to datastore 1 and 2.vmdk to datastore2. SVmotion is not supported in the VI client, and must be implemented via either the RCLI or the VIMA appliance. SVmotion is typically leveraged for the following use cases: Changing storage tiers for a particular VM o A VM on tier 2 storage may not be performing acceptably, or a VM on tier 1 storage may not require or utilize the benefits of the faster disk. Implementing a new array o Easy , zero downtime migration to the new storage When performing storage maintenance o Migrate to unaffected LUNS The SVmotion process works as follows: 1. Copy VM's working directory (Home folder) to destination datastore 2. Performs self Vmotion to new working directory 3. Creates snapshot of VM's disk(s) 4. Copies the base vmdk's to new location 5. Re-parents the snapshots to the relocated vmdk's on destination datastore 6. Merges snapshots to destination vmdk's 7. Deletes source data The process itself is reasonably straight forward. The only point I will cover more is the re-parenting of the snap. Effectively the vm is updated to change the snapshots base image (the vmdk) from the source vmdk to the destination vmdk. C:\Program Files\VMware\VMware VI Remote CLI\bin>SVmotion.pl --interactive Entering interactive mode. All other options and environment variables will be ignored. Enter the VirtualCenter service url you wish to connect to (e.g. https://myvc.mycorp.com/sdk, or jus t myvc.mycorp.com): vc-lab.vmware.lab Enter your username: administrator Enter your password: Attempting to connect to https://vc-lab.vmware.lab/sdk. Connected to server. Enter the name of the datacenter: MyDataCenter Enter the datastore path of the virtual machine (e.g. [datastore1] myvm/myvm.vmx): [VMclones] vc-fs01/vc-fs01.vmx Enter the name of the destination datastore: LAB_VMs You can also move disks independently of the virtual machine. If you want the disks to stay with the virtual machine, then skip this step.. Would you like to individually place the disks (yes/no)? yes Enter the datastore path of the disk you wish you place (e.g. [datastore1] myvm/myvm.vmdk): [VMclones] vc-fs01/vc-fs01.vmdk Enter the name of the destination datastore: LAB_VMs Would you like to place another disk (yes/no)? yes Enter the datastore path of the disk you wish you place (e.g. [datastore1] myvm/myvm.vmdk): [VMclones] vc-fs01/vc-fs01_1.vmdk Enter the name of the destination datastore: ISO_Library Would you like to place another disk (yes/no)? no Performing Storage VMotion. 0% |---------------------------------------------------------------------- -------------------------- ----| 100% ########################################################################## ########################## Storage VMotion completed successfully. Disconnecting. C:\Program Files\VMware\VMware VI Remote CLI\bin> Above is an example of running SVmotion.pl in interactive mode. As you see, you are guided through the process. Below, is an example of using the command in non-interactive mode. When in non-interactive mode, the command does not provide the progress bar we saw before, nor does it provide any output when the command successfully completes. The %errorlevel% variable in windows should return 0 if the command completed successfully. C:\Program Files\VMware\VMware VI Remote CLI\bin>SVmotion.pl - url=https://vc-lab.vmware.lab/sdk -userna me=administrator -password=yea_right! -datacenter="MyDatacenter" -- vm="[LAB_VMs] vc-fs01/vc-fs01.vmx:VMclone s" C:\Program Files\VMware\VMware VI Remote CLI\bin> References: http://www.sun.com/storage/white-papers/wmware-storage-vmotion.pdf Snapshot LUNS Objectives Knowledge Identify tools and steps necessary to manage replicated VMFS volumes o Resignaturing o Snapshot LUNs Skills & Abilities Manage RDMs in a replicated environment o Virtual compatibility mode o Physical compatibility mode Use esxcfg-advcfg o Set Resignaturing and Snapshot LUN options A snapshot LUN from an ESX servers perspective is one who's volume signature (Signature = LUN id + Disk Serial Number) which is stored in the VMFS header of a disk does not match the characteristics of the lun that is being mounted. For example you have a mission critical lun in your array. You decide to create an array based Clone of this lun (a clone is an identical block level copy of a lun) in the event that the primary lun dies. I shouldn’t have said that - the lun just died! But that’s OK right? I have a clone... So I go ahead and mask the clone lun out to my esx server expecting things to pick up where they left off but no…it just doesn’t appear. Because the lun is a Block Level copy of the lun, all the metadata including the volume signature are copied. And as it is a different physical lun the disk identifier and the signature don’t match up - it’s a snapshot lun. If you attempt to mount a snapshot lun, you would see the following (or V similar) in /var/log/vmkernel Apr 24 18:07:55 esx35-1 vmkernel: 0:00:06:39.408 cpu0:3616)ALERT: LVM: 4482: vmhba1:0:5:1 may be snapshot: disabling access. See resignaturing section in SAN config guide. There are a couple of ways to address this: LVM.DisallowSnapshotLUN - the default setting for this is 1 [root@esx35-1 /]# esxcfg-advcfg -g /LVM/DisallowSnapshotLUN Value of DisallowSnapshotLun is 1 What this means is that when ESX encounters a snapshot lun, it will refuse to mount it. If we change the value to 0 then it will ignore the fact that the lun has an invalid signature and mount it. The name is a bit of a negative, think of it and its value as 'do not allow snaps / disabled' which is 'allow snaps' [root@esx35-1 /]# esxcfg-advcfg -s 0 /LVM/DisallowSnapshotLUN Value of DisallowSnapshotLun is 0 LVM.EnableResignature - default value for this is 0 [root@esx35-1 /]# esxcfg-advcfg -g /LVM/EnableResignature Value of EnableResignature is 0 EnableResignature does exactly what it says….it will write a new signature to the disk. By default it is off, as above. To enable it, set to 1 as below. [root@esx35-1 /]# esxcfg-advcfg -s 1 /LVM/EnableResignature Value of EnableResignature is 1 Note: When you turn on LVM.EnableResignature the value for LVM.DisallowSnapshot is ignored - whatever it may be. When you allow a snapshot lun to be mounted, any VMs on this lun which are part of your VC inventory will not need to be re-added to the inventory. However if you resignature, then any VM's on this lun will need to be re-added to the inventory. When you resignature a lun, the VMFS label of the lun is changes to 'snap-XXXXXX-<OLD_NAME>' where XXXXXX is a system generated number. So when is it right to do 1 or the other? The general rule is this: if the lun is NEVER coming back, then you can safely set DisallowSnapshotLUN to 0 (which allows snap luns), whereas if the original lun is coming back (some more scenarios soon) then your best to write the new signature via EnableResignature. I should also say that it is generally not a good idea to allow snapshot luns as a permanent solution. Get your LUNs back by allowing snaps, then SVmotion then over to another lun. Should you need to enable resignature, turn it on, have it do its thing, then turn it off. Remember that these settings (both allow snaps and resignature) are set on a per host basis, and they are either on or not) Lets think of some scenarios: 1. Use of array based snapshots as a backup policy. You have a need to recover some data from a point in time captured by the snapshot. When you mount the snap on an ESX host, it will still see the original lun. If you allowed the snap to be mounted without resignaturing, then you would have 2 instances of the lun - confusion and corruption anyone? 2. Use of array based replication to a DR site You fail over to the DR site. If these ESX servers are isolated from the storage perspective from the prod esx hosts, you could enable the snap lun as from the DR hosts perspective the other luns are not going to appear anytime soon. 3. An array firmware upgrade/something dodgy happens on your array to cause the LUN id's to change Sometimes, something on the array will change that will cause the LUN id's to change, and the luns are perceived as snapshot luns Even more stuff on snaps. There is a new setting in 3.5 called SCSI.CompareLUNNumber If you set this to 0 then ESX will only look at the disk serial (vs lun id + serial) when making snap decisions. If your array uses NAA (National Address Authority) then the lun id is not used to identify the lun, but the NAA. This gets you around circumstances where you have presented a lun to multiple hosts, and each host sees the lun with a different LUN id. Managing Replicated RDMs The above issues with snap luns and RDMs is somewhat simpler. As an RDM is essentially a .vmdk that acts as a pointer to a physical san lun, so when you replicate a VMFS volume, with a guest that has an RDM the details contained within that RDM will be invalid. So dealing with replicated RDMs is as simple as removing the RDM from the guest, and re-adding the RDM pointing to the replicated lun. References: VMFS Volume Management http://www.vmware.com/files/pdf/vmfs_resig.pdf A Few Technical Threads - Part 2: VMFS Resignaturing http://virtualgeek.typepad.com/virtual_geek/2008/08/a-few-technic-1.html Multipathing Objectives Knowledge Explain the use cases for round-robin load balancing Skills & Abilities Perform advanced multi-pathing configuration o Configure multi-pathing policy Configure round-robin behavior using command-line tools o Manage active and inactive paths Multipathing in ESX is primarily designed for availability, rather than increasing throughput to individual LUNS. ESX 3.5 supports 3 multipathing policies, Fixed, Most Recently Used(MRU) & Round Robin (RR). I will briefly describe Fixed & MRU then spend a little more time on RR. Fixed Fixed is used with Active/Active (AA)arrays. ESX classes an AA array as one does not have the concept of lun ownership, that is any controller can send IO to any lun. When using the fixed policy, a path is defined as the preferred path which will always be used to send IO to the given lun while it is available. If the preferred path fails, another path will be chosen and used until the preferred path becomes available again when ESX will fail back to it. MRU Most Recently Used is used against Active/Passive (AP) arrays. An AP array is one that has the concept of LUN ownership. That is controllers in the array 'own' specific luns and only the owning controller may send IO to that LUN. MRU does not have the concept of preferred paths, only active paths. IO to a given lun will always be sent through the active path, and if that path fails another will be chosen. This path will then become the active path and will be used for all IO until it fails. When using Fixed or MRU ESX can only use a single path to send IO to a lun. Throughput issues are generally addressed by 'poor mans load balancing'. OK - Lets imagine that this is an AA array, and so we are using Fixed. For each lun, we have 4 paths. Lets also imagine that the IO characteristics for each lun are the same. vmhba1:0:1 vmhba1:0:2 vmhba1:0:3 vmhba1:0:4 vmhba2:0:1 vmhba2:0:2 vmhba2:0:3 vmhba2:0:4 vhmba1:1:1 vhmba1:1:2 vhmba1:1:3 vhmba1:1:4 vmhba2:1:1 vmhba2:1:2 vmhba2:1:3 vmhba2:1:4 If we set the active paths as the bold paths above, we have split the load across both HBA's and both controllers. Pretty simple? Not very real world though….You are really trying to ensure that you are distributing your IO across as many physical paths, and array controllers as possible, to reduce bottlenecks at any one point in the infrastructure. Remember that all I/O basically goes from 1 queue to another: from the guest queue (virtual scsi adapter), lun queue (vmfs volume), hba queue, across the fabric (or whatever medium), to the array port queue, (hopefully then cached), array backplane queues, then finally disk queues…. If the array were AP, and we are using MRU policy things change a little. As MRU has no concept of preferred paths the kernel will use which ever path it happens to discover the luns on first. This is a problem, as when the kernel boots and discovers storage, it is typical for ALL luns to be discovered via one hba before another, and so you get all your luns being accessed via a single hba. One way to address this is to configure your multipathing policy via the command line, and use the vml identifiers for a lun, rather than their runtime addresses. VML addresses for luns are persistent, that is they do not change across reboots whereas runtime addresses may change. Runtime (also known as canonical addresses) take the following format: vmhab2:0:13 where vmhba2 refers to the hba accessing the lun, :0 is the storage processor (controller) in the array the lun is via and :13 is the lun id. VML addresses may be found under /vmfs/devices/disks [root@esx35-1 /]# ls -l /vmfs/devices/disks/ total 993127116 -rw------- 1 root root 36270243840 Sep 7 04:18 vmhba1:0:0:0 -rw------- 1 root root 213825024 Sep 7 04:18 vmhba1:0:0:1 -rw------- 1 root root 5371107840 Sep 7 04:18 vmhba1:0:0:2 -rw------- 1 root root 4293596160 Sep 7 04:18 vmhba1:0:0:3 -rw------- 1 root root 26386698240 Sep 7 04:18 vmhba1:0:0:4 -rw------- 1 root root 2146765824 Sep 7 04:18 vmhba1:0:0:5 -rw------- 1 root root 567512064 Sep 7 04:18 vmhba1:0:0:6 -rw------- 1 root root 23565394944 Sep 7 04:18 vmhba1:0:0:7 -rw------- 1 root root 106896384 Sep 7 04:18 vmhba1:0:0:8 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:1:0 -> vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:11:0 -> vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:0:11:1 -> vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:12:0 -> vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:24:0 -> vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:0:24:1 -> vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:25:0 -> vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:0:25:1 -> vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:0:31:0 -> vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:1:0 -> vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:11:0 -> vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:1:11:1 -> vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:12:0 -> vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:24:0 -> vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:1:24:1 -> vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:25:0 -> vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36 lrwxrwxrwx 1 root root 60 Sep 7 04:18 vmhba2:1:25:1 -> vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1 lrwxrwxrwx 1 root root 58 Sep 7 04:18 vmhba2:1:31:0 -> vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572 -rw------- 1 root root 16106127360 Sep 7 04:18 vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36 -rw------- 1 root root 16106127360 Sep 7 04:18 vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36 -rw------- 1 root root 10733924864 Sep 7 04:18 vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1 -rw------- 1 root root 16106127360 Sep 7 04:18 vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36 -rw------- 1 root root 214748364800 Sep 7 04:18 vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36 -rw------- 1 root root 214745544704 Sep 7 04:18 vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1 -rw------- 1 root root 214748364800 Sep 7 04:18 vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36 -rw------- 1 root root 214745544704 Sep 7 04:18 vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1 -rw------- 1 root root 0 Sep 7 04:18 vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572 [root@esx35-1 /]# To identify a vml address for a particular lun, map the link of the runtime address to the vml address. For e.g. The vml address of vmhba2:0:24 maps to vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36 VML addresses may also be discerned via esxcfg-mapth -lv When you use these vml addresses with esxcfg-mpath the policies you define will persist across reboots, and will ensure that your paths will be split as you defined. Round Robin Load Balancing Round Robin (RR) is provided to allow administrators to create policies that will split I/O to a particular lun over multiple paths. One of the limitations of Fixed and MRU is that they will only utilize a single path to send I/O to a lun, whereas with RR the admin can create a policy to send the I/Os over multiple hbas to multiple array ports. RR can be quite effective, especially against A/A arrays when used correctly. It will be most effective when it is used either across the board for all luns, as if you have paths that are not configured to switch the I/Os from these luns will always go into the same HBA queue, which could increase the time it takes for the RR I/Os to traverse that queue, vs those that are not competing with other luns who do not split the I/Os across hbas. In the below example, we will configure RR for vmhba2:0:24, set the policy to switch after 100 I/Os and to switch to any available hba. [root@esx35-1 /]# esxcfg-mpath -p custom --lun=vmhba2:0:24 Setting vmhba2:0:24 policy to custom [root@esx35-1 /]# esxcfg-mpath -C 100 --lun=vmhba2:0:24 Setting Custom policy values [root@esx35-1 /]# esxcfg-mpath -H any --lun=vmhba2:0:24 Setting Custom policy values [root@esx35-1 /]# esxcfg-mpath -q --lun=vmhba2:0:24 Disk vmhba2:0:24 /dev/sde (204800MB) has 2 paths and policy of Custom: maxCmds=100 maxBlks=2048 hbaPolicy=any targetPolicy=mru FC 16:0.0 2100001b32108fc5<->200600a0b80f9e2c vmhba2:0:24 On preferred FC 16:0.0 2100001b32108fc5<->200600a0b80f9e2d vmhba2:1:24 On active [root@esx35-1 /]# The options you have for configuring a RR policy are: When to switch paths: This can be based upon the number of I/Os (using -C default 50) or the max number of blocks (with -B default 2048) Which HBA to select: (using -H) may be either; preferred, mru, any or minq. Minq selects the hba with the least outstanding I/Os Which Target to select: (using -T) chooses which target to use, can be; mru, any or preferred When configuring RR, you may choose the default policy (done with esxcfg-mpath -p rr -- lun=vmhba2:0:24) which is effectively '-H any -T mru' There are also 2 advanced options relevant to RR. /Disk/SPCmdsToSwitch - The default value for the number of I/Os to split on, defaults to 50 /Disk/SPBlksToSwitch - The default value for the number of blocks to split on, defaults to 2048 References: http://www.vmware.com/pdf/vi3_35_25_roundrobin.pdf Using IP Storage Objectives Skills & Abilities Configure NFS datastores using command-line tools Configure iSCSI hardware and software initiators using command-line tools Configure storage network segmentation o FC Zoning o iSCSI/NFS VLAN Configure iSCSI/NFS security options NFS Datastores One of the forms of shared storage supported by ESX is NFS. ESX supports NFS v3 over TCP. Adding NFS mounts to ESX is quite simple. [root@esx35-1 root]# esxcfg-nas -a -o vnfs -s /iso MyNFSDatastore Connecting to NAS volume: MyNFSDatastore MyNFSDatastore created and connected. [root@esx35-1 root]# The main pre-requisite step is that the NFS mount must be exported with the 'no_root_squash' option. The default behaviour of NFS is to deny the user root from accessing NFS mounts. There is an experimental option available to change the delegate user ESX uses to access NFS mounts . It can be changed (when in maintenance mode) via the 'Security Profile' tab. The number of NFS mounts ESX supports by default is 8. This value can be changed by adjusting the value in NFS.MaxVolumes (limit of 32) NFS storage traffic should always be separated from regular LAN traffic. A dedicated IP storage network should be implemented where possible for both performance/throughput and improved security. NFS traffic is not encrypted and as such an attacker could use any data sniffed from the network to reconstruct data as it traverses the network. If it is not possible to dedicate a physical network to NFS at the very least a VLAN should be leveraged to segregate the layer 2 traffic from regular LAN traffic. An another mechanism to protect data on NFS datastores is to mount the datastore as Read Only. This is acheived when adding an NFS mount via the GUI and checking the 'Read Only' box. iSCSI Datastores <I have updated it a little to include more cli goodness...> <Also bit more best practice fluff> iSCSI is one of the supported forms of shared storage that can be used by ESX hosts. Unlike NFS - iSCSI is block level storage, and volumes are typically formatted as VMFS (unless being used as RDM's). iSCSI storage networks may be accessed by ESX hosts via a Hardware Initiator or a Software Initiator. A Hardware Initiator acts as a SCSI device to the host. The host will issue SCSI commands directly to the HBA which then encapsulates those into iSCSI packets and sends them to the targets (arrays). A Software Initiator is a driver that runs in both the kernel memory space, and the Service Console. This driver takes the SCSI commands the kernel issue and then creates iSCSI packets, which are then sent to a vmk interface. The easy way to sum the differences is that a Hardware Initiator costs more $, but saves CPU cycles, the Software Initiator is free, but costs you CPU cycles. The other notable difference is that ESX can only perform boot from SAN via the hardware initiator. The general consensus of software vs hardware initiators is that you should generally only go for hardware if you require boot from san. With the speed of modern processors losing cycles for storage traffic is generally not a problem. How much CPU will you need for your iSCSI traffic? I get asked this question all the time, and you always find yourself saying - well how much I/O have you got :) I then suggest that on a busy server using software iSCSI you would want to make sure you had a core available for the cycles the software initiator will need. Obviously if you start enabling header digests checksums or GASP data digest checksums you would need MANY more cycles to handle this traffic. In terms of designing an iSCSI solution, you should remember that iSCSI is not just another tcp protocol. You are running your storage I/O over IP network, so ensuring that you always have as much bandwidth (with low latency) as possible is a good idea :) As a best practice we would always recommend that you implement a dedicated IP network for your storage traffic - you wouldn’t think twice about buying fibre switches when you implement FC storage right? And GbE switches don’t cost as much as FC switches….But - if you have a (very) tight budget, segregating the traffic with a VLAN would suffice. Try not to use cheap switches where you can, get switches with good port buffers, and enable flow control. Remember that iSCSI isnt really a WAN technology as the latency will really affect the overall performance. Remember that for ESX, iSCSI may only be run over GbE, and not 10GbE, nor do we support Jumbo Frames. People normally think that iSCSI is the poor cousin of FC - and while it is true that you cant get the same raw throughput - do you really need a gazillion IOPS? A well designed iSCSI solution, across multiple switches, with multiple uplinks, against multiple targets is a solid solution - that should meet pretty much all requirements. ESX supports up to 2 Hardware Initiators, and only 1 instance of the Software Initiator (vmhba32 generally). ESX does not support mixing Hardware & Software Initiators in the same server. The hardware initiator can be configured using the tool 'esxcfg-hwiscsi' and is used mainly to enable jumbo frames (MTU 9000). Additional parameters of the hardware initiator may be configured/queried using the 'vmkiscsi-tool' command. The software initiator is configured using the tool 'esxcfg-swiscsi'. Below, we will go through the steps to enable and configure the iSCSI software initiator. From creating the vmkernel port, enabling the initiator and setting targets. [root@esx35-1 root]# esxcfg-vswitch -a vSwitch2 [root@esx35-1 root]# esxcfg-vswitch -L vmnic3 vSwitch2 [root@esx35-1 root]# esxcfg-vswitch -L vmnic4 vSwitch2 [root@esx35-1 root]# esxcfg-vswitch -A iSCSI vSwitch2 [root@esx35-1 root]# esxcfg-vswitch -A iSCSI_SC vSwitch2 [root@esx35-1 root]# esxcfg-vswif -a -i 192.168.61.101 -n 255.255.255.0 -p iSCSI_SC vswif1 [root@esx35-1 root]# esxcfg-vswitch -v 50 -p iSCSI vSwitch2 [root@esx35-1 root]# esxcfg-vmknic -a -i 192.168.61.201 -n 255.255.255.0 iSCSI [root@esx35-1 root]# esxcfg-swiscsi -e Allowing software iSCSI traffic through firewall... Enabling software iSCSI... Using /usr/lib/vmware/vmkmod/iscsi_mod.o Module load of iscsi_mod succeeded. [root@esx35-1 root]# esxcfg-firewall -q swISCSIClient Service swISCSIClient is enabled. [root@esx35-1 root]# vmkiscsi-tool -D -a 192.168.61.16 vmhba32 IPAddr 192.168.61.16,Port=0 [root@esx35-1 root]# esxcfg-rescan vmhba32 Doing iSCSI discovery. This can take a few seconds ... Rescanning vmhba32...done. On scsi4, removing:. On scsi4, adding: 0:1. OK - Lets step through it. First we create a vSwitch and link a few vmnics to it. Then we create our port group for our vmkernel port and then a service console port. Then we create the vmk interface on our new port group iSCSI and the vswif interface on iSCSI_SC. Once we have the vmk & sc port we then enable the adapter using the 'esxcfg-swiscsi -e' command. Next we check that the SC firewall has been updated to allow iSCSI traffic. Finally we add the address of the iSCSI array and rescan the adapter. We can use vmkiscsi-tool to query which targets (arrays) we are connected to, and which luns are available [root@esx35-1 root]# vmkiscsi-tool -T -l vmhba32 ------------------------------------------- NAME : iqn.1998- 01.com.vmware:iscsi.example.crider ALIAS : DISCOVERY METHOD FLAGS : 0 SEND TARGETS DISCOVERY SETTABLE : 0 SEND TARGETS DISCOVERY ENABLED : 0 Portal 0 : 192.168.61.16:3260 ------------------------------------------- [root@esx35-1 root]# vmkiscsi-tool -L -l vmhba32 Target iqn.1998-01.com.vmware:iscsi.example.crider: ------------------------------------------- OS DEVICE NAME : vmhba32:0:1 BUS NUMBER : 0 TARGET ID : 0 LUN ID : 1 ------------------------------------------- [root@esx35-1 root]# We can also query the /proc filesystem for information about which LUNs are visible to the host [root@esx35-1 root]# cat /proc/scsi/vmkiscsi/* # iSCSI driver version: 22.214.171.124 variant (27-Jun-2005) # # SCSI: iSCSI: # Bus Tgt LUN IP address Port TargetName 0 0 1 192.168.61.16 3260 iqn.1998- 01.com.vmware:iscsi.example.crider [root@esx35-1 root]# Before we go into things a little deeper, lets go over a few pre-reqs. Aside from requiring a vmk port you also require Service Console connectivity (vswif) to the array. The Service Console runs an iSCSI daemon that is responsible primarily for authenticating any iSCSI sessions from ESX. There are 2 methods for target discovery, Send Targets (Dynamic) and Static Discovery. When using the software initiator the only supported discovery method is Send Targets. Send Targets is the simplest to configure, as all that is required is the IP address (and TCP port) of the array. When the initiator connects to the target, it queries for a list of targets and then attempts to mount them. Static Discovery is used with the hardware initiator and you supply the IP/Port of the array and the iSCSI Qualified Name (IQN) of the lun you are accessing. More on IQNs later. CHAP (Challenge Handshake Authentication Protocol) is a protocol used to secure iSCSI connections. iSCSI traffic is not encrypted, and as such an attacker could listen to the iSCSI traffic, and then pose as the ESX host to the iSCSI array and gain access to any luns presented to the ESX host. ESX (v3) only supports uni-directional chap, and can only store one set of CHAP credentials. To configure CHAP at the cmd line, you need to edit /etc/vmkiscsi.conf and enter the following lines: OutgoingUsername="iqn.1998-01.com.vmware:esx35-1-186fd4c7" OutgoingPassword="mypassword" IQNs are in the following format: iqn.1998-01.com.vmware:esx35-1-186fd4c7 iqn.<year>-<month>.<reverse dns name>:<servername>-<random number> The IQN and alias of your host may be queried with the following command: [root@esx35-1 root]# vmkiscsi-tool -I -l vmhba32 iSCSI Node Name: iqn.1998-01.com.vmware:esx35-1-186fd4c7 [root@esx35-1 root]# vmkiscsi-tool -k -l vmhba32 iSCSI Node Alias: esx35-1.vmware.lab [root@esx35-1 root]# Multipathing for iSCSI is actually pretty easy. You cannot *configure* multipathing for iSCSI luns - IP just takes care of it. It is similar to fibre channel, with one subtle exception, you only have 1 path (in this case a TCP session) to a TARGET. If your array is configured to host multiple iSCSI luns behind a single IP address - all traffic to that target will go through 1 TCP session. So how do you make this work for you? Well you configure your array to host the luns across as many ip interfaces as it will support, i.e. lun1 behind 192.168.61.11 lun2 behind 192.168.61.12 etc etc. This will allow the kernel to leverage multiple vmnics on the vSwitch (which we will see in the load balancing section in Networking). Really, you just need to remember that multipathing in iSCSI is about failover, and when failures occur in the network it is IP (routing) that ensures that the traffic gets to the array. Just as we saw in FC multipathing (poor mans) try to distribute the load across as many interfaces as you can, specifically for iSCSI it is about the array side as much (more?) than on the ESX side. Make sure you have multiple uplinks on your vSwitches and that your iSCSI luns are being hit by different target addresses (array IPs) References: The mother of all ESX iSCSI lessons... http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi- customers-using-vmware.html Zoning & Masking Objectives Skills & Abilities Configure LUN masking o Storage device o Host Set ESX Server host-side disk options While an in-depth discussion of FC topologies is not required for the VCDX, solid knowledge of FC concepts such as zoning & masking are required. I also have to say, deep knowledge of storage concepts does help immensely for VI admins. I generally tell my students if nothing else will make or break the performance of your VI - the storage will. It doesnt really matter how much memory or how many cores you have if your storage cannot handle the IOPS required by all the guests...but anyway - on with the show. FC Zoning Zones in a FC Fabric are at the surface similar to VLANs in a network environment. A Zone describes who can communicate with who-else at the FC2 layer (kinda layer 2 if you try to OSI it). There are (generally) 2 types of zones Hard & Soft but for our purposes they will both behave the same. Below I have thrown up a simple Redundant Switched Fabric environment. I have 2 servers 'Windows' and 'ESX' both with dual FC HBA's and both are connected to different FC switches. Also connected is a small FC array (I have used an EMC Clariion mainly as I had the shape handy). A fabric is really a single FC network, for SME's it is generally as above i.e. a single FC switch, but for larger enterprises it could be many FC switches cascaded together or a director. In my example we have 2 fabrics, Fabric 1 & Fabric 2. Zoning is configured at the Fabric (switch) level and is used to control which ports (and the device(s)behind them) can talk to which other ports. In Fabric 2 I would probably configure the zoning as follows: Zone 1 (SPB0_WINDOWS_HBA2): Port 0, Port 12 Zone 2 (SPA0_WINDOWS_HBA2): Port 0, Port 15 Zone 3 (SPB0_ESX_VMHBA2): Port 3, Port 12 Zone 4 (SPA0_ESX_VMHBA2): Port 3, Port 15 What I have done here is create 4 zones, and each zone provides access from a specific HBA to a specific array port. Another way I could have achieved this is: Zone 1 (ARRAY_WINDOWS_HBA2): Port 0, Port 12, Port 15 Zone 2 (ARRAY_ESX_VMHBA2): Port 3, Port 12, Port 15 The difference here is that while each HBA is zoned to see both array ports as in the previous example but here both array ports are zoned also. If you think about it, do servers HBA's need to see each other? In the same light generally array ports (Front End Ports) do not generally need to see each other. Therefore they do not need to be zoned together. If something doesn't need access to something else, you do not give it access right? From a simplistic standpoint, zoning describes what you can see on the fabric. Troubleshooting zoning from the VI admin's point of view is easy. Can you see the array? If yes (and specifically all ports of the array) then your problem is most likely not zoning. If you cannot see the array then zoning could well be the issue. LUN Masking LUN Masking, aka Selective Storage Presentation is an array level concept that describes what a given initiator (server, or really the HBA in the server) can access on the array. Zoning lets you see the array, Masking defines what on the array you can see. Looking at my picture above, we have four LUNS, LUNs 1,2,3 & 4. we also have 2 servers, a windows box and an ESX box. If we didnt have masking, then our windows server would have access to the the VMFS luns - which is generally a no-no. Imagine what would happen if this windows box were to write a disk signature to the VMFS volumes... So, on the array we would configure LUN Masking to provide each server with access to only the luns they require. You could imagine the masking configuration on the array to look like this: WINDOWS: LUN 1, LUN 3 ESX: LUN 2, LUN 4 Every array vendor implements masking in different ways, but the end result is the same, Masking defines which luns on the array are presented to which hosts. Who has access to what. If we expand our examples out to a little more real world scenario, we will likely have multiple ESX hosts that require access to our VMFS volumes. The only thing that changes is that we will Mask the luns out to multiple hosts. That is, configure the array to present the luns to multiple hosts - aka clustering. Now when most people think about masking, our definition is limited to the array. However, LUN masking may also be configured at the host level. Imagine that you have configured your array to present a LUN out to your 5 ESX hosts, however you do not want 1 of the 5 ESX hosts to have access to this lun. You could configure masking at the host level to prevent the ESX host from 'seeing' the LUN in question. We use the advanced option Disk.MaskLUNs. This can be set via the GUI 'Advanced Settings' -> 'Disk' -> MaskLUNs As this is VCDX we do things the cmd line way :) Setting advanced options is done via the 'esxcfg- advcfg' command. In the below example, I will mask LUN 31 from my host esx35-1. [root@esx35-1 config]# esxcfg-rescan vmhba2 Rescanning vmhba2...done. On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31. On scsi2, adding: 0:1 0:11 0:12 0:24 0:25 0:31. [root@esx35-1 config]# esxcfg-advcfg -g /Disk/MaskLUNs Value of MaskLUNs is [root@esx35-1 config]# esxcfg-advcfg -s "vmhba2:1:31; vmhba2:0:31;" /Disk/MaskLUNs Value of MaskLUNs is vmhba2:1:31; vmhba2:0:31; [root@esx35-1 config]# esxcfg-rescan vmhba2 Rescanning vmhba2...done. On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31. On scsi2, adding: 0:1 0:11 0:12 0:24 0:25. [root@esx35-1 config]# First, we do a rescan on vmhba2, then using 'esxcfg-advcfg -g' we query the value for /Disk/MaxLUNs which is not set. We then set the value using the '-s' flag. Once the value is set, we then rescan, and see that LUN31 is no longer visible to the host. Modifying Kernel Module Settings Objectives Skills & Abilities Use esxcfg-module o Modify storage adapter settings o Identify and load/unload modules o Get module status Use proc nodes to identify driver configuration and options First of all, let me start by saying that you generally modify vmkernel parameters at your own risk! I always tell my students that you don’t just read some blog and go away and tinker with the vmkernel module params! Kind of like how M$ tells you “you modify the registry at your own risk” - well we really mean it here! you have been warned :) vmkernel parameters can be seen in a few ways, some can be seen by querying the /proc filesystem, or by using the esxcfg-module command. kernel module parameters are stored in /etc/vmware/esx.conf Below, we are querying the /proc filesystem to obtain information on one of the attached fibre channel interfaces of the ESX host by calling the module information. [root@esx35-1 qla2300]# pwd /proc/scsi/qla2300 [root@esx07 qla2300]# cat 2 QLogic PCI to Fibre Channel Host Adapter for QLA2340: Firmware version: 3.03.19, Driver version 7.08-vm33.1 Boot Code Version: BIOS : v1.25, Fcode : v0.00, EFI : v0.00 Entry address = 0x8c44f4 HBA: QLA2312 , Serial# J06083 Request Queue = 0x41610000, Response Queue = 0x41630000 Request Queue count= 512, Response Queue count= 512 Total number of active commands = 0 Total number of interrupts = 22704892 Total number of active IP commands = 0 Total number of IOCBs (used/max) = (0/600) Total number of queued commands = 0 Device queue depth = 0x20 Number of free request entries = 328 Number of mailbox timeouts = 0 Number of ISP aborts = 0 Number of loop resyncs = 489 Number of retries for empty slots = 0 Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0 Host adapter:loop state= <READY>, flags= 0x860813 Dpc flags = 0x0 MBX flags = 0x0 SRB Free Count = 4096 Link down Timeout = 030 Port down retry = 015 Login retry count = 030 Loop down time = 120 Loop down abort time = 090 Commands retried with dropped frame(s) = 0 Configured characteristic impedence: 50 ohms Configured data rate: 1-2 Gb/sec auto-negotiate NPIV Supported : No SCSI Device Information: scsi-qla0-adapter-node=200000e08b0d63d3; scsi-qla0-adapter-port=210000e08b0d63d3; FC Port Information: scsi-qla0-port-0=200600a0b80f9e2b:200600a0b80f9e2c:010100:81; scsi-qla0-port-1=200600a0b80f9e2b:200600a0b80f9e2d:010600:84; SCSI LUN Information: (Id:Lun) * - indicates lun is not registered with the OS. ( 0: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:81, ( 0: 1): Total reqs 15571570, Pending reqs 0, flags 0x0, 0:0:81, ( 0:10): Total reqs 7203676, Pending reqs 0, flags 0x0, 0:0:81, ( 0:31): Total reqs 10685, Pending reqs 0, flags 0x0, 0:0:81, ( 1: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:84, ( 1: 1): Total reqs 21271, Pending reqs 0, flags 0x0, 0:0:84, ( 1:10): Total reqs 21273, Pending reqs 0, flags 0x0, 0:0:84, ( 1:31): Total reqs 10642, Pending reqs 0, flags 0x0, 0:0:84, Bus:Function = 0x6:0x8 [root@esx35-1 qla2300]# We are going to use the good ol' queue depth parameters as the primary example. I have highlighted in the above example, 0x20 hex is 32 decimal as seen above. Using the esxcfg-module command, we can see which modules the kernel has loaded. In the below example, the qla2300_707_vmw module is used for our Qlogic HBA. [root@esx35-1]# esxcfg-module -l Device Driver Modules Module Enabled Loaded vmklinux true true tg3 true true aacraid_esx30 true true qla2300_707_vmwtrue true e1000 true true lvmdriver true true vmfs3 true true etherswitch true true shaper true true tcpip true true cosShadow true true migration true true nfsclient true true deltadisk true true vmfs2 true true [root@esx35-1]# esxcfg-module -g qla2300_707_vmw qla2300_707_vmw enabled = 1 options = '' [root@esx35-1]# We can use the -g switch to query for any set module options. The output of the above command shows us that we do not have any options set. To set a kernel module parameter use the -s switch. [root@esx35-1 root]# [root@esx35-1 root]# esxcfg-module -s ql2xmaxqdepth=64 qla2300_707_vmw [root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64' What you see above demonstrates a few things. In the first command, we set a kernel parameter for the Qlogic module that increases the max queue depth to 64. We then query the module for options and see that the value is indeed set to 64. [root@esx35-1 root]# esxcfg-module -s qlport_down_retry=60 qla2300_707_vmw [root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw qla2300_707_vmw enabled = 1 options = 'qlport_down_retry=60' Next we set another parameter which increases the retry time from 30secs to 60secs. When we query the module options we see that we have lost the ql2xmaxqdepth value. Module parameters are not added cumulatively. When you need to set multiple parameters they must be set in the same command, as see in the following commands. [root@esx35-1 root]# esxcfg-module -s "ql2xmaxqdepth=64 qlport_down_retry=60" qla2300_707_vmw [root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64 qlport_down_retry=60' We see that these values we have set, are written into the esx.conf file. [root@esx35-1 root]# grep -i qla /etc/vmware/esx.conf /device/006:01.0/name = "QLogic Corp QLA231x/2340 (rev 02)" /vmkernel/module/qla2300_707_vmw.o/options = "ql2xmaxqdepth=64 qlport_down_retry=60" Now notice that when we check the runtime settings of the HBA, that these new values are not updated. [root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i device Device queue depth = 0x20 [root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i "down retry" Port down retry = 015 OK - Side note here. According to the VM doco and a few other sources, you need to run esxcfg-boot to update the initrd use by the host in order for the kernel settings to take effect. I did not bounce my host, and found that the settings were indeed applied as you can see from the below commands run after the reboot. [root@esx07 root]# esxcfg-module -g qla2300_707_vmw qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64 qlport_down_retry=60' [root@esx35-1 root]# grep -i qla /etc/vmware/esx.conf /device/006:01.0/name = "QLogic Corp QLA231x/2340 (rev 02)" /vmkernel/module/qla2300_707_vmw.o/options = "ql2xmaxqdepth=64 qlport_down_retry=60" [root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i device Device queue depth = 0x40 [root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i "down retry" Port down retry = 060 However, for completeness we will go for the *correct* method and update the initrd. First backup your old initrd in /boot (both the standard initrd and the debug mode intird) in the event that you need to roll back, then run the 'esxcfg-boot -b' command. [root@esx35-1 root]# cd /boot [root@esx35-1 boot]# cp /boot/initrd-2.4.21-57.ELvmnix.img-dbg /boot/OLD_initrd-2.4.21-57.ELvmnix.img-dbg [root@esx35-1 boot]# cp /boot/initrd-2.4.21-57.ELvmnix.img /boot/OLD_initrd-2.4.21-57.ELvmnix.img [root@esx35-1 boot]# ls -l total 38717 -rw-r--r-- 1 root root 26806 Aug 13 2008 config-2.4.21- 57.ELvmnix drwxr-xr-x 2 root root 1024 Jun 30 21:40 grub -rw-r--r-- 1 root root 7708363 Sep 3 10:55 initrd-2.4.21- 57.ELvmnix.img -rw-r--r-- 1 root root 8950139 Sep 3 10:55 initrd-2.4.21- 57.ELvmnix.img-dbg -rw-r--r-- 1 root root 666567 Jun 30 21:34 initrd-2.4.21- 57.ELvmnix.img-sc -rw-r--r-- 1 root root 615 Jun 30 21:53 kernel.h drwx------ 2 root root 12288 Jun 30 21:27 lost+found -rw-r--r-- 1 root root 7708363 Sep 3 10:56 OLD_initrd-2.4.21- 57.ELvmnix.img -rw-r--r-- 1 root root 8950139 Sep 3 10:56 OLD_initrd-2.4.21- 57.ELvmnix.img-dbg lrwxrwxrwx 1 root root 28 Jun 30 21:53 System.map -> System.map-2.4.21-57.ELvmnix -rw-r--r-- 1 root root 621361 Aug 13 2008 System.map-2.4.21- 57.ELvmnix -rwxr-xr-x 1 root root 3452300 Aug 13 2008 vmlinux-2.4.21- 57.ELvmnix -rw-r--r-- 1 root root 1374902 Aug 13 2008 vmlinuz-2.4.21- 57.ELvmnix [root@esx35-1 boot]# esxcfg-boot -b [root@esx35-1 boot]# ls -l total 38717 -rw-r--r-- 1 root root 26806 Aug 13 2008 config-2.4.21- 57.ELvmnix drwxr-xr-x 2 root root 1024 Jun 30 21:40 grub -rw-r--r-- 1 root root 7708337 Sep 3 10:57 initrd-2.4.21- 57.ELvmnix.img -rw-r--r-- 1 root root 8950126 Sep 3 10:57 initrd-2.4.21- 57.ELvmnix.img-dbg -rw-r--r-- 1 root root 666567 Jun 30 21:34 initrd-2.4.21- 57.ELvmnix.img-sc -rw-r--r-- 1 root root 615 Jun 30 21:53 kernel.h drwx------ 2 root root 12288 Jun 30 21:27 lost+found -rw-r--r-- 1 root root 7708363 Sep 3 10:56 OLD_initrd-2.4.21- 57.ELvmnix.img -rw-r--r-- 1 root root 8950139 Sep 3 10:56 OLD_initrd-2.4.21- 57.ELvmnix.img-dbg lrwxrwxrwx 1 root root 28 Jun 30 21:53 System.map -> System.map-2.4.21-57.ELvmnix -rw-r--r-- 1 root root 621361 Aug 13 2008 System.map-2.4.21- 57.ELvmnix -rwxr-xr-x 1 root root 3452300 Aug 13 2008 vmlinux-2.4.21- 57.ELvmnix -rw-r--r-- 1 root root 1374902 Aug 13 2008 vmlinuz-2.4.21- 57.ELvmnix [root@esx35-1 root]# shutdown -r now We can also use esxcfg-module to load or unload specific modules. In the below example we will unload the vmfs2 module from the kernel (which is generally not required) [root@esx035-1boot]# esxcfg-module -d vmfs2 [root@esx35-1 boot]# esxcfg-module -l Device Driver Modules Module Enabled Loaded vmklinux true true tg3 true true aacraid_esx30 true true qla2300_707_vmwtrue true e1000 true true lvmdriver true true vmfs3 true true etherswitch true true shaper true true tcpip true true cosShadow true true migration true true nfsclient true true deltadisk true true vmfs2 false true iscsi_mod true true Reboot your server and the module will not be loaded. NPIV Objectives Skills & Abilities Configure and use NPIV HBAs N Port Virtualization (NPIV) is a technique where a server can instantiate multiple World Wide Names (WWNs) onto a HBA. More specifically it allows multiple N_PORTS to share a physical N_PORT. ESX supports the use of NPIV only for VMs with RDMs, either Physical or Virtual. NPIV may not be used for VM traffic to a VMFS filesystem (i.e. vmdk files) In order to use NPIV the Fibre switches must support NPIV, and the FC HBAs in the ESX server must support NPIV (most 4Gb HBAs support NPIV) When a VM configured to use NPIV is powered on, ESX will create a virtual port (VPORT) on each HBA (max 4) which will be used for all traffic for that VMs IO to any attached RDMs. ESX will attempt to discover storage using each VPORT created for the VM. In order to use NPIV, your HBA's must support NPIV, as well as the FC switch. Most 4Gb Qlogic HBA's support NPIV, emulux HBA's must be running NPIV firmware. Any LUNs presented to VMs must be masked not only to the virtual WWN created for the VM, but also to the WWN of the HBAs in the ESX server(s). Once you have attached an RDM to your VM, you can then use the VI client to add WWN's for the VM that will be used for all traffic to the RDM. Right click VM -> 'Edit Settings' -> Options -> Fibre Channel NPIV Uncheck (if checked) Temporarily Disable NPIV for this VM and then select 'Generate New WWNs' These settings can only be changed while the VM is powered off. Virtual Center will then generate a new WWNN and multiple (4) WWPN for the VM. ESX instantiates a VPORT (a VPORT is perceived by the fabric as being a HBA - it has its own WWN) on each HBA (max 4) Once configured for a VM, you can regenerate or remove WWNs for a VM. Obviously this may cause the VM to lose access to the LUN. Below is an excerpt from the .vmx file of a vm that has been configured for NPIV. scsi0:1.present = "TRUE" scsi0:1.fileName = "myvm_1.vmdk" scsi0:1.mode = "independent-persistent" scsi0:1.deviceType = "scsi-hardDisk" wwn.node = "2839000c29000003" wwn.port = "2839000c29000004,2839000c29000005,2839000c29000006,2839000c29000007" wwn.type = "vc" References: http://pubs.vmware.com/vi35/fibre_san_config/esx_san_cfg_manage.8.17.html http://www.vmware.com/pdf/vi3_35_25_npiv_config.pdf .-. (-_-) (\x/) (-.-)
"Storage - DOC"