Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Storage - DOC

VIEWS: 522 PAGES: 34

									 .-.
(-_-)
(\x/)
(-.-)




Section 1 - Storage


VMFS Filesystem & Metadata

Objectives
Knowledge
    Describe the VMFS file system
           o Metadata
           o Multi-access and locking
           o Tree structure and files
           o Applicability to clustered environment
           o Journaling

Skills & Abilities
      Manage VMFS file systems using command-line tools


The VMFS (Virtual Machine Filesystem) is a proprietary filesystem created by vmware. It has been
created with the following goals:

           Optimized for (very) large files, up to 2Tb
           Be cluster aware. That is have multiple hosts have read/write access to the volume
            concurrently
           Be a journeled filesystem
           Provide a directory structure
           Provide sub-block allocation
           Use an LVM (Logical Volume Manager) to do cool stuff
           Amongst other things…..


VMFS is a cluster aware filesystem, that is you can have multiple hosts with concurrent read/write
access to a volume. One of the mechanisms for this is the file level locking mechanism implemented
by VMFS.
Traditional filesystems (think NTFS/ETX3) use a volume locking mechanism, that is when a host
access a volume it *locks* the volume. Once a volume is locked, it can only be accessed by the host
who has locked the volume. This locking mechanism is normally implemented by way of SCSI 2
Reservations (Exclusive Logical Unit Reservation).
As VMFS leverages a file level locking mechanism, generally no one server will lock a volume, but
rather the files located on that volume. This ensures that a file (think a VM - being just a bunch of
files) can only be opened (think run) by a single host at a time.
An integral part of this file level locking mechanism is VMFS metadata. Metadata contains certain
filesystem descriptors such as:

           Block Size
           Number of Extents
           Volume Capacity
           VMFS Version
           Volume Label
           VMFS UUID

Another crucial part of metadata are the file locks. File locks must be obtained when:

           A file is opened (e.g. powering on a VM)
           Creating a file (e.g. new VM/template)
           Deleting a file
           Changes to file ownership
           Access/Modification timestamps
           A file is grown (think thin disks & snaps)
           Creating/Deleting VMFS Volume
           Expanding a VMFS volume
           Resignaturing
           Using VDF
           And more….


Whenever metadata on a VMFS volume has to be updated, the VMFS volume must be reserved.
When an ESX host locks a VMFS volume it must obtain a SCSI2 Reservation of the actual lun/disk
hosting the VMFS volume. This reservation provides exclusive read/write access to the volume,
ensuring that only 1 ESX host may update metadata at a time (and reduce the likely hood of
metadata corruption). The (hopefully obvious) ramification of this is that other ESX hosts with access
to the disk lose the ability to send I/O to the disk while it is locked. In the event that a lun is locked
and a host attempts to send I/O while locked the following message will be logged in the vmkernel
log and the I/O will be retried up to 80 times.

Excerpt from /var/log/vmkernel

Apr 24 15:59:53 esx35-1 vmkernel: 5:14:57:01.939 cpu0:1083)StorageMonitor:
196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0
Apr 24 15:59:53 esx35-1 vmkernel: 5:14:57:01.939 cpu0:1041)SCSI: vm 1041:
109: Sync CR at 64
Apr 24 15:59:56 esx35-1 vmkernel: 5:14:57:04.982 cpu0:1151)StorageMonitor:
196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0
Apr 24 15:59:56 esx35-1 vmkernel: 5:14:57:04.982 cpu3:1041)SCSI: vm 1041:
109: Sync CR at 16
Apr 24 15:59:56 mel-esx-02 vmkernel: 5:14:57:05.050
cpu0:1161)StorageMonitor: 196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0
Apr 24 15:59:57 esx35-1 vmkernel: 5:14:57:06.047 cpu3:1041)SCSI: vm 1041:
109: Sync CR at 0
Apr 24 15:59:57 esx35-1 vmkernel: 5:14:57:06.047 cpu3:1041)WARNING: SCSI:
119: Failing I/O due to too many reservation conflicts

I have filtered a lot of repeats, what is good to look at above is:

24/0 0x0 0x0 0x0 This is the SCSI code for 'Reservation conflict: Device reserved by another
host'
then the next line

Sync CR at 64 This is the count down from the 80 retries and you see that the conflicts continue
to occur, the count goes down until the sad end:

Sync CR at 0
Failing I/O due to too many reservation conflicts

OK - so this is a little dramatic, but it shows a few things. That when a lun is locked the host will retry
a few (80) times to send the I/Os. That if it does not get released in time - bad things happen!

It is very normal for reservation conflicts to happen. As your VI grows - you add more hosts, run
more VM's, have more snapshots, VMotion more etc etc these things happen. The time is *should*
take to update metadata on a VMFS is in the vicinity of ~10microseconds so we are not talking about
huge periods of time the lun is locked. My example above was caused by a failure in my array where
a controller failed (but still appeared online) and the other controller did not tresspass the
luns…nasty stuff (though quite uncommon).

It is possible to administratively lock VMFS volumes using vmkfstools

[root@esx35-1 root]# vmkfstools -L reserve
/vmfs/devices/disks/vmhba2:1:11:1
[root@esx35-1 root]# vmkfstools -B /vmfs/devices/disks/vmhba2:1:11:1
Successfully broke LVM device lock for /vmfs/devices/disks/vmhba2:1:11:1
[root@esx35-1 root]#

As the lock is broken, we see the following message in the vmkernel log.

Sep 6 07:45:49 esx35-1 vmkernel: 0:00:06:55.314 cpu1:1033)LVM: 7433:
Device lock for
<(vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1, 21047328),
4a9e0381-cfc9b39f-cfe5-00151772654d> released

Using the esxcfg-info command we can see which luns are reserved. Even when we direct esxcfg-info
to display storage related information there is still a fair amount of output, grepping this can help.

[root@esx35-1 root]# esxcfg-info -s | grep -i Reserved
                        |----Is Reserved............................false
                        |----Is Reserved............................false
                        |----Is Reserved............................false
                        |----Is Reserved............................false
            |----Is Reserved........................................false
            |----Is Reserved........................................false
            |----Is Reserved........................................false
            |----Is Reserved........................................false




Querying Metadata

Metadata can be queried with the 'vmkfstools' command.

[root@esx35-1 root]# vmkfstools -P -h /vmfs/volumes/NewVMFS
VMFS-3.31 file system spanning 1 partitions.
File system label (if any): NewVMFS
Mode: public
Capacity 1.8G, 1.5G available, file block size 1.0M
UUID: 4a7724b1-06761321-6bcd-000c29b6325a
Partitions spanned (on "lvm"):
        vmhba0:1:0:1

The -P switch queries metadata, and the -h provides human-readable output.

Metadata may also be seen as files stored on each VMFS volume

[root@esx35-1       root]# ls -la /vmfs/volumes/NewVMFS/
total 290816
drwxr-xr-t          1   root            root        980 Aug      4 03:56 .
drwxr-xr-x          1   root            root        512 Aug      4 07:17 ..
-r--------          1   root            root      98304 Aug      4 03:56 .fbb.sf
-r--------          1   root            root   22708224 Aug      4 03:56 .fdc.sf
-r--------          1   root            root    6520832 Aug      4 03:56 .pbc.sf
-r--------          1   root            root   260374528 Aug      4 03:56 .sbc.sf
-r--------          1   root            root    4194304 Aug      4 03:56 .vh.sf


The metadata is stored as .sf files at the root of the volume.

.fdc.sf - file descriptor system file
.sbc.sf - sub-block system file
.fbb.sf - file block system file
.pbc.sf - pointer block system file
.vh.sf - volume header system file

Another important part of metadata is a region of the disk called the Heartbeat Region. This is an
area of the VMFS volume that ESX hosts accessing the volume will write their signatures to, as a way
of keeping file locks valid, and informing other ESX hosts also sharing the volume that they have
access to the volume. If a hosts signature is not updated within a certain period, or during a HA
event, other hosts can age the locks on the files in order to lock the files (and open them)
themselves.



New VMFS

Objectives

Knowledge
    Explain the process used to align VMFS partitions
    Describe the VMFS file system
          o Extents

Skills & Abilities
      Manage VMFS file systems using command-line tools

Manually creating aligned VMFS partition

When you create a new VMFS volume using the VI Client it is created aligned by default, however if
you want to use the Service Console to create your volumes you must go ensure you take a few
additional steps to ensure it is aligned.

Background
X86 systems require an MBR (Master Boot Record) at the beginning of any disk they use. The MBR
consumes the first 63 sectors of the disk and then the usable space of the disks start from there.




The picture above is a little stretched, but you can see what I mean, from the OS perspective, where
the partition starts is actually in the middle of an array block. This is an misaligned partition.




In this picture, we see that the OS voume is aligned with the array blocks.

It is worth noting that this is not a vmware thing - it is an X86 thing. This affects windows, linux,
<insert fav x86 OS here>

When you have a misaligned partition, it can greatly affect performance. The table below describes
vmware's numbers on the performance affect that misalignment can have. What you generally see,
is as the I/O increases the performance degrades...




What also amazes me is that when I teach about this issue in my DSA/FT classes or in the storage
classes I teach that this is news to most people. That such a serious problem (which is VERY
prevalent) is not widely known is an issue. It is not just storage nerds that need to know
this….anyway - on with the show.

Identify the disk to create the new volume on.

You need to identify the disk you are going to partition. fdisk is a cool partitioning tool, but is soo
easy to kill the wrong partition.

If it is a new disk, using fdisk is the easy way, run 'fdisk -l' and look for the disk without a partition
table.

[root@esx35-1 root]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
64 heads, 32 sectors/track, 10240 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot            Start            End       Blocks        Id    System
/dev/sda1   *                 1            100       102384        83    Linux
/dev/sda2                   101           2597      2556928        fb    Unknown
/dev/sda3                  2598           7597      5120000        83    Linux
/dev/sda4                  7598          10240      2706432         f    Win95 Ext'd (LBA)
/dev/sda5                  7598           8141       557040        82    Linux swap
/dev/sda6                  8142          10141      2047984        83    Linux
/dev/sda7                 10142          10240       101360        fc    Unknown
Disk /dev/sdb: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders
Units = cylinders of 4154 * 512 = 2126848 bytes
Disk /dev/sdb doesn't contain a valid partition table




You could also use 'esxcfg-vmhbadevs'

[root@esx35-1 root]# esxcfg-vmhbadevs -a
vmhba0:0:0     /dev/sda
vmhba0:1:0     /dev/sdb

The '-a' switch shows all devices, whether they have a console device or not.

compare this with the output of 'esxcfg-vmhbadevs -m'

[root@esx35-1 root]# esxcfg-vmhbadevs -m
vmhba0:0:0:2    /dev/sda2                                         4a0bdfe2-c789e0df-d96e-
000c29b6325a

The useful part of 'esxcfg-vmhbadevs' is it displays the vmhba address & the service console device.
If you use the '-m' switch it will also display the UUID of the vmfs volume.

We can determine that there is no vmfs volume on /dev/sdb as it was not listed when we ran
'esxcfg-vmhbadevs -m'



Using fdisk to create aligned partition

If you create your VMFS volumes using the VI client, they are automatically aligned.
If you create them via the service console, you need to manually align them.

[root@esx35-1 root]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF
disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by
w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1009, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1009, default 1009):
Using default value 1009

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fb
Changed system type of partition 1 to fb (Unknown)

Command (m for help): x
Expert command (m for help): b
Partition number (1-4): 1
New beginning of data (62-4191385, default 62): 128

Expert command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@esx35-1 root]#


Formatting the partition as VMFS

Now that we have created the partition, we need to format the partition. 'vmkfstools' is the tool to
create vmfs partitions (amongst other things)

[root@esx35-1 root]# vmkfstools -C vmfs3 -b1m -S NewVMFS vmhba0:1:0:1
Creating vmfs3 file system on "vmhba0:1:0:1" with blockSize 1048576 and
volume label "NewVMFS".
Successfully created new volume: 4a7724b1-06761321-6bcd-000c29b6325a

The '-C vmfs3' switch specifies to create a vmfs3 volume '-b1m' formats the volume with a 1m block
size '-S NewVMFS' defines the volume the name and 'vmhba0:1:0:1' is the device to create the
volume on.


Creating aligned vmdk files

Even though we have created an aligned VMFS filesystem it is advantageous to align any virtual
disks you have on the filesystem also. As you can see below we still have a misalignment issue...




To create an aligned virtual disk, when the disk has been added to the VM (assumes windows) open
Disk Management and select 'Rescan Disks' to pick up the new SCSI device. Once you can see the
new disk, 'Initialize' the disk right click the disk, and select 'Initialize Disk'. Now the disk is initialized
open a command prompt and use the tool 'diskpart'. You could also use the tool 'diskpar' (but the
syntax is different, diskpar uses a block offset, rather than the byte offset 'diskpart' uses).


C:\Documents and Settings\Administrator>diskpart
Microsoft DiskPart version 5.2.3790.3959
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: VIMGMT

DISKPART> list disk
  Disk ###       Status            Size          Free          Dyn    Gpt
  --------       ----------        -------       -------       ---    ---
  Disk 0         Online              20 GB            0 B
  Disk 1         Online            2039 MB       2039 MB

DISKPART> select disk 1

Disk 1 is now the selected disk.
DISKPART> create partition primary align = 64

DiskPart succeeded in creating the specified partition.

DISKPART>




Creating Extended VMFS volumes

The maximum size of a LUN that ESX may use is 2Tb. However you can create a single VMFS volume
that is up to 64Tb by spanning or extending a VMFS volume across multiple LUNs. A VMFS volume
may have up to 32 extents, these extents do not have to be the same size. For e.g. you can extend a
1Tb VMFS volume by adding an extent of 100Gb.

This can be done via the GUI, but as this is VCDX stuff, we use the cmd line. The tool you will use
'vmkfstools'




[root@esx35-1 root]# vmkfstools -Z "vmhba2:0:12:1" "vmhba2:0:1:1"


VMware ESX Server Question:
All data on vmhba2:0:12:1 will be lost. Continue and format?
0) Yes
1) No

Please choose a number [0-1]: 0


[root@esx35-1 root]# vmkfstools -P -h /vmfs/volumes/MyLUN/
VMFS-3.31 file system spanning 2 partitions.
File system label (if any): MyLUN
Mode: public
Capacity 29G, 29G available, file block size 1.0M
UUID: 4a81775b-21e6f204-684f-00151772654d
Partitions spanned (on "lvm"):
        vmhba2:0:1:1
        vmhba2:0:11:1




References:

Recommendations for Aligning VMFS Partitions
www.vmware.com/pdf/esx3_partition_align.pdf

Storage Block Alignment with Vmware Virtual Infrastructure
http://communities.vmware.com/servlet/JiveServlet/download/2409-117516-821464-
4435/NetappDisk+Alignment+on+Virtuals+-+3593.pdf


VMFS - Best Practices, and counter-FUD
http://virtualgeek.typepad.com/virtual_geek/2009/03/vmfs-best-practices-and-counter-fud.html
Storage troubleshooting

Objectives
Knowledge
      Identify storage related events and log entries
      Analyze storage events to determine related issues
Skills & Abilities
      Verify storage configuration using CLI, VI client and server log entries
      Troubleshoot storage connection issues using CLI , VI Client and logs
         o Rescan events
         o Failover events
      Interpret log entries for configuration validation and predictive analysis
      Troubleshoot file system errors using logs and CLI


Troubleshooting storage related issues is primarily done through analysis of the vmkernel log file.

The format of messages in the vmkernel log file is as follows:

Jun 19 09:12:54 pisa vmkernel:              14:22:31:50.009 cpu3:1033) scsi-qla0:
Scheduling SCAN for new luns....
<System Time> <host> <msg source>             <uptime>         <CPU:world id> <device>
<message>

Most of the descriptions above should be pretty self explanatory, but the CPU:world id bit could do
with more. Each VM is contained within a World and all of the vm's associated processes are
contained within that world. A world itself is a schedulable entity owned by the vmkernel, similar to
a process but more like a managed group of processes. A VM's world would contain multiple
processes to handle the running of the vm itself, and the operation od its virtual devices.

You can use the vm-support.pl script to determine the world id of each vm

[root@esx35-1 root]# vm-support -x
VMware ESX Server Support Script 1.29


Available worlds to debug:

vmid=1115             vc-fs01
vmid=1133             ad1-lab


This piece of information can make it easier to decipher messages within the vmkernel log file, which
can be quite trying at times...

The vmkernel log itself (/var/log/vmkernel) contains all messages logged by the vmkernel and is
controlled by syslog (/etc/syslog.conf). It is the primary log file for all things vmkernel related, which
itself is all things ESX. From a storage troubleshooting perspective it is the first place you should look.
In the VMFS notes we looked at detecting SCSI reservation issues, which caused timeouts in
accessing a VMFS volume. Below, we will look at path failover events which may be relatively
benign, or demonstrate serious issues in your storage environment.

Lets look at some common storage events, and how they appear in the vmkernel log

First we will look at a rescan event.
[root@esx35-1 /]# esxcfg-rescan vmhba2
Rescanning vmhba2...done.
On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31.
On scsi2, adding: 0:1 0:11 0:12 0:24 0:25 0:31.

So we discover a few luns as you see above, and below, you see how this is
shown in the vmkernel log. I have removed a bit of repeat stuff but most
of it is here...

Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.136 cpu1:1034)scsi-qla1:
Scheduling SCAN for new luns....
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.136 cpu1:1034)<6>scsi-qla1:
Scheduling SCAN for new luns....
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.137 cpu1:1035)<6>scsi(1) :
Non NPIV Fabric, Capability 0x100
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.137 cpu1:1035)SCSI: 861:
GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0,
linktype=0, state=0, failreason=2, rv=0, sts=0
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L1': Vendor: 'IBM     ' Model: '1722-600         '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L11': Vendor: 'IBM     ' Model: '1722-600          '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.146 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.147 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L12': Vendor: 'IBM     ' Model: '1722-600          '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.147 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.148 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L24': Vendor: 'IBM     ' Model: '1722-600          '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.148 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.149 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L25': Vendor: 'IBM     ' Model: '1722-600          '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.149 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.150 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T0:L31': Vendor: 'IBM     ' Model: 'Universal Xport '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.150 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.159 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L1': Vendor: 'IBM     ' Model: '1722-600         '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.159 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.160 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L11': Vendor: 'IBM     ' Model: '1722-600          '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.160 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.161 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L12': Vendor: 'IBM     ' Model: '1722-600         '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.161 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.162 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L24': Vendor: 'IBM     ' Model: '1722-600         '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.162 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.163 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L25': Vendor: 'IBM     ' Model: '1722-600         '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.163 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.164 cpu1:1035)ScsiScan: 395:
Path 'vmhba2:C0:T1:L31': Vendor: 'IBM     ' Model: 'Universal Xport '
Rev: '0914'
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.164 cpu1:1035)ScsiScan: 396:
Type: 0x0, ANSI rev: 5
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.220 cpu1:1033)<6>scsi(1) :
Non NPIV Fabric, Capability 0x100
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.220 cpu1:1033)SCSI: 861:
GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0,
linktype=0, state=0, failreason=2, rv=0, sts=0
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.244 cpu1:1034)<6>scsi(1) :
Non NPIV Fabric, Capability 0x100
Sep 7 09:15:06 esx35-1 vmkernel: 0:06:35:13.244 cpu1:1034)SCSI: 861:
GetInfo for adapter vmhba2, [0x3f041c80], max_vports=64, vports_inuse=0,
linktype=0, state=0, failreason=2, rv=0, sts=0
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Supported VPD pages for
vmhba2:C0:T0:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7
0xc8 0xc9 0xca 0xd0
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Device id info for
vmhba2:C0:T0:L1: 0x1 0x3 0x0 0x10 0x60 0xa 0xb 0x80 0x0 0xf 0x9e 0x2b 0x0
0x0 0x17 0x53 0x49 0xd5 0xa9 0xb6 0x1 0x93 0x0 0x8 0x20 0x6 0x0 0xa0 0xb8
0xf 0x9e 0x2c 0x1 0x94 0x0 0x4 0x0 0x0 0x0 0x1
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Id for vmhba2:C0:T0:L1
0x60 0x0a 0x0b 0x80 0x00 0x0f 0x9e 0x2b 0x00 0x00 0x17 0x53 0x49 0xd5 0xa9
0xb6 0x31 0x37 0x32 0x32 0x2d 0x36
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Supported VPD pages for
vmhba2:C0:T0:L11 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7
0xc8 0xc9 0xca 0xd0
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Device id info for
vmhba2:C0:T0:L11: 0x1 0x3 0x0 0x10 0x60 0xa 0xb 0x80 0x0 0xf 0x9e 0x2b 0x0
0x0 0x19 0xca 0x4a 0x68 0xfb 0x3c 0x1 0x93 0x0 0x8 0x20 0x6 0x0 0xa0 0xb8
0xf 0x9e 0x2c 0x1 0x94 0x0 0x4 0x0 0x0 0x0 0x1
Sep 7 09:15:06 esx35-1 vmkernel: VMWARE SCSI Id: Id for vmhba2:C0:T0:L11
0x60 0x0a 0x0b 0x80 0x00 0x0f 0x9e 0x2b 0x00 0x00 0x19 0xca 0x4a 0x68 0xfb
0x3c 0x31 0x37 0x32 0x32 0x2d 0x36

So there is a bit to digest there. Lets focus on the bottom bit i have highlighted. VPD stands for Vital
Product Data and describes the capabilities of the device. The 'Supported VPD pages' bit says what
information will be described (or can be queried). The green parts are the vendor specific pages,
and the blue ones provide information like:
 0x0 "Supported VPD pages", 0x80 "Unit serial number", 0x83 "Device identification" , 0x85
"Management network addresses"


Now lets look at a lun trespass. A trespass is an array side event where lun ownership changes from
controller to another.
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.214
cpu2:1026)StorageMonitor: 196: vmhba2:1:1:0 status = 2/0 0x2 0x4 0x3
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.214 cpu2:1026)SCSI: 5273:
vml.0200010000600601602be019000840a437da99de11524149442035: Cmd failed.
Blocking device during path failover.
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)WARNING: SCSI:
4562: Manual switchover to path vmhba1:0:1 begins.
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)SCSI: 3744:
Path vmhba1:0:1 is already active. No action required.
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216 cpu0:1053)WARNING: SCSI:
4614: Manual switchover to vmhba1:0:1 completed successfully.
Sep 9 18:00:50 essx35-1 vmkernel: 0:13:29:43.216
cpu1:1025)StorageMonitor: 196: vmhba2:0:1:0 status = 2/0 0x6 0x29 0x0

The first message translates to 'NOT READY: The LUN addressed cannot be accessed'. The kernel
then initiates a failover FROM hba2 TO hba1.

During events such as this, you may also see a message: 'Retry (unit attn)' this means that the lun
has outstanding I/Os that will be lost.

OK - now below is an example of some bad stuff.

Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu0:1099)StorageMonitor:
196: vmhba1:0:3:0 status = 24/0 0x0 0x0 0x0
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)SCSI: vm 1040:
109: Sync CR at 0
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: SCSI:
119: Failing I/O due to too many reservation conflicts
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: Vol3:
611: Couldn't read volume header from vmhba1:0:3:1: SCSI reservation
conflict
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)FSS: 390:
Failed with status SCSI reservation conflict for f530 28 1 493dc0c4
cc70d928 1f009457 e21d5c29 0 0 0 0 0 0 0
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)WARNING: Fil3:
1791: Failed to reserve volume f530 28 1 493dc0c4 cc70d928 1f009457
e21d5c29 0 0 0 0 0 0 0
Apr 24 16:07:01 esx35-1 vmkernel: 5:15:04:10.057 cpu2:1040)FSS: 390:
Failed with status SCSI reservation conflict for f530 28 2 493dc0c4
cc70d928 1f009457 e21d5c29 4 1 0 0 0 0 0

 I have truncated a lot but as we saw in the VMFS bit when a volume is reserved other hosts will
attempt up to 80 times to get to the disk. Here we see that this host has reached this limit and its
I/Os have failed. Generally reservation issues result due to many concurrent meta-data updates; too
many snapshots, multiple power on/off operations, bad scripts (like overuse of vdf) or even worse a
shonky array.


In the event that you need more verbose (!) information than is contained within the vmkernel log
when troubleshooting SCSI events, you can use the storagemonitor tool.


[root@esx35-1 root]# /usr/lib/vmware/bin/storageMonitor
Timestamp       World adapter    id   lun Command                    Error
Message
=========       ===== =======    === === =======
=============
0:00:00:37.166 01024 vmhba2      000 031 [0x28]READ(10)
[0x5:0x21:0x0]ILLEGAL REQUEST: Logical block address out of range[0x 0]
0:00:00:38.095 01024 vmhba2      000 031 [0x8 ]READ(06)
[0x5:0x21:0x0]ILLEGAL REQUEST: Logical block address out of range[0x 0]
Messages in the storagemonitor output are formatted in the following manner:

Mode:worldId:adapterName:id:lun:senseKey:additionalSenseCode:AdditionalSenseQualifier:

Storagemonitor is configured by its default config file /etc/vmware/storagemonitor.conf




References:

http://www.vmprofessional.com/index.php?content=resources



Storage VMotion

Objectives
Knowledge
      Describe Storage VMotion operation
      Explain implementation process for Storage VMotion
      Identify Storage VMotion use cases
      Understand performance implications for Storage VMotion
Skills and Abilities
      Use Remote CLI to perform Storage VMotion operations
             o Interactive mode
             o Non-interactive mode
      Implement Storage VMotion based on various use cases
             o Migration of all virtual disks to target storage location
             o Migration of virtual disks to independent target storage locations


Storage VMotion (SVmotion) is a feature which allows you to move a VMs files from one datastore
to another - without any downtime on the VM itself. Without SVmotion you would need to power
off a VM in order to relocate its constituent files, which depending on the size of the VM and your
underlying storage infrastructure could take many hours...

In ESX 3.5 SVmotion requires a valid Vmotion license, and is only supported between iSCSI & FC
datastores. During the SVmotion process there will be temporary requirements for additional
memory (double the VM's current allocation of physical memory for self vmotion) and additional
disk space, for both the snapshots created and the temporary doubling of the vmdk space on the
datastores. The virtual disks themselves must be eligible for snapshots (i.e. Not persistent or physical
compatibility RDMs). Up to 4 concurrent SVmotions are supported per datastore at any one time,
where the datastore is either the source or destination.

SVmotion in ESX3.5 leverages snapshot technology to copy the VM's disks. All I/O for the SVmotion
process is contained within the vmkernel and the kernel uses the NFC (Network File Copier) to move
the data.
SVmotion copies a VM's Working directory; the .vmx file, and log files the vswp file (if stored in the
working directory) and any other files that may reside in the working directory. Virtual disks may be
copied to specific datastores, i.e. You may move 1.vmdk to datastore 1 and 2.vmdk to datastore2.

SVmotion is not supported in the VI client, and must be implemented via either the RCLI or the VIMA
appliance.

SVmotion is typically leveraged for the following use cases:

         Changing storage tiers for a particular VM
             o A VM on tier 2 storage may not be performing acceptably, or a VM on tier 1 storage
                 may not require or utilize the benefits of the faster disk.
         Implementing a new array
             o Easy , zero downtime migration to the new storage
         When performing storage maintenance
             o Migrate to unaffected LUNS


The SVmotion process works as follows:

 1.       Copy VM's working directory (Home folder) to destination datastore
 2.       Performs self Vmotion to new working directory
 3.       Creates snapshot of VM's disk(s)
 4.       Copies the base vmdk's to new location
 5.       Re-parents the snapshots to the relocated vmdk's on destination datastore
 6.       Merges snapshots to destination vmdk's
 7.       Deletes source data

The process itself is reasonably straight forward. The only point I will cover more is the re-parenting
of the snap. Effectively the vm is updated to change the snapshots base image (the vmdk) from the
source vmdk to the destination vmdk.



C:\Program Files\VMware\VMware VI Remote CLI\bin>SVmotion.pl --interactive

Entering interactive mode.             All other options and environment variables
will be ignored.

Enter the VirtualCenter service url you wish to connect to (e.g.
https://myvc.mycorp.com/sdk, or jus
t myvc.mycorp.com): vc-lab.vmware.lab
Enter your username: administrator
Enter your password:

Attempting to connect to https://vc-lab.vmware.lab/sdk.
Connected to server.

Enter the name of the datacenter: MyDataCenter
Enter the datastore path of the virtual machine (e.g. [datastore1]
myvm/myvm.vmx): [VMclones] vc-fs01/vc-fs01.vmx
Enter the name of the destination datastore: LAB_VMs
You can also move disks independently of the virtual machine. If you want
the disks to stay with the virtual machine, then skip this step..
Would you like to individually place the disks (yes/no)? yes
Enter the datastore path of the disk you wish you place (e.g. [datastore1]
myvm/myvm.vmdk): [VMclones] vc-fs01/vc-fs01.vmdk
Enter the name of the destination datastore: LAB_VMs
Would you like to place another disk (yes/no)? yes
Enter the datastore path of the disk you wish you place (e.g. [datastore1]
myvm/myvm.vmdk): [VMclones] vc-fs01/vc-fs01_1.vmdk
Enter the name of the destination datastore: ISO_Library
Would you like to place another disk (yes/no)? no

Performing Storage VMotion.
0% |----------------------------------------------------------------------
--------------------------
----| 100%

##########################################################################
##########################
Storage VMotion completed successfully.

Disconnecting.

C:\Program Files\VMware\VMware VI Remote CLI\bin>

Above is an example of running SVmotion.pl in interactive mode. As you see, you are guided through
the process.
Below, is an example of using the command in non-interactive mode. When in non-interactive
mode, the command does not provide the progress bar we saw before, nor does it provide any
output when the command successfully completes. The %errorlevel% variable in windows should
return 0 if the command completed successfully.

C:\Program Files\VMware\VMware VI Remote CLI\bin>SVmotion.pl -
url=https://vc-lab.vmware.lab/sdk -userna
me=administrator -password=yea_right! -datacenter="MyDatacenter" --
vm="[LAB_VMs] vc-fs01/vc-fs01.vmx:VMclone
s"

C:\Program Files\VMware\VMware VI Remote CLI\bin>




References: http://www.sun.com/storage/white-papers/wmware-storage-vmotion.pdf




Snapshot LUNS

Objectives
Knowledge
      Identify tools and steps necessary to manage replicated VMFS volumes
         o Resignaturing
         o Snapshot LUNs
Skills & Abilities
      Manage RDMs in a replicated environment
              o Virtual compatibility mode
              o Physical compatibility mode
      Use esxcfg-advcfg
              o Set Resignaturing and Snapshot LUN options
 A snapshot LUN from an ESX servers perspective is one who's volume signature (Signature = LUN id
+ Disk Serial Number) which is stored in the VMFS header of a disk does not match the
characteristics of the lun that is being mounted.

For example you have a mission critical lun in your array. You decide to create an array based Clone
of this lun (a clone is an identical block level copy of a lun) in the event that the primary lun dies. I
shouldn’t have said that - the lun just died! But that’s OK right? I have a clone... So I go ahead and
mask the clone lun out to my esx server expecting things to pick up where they left off but no…it just
doesn’t appear.

Because the lun is a Block Level copy of the lun, all the metadata including the volume signature are
copied. And as it is a different physical lun the disk identifier and the signature don’t match up - it’s a
snapshot lun.

If you attempt to mount a snapshot lun, you would see the following (or V similar) in
/var/log/vmkernel

Apr 24 18:07:55 esx35-1 vmkernel: 0:00:06:39.408 cpu0:3616)ALERT: LVM:
4482: vmhba1:0:5:1 may be snapshot: disabling access. See resignaturing
section in SAN config guide.

There are a couple of ways to address this:

LVM.DisallowSnapshotLUN - the default setting for this is 1

[root@esx35-1 /]# esxcfg-advcfg -g /LVM/DisallowSnapshotLUN
Value of DisallowSnapshotLun is 1

What this means is that when ESX encounters a snapshot lun, it will refuse to mount it. If we change
the value to 0 then it will ignore the fact that the lun has an invalid signature and mount it. The
name is a bit of a negative, think of it and its value as 'do not allow snaps / disabled' which is 'allow
snaps'

[root@esx35-1 /]# esxcfg-advcfg -s 0 /LVM/DisallowSnapshotLUN
Value of DisallowSnapshotLun is 0

LVM.EnableResignature - default value for this is 0

[root@esx35-1 /]# esxcfg-advcfg -g /LVM/EnableResignature
Value of EnableResignature is 0

EnableResignature does exactly what it says….it will write a new signature to the disk. By default it is
off, as above. To enable it, set to 1 as below.

[root@esx35-1 /]# esxcfg-advcfg -s 1                  /LVM/EnableResignature
Value of EnableResignature is 1

Note: When you turn on LVM.EnableResignature the value for LVM.DisallowSnapshot is ignored -
whatever it may be.
When you allow a snapshot lun to be mounted, any VMs on this lun which are part of your VC
inventory will not need to be re-added to the inventory. However if you resignature, then any VM's
on this lun will need to be re-added to the inventory.

When you resignature a lun, the VMFS label of the lun is changes to 'snap-XXXXXX-<OLD_NAME>'
where XXXXXX is a system generated number.

So when is it right to do 1 or the other? The general rule is this: if the lun is NEVER coming back, then
you can safely set DisallowSnapshotLUN to 0 (which allows snap luns), whereas if the original lun is
coming back (some more scenarios soon) then your best to write the new signature via
EnableResignature. I should also say that it is generally not a good idea to allow snapshot luns as a
permanent solution. Get your LUNs back by allowing snaps, then SVmotion then over to another lun.
Should you need to enable resignature, turn it on, have it do its thing, then turn it off. Remember
that these settings (both allow snaps and resignature) are set on a per host basis, and they are either
on or not)

Lets think of some scenarios:

 1.    Use of array based snapshots as a backup policy. You have a need to recover some data from
    a point in time captured by the snapshot.
          When you mount the snap on an ESX host, it will still see the original lun. If you allowed
          the snap to be mounted without resignaturing, then you would have 2 instances of the
          lun - confusion and corruption anyone?
 2.    Use of array based replication to a DR site
          You fail over to the DR site. If these ESX servers are isolated from the storage perspective
          from the prod esx hosts, you could enable the snap lun as from the DR hosts perspective
          the other luns are not going to appear anytime soon.
 3.    An array firmware upgrade/something dodgy happens on your array to cause the LUN id's to
    change
          Sometimes, something on the array will change that will cause the LUN id's to change,
          and the luns are perceived as snapshot luns



Even more stuff on snaps.

There is a new setting in 3.5 called SCSI.CompareLUNNumber If you set this to 0 then ESX will only
look at the disk serial (vs lun id + serial) when making snap decisions.

If your array uses NAA (National Address Authority) then the lun id is not used to identify the lun,
but the NAA. This gets you around circumstances where you have presented a lun to multiple hosts,
and each host sees the lun with a different LUN id.



Managing Replicated RDMs

The above issues with snap luns and RDMs is somewhat simpler. As an RDM is essentially a .vmdk
that acts as a pointer to a physical san lun, so when you replicate a VMFS volume, with a guest that
has an RDM the details contained within that RDM will be invalid. So dealing with replicated RDMs is
as simple as removing the RDM from the guest, and re-adding the RDM pointing to the replicated
lun.



References:

VMFS Volume Management
http://www.vmware.com/files/pdf/vmfs_resig.pdf

A Few Technical Threads - Part 2: VMFS Resignaturing
http://virtualgeek.typepad.com/virtual_geek/2008/08/a-few-technic-1.html



Multipathing

Objectives
Knowledge
      Explain the use cases for round-robin load balancing
Skills & Abilities
      Perform advanced multi-pathing configuration
              o Configure multi-pathing policy
      Configure round-robin behavior using command-line tools
              o Manage active and inactive paths


Multipathing in ESX is primarily designed for availability, rather than increasing throughput to
individual LUNS. ESX 3.5 supports 3 multipathing policies, Fixed, Most Recently Used(MRU) & Round
Robin (RR). I will briefly describe Fixed & MRU then spend a little more time on RR.

Fixed
Fixed is used with Active/Active (AA)arrays. ESX classes an AA array as one does not have the
concept of lun ownership, that is any controller can send IO to any lun. When using the fixed policy,
a path is defined as the preferred path which will always be used to send IO to the given lun while it
is available. If the preferred path fails, another path will be chosen and used until the preferred path
becomes available again when ESX will fail back to it.

MRU
Most Recently Used is used against Active/Passive (AP) arrays. An AP array is one that has the
concept of LUN ownership. That is controllers in the array 'own' specific luns and only the owning
controller may send IO to that LUN. MRU does not have the concept of preferred paths, only active
paths. IO to a given lun will always be sent through the active path, and if that path fails another will
be chosen. This path will then become the active path and will be used for all IO until it fails.

When using Fixed or MRU ESX can only use a single path to send IO to a lun. Throughput issues are
generally addressed by 'poor mans load balancing'.
OK - Lets imagine that this is an AA array, and so we are using Fixed. For each lun, we have 4 paths.
Lets also imagine that the IO characteristics for each lun are the same.

vmhba1:0:1                 vmhba1:0:2                 vmhba1:0:3                  vmhba1:0:4
vmhba2:0:1                 vmhba2:0:2                 vmhba2:0:3                  vmhba2:0:4
vhmba1:1:1                 vhmba1:1:2                 vhmba1:1:3                  vhmba1:1:4
vmhba2:1:1                 vmhba2:1:2                 vmhba2:1:3                  vmhba2:1:4

If we set the active paths as the bold paths above, we have split the load across both HBA's and
both controllers. Pretty simple? Not very real world though….You are really trying to ensure that
you are distributing your IO across as many physical paths, and array controllers as possible, to
reduce bottlenecks at any one point in the infrastructure. Remember that all I/O basically goes
from 1 queue to another: from the guest queue (virtual scsi adapter), lun queue (vmfs volume), hba
queue, across the fabric (or whatever medium), to the array port queue, (hopefully then cached),
array backplane queues, then finally disk queues….

If the array were AP, and we are using MRU policy things change a little. As MRU has no concept of
preferred paths the kernel will use which ever path it happens to discover the luns on first. This is a
problem, as when the kernel boots and discovers storage, it is typical for ALL luns to be discovered
via one hba before another, and so you get all your luns being accessed via a single hba.

One way to address this is to configure your multipathing policy via the command line, and use the
vml identifiers for a lun, rather than their runtime addresses. VML addresses for luns are persistent,
that is they do not change across reboots whereas runtime addresses may change.

Runtime (also known as canonical addresses) take the following format:
vmhab2:0:13 where vmhba2 refers to the hba accessing the lun, :0 is the storage processor
(controller) in the array the lun is via and :13 is the lun id.

VML addresses may be found under /vmfs/devices/disks

[root@esx35-1 /]# ls -l /vmfs/devices/disks/
total 993127116
-rw-------    1 root     root     36270243840 Sep 7 04:18 vmhba1:0:0:0
-rw-------    1 root     root     213825024 Sep 7 04:18 vmhba1:0:0:1
-rw-------    1 root     root     5371107840 Sep 7 04:18 vmhba1:0:0:2
-rw-------    1 root     root     4293596160 Sep 7 04:18 vmhba1:0:0:3
-rw-------    1 root     root     26386698240 Sep 7 04:18 vmhba1:0:0:4
-rw-------    1 root     root     2146765824 Sep 7 04:18 vmhba1:0:0:5
-rw-------    1 root     root     567512064 Sep 7 04:18 vmhba1:0:0:6
-rw-------    1 root     root     23565394944 Sep 7 04:18 vmhba1:0:0:7
-rw-------    1 root     root     106896384 Sep 7 04:18 vmhba1:0:0:8
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:1:0 ->
vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:11:0 ->
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:0:11:1 ->
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:12:0 ->
vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:24:0 ->
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:0:24:1 ->
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:25:0 ->
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:0:25:1 ->
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:0:31:0 ->
vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:1:0 ->
vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:11:0 ->
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:1:11:1 ->
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:12:0 ->
vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:24:0 ->
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:1:24:1 ->
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:25:0 ->
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36
lrwxrwxrwx    1 root     root           60 Sep 7 04:18 vmhba2:1:25:1 ->
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1
lrwxrwxrwx    1 root     root           58 Sep 7 04:18 vmhba2:1:31:0 ->
vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572
-rw-------    1 root     root     16106127360 Sep 7 04:18
vml.0200010000600a0b80000f9e2b0000175349d5a9b6313732322d36
-rw-------    1 root     root     16106127360 Sep 7 04:18
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36
-rw-------    1 root     root     10733924864 Sep 7 04:18
vml.02000b0000600a0b80000f9e2b000019ca4a68fb3c313732322d36:1
-rw-------    1 root     root     16106127360 Sep 7 04:18
vml.02000c0000600a0b80000f9e2b000019d04a68fc58313732322d36
-rw-------    1 root     root     214748364800 Sep 7 04:18
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36
-rw-------    1 root     root     214745544704 Sep 7 04:18
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36:1
-rw-------    1 root     root     214748364800 Sep 7 04:18
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36
-rw-------    1 root     root     214745544704 Sep 7 04:18
vml.0200190000600a0b80000f9e2b0000144648fd15b4313732322d36:1
-rw-------    1 root     root            0 Sep 7 04:18
vml.02001f0000600a0b80000f9e2b0000000000000000556e69766572
[root@esx35-1 /]#

To identify a vml address for a particular lun, map the link of the runtime address to the vml
address. For e.g. The vml address of vmhba2:0:24 maps to
vml.0200180000600a0b80000fbf820000165d4a2fc16f313732322d36

VML addresses may also be discerned via esxcfg-mapth -lv

When you use these vml addresses with esxcfg-mpath the policies you define will persist across
reboots, and will ensure that your paths will be split as you defined.

Round Robin Load Balancing

Round Robin (RR) is provided to allow administrators to create policies that will split I/O to a
particular lun over multiple paths. One of the limitations of Fixed and MRU is that they will only
utilize a single path to send I/O to a lun, whereas with RR the admin can create a policy to send the
I/Os over multiple hbas to multiple array ports. RR can be quite effective, especially against A/A
arrays when used correctly. It will be most effective when it is used either across the board for all
luns, as if you have paths that are not configured to switch the I/Os from these luns will always go
into the same HBA queue, which could increase the time it takes for the RR I/Os to traverse that
queue, vs those that are not competing with other luns who do not split the I/Os across hbas.

In the below example, we will configure RR for vmhba2:0:24, set the policy to switch after 100 I/Os
and to switch to any available hba.

[root@esx35-1 /]# esxcfg-mpath -p custom --lun=vmhba2:0:24
Setting vmhba2:0:24 policy to custom
[root@esx35-1 /]# esxcfg-mpath -C 100 --lun=vmhba2:0:24
Setting Custom policy values
[root@esx35-1 /]# esxcfg-mpath -H any --lun=vmhba2:0:24
Setting Custom policy values
[root@esx35-1 /]# esxcfg-mpath -q --lun=vmhba2:0:24
Disk vmhba2:0:24 /dev/sde (204800MB) has 2 paths and policy of Custom:
maxCmds=100 maxBlks=2048 hbaPolicy=any targetPolicy=mru
 FC 16:0.0 2100001b32108fc5<->200600a0b80f9e2c vmhba2:0:24 On preferred
 FC 16:0.0 2100001b32108fc5<->200600a0b80f9e2d vmhba2:1:24 On active

[root@esx35-1 /]#

The options you have for configuring a RR policy are:

When to switch paths: This can be based upon the number of I/Os (using -C default 50) or the max
number of blocks (with -B default 2048)
Which HBA to select: (using -H) may be either; preferred, mru, any or minq. Minq selects the hba
with the least outstanding I/Os
Which Target to select: (using -T) chooses which target to use, can be; mru, any or preferred

When configuring RR, you may choose the default policy (done with esxcfg-mpath -p rr --
lun=vmhba2:0:24) which is effectively '-H any -T mru'

There are also 2 advanced options relevant to RR.
 /Disk/SPCmdsToSwitch - The default value for the number of I/Os to split on, defaults to 50
 /Disk/SPBlksToSwitch - The default value for the number of blocks to split on, defaults to 2048



 References:

 http://www.vmware.com/pdf/vi3_35_25_roundrobin.pdf




Using IP Storage

Objectives

Skills & Abilities
      Configure NFS datastores using command-line tools
      Configure iSCSI hardware and software initiators using command-line tools
      Configure storage network segmentation
              o FC Zoning
              o iSCSI/NFS VLAN
      Configure iSCSI/NFS security options



NFS Datastores

One of the forms of shared storage supported by ESX is NFS. ESX supports NFS v3 over TCP.

Adding NFS mounts to ESX is quite simple.

[root@esx35-1 root]# esxcfg-nas -a -o vnfs -s /iso MyNFSDatastore
Connecting to NAS volume: MyNFSDatastore
MyNFSDatastore created and connected.
[root@esx35-1 root]#

The main pre-requisite step is that the NFS mount must be exported with the 'no_root_squash'
option. The default behaviour of NFS is to deny the user root from accessing NFS mounts. There is an
experimental option available to change the delegate user ESX uses to access NFS mounts . It can be
changed (when in maintenance mode) via the 'Security Profile' tab.

The number of NFS mounts ESX supports by default is 8. This value can be changed by adjusting the
value in NFS.MaxVolumes (limit of 32)

NFS storage traffic should always be separated from regular LAN traffic. A dedicated IP storage
network should be implemented where possible for both performance/throughput and improved
security. NFS traffic is not encrypted and as such an attacker could use any data sniffed from the
network to reconstruct data as it traverses the network. If it is not possible to dedicate a physical
network to NFS at the very least a VLAN should be leveraged to segregate the layer 2 traffic from
regular LAN traffic.
An another mechanism to protect data on NFS datastores is to mount the datastore as Read Only.
This is acheived when adding an NFS mount via the GUI and checking the 'Read Only' box.



iSCSI Datastores

<I have updated it a little to include more cli goodness...>
<Also bit more best practice fluff>

iSCSI is one of the supported forms of shared storage that can be used by ESX hosts. Unlike NFS -
iSCSI is block level storage, and volumes are typically formatted as VMFS (unless being used as
RDM's).
iSCSI storage networks may be accessed by ESX hosts via a Hardware Initiator or a Software Initiator.
A Hardware Initiator acts as a SCSI device to the host. The host will issue SCSI commands directly to
the HBA which then encapsulates those into iSCSI packets and sends them to the targets (arrays). A
Software Initiator is a driver that runs in both the kernel memory space, and the Service Console.
This driver takes the SCSI commands the kernel issue and then creates iSCSI packets, which are then
sent to a vmk interface.

The easy way to sum the differences is that a Hardware Initiator costs more $, but saves CPU cycles,
the Software Initiator is free, but costs you CPU cycles.
The other notable difference is that ESX can only perform boot from SAN via the hardware initiator.

The general consensus of software vs hardware initiators is that you should generally only go for
hardware if you require boot from san. With the speed of modern processors losing cycles for
storage traffic is generally not a problem. How much CPU will you need for your iSCSI traffic? I get
asked this question all the time, and you always find yourself saying - well how much I/O have you
got :) I then suggest that on a busy server using software iSCSI you would want to make sure you had
a core available for the cycles the software initiator will need. Obviously if you start enabling header
digests checksums or GASP data digest checksums you would need MANY more cycles to handle this
traffic.

In terms of designing an iSCSI solution, you should remember that iSCSI is not just another tcp
protocol. You are running your storage I/O over IP network, so ensuring that you always have as
much bandwidth (with low latency) as possible is a good idea :) As a best practice we would always
recommend that you implement a dedicated IP network for your storage traffic - you wouldn’t think
twice about buying fibre switches when you implement FC storage right? And GbE switches don’t
cost as much as FC switches….But - if you have a (very) tight budget, segregating the traffic with a
VLAN would suffice. Try not to use cheap switches where you can, get switches with good port
buffers, and enable flow control. Remember that iSCSI isnt really a WAN technology as the latency
will really affect the overall performance.

Remember that for ESX, iSCSI may only be run over GbE, and not 10GbE, nor do we support Jumbo
Frames.

People normally think that iSCSI is the poor cousin of FC - and while it is true that you cant get the
same raw throughput - do you really need a gazillion IOPS? A well designed iSCSI solution, across
multiple switches, with multiple uplinks, against multiple targets is a solid solution - that should
meet pretty much all requirements.
ESX supports up to 2 Hardware Initiators, and only 1 instance of the Software Initiator (vmhba32
generally). ESX does not support mixing Hardware & Software Initiators in the same server.

The hardware initiator can be configured using the tool 'esxcfg-hwiscsi' and is used mainly to enable
jumbo frames (MTU 9000). Additional parameters of the hardware initiator may be
configured/queried using the 'vmkiscsi-tool' command.

The software initiator is configured using the tool 'esxcfg-swiscsi'.

Below, we will go through the steps to enable and configure the iSCSI software initiator. From
creating the vmkernel port, enabling the initiator and setting targets.

[root@esx35-1 root]# esxcfg-vswitch -a vSwitch2
[root@esx35-1 root]# esxcfg-vswitch -L vmnic3 vSwitch2
[root@esx35-1 root]# esxcfg-vswitch -L vmnic4 vSwitch2
[root@esx35-1 root]# esxcfg-vswitch -A iSCSI vSwitch2
[root@esx35-1 root]# esxcfg-vswitch -A iSCSI_SC vSwitch2
[root@esx35-1 root]# esxcfg-vswif -a -i 192.168.61.101 -n 255.255.255.0 -p
iSCSI_SC vswif1
[root@esx35-1 root]# esxcfg-vswitch -v 50 -p iSCSI vSwitch2
[root@esx35-1 root]# esxcfg-vmknic -a -i 192.168.61.201 -n 255.255.255.0
iSCSI
[root@esx35-1 root]# esxcfg-swiscsi -e
Allowing software iSCSI traffic through firewall...
Enabling software iSCSI...
Using /usr/lib/vmware/vmkmod/iscsi_mod.o
Module load of iscsi_mod succeeded.
[root@esx35-1 root]# esxcfg-firewall -q swISCSIClient
Service swISCSIClient is enabled.
[root@esx35-1 root]# vmkiscsi-tool -D -a 192.168.61.16 vmhba32
IPAddr 192.168.61.16,Port=0
[root@esx35-1 root]# esxcfg-rescan vmhba32
Doing iSCSI discovery. This can take a few seconds ...
Rescanning vmhba32...done.
On scsi4, removing:.
On scsi4, adding: 0:1.

OK - Lets step through it. First we create a vSwitch and link a few vmnics to it. Then we create our
port group for our vmkernel port and then a service console port. Then we create the vmk interface
on our new port group iSCSI and the vswif interface on iSCSI_SC. Once we have the vmk & sc port we
then enable the adapter using the 'esxcfg-swiscsi -e' command. Next we check that the SC firewall
has been updated to allow iSCSI traffic.
Finally we add the address of the iSCSI array and rescan the adapter.

We can use vmkiscsi-tool to query which targets (arrays) we are connected to, and which luns are
available

[root@esx35-1 root]# vmkiscsi-tool -T -l vmhba32
-------------------------------------------
NAME                               : iqn.1998-
01.com.vmware:iscsi.example.crider
ALIAS                              :
DISCOVERY METHOD FLAGS             : 0
SEND TARGETS DISCOVERY SETTABLE    : 0
SEND TARGETS DISCOVERY ENABLED     : 0
Portal 0                           : 192.168.61.16:3260
-------------------------------------------
[root@esx35-1 root]# vmkiscsi-tool -L -l vmhba32
Target iqn.1998-01.com.vmware:iscsi.example.crider:
-------------------------------------------
OS DEVICE NAME    : vmhba32:0:1
BUS NUMBER        : 0
TARGET ID         : 0
LUN ID            : 1
-------------------------------------------

[root@esx35-1 root]#

We can also query the /proc filesystem for information about which LUNs are visible to the host

[root@esx35-1 root]# cat /proc/scsi/vmkiscsi/*
# iSCSI driver version: 3.6.3.0 variant (27-Jun-2005)
#
# SCSI:               iSCSI:
# Bus Tgt LUN         IP address   Port TargetName
    0   0   1      192.168.61.16   3260 iqn.1998-
01.com.vmware:iscsi.example.crider
[root@esx35-1 root]#


Before we go into things a little deeper, lets go over a few pre-reqs. Aside from requiring a vmk port
you also require Service Console connectivity (vswif) to the array. The Service Console runs an iSCSI
daemon that is responsible primarily for authenticating any iSCSI sessions from ESX.

There are 2 methods for target discovery, Send Targets (Dynamic) and Static Discovery. When using
the software initiator the only supported discovery method is Send Targets. Send Targets is the
simplest to configure, as all that is required is the IP address (and TCP port) of the array. When the
initiator connects to the target, it queries for a list of targets and then attempts to mount them.
Static Discovery is used with the hardware initiator and you supply the IP/Port of the array and the
iSCSI Qualified Name (IQN) of the lun you are accessing. More on IQNs later.

CHAP (Challenge Handshake Authentication Protocol) is a protocol used to secure iSCSI connections.
iSCSI traffic is not encrypted, and as such an attacker could listen to the iSCSI traffic, and then pose
as the ESX host to the iSCSI array and gain access to any luns presented to the ESX host.

ESX (v3) only supports uni-directional chap, and can only store one set of CHAP credentials. To
configure CHAP at the cmd line, you need to edit /etc/vmkiscsi.conf and enter the following lines:

OutgoingUsername="iqn.1998-01.com.vmware:esx35-1-186fd4c7"
OutgoingPassword="mypassword"

IQNs are in the following format:

iqn.1998-01.com.vmware:esx35-1-186fd4c7
iqn.<year>-<month>.<reverse dns name>:<servername>-<random number>

The IQN and alias of your host may be queried with the following command:

[root@esx35-1 root]# vmkiscsi-tool -I -l vmhba32
iSCSI Node Name: iqn.1998-01.com.vmware:esx35-1-186fd4c7
[root@esx35-1 root]# vmkiscsi-tool -k -l vmhba32
iSCSI Node Alias: esx35-1.vmware.lab
[root@esx35-1 root]#
Multipathing for iSCSI is actually pretty easy. You cannot *configure* multipathing for iSCSI luns - IP
just takes care of it. It is similar to fibre channel, with one subtle exception, you only have 1 path (in
this case a TCP session) to a TARGET. If your array is configured to host multiple iSCSI luns behind a
single IP address - all traffic to that target will go through 1 TCP session. So how do you make this
work for you? Well you configure your array to host the luns across as many ip interfaces as it will
support, i.e. lun1 behind 192.168.61.11 lun2 behind 192.168.61.12 etc etc. This will allow the kernel
to leverage multiple vmnics on the vSwitch (which we will see in the load balancing section in
Networking). Really, you just need to remember that multipathing in iSCSI is about failover, and
when failures occur in the network it is IP (routing) that ensures that the traffic gets to the array.
Just as we saw in FC multipathing (poor mans) try to distribute the load across as many interfaces as
you can, specifically for iSCSI it is about the array side as much (more?) than on the ESX side. Make
sure you have multiple uplinks on your vSwitches and that your iSCSI luns are being hit by different
target addresses (array IPs)


References:

The mother of all ESX iSCSI lessons...
http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-
customers-using-vmware.html


Zoning & Masking

Objectives

Skills & Abilities
      Configure LUN masking
              o Storage device
              o Host
      Set ESX Server host-side disk options

While an in-depth discussion of FC topologies is not required for the VCDX, solid knowledge of FC
concepts such as zoning & masking are required. I also have to say, deep knowledge of storage
concepts does help immensely for VI admins. I generally tell my students if nothing else will make or
break the performance of your VI - the storage will. It doesnt really matter how much memory or
how many cores you have if your storage cannot handle the IOPS required by all the guests...but
anyway - on with the show.


FC Zoning

Zones in a FC Fabric are at the surface similar to VLANs in a network environment. A Zone describes
who can communicate with who-else at the FC2 layer (kinda layer 2 if you try to OSI it). There are
(generally) 2 types of zones Hard & Soft but for our purposes they will both behave the same.

Below I have thrown up a simple Redundant Switched Fabric environment. I have 2 servers
'Windows' and 'ESX' both with dual FC HBA's and both are connected to different FC switches. Also
connected is a small FC array (I have used an EMC Clariion mainly as I had the shape handy).
A fabric is really a single FC network, for SME's it is generally as above i.e. a single FC switch, but for
larger enterprises it could be many FC switches cascaded together or a director. In my example we
have 2 fabrics, Fabric 1 & Fabric 2.

Zoning is configured at the Fabric (switch) level and is used to control which ports (and the
device(s)behind them) can talk to which other ports. In Fabric 2 I would probably configure the
zoning as follows:

Zone 1 (SPB0_WINDOWS_HBA2): Port 0, Port 12
Zone 2 (SPA0_WINDOWS_HBA2): Port 0, Port 15
Zone 3 (SPB0_ESX_VMHBA2): Port 3, Port 12
Zone 4 (SPA0_ESX_VMHBA2): Port 3, Port 15

What I have done here is create 4 zones, and each zone provides access from a specific HBA to a
specific array port. Another way I could have achieved this is:

Zone 1 (ARRAY_WINDOWS_HBA2): Port 0, Port 12, Port 15
Zone 2 (ARRAY_ESX_VMHBA2): Port 3, Port 12, Port 15

The difference here is that while each HBA is zoned to see both array ports as in the previous
example but here both array ports are zoned also.
If you think about it, do servers HBA's need to see each other? In the same light generally array ports
(Front End Ports) do not generally need to see each other. Therefore they do not need to be zoned
together. If something doesn't need access to something else, you do not give it access right?

From a simplistic standpoint, zoning describes what you can see on the fabric. Troubleshooting
zoning from the VI admin's point of view is easy. Can you see the array? If yes (and specifically all
ports of the array) then your problem is most likely not zoning. If you cannot see the array then
zoning could well be the issue.



LUN Masking

LUN Masking, aka Selective Storage Presentation is an array level concept that describes what a
given initiator (server, or really the HBA in the server) can access on the array. Zoning lets you see
the array, Masking defines what on the array you can see. Looking at my picture above, we have
four LUNS, LUNs 1,2,3 & 4. we also have 2 servers, a windows box and an ESX box.
If we didnt have masking, then our windows server would have access to the the VMFS luns - which
is generally a no-no. Imagine what would happen if this windows box were to write a disk signature
to the VMFS volumes...
So, on the array we would configure LUN Masking to provide each server with access to only the luns
they require. You could imagine the masking configuration on the array to look like this:

WINDOWS: LUN 1, LUN 3
ESX: LUN 2, LUN 4

Every array vendor implements masking in different ways, but the end result is the same, Masking
defines which luns on the array are presented to which hosts. Who has access to what.

If we expand our examples out to a little more real world scenario, we will likely have multiple ESX
hosts that require access to our VMFS volumes. The only thing that changes is that we will Mask the
luns out to multiple hosts. That is, configure the array to present the luns to multiple hosts - aka
clustering.

Now when most people think about masking, our definition is limited to the array. However, LUN
masking may also be configured at the host level. Imagine that you have configured your array to
present a LUN out to your 5 ESX hosts, however you do not want 1 of the 5 ESX hosts to have access
to this lun. You could configure masking at the host level to prevent the ESX host from 'seeing' the
LUN in question.

We use the advanced option Disk.MaskLUNs. This can be set via the GUI 'Advanced Settings' -> 'Disk'
-> MaskLUNs
As this is VCDX we do things the cmd line way :) Setting advanced options is done via the 'esxcfg-
advcfg' command. In the below example, I will mask LUN 31 from my host esx35-1.

[root@esx35-1 config]# esxcfg-rescan vmhba2
Rescanning vmhba2...done.
On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31.
On scsi2, adding: 0:1 0:11 0:12 0:24 0:25 0:31.
[root@esx35-1 config]# esxcfg-advcfg -g /Disk/MaskLUNs
Value of MaskLUNs is
[root@esx35-1 config]# esxcfg-advcfg -s "vmhba2:1:31; vmhba2:0:31;"
/Disk/MaskLUNs
Value of MaskLUNs is vmhba2:1:31; vmhba2:0:31;
[root@esx35-1 config]# esxcfg-rescan vmhba2
Rescanning vmhba2...done.
On scsi2, removing: 0:1 0:11 0:12 0:24 0:25 0:31.
On scsi2, adding: 0:1 0:11 0:12 0:24 0:25.
[root@esx35-1 config]#

First, we do a rescan on vmhba2, then using 'esxcfg-advcfg -g' we query the value for /Disk/MaxLUNs
which is not set. We then set the value using the '-s' flag. Once the value is set, we then rescan, and
see that LUN31 is no longer visible to the host.



Modifying Kernel Module Settings

Objectives

Skills & Abilities
      Use esxcfg-module
              o Modify storage adapter settings
              o Identify and load/unload modules
              o Get module status
      Use proc nodes to identify driver configuration and options


First of all, let me start by saying that you generally modify vmkernel parameters at your own risk! I
always tell my students that you don’t just read some blog and go away and tinker with the
vmkernel module params! Kind of like how M$ tells you “you modify the registry at your own risk” -
well we really mean it here! you have been warned :)

vmkernel parameters can be seen in a few ways, some can be seen by querying the /proc filesystem,
or by using the esxcfg-module command. kernel module parameters are stored in
/etc/vmware/esx.conf

Below, we are querying the /proc filesystem to obtain information on one of the attached fibre
channel interfaces of the ESX host by calling the module information.

[root@esx35-1 qla2300]# pwd
/proc/scsi/qla2300
[root@esx07 qla2300]# cat 2
QLogic PCI to Fibre Channel Host Adapter for QLA2340:
        Firmware version: 3.03.19, Driver version 7.08-vm33.1

Boot Code Version:
        BIOS : v1.25, Fcode : v0.00, EFI : v0.00

Entry address = 0x8c44f4
HBA: QLA2312 , Serial# J06083
Request Queue = 0x41610000, Response Queue = 0x41630000
Request Queue count= 512, Response Queue count= 512
Total number of active commands = 0
Total number of interrupts = 22704892
Total number of active IP commands = 0
Total number of IOCBs (used/max) = (0/600)
Total number of queued commands = 0
    Device queue depth = 0x20
Number of free request entries = 328
Number of mailbox timeouts = 0
Number of ISP aborts = 0
Number of loop resyncs = 489
Number of retries for empty slots = 0
Number of reqs in pending_q= 0, retry_q= 0, done_q= 0, scsi_retry_q= 0
Host adapter:loop state= <READY>, flags= 0x860813
Dpc flags = 0x0
MBX flags = 0x0
SRB Free Count = 4096
Link down Timeout = 030
Port down retry = 015
Login retry count = 030
Loop down time = 120
Loop down abort time = 090
Commands retried with dropped frame(s) = 0
Configured characteristic impedence: 50 ohms
Configured data rate: 1-2 Gb/sec auto-negotiate
NPIV Supported : No


SCSI Device Information:
scsi-qla0-adapter-node=200000e08b0d63d3;
scsi-qla0-adapter-port=210000e08b0d63d3;

FC Port Information:
scsi-qla0-port-0=200600a0b80f9e2b:200600a0b80f9e2c:010100:81;
scsi-qla0-port-1=200600a0b80f9e2b:200600a0b80f9e2d:010600:84;
SCSI LUN Information:
(Id:Lun) * - indicates lun is not registered with the OS.
( 0: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:81,
( 0: 1): Total reqs 15571570, Pending reqs 0, flags 0x0, 0:0:81,
( 0:10): Total reqs 7203676, Pending reqs 0, flags 0x0, 0:0:81,
( 0:31): Total reqs 10685, Pending reqs 0, flags 0x0, 0:0:81,
( 1: 0): Total reqs 2, Pending reqs 0, flags 0x0*, 0:0:84,
( 1: 1): Total reqs 21271, Pending reqs 0, flags 0x0, 0:0:84,
( 1:10): Total reqs 21273, Pending reqs 0, flags 0x0, 0:0:84,
( 1:31): Total reqs 10642, Pending reqs 0, flags 0x0, 0:0:84,
Bus:Function = 0x6:0x8
[root@esx35-1 qla2300]#


We are going to use the good ol' queue depth parameters as the primary example. I have highlighted
in the above example, 0x20 hex is 32 decimal as seen above.

Using the esxcfg-module command, we can see which modules the kernel has loaded. In the below
example, the qla2300_707_vmw module is used for our Qlogic HBA.

[root@esx35-1]# esxcfg-module -l
Device Driver Modules
Module          Enabled Loaded
vmklinux        true    true
tg3             true    true
aacraid_esx30 true      true
qla2300_707_vmwtrue     true
e1000           true    true
lvmdriver       true    true
vmfs3           true    true
etherswitch     true    true
shaper          true    true
tcpip           true    true
cosShadow       true    true
migration       true    true
nfsclient       true    true
deltadisk       true    true
vmfs2           true    true
[root@esx35-1]# esxcfg-module -g qla2300_707_vmw
qla2300_707_vmw enabled = 1 options = ''
[root@esx35-1]#
We can use the -g switch to query for any set module options. The output of the above command
shows us that we do not have any options set.

To set a kernel module parameter use the -s switch.

[root@esx35-1 root]#
[root@esx35-1 root]# esxcfg-module -s ql2xmaxqdepth=64 qla2300_707_vmw
[root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw
qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64'


What you see above demonstrates a few things. In the first command, we set a kernel parameter for
the Qlogic module that increases the max queue depth to 64. We then query the module for options
and see that the value is indeed set to 64.

[root@esx35-1 root]# esxcfg-module -s qlport_down_retry=60 qla2300_707_vmw
[root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw
qla2300_707_vmw enabled = 1 options = 'qlport_down_retry=60'

Next we set another parameter which increases the retry time from 30secs to 60secs. When we
query the module options we see that we have lost the ql2xmaxqdepth value. Module parameters
are not added cumulatively. When you need to set multiple parameters they must be set in the
same command, as see in the following commands.

[root@esx35-1 root]# esxcfg-module -s "ql2xmaxqdepth=64
qlport_down_retry=60" qla2300_707_vmw
[root@esx35-1 root]# esxcfg-module -g qla2300_707_vmw
qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64
qlport_down_retry=60'

We see that these values we have set, are written into the esx.conf file.

[root@esx35-1 root]# grep -i qla /etc/vmware/esx.conf
/device/006:01.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
/vmkernel/module/qla2300_707_vmw.o/options = "ql2xmaxqdepth=64
qlport_down_retry=60"

Now notice that when we check the runtime settings of the HBA, that these new values are not
updated.
[root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i device
    Device queue depth = 0x20
[root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i "down retry"
Port down retry = 015

OK - Side note here. According to the VM doco and a few other sources, you need to run esxcfg-boot
to update the initrd use by the host in order for the kernel settings to take effect. I did not bounce
my host, and found that the settings were indeed applied as you can see from the below commands
run after the reboot.


[root@esx07 root]# esxcfg-module -g qla2300_707_vmw
qla2300_707_vmw enabled = 1 options = 'ql2xmaxqdepth=64
qlport_down_retry=60'

[root@esx35-1 root]# grep -i qla /etc/vmware/esx.conf
/device/006:01.0/name = "QLogic Corp QLA231x/2340 (rev 02)"
/vmkernel/module/qla2300_707_vmw.o/options = "ql2xmaxqdepth=64
qlport_down_retry=60"

[root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i device
    Device queue depth = 0x40
[root@esx35-1 root]# cat /proc/scsi/qla2300/2 | grep -i "down retry"
Port down retry = 060

However, for completeness we will go for the *correct* method and update the initrd. First backup
your old initrd in /boot (both the standard initrd and the debug mode intird) in the event that you
need to roll back, then run the 'esxcfg-boot -b' command.

[root@esx35-1 root]# cd /boot
[root@esx35-1 boot]# cp /boot/initrd-2.4.21-57.ELvmnix.img-dbg
/boot/OLD_initrd-2.4.21-57.ELvmnix.img-dbg
[root@esx35-1 boot]# cp /boot/initrd-2.4.21-57.ELvmnix.img
/boot/OLD_initrd-2.4.21-57.ELvmnix.img
[root@esx35-1 boot]# ls -l
total 38717
-rw-r--r--    1 root     root        26806 Aug 13 2008 config-2.4.21-
57.ELvmnix
drwxr-xr-x     2 root    root         1024 Jun 30 21:40 grub
-rw-r--r--    1 root     root      7708363 Sep 3 10:55 initrd-2.4.21-
57.ELvmnix.img
-rw-r--r--    1 root     root      8950139 Sep 3 10:55 initrd-2.4.21-
57.ELvmnix.img-dbg
-rw-r--r--    1 root     root       666567 Jun 30 21:34 initrd-2.4.21-
57.ELvmnix.img-sc
-rw-r--r--    1 root     root          615 Jun 30 21:53 kernel.h
drwx------    2 root     root        12288 Jun 30 21:27 lost+found
-rw-r--r--    1 root     root      7708363 Sep 3 10:56 OLD_initrd-2.4.21-
57.ELvmnix.img
-rw-r--r--    1 root     root      8950139 Sep 3 10:56 OLD_initrd-2.4.21-
57.ELvmnix.img-dbg
lrwxrwxrwx     1 root    root           28 Jun 30 21:53 System.map ->
System.map-2.4.21-57.ELvmnix
-rw-r--r--    1 root     root       621361 Aug 13 2008 System.map-2.4.21-
57.ELvmnix
-rwxr-xr-x     1 root    root      3452300 Aug 13 2008 vmlinux-2.4.21-
57.ELvmnix
-rw-r--r--    1 root     root      1374902 Aug 13 2008 vmlinuz-2.4.21-
57.ELvmnix
[root@esx35-1 boot]# esxcfg-boot -b
[root@esx35-1 boot]# ls -l
total 38717
-rw-r--r--    1 root     root        26806 Aug 13 2008 config-2.4.21-
57.ELvmnix
drwxr-xr-x     2 root    root         1024 Jun 30 21:40 grub
-rw-r--r--    1 root     root      7708337 Sep 3 10:57 initrd-2.4.21-
57.ELvmnix.img
-rw-r--r--    1 root     root      8950126 Sep 3 10:57 initrd-2.4.21-
57.ELvmnix.img-dbg
-rw-r--r--    1 root     root       666567 Jun 30 21:34 initrd-2.4.21-
57.ELvmnix.img-sc
-rw-r--r--    1 root     root          615 Jun 30 21:53 kernel.h
drwx------    2 root     root        12288 Jun 30 21:27 lost+found
-rw-r--r--    1 root     root      7708363 Sep 3 10:56 OLD_initrd-2.4.21-
57.ELvmnix.img
-rw-r--r--    1 root     root      8950139 Sep 3 10:56 OLD_initrd-2.4.21-
57.ELvmnix.img-dbg
lrwxrwxrwx    1 root     root           28 Jun 30 21:53 System.map ->
System.map-2.4.21-57.ELvmnix
-rw-r--r--    1 root     root       621361 Aug 13 2008 System.map-2.4.21-
57.ELvmnix
-rwxr-xr-x     1 root    root      3452300 Aug 13 2008 vmlinux-2.4.21-
57.ELvmnix
-rw-r--r--    1 root     root      1374902 Aug 13                    2008 vmlinuz-2.4.21-
57.ELvmnix
[root@esx35-1 root]# shutdown -r now


We can also use esxcfg-module to load or unload specific modules. In the below example we will
unload the vmfs2 module from the kernel (which is generally not required)

[root@esx035-1boot]# esxcfg-module -d vmfs2
[root@esx35-1 boot]# esxcfg-module -l
Device Driver Modules
Module         Enabled Loaded
vmklinux       true    true
tg3            true    true
aacraid_esx30 true     true
qla2300_707_vmwtrue    true
e1000          true    true
lvmdriver      true    true
vmfs3          true    true
etherswitch    true    true
shaper         true    true
tcpip          true    true
cosShadow      true    true
migration      true    true
nfsclient      true    true
deltadisk      true    true
vmfs2          false   true
iscsi_mod      true    true

Reboot your server and the module will not be loaded.




NPIV

Objectives

Skills & Abilities
      Configure and use NPIV HBAs

N Port Virtualization (NPIV) is a technique where a server can instantiate multiple World Wide
Names (WWNs) onto a HBA. More specifically it allows multiple N_PORTS to share a physical
N_PORT.

ESX supports the use of NPIV only for VMs with RDMs, either Physical or Virtual. NPIV may not be
used for VM traffic to a VMFS filesystem (i.e. vmdk files)

In order to use NPIV the Fibre switches must support NPIV, and the FC HBAs in the ESX server must
support NPIV (most 4Gb HBAs support NPIV)

When a VM configured to use NPIV is powered on, ESX will create a virtual port (VPORT) on each
HBA (max 4) which will be used for all traffic for that VMs IO to any attached RDMs. ESX will attempt
to discover storage using each VPORT created for the VM.

In order to use NPIV, your HBA's must support NPIV, as well as the FC switch. Most 4Gb Qlogic HBA's
support NPIV, emulux HBA's must be running NPIV firmware. Any LUNs presented to VMs must be
masked not only to the virtual WWN created for the VM, but also to the WWN of the HBAs in the
ESX server(s).

Once you have attached an RDM to your VM, you can then use the VI client to add WWN's for the
VM that will be used for all traffic to the RDM.

Right click VM -> 'Edit Settings' -> Options -> Fibre Channel NPIV

Uncheck (if checked) Temporarily Disable NPIV for this VM and then select 'Generate New WWNs'

These settings can only be changed while the VM is powered off. Virtual Center will then generate a
new WWNN and multiple (4) WWPN for the VM.
ESX instantiates a VPORT (a VPORT is perceived by the fabric as being a HBA - it has its own WWN)
on each HBA (max 4)

Once configured for a VM, you can regenerate or remove WWNs for a VM. Obviously this may cause
the VM to lose access to the LUN.

Below is an excerpt from the .vmx file of a vm that has been configured for NPIV.

scsi0:1.present = "TRUE"
scsi0:1.fileName = "myvm_1.vmdk"
scsi0:1.mode = "independent-persistent"
scsi0:1.deviceType = "scsi-hardDisk"
wwn.node = "2839000c29000003"
wwn.port =
"2839000c29000004,2839000c29000005,2839000c29000006,2839000c29000007"
wwn.type = "vc"




References:

http://pubs.vmware.com/vi35/fibre_san_config/esx_san_cfg_manage.8.17.html

http://www.vmware.com/pdf/vi3_35_25_npiv_config.pdf




 .-.
(-_-)
(\x/)
(-.-)

								
To top