NetBackup_Backup_Planning___Performance_Tuning_Guide by amsreeku

VIEWS: 370 PAGES: 178

									Veritas NetBackup™
Backup Planning and
Performance Tuning Guide

UNIX, Windows, and Linux


Release 6.0




N281842


November 27, 2007
Veritas NetBackup
Backup Planning and Performance Tuning Guide
          Copyright © 2003 - 2007 Symantec Corporation. All rights reserved.

          NetBackup 6.5

          Symantec, the Symantec logo, and NetBackup are trademarks or registered trademarks of
          Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be
          trademarks of their respective owners.

          Portions of this software are derived from the RSA Data Security, Inc. MD5
          Message-Digest Algorithm. Copyright 1991-92, RSA Data Security, Inc. Created 1991. All
          rights reserved.

          The product described in this document is distributed under licenses restricting its use,
          copying, distribution, and decompilation/reverse engineering. No part of this document
          may be reproduced in any form by any means without prior written authorization of
          Symantec Corporation and its licensors, if any.
          THIS DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED
          CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED
          WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR
          NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH
          DISCLAIMERS ARE HELD TO BE LEGALLY INVALID, SYMANTEC CORPORATION SHALL
          NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION
          WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE
          INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE
          WITHOUT NOTICE.

          The Licensed Software and Documentation are deemed to be “commercial computer
          software” and “commercial computer software documentation” as defined in FAR
          Sections 12.212 and DFARS Section 227.7202.

          Symantec Corporation
          20330 Stevens Creek Blvd.
          Cupertino, CA 95014
          www.symantec.com

          Printed in the United States of America.
Third-party legal notices

                 Third-party software may be recommended, distributed, embedded, or bundled
                 with this Veritas product. Such third-party software is licensed separately by its
                 copyright holder. All third-party copyrights associated with this product are
                 listed in the accompanying release notes.
                 AIX is a registered trademark of IBM Corporation.
                 HP-UX is a registered trademark of Hewlett-Packard Development Company, L.P.
                 Linux is a registered trademark of Linus Torvalds.
                 Solaris is a trademark of Sun Microsystems, Inc.
                 Windows is a registered trademark of Microsoft Corporation.
                 Oracle is a registered trademark of Oracle Corporation.

Licensing and registration
                 Veritas NetBackup is a licensed product. See the NetBackup Installation Guide
                 for license installation instructions.

Technical support
                 For technical assistance, visit http://entsupport.symantec.com and select phone
                 or email support. Use the Knowledge Base search feature to access resources
                 such as TechNotes, product alerts, software downloads, hardware compatibility
                 lists, and our customer email notification service.
                                                                  Contents



Section I   Backup planning and configuration guidelines
Chapter 1   NetBackup capacity planning
            Purpose ..................................................................................................................12
            Introduction ..........................................................................................................13
            Analyzing your backup requirements ..............................................................14
            Designing your backup system ..........................................................................16
                Calculate the required data transfer rate for your backups ..................17
                Calculate how long it will take to back up to tape ...................................18
                Calculate how many tape drives are needed ............................................20
                Calculate the required data transfer rate for your network(s) .............21
                Calculate the size of your NetBackup catalog .........................................22
                Calculate the size of the EMM server ........................................................23
                Calculate media needed for full and incremental backups ...................25
                Calculate the size of the tape library needed to store your backups ...26
                Design your master backup server based on your previous findings ..27
                Estimate the number of master servers needed ......................................29
                Design your media server ...........................................................................31
                Estimate the number of media servers needed .......................................32
                Design your NOM server .............................................................................33
                Summary .......................................................................................................37
            Questionnaire for capacity planning ................................................................38

Chapter 2   Master server configuration guidelines
            Managing NetBackup job scheduling ................................................................40
                Delays in starting jobs .................................................................................40
                Delays in running queued jobs ...................................................................40
                Job delays caused by unavailable media ...................................................41
                Delays after removing a media server ......................................................41
                Limiting factors for job scheduling ...........................................................41
                Adjusting the server’s network connection options ...............................42
                Using NOM to monitor jobs ........................................................................43
                Disaster recovery testing and job scheduling ..........................................43
            Miscellaneous considerations ............................................................................44
                Processing of storage units ........................................................................44
6




                    Disk staging .................................................................................................. 44
                    File system capacity .................................................................................... 45
                NetBackup catalog strategies ............................................................................ 45
                    Catalog backup types .................................................................................. 46
                    Guidelines for managing the catalog ........................................................ 46
                    Catalog backup not finishing in the available window .......................... 47
                    Catalog compression ................................................................................... 48
                Merging/splitting/moving servers ................................................................... 48
                    Moving the EMM server .............................................................................. 49
                Guidelines for policies ........................................................................................ 49
                    Include and exclude lists ............................................................................ 49
                    Critical policies ............................................................................................. 50
                    Schedule frequency ..................................................................................... 50
                Managing logs ...................................................................................................... 50
                    Optimizing the performance of vxlogview .............................................. 50
                    Interpreting legacy error logs .................................................................... 51

    Chapter 3   Media server configuration guidelines
                Network and SCSI/FC bus bandwidth ............................................................... 54
                How to change the threshold for media errors ............................................... 54
                    Adjusting media_error_threshold ............................................................. 55
                How to reload the st driver without rebooting Solaris .................................. 57
                Media Manager drive selection ......................................................................... 58
                Robot types and NetBackup port configuration ............................................. 58

    Chapter 4   Media configuration guidelines
                Dedicated or shared backup environment ....................................................... 60
                Pooling ................................................................................................................... 60
                Disk versus tape ................................................................................................... 60

    Chapter 5   Database backup guidelines
                Introduction ......................................................................................................... 64
                Considerations for database backups ............................................................... 64

    Chapter 6   Best practices
                Best practices: new tape drive technologies .................................................... 66
                Best practices: tape drive cleaning ................................................................... 66
                Best practices: storing tape cartridges ............................................................. 68
                Best practices: recoverability ............................................................................. 68
                    Suggestions for data recovery planning .................................................. 69
                Best practices: naming conventions ................................................................. 71
                                                                                                                                           7




                    Policy names .................................................................................................71
                    Schedule names ............................................................................................71
                    Storage unit/storage group names ............................................................72


Section II   Performance tuning
Chapter 7    Measuring performance
             Overview ................................................................................................................76
             Controlling system variables for consistent testing conditions ...................76
                 Server variables ............................................................................................76
                 Network variables ........................................................................................77
                 Client variables .............................................................................................78
                 Data variables ...............................................................................................78
             Evaluating performance .....................................................................................79
             Evaluating UNIX system components ..............................................................84
                 Monitoring CPU load ...................................................................................84
                 Measuring performance independent of tape or disk output ...............84
             Evaluating Windows system components .......................................................85
                 Monitoring CPU load ...................................................................................86
                 Monitoring memory use .............................................................................87
                 Monitoring disk load ...................................................................................87

Chapter 8    Tuning the NetBackup data transfer path
             Overview ................................................................................................................90
             The data transfer path ........................................................................................90
             Basic tuning suggestions for the data path .....................................................91
             NetBackup client performance ..........................................................................95
             NetBackup network performance .....................................................................96
                 Network interface settings .........................................................................96
                 Network load .................................................................................................97
                 NetBackup media server network buffer size ..........................................97
                 NetBackup client communications buffer size ........................................99
                 The NOSHM file .........................................................................................100
                 Using multiple interfaces .........................................................................101
             NetBackup server performance .......................................................................102
                 Shared memory (number and size of data buffers) ..............................102
                 Parent/child delay values .........................................................................108
                 Using NetBackup wait and delay counters ............................................108
                 Fragment size and NetBackup restores ..................................................119
                 Other restore performance issues ...........................................................122
             NetBackup storage device performance .........................................................126
8




    Chapter 9    Tuning other NetBackup components
                 Multiplexing and multi-streaming ................................................................. 130
                     When to use multiplexing and multi-streaming ................................... 130
                     Effects of multiple data streams on backup/restore ............................ 132
                 Encryption .......................................................................................................... 133
                 Compression ....................................................................................................... 133
                     How to enable compression ..................................................................... 133
                 Using encryption and compression ................................................................ 134
                 NetBackup Java .................................................................................................. 134
                 Vault .................................................................................................................... 134
                 Fast recovery with bare metal restore ............................................................ 135
                 Backing up many small files ............................................................................ 135
                     FlashBackup ............................................................................................... 136
                 NetBackup Operations Manager (NOM) ......................................................... 137
                     Adjusting the NOM server heap size ...................................................... 137
                     Adjusting the NOM web server heap size .............................................. 138
                     Adjusting the Sybase cache size .............................................................. 138
                     Saving NOM databases and database log files on separate physical hard
                           disks ..................................................................................................... 140
                     Defragment NOM databases .................................................................... 142
                     Purge data periodically ............................................................................. 143

    Chapter 10   Tuning disk I/O performance
                 Hardware performance hierarchy .................................................................. 146
                     Performance hierarchy level 1 ................................................................ 148
                     Performance hierarchy level 2 ................................................................ 148
                     Performance hierarchy level 3 ................................................................ 149
                     Performance hierarchy level 4 ................................................................ 150
                     Performance hierarchy level 5 ................................................................ 151
                     General notes on performance hierarchies ........................................... 151
                 Hardware configuration examples ................................................................. 153
                 Tuning software for better performance ....................................................... 154

    Chapter 11   OS-related tuning factors
                 Kernel tuning (UNIX) ........................................................................................ 158
                     Kernel parameters on Solaris 8 and 9 .................................................... 158
                     Kernel parameters in Solaris 10 .............................................................. 160
                     Message queue and shared memory parameters on HP-UX ............... 161
                     Kernel parameters on Linux .................................................................... 163
                 Adjusting data buffer size (Windows) ............................................................ 163
                 Other Windows issues ....................................................................................... 165
                                                                                                                             9




Appendix A   Additional resources
                Performance tuning information at Vision online ...............................167
                Performance monitoring utilities ............................................................167
                Freeware tools for bottleneck detection .................................................167
                Mailing list resources ................................................................................168

Index                                                                                                               169
                  Section I


Backup planning and
configuration guidelines

      Section I helps you lay the foundation of good backup performance through
      planning and configuring your NetBackup installation. Section I also includes
      some best practices.
      Section I includes these chapters:
      ■   NetBackup capacity planning
      ■   Master server configuration guidelines
      ■   Media server configuration guidelines
      ■   Media configuration guidelines
      ■   Database backup guidelines
      ■   Best practices


      Note: For a discussion of tuning factors and general recommendations that may
      be applied to an existing installation, see Section II.
                                          Chapter                         1
NetBackup capacity
planning
      This chapter explains how to design your backup system as a foundation for
      good performance.
      This chapter includes the following sections:
      ■   “Introduction” on page 13
      ■   “Analyzing your backup requirements” on page 14
      ■   “Designing your backup system” on page 16
      ■   “Questionnaire for capacity planning” on page 38
12 NetBackup capacity planning
   Purpose




   Purpose
                          Veritas NetBackup is a high-performance data protection application. Its
                          architecture is designed for large and complex distributed computing
                          environments. NetBackup provides a scalable storage management server that
                          can be configured for network backup, recovery, archival, and file migration
                          services.
                          This manual is for administrators who want to analyze, evaluate, and tune
                          NetBackup performance. This manual is intended to answer questions such as
                          the following: How big should the backup server be? How can the NetBackup
                          server be tuned for maximum performance? How many CPUs and tape drives
                          are needed? How to configure backups to run as fast as possible? How to
                          improve recovery times? What tools can characterize or measure how
                          NetBackup is handling data?


                          Note: Most critical factors in performance are based in hardware rather than
                          software. Hardware selection and configuration have roughly four times the
                          weight that software has in determining performance. Although this guide
                          provides some hardware configuration assistance, it is assumed for the most
                          part that your devices are correctly configured.




   Disclaimer
                          It is assumed you are familiar with NetBackup and your applications, operating
                          systems, and hardware. The information in this manual is advisory only,
                          presented in the form of guidelines. Changes to an installation undertaken as a
                          result of the information contained herein should be verified in advance for
                          appropriateness and accuracy. Some of the information contained herein may
                          apply only to certain hardware or operating system architectures.


                          Note: The information in this manual is subject to change.
                                                                NetBackup capacity planning   13
                                                                              Introduction




Introduction
          The first step toward accurately estimating your backup requirements is a
          complete understanding of your environment. Many performance issues can be
          traced to hardware or environmental issues. A basic understanding of the entire
          backup data path is important in determining the maximum performance you
          can expect from your installation.
          Every backup environment has a bottleneck. It may be a fast bottleneck, but it
          will determine the maximum performance obtainable with your system.

          Example:
          Consider the configuration illustrated below. In this environment, backups run
          slowly (in other words, they are not completing in the scheduled backup
          window). Total throughput is eight to 10 megabytes per second.
          What makes the backups run slowly? How can NetBackup or the environment be
          configured to increase backup performance in this situation?

          Figure 1-1       Dedicated NetBackup server




          The explanation is that the LAN, having a speed of 100megabits per second, has
          a theoretical throughput of 12.5 megabytes per second. In practice, 100BaseT
          throughput is unlikely to exceed 70% utilization. Therefore, the best delivered
          data rate is about 8 megabytes per second to the NetBackup server. The
          throughput can be even lower than this, when TCP/IP packet headers,
          TCP-window size constraints, router hops (packet latency for ACK packets
          delays the sending of the next data packet), host CPU utilization, filesystem
          overhead, and other LAN users’ activity are considered. Since the LAN is the
          slowest element in the backup path, it is the first place to look in order to
          increase backup performance in this configuration.
14 NetBackup capacity planning
   Analyzing your backup requirements




   Analyzing your backup requirements
                           Many elements influence your backup strategy. You must analyze and compare
                           these factors and then make backup decisions according to your site’s priorities.
                           When you plan your installation’s NetBackup capacity, ask yourself the
                           following questions:
                           ■   Which systems need to be backed up?
                               It is important that you identify all systems that need to be backed up and
                               then list each system separately so that you can identify any that require
                               more resources to back up. Document which machines have local tape
                               drives or libraries attached and be sure to write down the model type of
                               each tape drive or library. In addition, record each host name, operating
                               system and version, database type and version, network technology (for
                               example, ATM or 100BaseT), and location.
                           ■   How much data will be backed up?
                               Calculate how much data you need to back up. Include the total disk space
                               on each individual system, including that for databases. Remember to add
                               the space on mirrored disks only once.
                               By calculating the total size for all disks, you can design a system that takes
                               future growth into account. You should also consider the future by
                               estimating how much data you will need to back up in six months to a few
                               years from now.
                               ■    Do you plan to back up databases or raw partitions?
                                    If you are planning to backing up databases, you need to identify the
                                    database engines, their version numbers, and the method that you will
                                    use to back them up. NetBackup can back up several database engines
                                    and raw file systems, and databases can be backed up while they are
                                    online or offline. To back up any database while it is online, you need a
                                    NetBackup database agent for your particular database engine.
                                    If you use NetBackup Advanced Client to back up databases using raw
                                    partitions, you are actually backing up as much data as the total size of
                                    your raw partition. Also, remember to add the size of your database
                                    backups to your final calculations when figuring out how much data
                                    you need to back up.
                               ■    Will you be backing up specialty servers like MS-Exchange, Lotus
                                    Notes, etc.?
                                    If you are planning on backing up any specialty servers, you will need
                                    to identify their types and application release numbers. As previously
                                    mentioned, you may need a special NetBackup agent to properly back
                                    up your particular servers.
                           ■   What types of backups are needed and how often should they take place?
                                                         NetBackup capacity planning   15
                                                  Analyzing your backup requirements



    The frequency of your backups has a direct impact on your:
    ■   Tape requirements
    ■   Data transfer rate considerations
    ■    Restore opportunities.
    To properly size your backup system, you must decide on the type and
    frequency of your backups. Will you perform daily incremental and weekly
    full backups? Monthly or bi-weekly full backups?
■   How much time is available to run each backup?
    It is important to know the window of time that is available for each backup.
    The length of a window dictates several aspects of your backup strategy, for
    example, you may want a larger window of time to back up multiple,
    high-capacity servers. Or you may consider the use of advanced NetBackup
    features such as synthetic backups, a local snapshot method, or
    FlashBackup.
■   How long should backups be retained?
    An important factor while designing your backup strategy is to consider
    your policy for backup expiration. The amount of time a backup is kept is
    also known as the “retention period.” A fairly common policy is to expire
    your incremental backups after one month and your full backups after six
    months. With this policy, you can restore any daily file change from the
    previous month and restore data from full backups for the previous six
    months. The length of the retention period depends on your own unique
    requirements and business needs, and perhaps regulatory requirements.
    However, keep in mind that the length of your retention period has a
    directly proportional effect on the number of tapes you will need and the
    size of your NetBackup catalog database. Your NetBackup catalog database
    keeps track of all the information on all your tapes. The catalog size is
    tightly tied in to your retention period and the frequency of your backups.
    Also, database management daemons and services may become bottlenecks.
■   If backups are sent off site, how long must they remain off site?
    If you plan to send tapes to an off site location as a disaster recovery option,
    you must identify which tapes to send off site and how long they remain off
    site. You might decide to duplicate all your full backups, or only a select few.
    You might also decide to duplicate certain systems and exclude others. As
    tapes are sent off site, you will need to buy new tapes to replace them until
    they are recycled back from off site storage. If you forget this simple detail,
    you will run out of tapes when you most need them.
■   What is your network technology?
    If you are planning on backing up any system over a network, note the
    network types that you will be using. The next section, Designing your
16 NetBackup capacity planning
   Designing your backup system



                                  backup system, explains how to calculate the amount of data you can
                                  transfer over those networks in a given time.
                                  Depending on the amount of data that you want to back up and the
                                  frequency of those backups, you might want to consider installing a private
                                  network just for backups.
                          ■       What new systems will be added to your site in the next six months?
                                  It is important to plan for future growth when designing your backup
                                  system. By analyzing the potential future growth of your current or future
                                  systems, you can insure the backup solution that you have accommodates
                                  the kind of environment that you will have in the future. Remember to add
                                  any resulting growth factor that you incur to your total backup solution.
                          ■       Will user-directed backups or restores be allowed?
                                  Allowing users to do their own backups and restores can reduce the time it
                                  takes to initiate certain operations. However, user-directed operations can
                                  also result in higher support costs and the loss of some flexibility.
                                  User-directed operations can monopolize media and tape drives when you
                                  most need them. They can also generate more support calls and training
                                  issues while the users become familiar with the new backup system. You
                                  will need to decide whether allowing user access to some of your backup
                                  systems’ functions is worth the potential costs.
                          Other factors to consider when planning your backup capacity include:
                          ■       Data type: What are the types of data: text, graphics, database? How
                                  compressible is the data? How many files are involved? Will the data be
                                  encrypted? (Note that encrypted backups may run slower. See “Encryption”
                                  on page 133 for more information.)
                          ■       Data location: Is the data local or remote? What are the characteristics of
                                  the storage subsystem? What is the exact data path? How busy is the storage
                                  subsystem?
                          ■       Change management: Because hardware and software infrastructure will
                                  change over time, is it worth the cost to create an independent test-backup
                                  environment to ensure your production environment will work with the
                                  changed components?



   Designing your backup system
                          Following an analysis of your backup requirements, you can begin designing
                          your backup system. Use the following subsections in the order shown below.
                                                                     NetBackup capacity planning   17
                                                                    Designing your backup system




              Note: The ideas and examples that follow are based on standard and ideal
              calculations. Your numbers will differ based on your particular environment,
              data, and compression rates.

              ■   “Calculate the required data transfer rate for your backups” on page 17
              ■   “Calculate how long it will take to back up to tape” on page 18
              ■   “Calculate how many tape drives are needed” on page 20
              ■   “Calculate the required data transfer rate for your network(s)” on page 21
              ■   “Calculate the size of your NetBackup catalog” on page 22
              ■   “Calculate the size of the EMM server” on page 23
              ■   “Calculate media needed for full and incremental backups” on page 25
              ■   “Calculate the size of the tape library needed to store your backups” on
                  page 26
              ■   “Design your master backup server based on your previous findings” on
                  page 27
              ■   “Estimate the number of master servers needed” on page 29
              ■   “Design your media server” on page 31
              ■   “Estimate the number of media servers needed” on page 32
              ■   “Design your NOM server” on page 33
              ■   “Summary” on page 37


Calculate the required data transfer rate for your backups
              This is the rate of transfer your system must achieve to complete a backup of all
              your data in the allowed time window. Use the following formula to calculate
              your ideal data transfer rate for full and incremental backups:
              Ideal data transfer rate = (Amount of data to back up) / (Backup window)
              On average, the daily change in data for many systems is between 10 and 20
              percent. Calculating a change of 20% in the (Amount of data to back up) and
              dividing it by the (Backup window) will give you the backup data rate for
              incremental backups.
              If you are running cumulative-incremental backups, you need to take into
              account which data is changing, since that affects the size of your backups. For
              example, if the same 20% of the data is changing daily, your
              cumulative-incremental backup will be much smaller than if a completely
              different 20% changes every day.
18 NetBackup capacity planning
   Designing your backup system



                          Example: Calculating your ideal data transfer rate during the week
                          Assumptions:
                              Amount of data to back up during a full backup = 500 gigabytes
                              Amount of data to back up during an incremental backup = 20% of a full
                              backup Daily backup window = 8 hours
                          Solution 1:
                              Full backup = 500 gigabytes
                              Ideal data transfer rate = 500 gigabytes/8 hours = 62.5 gigabytes/hour
                          Solution 2:
                              Incremental backup = 100 gigabytes
                              Ideal data transfer rate = 100 gigabytes/8 hours = 12.5 gigabytes/hour
                          To calculate your ideal data transfer rate during the weekends, divide the
                          amount of data that needs to be backed up by the length of the weekend backup
                          window.


   Calculate how long it will take to back up to tape
                          Once you know what your ideal data transfer rates are for backups, you can
                          figure out what kind of tape drive technology will meet your needs. Because you
                          also know the length of your available backup windows and the amount of data
                          that needs to be backed up, you can also calculate how many tape drives you will
                          need.
                          Table 1-1 lists the transfer rates for several tape drive technologies. The values
                          listed are those published by their individual manufacturers and those observed
                          in real-life situations. Keep in mind that device manufacturers list optimum
                          rates for their devices. In reality, it is quite rare to achieve those values when a
                          system has to deal with the overhead of the operating system, CPU loads, bus
                          architecture, data types, and other hardware and software issues.
                          The typical gigabytes/hour values from Table 1-1 represent a range of real-life
                          transfer rates for several devices, with and without compression. When you
                          design your backup system, consider the nature of both your data and your
                          environment. It is generally wise to estimate on the conservative side when
                          planning capacity. For instance, use the low end of the typical gigabytes/hour
                          range for your planning unless you have specific reasons to use the higher
                          numbers.
                          To calculate the length of your backups using a particular tape drive, use the
                          formula:
                              Actual data transfer rate = (Amount of data to back up)/((Number of drives)
                              * (Tape drive transfer rate))
                                                          NetBackup capacity planning   19
                                                         Designing your backup system




Table 1-1         Tape drive data transfer rates

Drive               Theoretical        Theoretical                Typical
                    gigabytes/hour (no gigabytes/hour (2:1        gigabytes/hour
                    compression)       compression)

LTO gen 1           54                    108                     37-65

LTO gen 2           108                   216                     75-130

LTO gen 3           288                   576                     200-345

SDLT 320            57                    115                     40-70

SDLT 600            129                   259                     90-155

STK 9940B           108                   252 (2.33:1)            75-100


Example: Calculating the actual data transfer rate required
Assumptions:
    Amount of data to back up during a full backup = 500 gigabytes
    Daily backup window = 8 hours
    Ideal transfer rate (data/(backup window)) = 500 gigabytes/8 hours = 62.5
    gigabytes/hour
Solution 1:
    Tape drive = 1 drive, LTO gen 1
    Tape drive transfer rate = 37 gigabytes/hour
    Actual data transfer rate = 500 gigabytes/((1 drive) * (37 gigabytes/hour)) =
    13.51 hours
With a data transfer rate of 37 gigabytes/hour, a single LTO gen 1 tape drive will
take 13.51 hours to perform a 500 gigabyte backup. A single LTO gen 1 tape
drive will not be able to perform your backup in eight hours. You will need a
faster tape drive or another LTO gen 1 tape drive.
Solution 2:
    Tape drive = 1 drive, LTO gen 2
    Tape drive transfer rate = 75 gigabytes/hour
    Backup length = 500 gigabytes/((1 drive) * (75 gigabytes/hour)) = 6.67 hours
    With a data transfer rate of 75 gigabytes/hour, a single LTO gen 2 tape drive
    will take 6.67 hours to perform a 500 gigabyte backup.
Depending on the several factors that can influence the transfer rates of your
tape drives, it is possible to obtain higher or lower transfer rates. The solutions
in the examples above are approximations of what you can expect.
20 NetBackup capacity planning
   Designing your backup system



                          Note also that a backup of encrypted data may take more time. See “Encryption”
                          on page 133 for more information.


   Calculate how many tape drives are needed
                          To calculate how many tape drives you will need to perform your backups, use
                          the formula below and the typical gigabytes/hour transfer rates from Table 1-1
                          “Tape drive data transfer rates” on page 19.
                               Number of drives = (Amount of data to back up) /((Backup window) * (Tape
                               drive transfer rate))

                          Example: Calculating the number of tape drives needed to perform a
                          backup
                          Assumptions:
                              Amount of data to back up = 500 gigabytes
                              Backup window = 8 hours
                          Solution 1:
                              Tape drive type = SDLT 320
                              Tape drive transfer rate = 40 gigabytes/hour
                              Number of drives = 500 gigabytes/ ((8 hours) * (40 gigabytes/hour)) = 1.56 =
                              2 drives
                          Solution 2:
                              Tape drive type = SDLT 600
                              Tape drive transfer rate = 90 gigabytes/hour
                              Number of drives = 500 gigabytes/((8 hours) * (90 gigabytes/hour)) = 0.69 =
                              1 drive
                          Although it is quite straightforward to calculate the number of drives needed to
                          perform a backup, it is difficult to spread the data streams evenly across all
                          drives. To effectively spread your data, you have to experiment with various
                          backup schedules, NetBackup policies, and your hardware configuration. See
                          “Basic tuning suggestions for the data path” on page 91 to determine your
                          options.
                          Another important aspect of calculating how many tape devices you will need is
                          calculating how many tape devices you can attach to a drive controller.
                          When calculating the maximum number of tape drives that you can attach to a
                          controller, you must know the drive and controller maximum transfer rates as
                          published by their manufacturers. Failure to use maximum transfer rates for
                          your calculations can result in saturated controllers and unpredictable results.
                          Table 1-2 displays the transfer rates for several drive controllers. In practice,
                          your transfer rates might be slower because of the inherent overhead of several
                                                                           NetBackup capacity planning   21
                                                                          Designing your backup system



               variables including your file system layout, system CPU load, and memory
               usage.
               Table 1-2           Drive controller data transfer rates

               Drive Controller                 Theoretical                 Theoretical
                                                megabytes/second            gigabytes/hour

               ATA-5 (ATA/ATAPI-5)              66                          237.6

               Wide Ultra 2 SCSI                80                          288

               iSCSI                            100                         360

               1 Gigabit Fibre Channel          100                         360

               SATA/150                         150                         540

               Ultra-3 SCSI                     160                         576

               2 Gigabit Fibre Channel          200                         720

               SATA/300                         300                         1080

               Ultra320 SCSI                    320                         1152

               4 Gigabit Fibre Channel          400                         1440



Calculate the required data transfer rate for your network(s)
               When designing your backup system to perform backups over a network, you
               need to move data from your client(s) to your backup server(s) at a fast enough
               rate to finish your backups within your allotted backup window. Using the
               typical gigabytes/hour transfer rates from Table 1-3, you can find out the
               typical transfer rates of some fairly common network technologies. To calculate
               the required data transfer rate, use the formula below:
                    Required network data transfer rate = (Amount of data to back up) / (Backup
                    window)

               Table 1-3           Network data transfer rates

               Network Technology        Theoretical gigabytes/hour          Typical gigabytes/hour

               10BaseT (switched)        3.6                                 2.7

               100BaseT (switched)       36                                  32

               1000BaseT (switched)      360                                 320
22 NetBackup capacity planning
   Designing your backup system




                          Note: For additional information on the importance of matching network
                          bandwidth to your tape drives, see “Network and SCSI/FC bus bandwidth” on
                          page 54.


                          Example: Calculating network transfer rates
                          Assumptions:
                              Amount of data to back up = 500 gigabytes
                              Backup window = 8 hours
                              Required network transfer rate = 500 gigabytes/8hr = 62.5 gigabytes/hour
                          Solution 1: Network Technology = 10BaseT (switched)
                          Typical transfer rate = 2.7 gigabytes/hour
                          Using the values from Table 1-3, a single 10BaseT network has a transfer rate of
                          2.7 gigabytes/hour. This network will not handle your required data transfer
                          rate of 62.5 gigabytes/hour. In this case, you would have to explore some other
                          options, such as:
                          ■       Backing up your data over a faster network (1000BaseT)
                          ■       Backing up large servers to dedicated tape drives (media servers)
                          ■       Performing your backups during a longer time window
                          ■       Performing your backups over faster dedicated private networks.
                          Solution 2: Network Technology = 1000BaseT (switched)
                          Typical transfer rate = 320 gigabytes/hour
                          Using the values from Table 1-3, a single 1000BaseT network has a transfer rate
                          of 320 gigabytes/hour. This network technology will be able to handle your
                          backups with room to spare.
                          Calculating the data transfer rates for your networks can help you identify your
                          potential bottlenecks by looking at the transfer rates of your slowest networks.
                          “Basic tuning suggestions for the data path” on page 91 provides several
                          solutions for dealing with multiple networks and bottlenecks.


   Calculate the size of your NetBackup catalog
                          An important factor when designing your backup system is to calculate how
                          much disk space you need to store your NetBackup catalog. Your catalog keeps
                          track of all the files that have been backed up. The catalog’s size is directly tied
                          in to several variables, including the frequency of your backups, the number of
                          files being backed up, the path length for each file being backed up, and your
                          retention periods. On average, the size of your catalog can be between 1% to 2%
                          (or higher) of the total data being tracked.
                                                                     NetBackup capacity planning   23
                                                                    Designing your backup system



              To calculate your NetBackup catalog size, you need to know how much data you
              will be backing up for full and incremental backups, how often these backups
              will be performed, and for how long they will be retained. Here are two simple
              formulas to calculate these values:
                   Data being tracked = (Amount of data to back up) * (Number of backups) *
                   (Retention period)
                   NetBackup catalog size = 120 * (number of files)


              Note: If you select NetBackup’s True Image Restore option, your catalog will be
              twice as large as a catalog without this option selected. True Image Restore
              collects the information required to restore directories to their contents at the
              time of any selected full or incremental backup. Because the additional
              information that NetBackup collects for incremental backups is the same as that
              of a full backup, incremental backups take much more disk space when you
              collect True Image Restore information.


              Example: Calculating the size of your NetBackup catalog
              Assumptions:
                  Amount of data to back up = 100 gigabytes
                  Incremental backups = 20% of all data
                  Full backups per month = 4
                  Retention period for full backups = 6 months
                  Incremental backups per month = 30
                  Retention period for incremental backups = 1 month
              Solution:
                  Size of full backups = 100 gigabytes * 4 * 6 months = 2.4 terabytes
                  Size of incremental backups = (20% of 100 gigabytes) * 30 * 1 month = 600
                  gigabytes
                  Total data tracked = 2.4 terabytes + 600 gigabytes = 3 terabytes
                  NetBackup catalog size = 2% of 3 terabytes= 60 gigabytes
              Based on the previous assumptions, it will take 60 gigabytes of disk space to hold
              the catalog. Compression can reduce the size of your catalog to one-sixth or less
              of its uncompressed size. When the catalog is decompressed, this is only done
              for the images and time period of the particular system that you need to restore.


Calculate the size of the EMM server
              By default, the EMM server resides on the NetBackup master server. The
              amount of space needed for the EMM server is determined by the size of the
              NetBackup database (NBDB), as explained below.
24 NetBackup capacity planning
   Designing your backup system




                          Note: This space must be included when determining size requirements for a
                          master or media server, depending on where the EMM server is installed.

                          Space for the NBDB on the EMM server is required in the following two
                          locations:
                              UNIX
                                     /usr/openv/db/data
                                     /usr/openv/db/staging
                                  Windows
                                     install_path\NetBackupDB\data
                                     install_path\NetBackupDB\staging
                          Calculate the required space for the NBDB in each of the two directories, as
                          follows:
                               60 MB + (2 KB * number of volumes configured for EMM)
                          where EMM is the Enterprise Media Manager, and volumes are NetBackup
                          (EMM) media volumes. Note that 60 MB is the default amount of space needed
                          for the NBDB database used by the EMM server. It includes pre-allocated space
                          for configuration information for devices and storage units.


                          Note: During NetBackup installation, the install script looks for 60 MB of free
                          space in the above /data directory; if there is insufficient space, the installation
                          fails. The space in /staging is only required when a hot catalog backup is run.


                          Example: Calculating the space needed for the EMM server
                          Assuming there are 1000 EMM volumes to back up, the total space needed for
                          the EMM server in /usr/openv/db/data is:
                              60 MB + (2 KB * 1000 volumes) = 62 MB
                          The same amount of space is required in /usr/openv/db/staging. The
                          amount of space required may grow over time as the NBDB database increases in
                          size.


                          Note: The above 60 MB of space is pre-allocated, and is derived from the
                          following separate databases that are consolidated into the EMM database in
                          NetBackup 6.0: globDB, ltidevs, robotic_def, namespace.chksum, ruleDB,
                          poolDB, volDB, mediaDB, storage_units, stunit_groups, SSOhosts, and media
                          errors database. See the NetBackup Release Notes, in the section titled
                          “Enterprise Media Manager Databases,” for additional details on files and
                          database information included in the EMM database.
                                                                     NetBackup capacity planning   25
                                                                    Designing your backup system



Calculate media needed for full and incremental backups
              As part of planning your backup strategy, calculate how many tapes will be
              needed to store and retrieve your backups. The number of tapes that you will
              need depends on:
              ■   The amount of data that you are backing up
              ■   The frequency of your backups
              ■   The planned retention periods
              ■   The capacity of the media used to store your backups.
              If you expect your site's workload to increase over time, you can ease the pain of
              future upgrades by planning for expansion. Design your initial backup
              architecture so it can evolve to support more clients and servers. Invest in the
              faster, higher-capacity components that will serve your needs beyond the
              present.
              A simple formula for calculating your tape needs is shown here:
                  Number of tapes = (Amount of data to back up) / (Tape capacity)
              To calculate how many tapes will be needed based on all your requirements, the
              above formula can be expanded to
                  Number of tapes = ((Amount of data to back up) * (Frequency of backups) *
                  (Retention period)) / (Tape capacity)

              Table 1-4         Tape capacities

              Drive                        Theoretical gigabytes      Theoretical gigabytes
                                           (no compression)           (2:1 compression)

              LTO gen 1                    100                        200

              LTO gen 2                    200                        400

              LTO gen 3                    400                        800

              SDLT 320                     160                        320

              SDLT 600                     300                        600

              STK 9940B                    200                        400


              Example: Calculating how many tapes are needed to store all your
              backups
              Preliminary calculations:
                   Size of full backups = 500 gigabytes * 4 (per month) * 6 months = 12
                   terabytes
                   Size of incremental backups = (20% of 500 gigabytes) * 30 * 1 month = 3
26 NetBackup capacity planning
   Designing your backup system



                                  terabytes
                                  Total data tracked = 12 terabytes + 3 terabytes = 15 terabytes
                          Solution 1:
                              Tape drive type = LTO gen 1
                              Tape capacity without compression = 100 gigabytes
                              Tape capacity with compression = 200 gigabytes
                              Without compression:
                              Tapes needed for full backups = 12 terabytes/100 gigabytes = 120
                              Tapes needed for incremental backups = 3 terabytes/100 gigabytes = 30
                              Total tapes needed = 120 + 30 = 150 tapes
                              With 2:1 compression:
                              Tapes needed for full backups = 12 terabytes/200 gigabytes = 60
                              Tapes needed for incremental backups = 3 terabytes/200 gigabytes = 15
                              Total tapes needed = 60 + 15 = 75 tapes
                          Solution 2:
                              Tape drive type = LTO gen 3
                              Tape capacity without compression = 400 gigabytes
                              Tape capacity with compression = 800 gigabytes
                              Without compression:
                              Tapes needed for full backups = 12 terabytes/400 gigabytes = 30
                              Tapes needed for incremental backups = 3 terabytes/400 gigabytes = 7.5 ~=
                              8
                              Total tapes needed = 30 + 8 = 38 tapes
                              With 2:1 compression:
                              Tapes needed for full backups = 12 terabytes/800 gigabytes = 15
                              Tapes needed for incremental backups = 3 terabytes/800 gigabytes = 3.75
                              ~= 4
                              Total tapes needed = 15 + 4 = 19 tapes


   Calculate the size of the tape library needed to store your backups
                          To calculate how many robotic library tape slots are needed to store all your
                          backups, take the number of tapes for backup calculated in “Calculate media
                          needed for full and incremental backups” on page 25 and add tapes for catalog
                          backup and cleaning:
                              Tape slots needed = (Number of tapes needed for backups) + (Number of
                              tapes needed for catalog backups) + 1 (for a cleaning tape)
                          A typical example of tapes needed for catalog backup is 2.
                          Additional tapes may be needed for the following:
                          ■       If you plan to duplicate tapes or to reserve some media for special
                                  (non-backup) use, add those tapes to the above formula.
                                                                    NetBackup capacity planning   27
                                                                   Designing your backup system



              ■   Add tapes needed for future data growth. Make sure your system has a viable
                  upgrade path as new tape drives become available.


Design your master backup server based on your previous findings
              To design and configure a master backup server, you must:
              ■   Perform an initial backup requirements analysis, as outlined in the section
                  “Analyzing your backup requirements” on page 14.
              ■   Perform the calculations outlined in the previous steps of the current
                  section.
              Designing a backup server becomes a simple task once the basic design
              constraints are known:
              ■   Amount of data to back up
              ■   Size of the NetBackup catalog
              ■   Number of tape drives needed
              ■   Number of networks needed
              Given the above, a simple approach to designing your backup server can be
              outlined as follows:
              ■   Acquire a dedicated server
              ■   Add tape drives and controllers (for saving your backups)
              ■   Add disk drives and controllers (for OS and NetBackup catalog)
              ■   Add network cards
              ■   Add memory
              ■   Add CPU’s
28 NetBackup capacity planning
   Designing your backup system



                          Figure 1-2        Backup server hardware component




                          In some cases, it may not be practical to design a generic server to back up all of
                          your systems. You might have one or several large servers that cannot be backed
                          up over a network within your backup window. In such cases, it is best to back up
                          those servers using their own locally-attached tape drives. Although this section
                          discusses how to design a master backup server, you can still use its information
                          to properly add the necessary tape drives and components to your other servers.
                          The next example shows how to configure a master server using the design
                          elements gathered from the previous sections.

                          Example: Designing your master backup server
                          Assumptions:
                              Amount of data to back up during full backups = 500 gigabytes
                              Amount of data to back up during incremental backups = 100 gigabytes
                              Tape drive type = SDLT 600
                              Tape drives needed = 1
                              Network technology = 100BaseT
                              Network cards needed = 1
                              Size of NetBackup catalog after 6 months = 60 gigabytes (from “Example:
                              Calculating the size of your NetBackup catalog” on page 23)
                          Solution (the following values are based on Table 1-6 “CPUs needed per
                          master/media server component” and Table 1-7 “Memory needed per
                          master/media server component”):
                              CPUs needed for network cards = 1
                              CPUs needed for tape drives = 1
                                                                      NetBackup capacity planning   29
                                                                     Designing your backup system



                  CPUs needed for OS = 1
                  Total CPUs needed = 1 + 1 + 1 = 3
                  Memory needed for network cards = 16 megabytes
                  Memory needed for tape drives = 128 megabytes
                  Memory needed for OS and NetBackup = 1 gigabyte
                  Total memory needed = 16 + 128 + 1000 = 1.144 gigabytes
              Based on the above, your master server needs 3 CPUs and 1.144 gigabytes of
              memory. In addition, you need 60 gigabytes of disk space to store your
              NetBackup catalog, along with the necessary disks and drive controllers to
              install your operating system and NetBackup (2 gigabytes should be ample for
              most installations). This server also requires one SCSI card, or another, faster,
              adapter for use with the tape drive (and robot arm) and a single 100BaseT card
              for network backups.
              When designing your backup server solution, begin with a dedicated server for
              optimum performance. In addition, consult with your server’s hardware
              manufacturer to ensure that the server can handle your other components. In
              most cases, servers have specific restrictions on the number and mixture of
              hardware components that can be supported concurrently. Overlooking this last
              detail can cripple even the best of plans.


Estimate the number of master servers needed
              One of the key elements in designing your backup solution is estimating how
              many master servers are needed. As a rule, the number of master servers is
              proportional to the number of media servers. To determine how many master
              servers are required, consider the following:
              ■   The master server must be able to periodically communicate with all its
                  media servers. If there are too many media servers, master server
                  processing may be overloaded.
              ■   Consider business-related requirements. For example, if an installation has
                  different applications which require different backup windows, a single
                  master may have to run backups continually, leaving no spare time for
                  catalog cleaning, catalog backup, or maintenance.
              ■   If at all possible, design your configuration with one master server per
                  firewall domain. In addition, do not share robotic tape libraries between
                  firewall domains.
              ■   As a rule, the number of clients (separate physical hosts) per master server
                  is not a critical factor for NetBackup. Ordinary backup processing performed
                  by each client has little or no impact on the NetBackup server, unless, for
                  instance, the clients all have database extensions or are trying to run
                  ALL_LOCAL_DRIVES at the same time.
30 NetBackup capacity planning
   Designing your backup system



                          ■       Plan your configuration so that it contains no single point of failure. Provide
                                  sufficient redundancy to ensure high availability of the backup process.
                                  Having more tape drives or media may reduce the number of media servers
                                  needed per master server.
                          ■       Consider limiting the number of media servers handled by a master to the
                                  lower end of the estimates in Table 1-5.
                                  Although a well-managed NetBackup environment can handle more media
                                  servers than the numbers listed in this table, you may find your backup
                                  operations more efficient and manageable with fewer but larger media
                                  servers. The variation in the number of media servers per master server for
                                  each scenario in the table depends on the number of jobs submitted,
                                  multiplexing, multi-streaming, and network capacity.
                                  For information on designing a master server, refer to “Design your master
                                  backup server based on your previous findings” on page 27.


                          Note: This table provides a rough estimate only, as a guideline for initial
                          planning. Note also that the RAM amounts shown below are for a base
                          NetBackup installation; RAM requirements vary depending on the NetBackup
                          features, options, and agents being used.


   Table 1-5         Number of media servers supported by a master server

    Master         RAM              Number of      Master         Media Server Media               Number of
    Server Type                     Processors     Backups        Backups      Configuration       Media
                                                                                                   Servers Per
                                                                                                   Master
                                                                                                   Server

    Solaris        2 gigabytes      4              Not backing    Media server    10 - 20 tape     25 - 40
                                                   up clients     backing up      drives in not
                                                                  itself only     more than 2
                                                                                  libraries

    Solaris        4 gigabytes      4              Not backing    Media server    10 - 20 tape     35 - 50
                                                   up clients     backing up      drives in not
                                                                  itself only     more than 2
                                                                                  libraries

    Solaris        8+ gigabytes     4              Not backing    Media server    20 - 40 tape     50 -70
                                                   up clients     backing up      drives in not
                                                                  network         more than 2
                                                                  clients         libraries
                                                                                   NetBackup capacity planning   31
                                                                                  Designing your backup system



Table 1-5         Number of media servers supported by a master server

Master          RAM            Number of        Master         Media Server Media               Number of
Server Type                    Processors       Backups        Backups      Configuration       Media
                                                                                                Servers Per
                                                                                                Master
                                                                                                Server

Windows         2 gigabytes    4                Not backing    Media server   15 - 30 tape      10+
                                                up clients     backing up     drives in not
                                                               itself only    more than 2
                                                                              libraries

Windows         4 gigabytes    4                Not backing    Media server   20 - 40 tape      20+
                                                up clients     backing up     drives in not
                                                               itself only    more than 2
                                                                              libraries

Windows         8+ gigabytes   4                Not backing    Media server   40 - 128 tape     50+
                                                up clients     backing up     drives in not
                                                               network        more than 2
                                                               clients        libraries



Design your media server
                       You can use a media server not only to back up itself, but also to back up other
                       systems and reduce or balance the load on your master server. With NetBackup,
                       the robotic control of a library can be on either the master server or the media
                       server.

Table 1-6         CPUs needed per master/media server component

Component                How many and what kind of component                  Number of CPU’s per
                                                                              component

Network cards            2-3 100BaseT cards                                   1

                         5-7 10BaseT cards                                    1

                         1 ATM card                                           1

                         1-2 Gigabit Ethernet cards with coprocessor          1

Tape drives              2 LTO gen 3 drives                                   1

                         2-3 SDLT 600 drives                                  1

                         2-3 LTO gen 2 drives                                 1

                         3-4 LTO gen 1 drives                                 1
32 NetBackup capacity planning
   Designing your backup system



   Table 1-6           CPUs needed per master/media server component

    Component                  How many and what kind of component            Number of CPU’s per
                                                                              component

    OS and NetBackup                                                          1


                           Table 1-7           Memory needed per master/media server component

                           Component                  Type of component    Memory per component

                           Network cards                                   16 megabytes

                           Tape drives                LTO gen 3 drive      256 megabytes

                                                      SDLT 600 drive       128 megabytes

                                                      LTO gen 2 drive      128 megabytes

                                                      LTO gen 1 drive      64 megabytes

                           OS and NetBackup                                1 gigabyte

                           OS, NetBackup, and NOM                          1 or more gigabytes

                           NetBackup multiplexing                          8 megabytes * (# streams) * (# drives)


                           The information in the above tables is a rough estimate only, intended as a
                           guideline for initial planning.
                           In addition to the above media server components, you must also add the
                           necessary disk drives to store the NetBackup catalog and your operating system.
                           The size of the disks needed to store your catalog depends on the calculations
                           explained earlier under “Calculate the size of your NetBackup catalog” on
                           page 22.


   Estimate the number of media servers needed
                           Here are some guidelines for estimating the number of media servers needed:
                           ■      I/O performance is generally more important than CPU performance.
                           ■      Consider CPU, I/O, and memory expandability when choosing a server.
                           ■      Consider how many CPUs are needed (see “CPUs needed per master/media
                                  server component” on page 31). Here are some general guidelines:
                                  Experiments (with Sun Microsystems) have shown that a useful,
                                  conservative estimate is 5MHz of CPU capacity per 1MB/second of data
                                  movement in and out of the NetBackup media server. Keep in mind that the
                                  operating system and other applications also use the CPU. This estimate is
                                  for the power available to NetBackup itself.
                                                                     NetBackup capacity planning   33
                                                                    Designing your backup system



                 Example:
                    A system backing up clients over the network to a local tape drive at the
                    rate of 10MB/second would need 100MHz of available CPU power:
                    50MHz to move data from the network to the NetBackup server
                    50MHz to move data from the NetBackup server to tape.
             ■   Consider how much memory is needed (see “Memory needed per
                 master/media server component” on page 32).
                 At least 512 megabytes of RAM is recommended if the server is running a
                 Java GUI. NetBackup uses shared memory for local backups. NetBackup
                 buffer usage will affect how much memory is needed. See the “Tuning the
                 NetBackup data transfer path” chapter for more information on NetBackup
                 buffers.
                 Keep in mind that non-NetBackup processes need memory in addition to
                 what NetBackup needs.
                 A media server moves data from disk (on relevant clients) to storage
                 (usually disk or tape). The server must be carefully sized to maximize
                 throughput. Maximum throughput is attained when the server keeps its
                 tape devices streaming. (For an explanation of streaming, see “Tape
                 streaming” on page 126.)
             Media server factors to consider for sizing include:
                 ■    Disk storage access time
                 ■    Adapter (for example, SCSI) speed
                 ■    Bus (for example, PCI) speed
                 ■    Tape device speed
                 ■    Network interface (for example, 100BaseT) speed
                 ■    Amount of system RAM
                 ■    Other applications, if the host is non-dedicated
             The platform chosen must be able to drive all network interfaces and keep all
             tape devices streaming.


Design your NOM server
             Before setting up a NetBackup Operations Manager (NOM) server, review the
             recommendations and requirements listed in the installation chapter of the
             NetBackup Operations Manager Guide. Some of the considerations are the
             following:
             ■   NOM server should be configured as a fixed host with a static IP address.
             ■   Symantec recommends that you not install the NOM server software on the
                 same server as NetBackup master or media server software. Installing NOM
                 on a master server may impact security and performance.
34 NetBackup capacity planning
   Designing your backup system




                          Note: To use NOM to monitor jobs, see “Using NOM to monitor jobs” on page 43.



                          Sizing considerations
                          The size of your NOM server depends largely on the number of NetBackup
                          objects that NOM manages. The NetBackup objects that determine the NOM
                          server size are the following:
                          ■       Number of master servers to manage
                          ■       Number of policies
                          ■       Number of jobs run per day
                          ■       Number of media
                          Based on the above factors, the following NOM server components should be
                          sized accordingly.

                           NOM server components

                           Disk space (for installed NOM binary + NOM database, described below)

                           Type and number of CPUs

                           RAM


                          The next section describes the NOM database and how it affects disk space
                          requirements, followed by a description of sizing guidelines for NOM.


                          NOM database
                          The Sybase database used by NOM is similar to that used by NetBackup and is
                          installed as part of the NOM server installation.
                          ■       Once you configure and add master servers to NOM, the disk space occupied
                                  by NOM depends on the volume of data initially loaded on the NOM server
                                  from the managed NetBackup servers.
                                  The initial data load on the NOM server is in turn dependent on the
                                  following data in the managed master servers:
                                  ■   Number of policy data records
                                  ■   Number of job data records
                                  ■   Number of media data records
                          ■       The rate of NOM database growth depends on the quantity of managed data.
                                  This data can be policy data, job data, or media data.
                                                          NetBackup capacity planning   35
                                                         Designing your backup system



For optimal performance and scalability, it is recommended that you manage
approximately a month of historical data.
To adjust database values for better NOM performance, see the topics under
“NetBackup Operations Manager (NOM)” on page 137.


Sizing guidelines
The following guidelines are presented in groups based on the number of objects
that your NOM server manages. The guidelines are intended for basic planning
purposes, and do not represent fixed recommendations or restrictions.
It is assumed that your NOM server is a standalone host (the host is not acting as
a NetBackup master server).


Note: Installation of NOM server software on the NetBackup master server is not
recommended.



Sizing Guidelines for NOM 6.0 MP5
If you are installing NOM 6.0 MP5, see Table 1-8 to choose the NetBackup
installation category that matches your site. Each category is based on the
number of master servers that NOM server manages, number of jobs per day
across all master servers, and so forth. Based on your NetBackup installation
category, you can determine the minimum hardware requirements for installing
NOM 6.0 MP5.

Table 1-8          NetBackup installation categories

NetBackup Maximum Maximum Maximum Maximum Maximum Maximum
installation master  jobs per number of policies alerts media
category     servers day      jobs in the
                              database

A             3           1000        100000      5000          1000        10000

B             10          10000       500000      50000         10000       300000

C             40          75000       1000000     50000         200000      300000



Note: If your installation is larger than those listed here (regarding number of
NetBackup master servers, number of jobs per day, and so forth), NOM behavior
is unpredictable. In that case, Symantec recommends using multiple NOM
servers.
36 NetBackup capacity planning
   Designing your backup system



                          With your NetBackup installation category (A, B, or C), use Table 1-9 to find the
                          minimum hardware requirements and recommended settings for NOM 6.0 MP5.
   Table 1-9         Minimum hardware requirements and recommended settings for NOM 6.0 MP5

      NetBackup OS                 CPU type      Number RAM Average       Recommended Recommended
      installation                               of CPUs    database      cache size for heap size (Web
      category                                              growth rate   Sybase         server and NOM
                                                            per day                      Server)

      A             Windows        Pentium III     1    2 GB     3 MB        512 MB           512 MB
                    2000/2003      or higher/
                                   Xeon

                    Solaris        Sun SPARC       1    2 GB     3 MB        512 MB           512 MB
                    8/9/10

      B             Windows        Pentium III     2    4 GB     30 MB         1 GB             1 GB
                    2000/2003      or higher/
                                   Xeon

                    Solaris        Sun SPARC       2    4 GB     30 MB         1 GB             1 GB
                    8/9/10

      C             Windows        Pentium III     4    8 GB    225 MB         2 GB             2 GB
                    2000/2003      or higher/
                                   Xeon

                    Solaris        Sun SPARC       4    8 GB    225 MB         2 GB             2 GB
                    8/9/10


                          For example, if your NetBackup setup falls in installation category B (Windows
                          environment), your NOM system must meet the following minimum hardware
                          requirements:
                          ■       CPU Type: Pentium III or higher, Xeon
                          ■       Number of CPUs required: 2
                          ■       RAM: 4 GB
                          For optimal performance, the average database growth rate per day on your
                          NOM system should be 30 MB per day or lower. The recommended cache size for
                          Sybase is 1 GB. The recommended heap size for the NOM server and Web server
                          is 1 GB.
                          To adjust the heap size for NOM server and NOM web server, see “Adjusting the
                          NOM server heap size” on page 137 and “Adjusting the NOM web server heap
                          size” on page 138.
                                                                  NetBackup capacity planning   37
                                                                 Designing your backup system



          Symantec recommends that you adjust the Sybase cache size after installing
          NOM. After you install NOM, the database size can grow rapidly as you add more
          master servers.
          See “Adjusting the Sybase cache size” on page 138.


Summary
          Using the guidelines provided in this chapter, design a solution that can do a full
          backup and incremental backups of your largest system within your time
          window. The remainder of the backups can happen over successive days.
          Eventually, your site may outgrow its initial backup solution. By following these
          guidelines, you can add more capacity at a future date without having to
          redesign your basic strategy. With proper design and planning, you can create a
          backup strategy that will grow with your environment.
          As outlined in the previous sections, the number and location of the backup
          devices are dependent on a number of factors.
          ■   The amount of data on the target systems,
          ■   The available backup and restore windows,
          ■   The available network bandwidth, and
          ■   The speed of the backup devices.
          If one drive causes backup window time conflicts, another can be added,
          providing an aggregate rate of two drives. The trade-off is that the second drive
          imposes extra CPU, memory, and I/O loads on the media server.
          If you find that you cannot complete backups in the allocated window, one
          approach is to either increase your backup window or decrease the frequency of
          your full and incremental backups.
          Another approach is to reconfigure your site to speed up overall backup
          performance. Before you make any such change, you should understand what
          determines your current backup performance. List or diagram your site network
          and systems configuration. Note the maximum data transfer rates for all the
          components of your backup configuration and compare these against the rate
          you must meet for your backup window. This will identify the slowest
          components and, consequently, the cause of your bottlenecks. Some likely areas
          for bottlenecks include the networks, tape drives, client OS load, and filesystem
          fragmentation.
38 NetBackup capacity planning
   Questionnaire for capacity planning




   Questionnaire for capacity planning
                              Use the following questionnaire to fill in information about the characteristics
                              of your systems and how they will be used. This data can help determine your
                              NetBackup client configurations and backup requirements.

   Table 1-10             Backup questionnaire

    Question                   Explanation

    System name                Any unique name to identify the machine. Hostname or any unique name for each
                               system.

    Vendor                     The hardware vendor who made the system (for example, Sun, HP, IBM, generic PC)

    Model                      For example: Sun E450, HP K580, Pentium II 300MHZ, HP Proliant 8500

    OS version                 For example: Solaris 9, HP-UX 11i, Windows 2000 DataCenter

    Building / location        Identify physical location by room, building, and/or campus.

    Total storage              Total available internal and external storage capacity.

    Used storage               Total used internal and external storage capacity - if the amount of data to be backed up
                               is substantially different from the amount used, please note that.

    Type of external array     For example: Hitachi, EMC, EMC CLARiiON, STK.

    Network connection         For example, 10/100MB, Gigabit, T1. It is important to know if the LAN is a switched
                               network or not.

    Database (DB)              For example, Oracle 8.1.6, SQLServer 7.

    Hot backup required?       If so, this requires the optional database agent if backing up a database.

    Key application            For example: Exchange server, accounting system, software developer's code repository,
                               NetBackup critical policies.

    Backup window              For example: incrementals run M-F from 11PM to 6AM, Fulls are all day Sunday. This
                               information helps determine where potential bottlenecks will be and how to configure a
                               solution.

    Retention policy           For example: incrementals for 2 weeks, full backups for 13 weeks. This information will
                               help determine how to size the number of slots needed in a library.

    Existing backup media      Type of media currently used for backups.

    Comments?                  Any special situations to be aware of? Any significant patches on the operating system?
                               Will the backups be over a WAN? Do the backups need to go through a firewall?
                                          Chapter                        2
Master server
configuration guidelines
      This chapter provides guidelines and recommendations for better performance
      on the NetBackup master server.
      This chapter includes the following sections:
      ■   “Managing NetBackup job scheduling” on page 40
      ■   “Miscellaneous considerations” on page 44
      ■   “Merging/splitting/moving servers” on page 48
      ■   “Guidelines for policies” on page 49
      ■   “Managing logs” on page 50
40 Master server configuration guidelines
   Managing NetBackup job scheduling




   Managing NetBackup job scheduling
                            This section discusses issues related to NetBackup job scheduling.


   Delays in starting jobs
                            The NetBackup Policy Execution Manager (nbpem) may not begin a backup at
                            exactly the time a backup policy's schedule window opens. This can happen
                            when you define a schedule or modify an existing schedule with a window start
                            time close to the current time.
                            For instance, suppose you create a schedule at 5:50 PM, specifying that backups
                            should start at 6:00 PM. You complete the policy definition at 5:55 PM. At 6:00
                            PM, you expect to see a backup job for the policy start, but it does not. Instead,
                            the job takes another several minutes to start.
                            The explanation is that NetBackup receives and queues policy change events as
                            they happen, but processes them periodically as configured in the Policy Update
                            Interval setting under Host Properties > Master Server > Properties > Global
                            Settings (the default is 10 minutes). The backup does not start until the first
                            time NetBackup processes policy changes after the policy definition is
                            completed at 5:55 PM. NetBackup may not process the changes until 6:05 PM.
                            For each policy change, NetBackup determines what needs to be done and
                            updates its work list accordingly.


   Delays in running queued jobs
                            If jobs remain in the queue and only one job runs at a time, make sure the
                            following attributes are set to allow jobs to run simultaneously:
                            ■    Host Properties > Master Server > Properties > Global Attributes >
                                 Maximum jobs per client (should be greater than 1).
                            ■    Host Properties > Master Server > Properties > Client Attributes setting for
                                 Maximum data streams (should be greater than 1).
                            ■    Policy attribute Limit jobs per policy (should be greater than 1).
                            ■    Policy schedule attribute Media multiplexing (should be greater than 1).
                            ■    Check the storage unit properties:
                                 ■    Is the storage unit enabled to use multiple drives (Maximum concurrent
                                      write drives)? If you want to increase this value, remember to set it to
                                      fewer than the number of drives available to this storage unit.
                                      Otherwise, restores and other non-backup activities will not be able to
                                      run while backups to the storage unit are running.
                                                               Master server configuration guidelines   41
                                                                Managing NetBackup job scheduling



                   ■    Is the storage unit enabled for multiplexing (Maximum streams per
                        drive)? You can write a maximum of 32 jobs to one tape at the same
                        time.


Job delays caused by unavailable media
               If the media in a storage unit are not configured or are unusable (such as being
               expired, or the maximum mounts setting was exceeded, or the wrong pool was
               selected), the job will fail if no other storage units are usable. If media are
               unavailable, new media will have to be added, or the media configuration will
               have to be changed to make media available (such as changing the volume pool
               or the maximum mounts).
               If the media in a storage unit are usable but are currently busy, the job will be
               queued. The NetBackup Activity Monitor should display the reason for the job
               queuing, such as “media are in use.” If the media are in use, the media will
               eventually stop being used and the job will run.


Delays after removing a media server
               A job may be queued by the NetBackup Job Manager (nbjm) if the media server is
               not available. This is not because of communication time-outs, but because EMM
               knows the media server is down and the NetBackup Resource Broker (nbrb)
               queues the request to be retried later.
               If a media server is configured in EMM but has been physically removed,
               powered off, or disconnected from the network, or if the network is down for any
               reason, the media and device selection logic of EMM will queue the job if no
               other media servers are available. The Activity Monitor should display the
               reason for the job queuing, such as “media server is offline.” Once the media
               server is online again in EMM, the job will start. In the meantime, if other media
               servers are available, the job will run on another media server.
               If a media server is not configured in EMM (removed from the configuration),
               regardless of the physical state of the media server, EMM will not select that
               media server for use. If no other media servers are available, the job will fail.


Limiting factors for job scheduling
               For every backup submitted, there may be one bprd process for the duration of
               the job. When many requests are submitted to NetBackup simultaneously,
               NetBackup will increase its use of memory and may eventually impact the
               overall performance of the system. This type of performance degradation is
               associated with the way a given operating system handles memory requests. It
               may affect the functioning of all applications running on the system in question,
               not just NetBackup.
42 Master server configuration guidelines
   Managing NetBackup job scheduling




                            Note: The Activity Monitor may not update if there are thousands of jobs to
                            view. If this happens, you may need to change the memory setting using the
                            NetBackup Java command jnbSA with the -mx option. Refer to the
                            “INITIAL_MEMORY, MAX_MEMORY” subsection in the NetBackup System
                            Administrator’s Guide for UNIX and Linux, Volume I. Note that this situation
                            does not affect NetBackup's ability to continue running jobs.



   Adjusting the server’s network connection options
                            When running many simultaneous jobs, the CPU utilization of the master server
                            may become very high. To reduce utilization and improve performance, adjust
                            the network connection options for the local machine on the Host Properties >
                            Master Server > Master Server Properties > Firewall display in the NetBackup
                            Administration Console (shown below), or you can add the following bp.conf
                            entry to the UNIX master server.
                                 CONNECT_OPTIONS = localhost 1 0 2
                            For an explanation of the CONNECT_OPTIONS values, refer to the NetBackup
                            System Administrator’s Guide for UNIX and Linux, Volume II.




                            The NetBackup Troubleshooting Guide also provides information on network
                            connectivity issues.
                                                             Master server configuration guidelines   43
                                                              Managing NetBackup job scheduling



Using NOM to monitor jobs
              NetBackup Operations Manager ((NOM) can be used to monitor the performance
              of NetBackup jobs. NOM can also manage and monitor dozens of NetBackup
              installations spread across multiple locations. Some of the features provided by
              NOM are the following:
              ■   Web-based interface for efficient, remote administration across multiple
                  NetBackup servers from a single, centralized console.
              ■   Policy-based alert notification, using predefined alert conditions to specify
                  typical issues or thresholds within NetBackup.
              ■   Operational reporting, on issues such as backup performance, media
                  utilization, and rates of job success.
              ■   Consolidated job and job policy views per server (or group of servers), for
                  filtering and sorting job activity.
              For more information on the capabilities of NOM, click Help from the title bar of
              the NOM console. Or see the NetBackup Operations Manager Guide.
              To design your NOM server and view the NOM sizing guidelines, see “Design
              your NOM server” on page 33.
              Information is also available on adjusting NOM performance. See the topics
              under “NetBackup Operations Manager (NOM)” on page 137.


Disaster recovery testing and job scheduling
              The following techniques may help in your disaster recovery testing.
              ■   Prevent the expiration of empty media.
                  a    Go to the following directory:
                       UNIX
                                cd /usr/openv/netbackup/bin
                       Windows
                                install_path\NetBackup\bin
                  b    Enter the following:
                                mkdir bpsched.d
                                cd bpsched.d
                                echo 0 > CHECK_EXPIRED_MEDIA_INTERVAL
              ■   Prevent the expiration of images
                  a    Go to the following directory:
                       UNIX
                                cd /usr/openv/netbackup
                       Windows
                                cd install_path\NetBackup
44 Master server configuration guidelines
   Miscellaneous considerations



                                 b    Enter the following:
                                      UNIX
                                               touch NOexpire
                                      Windows
                                               echo 0 > NOexpire
                            ■    Prevent backups from starting by shutting down bprd (NetBackup Request
                                 Manager). This will suspend scheduling of new jobs by nbpem. To shut down
                                 bprd, you can use the Activity Monitor in the NetBackup Administration
                                 Console.
                                 Restart bprd to resume scheduling.



   Miscellaneous considerations
                            Consider the following issues when planning for or troubleshooting NetBackup.


   Processing of storage units
                            NetBackup storage units are processed in alphabetical order. You can affect how
                            storage units are selected and therefore when media servers are used by being
                            aware of the alphabetical order of the name of each storage unit. You can also
                            have some control over load balancing by using storage unit groups.
                            Storage unit groups contain a list of storage units that are available for that
                            policy to use. A storage unit group can be configured to use storage units in any
                            of three ways, in the New Storage Unit Group dialog of the NetBackup
                            Administration Console.
                            ■    Use storage units in the order in which they are listed in the group.
                            ■    Choose the least recently selected storage unit in the group.
                            ■    Configure the storage unit group as a failover group. This means the first
                                 storage unit in the group will be the only storage unit used. If the storage
                                 unit is busy, then backups will queue. The second storage unit will only be
                                 used if the first storage unit is down.


   Disk staging
                            With disk staging, images can be created on disk initially, then copied later to
                            another media type (as determined in the disk staging schedule). The media type
                            for the final destination is typically tape, but could be disk. This two-stage
                            process leverages the advantages of disk-based backups in the near term, while
                            preserving the advantages of tape-based backups for long term.
                            Note that disk staging can be used to increase backup speed. For more
                            information, refer to the NetBackup System Administrator’s Guide, Volume I.
                                                                Master server configuration guidelines   45
                                                                         NetBackup catalog strategies



File system capacity
              There must be ample file system space for NetBackup to record its logging
              and/or catalog entries on each master server, media server, and client. If logging
              or catalog entries should exhaust available file system space, NetBackup will
              cease to function. Having the ability to increase the size of the file system via
              volume management is recommended. The disk containing the NetBackup
              master catalog should be protected with mirroring or RAID hardware or
              software technology.



NetBackup catalog strategies
              The NetBackup catalog resides on the disk of the NetBackup master server. The
              catalog consists of the following parts:
              ■   Image database: The image database contains information about what has
                  been backed up. It is by far the largest part of the catalog.
              ■   NetBackup data stored in relational databases: This includes the media and
                  volume data describing media usage and volume information which is used
                  during the backups.
              ■   NetBackup configuration files: Policy, schedule and other flat files used by
                  NetBackup.
              For more information on the catalog, refer to “Catalog Maintenance and
              Performance Optimization” in the NetBackup Administrator's Guide Volume 1.
              The NetBackup catalogs on the master server tend to grow large over time and
              eventually fail to fit on a single tape. Here is the layout of the first few directory
              levels of the NetBackup catalogs on the master server:
46 Master server configuration guidelines
   NetBackup catalog strategies



                            Figure 2-3                Directory layout on master Server

                                                                    /usr/openv/




           /db/data                     /Netbackup/db                        /var             /Netbackup/vault         /var/global
                                                         License key and authentication
                                                                  information


     NBDB.db                                                                                                        server.conf
     EMM_DATA.db                                                                                                    databases.conf
     EMM_INDEX.db
     NBDB.log
                           /class                               /error              /images                /vault
     BMRDB.db
                             /class_template          /config                                 /jobs
     BMRDB.log
                                                                 /failure_history
     BMR_DATA.db
                                            /client                                                   /media
     BMR_INDEX.db
     vxdbms.conf
                                                                 /client_1              /Master
                                                                         /Media_server        /client_n

     Relational database                                                     Image database                         Configuration files
             files




   Catalog backup types
                            In addition to the existing cold catalog backups (which require that no jobs be
                            running), NetBackup 6.0 introduces online “hot” catalog backups. These hot
                            catalog backups can be performed while other jobs are running.


                            Note: For NetBackup release 6.0 and beyond, it is recommended that you use
                            schedule-based, incremental hot catalog backups with periodic full backups as
                            your preferred catalog backup method.



   Guidelines for managing the catalog
                            ■       NetBackup catalog pathnames (cold catalog backups)
                                                             Master server configuration guidelines   47
                                                                      NetBackup catalog strategies



                  When defining the file list, use absolute pathnames for the locations of the
                  NetBackup and Media Manager catalog paths and include the server name
                  in the path. This is in case the media server performing the backup is
                  changed.
              ■   Back up the catalog using online, hot catalog backup
                  This type of catalog backup is for highly active NetBackup environments in
                  which continual backup activity is occurring. It is considered an online, hot
                  method because it can be performed while regular backup activity is taking
                  place. This type of catalog is policy-based and can span more than one tape.
                  It also allows for incremental backups, which can significantly reduce
                  catalog backup times for large catalogs.
              ■   Store the catalog on a separate file system
                  The NetBackup catalog can grow quickly depending on backup frequency,
                  retention periods, and the number of files being backed up. If you store the
                  NetBackup catalog data on its own file system, this ensures that other disk
                  resources, root file systems, and the operating system are not impacted by
                  the catalog growth. For information on how to move the catalog, refer to
                  “Catalog compression” on page 48.
              ■   Change the location of the NetBackup relational database files
                  The location of the NetBackup relational database files can be changed
                  and/or split into multiple directories for better performance. For example,
                  by placing the transaction log file, NBDB.log, on a physically separate drive,
                  you gain better protection against disk failure and increased efficiency in
                  writing to the log file. Refer to the procedure in the section “Moving NBDB
                  Database Files After Installation” in the “NetBackup Relational Database”
                  appendix of the NetBackup System Administrator’s Guide, Volume I.
              ■   Delay to compress catalog
                  The default value for this parameter is 0, which means that NetBackup does
                  not compress the catalog. As your catalog increases in size, you may want to
                  use a value between 10 and 30 days for this parameter. When you restore
                  old backups, which requires looking at catalog files that have been
                  compressed, NetBackup automatically uncompresses the files as needed,
                  with minimal performance impact. For information on how to compress the
                  catalog, refer to “Catalog compression” on page 48.


Catalog backup not finishing in the available window
              If your cold catalog backups are not finishing in the backup window, or hot
              catalog backups are running a long time, here are some possible solutions:
              ■   Use catalog archiving. Catalog archiving reduces the size of online catalog
                  data by relocating the large catalog .f files to secondary storage. NetBackup
48 Master server configuration guidelines
   Merging/splitting/moving servers



                                 administration will continue to require regularly scheduled catalog backups,
                                 but without the large amount of online catalog data, the backups will be
                                 faster.
                            ■    Off load some policies, clients, and backup images from the current master
                                 server to a new, additional master, so that each master has a window large
                                 enough to allow its catalog backup to finish. Since a media server can be
                                 connected to one master server only, additional media servers may be
                                 needed. For assistance in adding another master server to lighten the
                                 workload of the existing master, contact Symantec Consulting.
                            ■    Determine whether most of the catalog backup time is being used in
                                 expiring backup images. If this is the case, make sure the master's primary
                                 DNS server is available by running nslookup. The command should respond
                                 quickly. Also, investigate whether there are any media servers which no
                                 longer exist. The image cleaning operation will time out on these repeatedly,
                                 trying to expire fragments, if the media servers were not removed from the
                                 NetBackup configuration correctly.


   Catalog compression
                            When the NetBackup image catalog becomes too large for the available disk
                            space, there are two ways to manage this situation:
                            ■    Compress the image catalog
                            ■    Move the image catalog.
                            For details, refer to “Moving the Image Catalog” and “Compressing and
                            Uncompressing the Image Catalog” in the NetBackup System Administrator’s
                            Guide, Volume I.
                            Note that NetBackup compresses images after each backup session, regardless
                            of whether or not any backups were successful. This happens right before the
                            execution of the session_notify script and the backup of the catalog. The actual
                            backup session is extended until compression is complete.



   Merging/splitting/moving servers
                            A master server schedules and maintains backup information for a given set of
                            systems. The Enterprise Media Manager (EMM) server and its database maintain
                            centralized device and media related information used on all servers that are
                            part of the configuration. By default, the EMM server and the NetBackup
                            Relational Database (NBDB) that contains the EMM data are located on the
                            master server. A large and dynamic data center can expect to periodically
                            reconfigure the number and organization of its backup servers.
                                                              Master server configuration guidelines    49
                                                                              Guidelines for policies



               Centralized management, reporting, and maintenance are the benefits of
               working in a centralized NetBackup environment. Once a master server has
               been established, it is possible to merge its databases with another master
               server, giving control over its set of server backups to the new master server.
               Conversely, if the backup load on a master server has grown to the point where
               backups are not finishing in the backup window, it may be desirable to split that
               master server into two master servers.
               It is possible to merge or split NetBackup master servers or EMM servers. It is
               also possible to convert a media server to a master server or a master server to a
               media server. However, the procedures to accomplish this are complex and
               require a detailed knowledge of NetBackup database interactions. Merging or
               splitting NetBackup, Media Manager and EMM databases to another server is
               not recommended without involving a Symantec consultant to determine the
               changes needed, based on your specific configuration and requirements.


Moving the EMM server
               The EMM server can be moved from the master server to another server, to
               create a “remote” EMM server. In some cases, moving the EMM server off the
               master may improve capacity and performance for the NetBackup master server
               and for the EMM server. For assistance, refer to “Moving the NetBackup
               Database from One Host to Another” in Appendix A of the NetBackup System
               Administrator's Guide, Volume I.



Guidelines for policies
               The following items may have performance implications.


Include and exclude lists
               ■   Do not use excessive wild cards in file lists.
                   When wildcards are used, NetBackup compares every filename against the
                   wild cards. This decreases NetBackup performance. Instead of placing
                   /tmp/* (UNIX) or C:\Temp\* (Windows) in an include or exclude list, use
                   /tmp/ or C:\Temp.
               ■   Use exclude files to exclude large useless files.
                   Reduce the size of your backups by using exclude lists for the files your
                   installation does not need to preserve. For instance, you may decide to
                   exclude temporary files. Use absolute paths for your exclude list entries, so
                   that valuable files are not inadvertently excluded. Before adding files to the
                   exclude list, confirm with the affected users that their files can be safely
50 Master server configuration guidelines
   Managing logs



                                 excluded. Should disaster (or user error) strike, not being able to recover
                                 files costs much more than backing up extra data.
                                 When a policy specifies that all local drives be backed up
                                 (ALL_LOCAL_DRIVES), nbpem initiates a parent job (nbgenjob) that
                                 connects to the client and runs bpmount -i to get a list of mount points.
                                 Then nbpem initiates a job with its own unique job identification number
                                 for each mount point. Next the client bpbkar starts a stream for each job.
                                 Then, and only then, the exclude list is read by NetBackup. When the entire
                                 job is excluded, bpbkar exits with a status 0, stating that it sent 0 of 0 files
                                 to backup. The resulting image files are treated just as any other successful
                                 backup's image files. They expire in the normal fashion when the expiration
                                 date in the image header files specifies they are to expire.


   Critical policies
                            For online, hot catalog backups (a new feature in NetBackup 6.0), make sure to
                            identify those policies that are crucial to recovering your site in the event of a
                            disaster. For more information on hot catalog backup and critical policies, refer
                            to the NetBackup System Administrator’s Guide, Volume I.


   Schedule frequency
                            To minimize the number of times you back up files that have not changed, and
                            to minimize your consumption of bandwidth, media, and other resources,
                            consider limiting the frequency of your full backups to monthly or even
                            quarterly, followed by weekly cumulative incremental backups and daily
                            incremental backups.



   Managing logs
   Optimizing the performance of vxlogview
                            As explained in the NetBackup Troubleshooting Guide, the vxlogview command
                            is used for viewing logs created by unified logging (VxUL). The vxlogview
                            command will deliver optimum performance when a file ID is specified in the
                            query.
                            For example: when viewing messages logged by the NetBackup Resource Broker
                            (nbrb) for a given day, you can filter out the library messages while viewing the
                            nbrb logs. To achieve this, run vxlogview as follows:
                                 vxlogview –o nbrb –i nbrb –n 0
                            Note that -i nbrb specifies the file ID for nbrb. Specifying the file ID improves
                            the performance, because the search is confined to a smaller set of files.
                                                                         Master server configuration guidelines   51
                                                                                                Managing logs



Interpreting legacy error logs
                 This section describes the fields in the legacy log files written to the
                 /usr/openv/netbackup/db/error directory on UNIX (the
                 install_path\NetBackup\db\error folder on Windows). On UNIX, there is
                 a link to the most current file in the error directory; the link is called
                 daily_messages.log. Note that the information in these logs provides the
                 basis for the NetBackup ALL LOG ENTRIES report. For more information on
                 legacy logging and unified logging (VxUL), refer to the NetBackup
                 Troubleshooting Guide.
                 Here is a sample message from an error log:
                 1021419793 1 2 4 nabob 0 0 0 *NULL* bpjobd TERMINATED bpjobd
                 The meaning of the various fields in this message (the fields are delimited by
                 blanks) is defined in Table 2-11. Table 2-12 lists the values for the message type,
                 which is the third field in the log message.

Table 2-11   Meaning of daily_messages log fields

Field              Definition                                        Value

1                  Time this event occurred (ctime)                  1021419793 (= number of seconds since
                                                                     1970)

2                  Error database entry version                      1

3                  Type of message                                   2

4                  Severity of error:                                4
                   1: Unknown
                   2: Debug
                   4: Informational
                   8: Warning
                   16: Error
                   32: Critical

5                  Server on which error was reported                nabob

6                  Job ID (included if pertinent to the log entry)   0

7                  (optional entry)                                  0

8                  (optional entry)                                  0

9                  Client on which error occurred, if applicable,    *NULL*
                   otherwise *NULL*

10                 Process which generated the error message         bpjobd
52 Master server configuration guidelines
   Managing logs



   Table 2-11          Meaning of daily_messages log fields (continued)

    Field                        Definition                               Value

    11                        Text of error message                       TERMINATED bpjobd




                            Table 2-12         Message types

                             Type Value                   Definition of this Message Type

                             1                            Unknown

                             2                            General

                             4                            Backup

                             8                            Archive

                             16                           Retrieve

                             32                           Security

                             64                           Backup status

                             128                          Media device
                                          Chapter                              3
Media server configuration
guidelines
      This chapter provides configuration guidelines for the media server along with
      related background information.
      This chapter includes the following sections:
      ■   “Network and SCSI/FC bus bandwidth” on page 54
      ■   “How to change the threshold for media errors” on page 54
      ■   “How to reload the st driver without rebooting Solaris” on page 57
      ■   “How to reload the st driver without rebooting Solaris” on page 57
      ■   “Media Manager drive selection” on page 58
      ■   “Robot types and NetBackup port configuration” on page 58
54 Media server configuration guidelines
   Network and SCSI/FC bus bandwidth




   Network and SCSI/FC bus bandwidth
                            Configure no more than two high-performance tape drives per
                            SCSI/fibre-channel connection. A SCSI/fibre-channel configuration should be
                            able to handle both drives at maximum rated compression. Tape drive wear and
                            tear is much reduced and efficiency increased if the data stream matches the
                            tape drive capacity and is sustained.


                            Note: Make sure that both your inbound network connection and your SCSI/FC
                            bus have enough bandwidth to feed all of your tape drives.

                            Example:
                               iSCSI (360 GB/hour)
                               Two LTO gen 3 drives, each rated at approximately 300 GB/hour (2:1
                               compression)
                            In this example, the tape drives require more speed than provided by the iSCSI
                            bus. Only one tape drive will stream given this configuration. The solution is to
                            add a second iSCSI bus, or to move to a connection that is fast enough to
                            efficiently feed data to the tape drives.



   How to change the threshold for media errors
                            Some backup failures can occur because there is no media available. If you see
                            this kind of error, you can execute the following script and then run the
                            NetBackup Media List report to check the status of media:
                            UNIX
                                 /usr/openv/netbackup/bin/goodies/available_media
                            Windows
                                 install_path\NetBackup\bin\goodies\available_media
                            The NetBackup Media List report may show that some media is frozen and
                            therefore cannot be used for backups.
                            One of the reasons NetBackup freezes media is because of recurring I/O errors.
                            The NetBackup Troubleshooting Guide describes the recommended approaches
                            for dealing with this issue, for example, under NetBackup error code 96. It is also
                            possible to configure the NetBackup error threshold value. The method for
                            doing this is described in this section.
                            Each time a read, write, or position error occurs, NetBackup records the time,
                            media ID, type of error, and drive index in the EMM database. Then NetBackup
                            scans to see whether that media has had “m” of the same errors within the past
                            “n” hours. The variable “m” is a tunable parameter known as
                            media_error_threshold. The default value of media_error_threshold is 2 errors.
                                                            Media server configuration guidelines   55
                                                     How to change the threshold for media errors



              The variable “n” is known as time_window. The default value of time_window is
              12 hours. If a tape volume has more than media_error_threshold errors,
              NetBackup will take the appropriate action:
              ■   If the volume has not been previously assigned for backups, then
                  NetBackup will:
                  ■   set the volume status to FROZEN
                  ■   select a different volume
                  ■   log an error
              ■   If the volume is in the NetBackup media catalog and has been previously
                  selected for backups, then NetBackup will:
                  ■   set the volume to SUSPENDED
                  ■   abort the current backup
                  ■   log an error


Adjusting media_error_threshold
              To configure the NetBackup media error thresholds, use the nbemmcmd
              command on the media server as follows. NetBackup freezes a tape volume or
              downs a drive for which these values are exceeded. For more detail on the
              nbemmcmd command, refer to the man page or to the NetBackup Commands
              Guide.
              UNIX
                  /usr/openv/netbackup/bin/admincmd/nbemmcmd -changesetting
                  -time_window unsigned integer -machinename string
                  -media_error_threshold unsigned integer -drive_error_threshold
                  unsigned integer
              Windows
                  <install_path>\NetBackup\bin\admincmd\nbemmcmd.exe
                  -changesetting -time_window unsigned integer -machinename string
                  -media_error_threshold unsigned integer -drive_error_threshold
                  unsigned integer
              For example, if the -drive_error_threshold is set to the default value of 2,
              the drive is downed after 3 errors in 12 hours. If the
              -drive_error_threshold is set to a value of 6, it would take 7 errors in the
              same 12 hour period before the drive would be downed.
56 Media server configuration guidelines
   How to change the threshold for media errors




                            Note: The following description has nothing to do with the number of times
                            NetBackup retries a backup/restore that fails. That situation is controlled by the
                            global configuration parameter “Backup Tries” for backups and the bp.conf
                            entry RESTORE_RETRIES for restores. This algorithm merely deals with
                            whether I/O errors on tape should cause media to be frozen or drives to be
                            downed.

                            When a read/write/position error occurs on tape, the error returned by the
                            operating system does not distinguish between whether the error is caused by
                            the tape or the drive. To prevent the failure of all backups in a given timeframe,
                            bptm tries to identify a bad tape volume or drive based on past history, using the
                            following logic:
                            ■    Each time an I/O error occurs on a read/write/position, bptm logs the error
                                 in the file /usr/openv/netbackup/db/media/errors (UNIX) or
                                 install_path\NetBackup\db\media\errors (Windows). The error
                                 message includes the time of the error, media ID, drive index and type of
                                 error.
                                 Examples of the entries in this file are these:
                                 07/21/96 04:15:17 A00167 4 WRITE_ERROR
                                 07/26/96 12:37:47 A00168 4 READ_ERROR
                            ■    Each time an entry is made, the past entries are scanned to determine if the
                                 same media ID and/or drive has had this type of error in the past “n” hours.
                                 “n” is known as the time_window. The default time window is 12 hours.
                                 When performing the history search for the time_window entries, EMM
                                 notes past errors that match the media ID, the drive, or both the drive and
                                 the media ID. The purpose of this is to determine the cause of the error. For
                                 example, if a given media ID gets write errors on more than one drive, it is
                                 assumed that the tape volume is bad and NetBackup freezes the volume. If
                                 more than one media ID gets a particular error on the same drive, it is
                                 assumed the drive is bad and the drive goes to a “down” state. If only past
                                 errors are found on the same drive with the same media ID, then EMM
                                 assumes that the volume is bad and freezes it.
                            ■    Freezing or downing does not occur on the first error. There are two other
                                 parameters, media_error_threshold and drive_error_threshold. The default
                                 value of both of these parameters is 2. For a “freeze” or “down” to happen,
                                 more than the threshold number of errors must occur (by default, at least
                                 three errors must occur) in the time window for the same drive/media ID.
                                                             Media server configuration guidelines     57
                                               How to reload the st driver without rebooting Solaris




           Note: If either media_error_threshold or drive_error_threshold is 0, freezing or
           downing occurs the first time any I/O error occurs. media_error_threshold is
           looked at first, so if both values are 0, freezing will override downing. It is not
           recommended that these values be set to 0.

               Changing the default values is not recommended, unless there is a good
               reason to do so. One obvious change would be to put very large numbers in
               the THRESHOLD files, thus basically disabling the mechanism such that to
               “freeze” a tape or “down” a drive should never occur.
               Freezing and downing is primarily intended to benefit backups. If read
               errors occur on a restore, freezing media has little effect. NetBackup still
               accesses the tape to perform the restore. In the restore case, downing a bad
               drive may help.



How to reload the st driver without rebooting
Solaris
           The devfsadmd daemon enhances device management in Solaris. This daemon
           is capable of dynamically reconfiguring devices during the boot process and in
           response to kernel event notification.
           The devfsadm located in /usr/sbin is the command form of devfsadmd.
           devfsadm replaces drvconfig (for management of physical device tree
           /devices) and devlinks (for management of logical devices in /dev). devfsadm
           also replaces the commands for specific device class types, such as
           /usr/sbin/tapes.
           Thus, in order to recreate tape devices for NetBackup after changing the
           /kernel/drv/st.conf file without rebooting the server, perform the
           following steps:

           To reload the st driver without rebooting
           1   Shutdown the NetBackup and Media Manager daemons.
           2   Obtain the module id for the st driver in kernel:
               /usr/sbin/modinfo | grep SCSI
               The module id is the first field in the line corresponding to the SCSI tape
               driver.
           3   Unload the st driver from the kernel:
               /usr/sbin/modunload -i "module id"
58 Media server configuration guidelines
   Media Manager drive selection



                            4    Use devfsadm to recreate the device nodes in /devices and the device
                                 links in /dev for tape devices by running any one (not all) of the following
                                 commands:
                                      /usr/sbin/devfsadm -i st
                                      /usr/sbin/devfsadm -c tape
                                      /usr/sbin/devfsadm -C -c tape (Use this command to enforce cleanup if
                                          dangling logical links are present in /dev.)
                            5    Reload the st driver:
                                      /usr/sbin/modload st
                            6    Restart the NetBackup and Media Manager daemons.



   Media Manager drive selection
                            Once the media and device selection logic (MDS) in the EMM service determines
                            which storage unit to use, MDS attempts to select a drive that matches the
                            storage unit selection criterion, such as media server, robot number, robot type,
                            and density. MDS prefers loaded drives over unloaded drives (a loaded drive
                            removes the overhead of loading a media in a drive). If no loaded drives are
                            available, MDS attempts to select the best usable drive suited for the job. In
                            general, MDS prefers non-shared drives over shared drives, and it attempts to
                            select the least recently used drive.



   Robot types and NetBackup port configuration
                            There is no facility in NetBackup to route ACSLS communications through any
                            server other than the one where the backup or restore is taking place. Unlike the
                            tldd/tldcd design for TLD robotic control which requires one point of robotic
                            control, acsd is a single robotic daemon that runs on each server with ACS
                            drives attached. The same is true for TLM (ADIC SDLC and ADIC DAS controlled
                            libraries) where tlmd runs on each server with TLM drives. Such robot types,
                            which have no single point of robotic control, provide resiliency in case of a
                            NetBackup server failure. However, extra planning may be required to
                            accommodate them given your firewall requirements.
                            ACS has been enhanced to be more firewall friendly. For more information, refer
                            to the “STK Automated Cartridge System (ACS)” appendix of the NetBackup
                            Media Manager System Administrator’s Guide.
                                          Chapter                        4
Media configuration
guidelines
      This chapter provides guidelines and recommendations for better performance
      with NetBackup media.
      This chapter includes the following sections:
      ■   “Dedicated or shared backup environment” on page 60
      ■   “Pooling” on page 60
      ■   “Disk versus tape” on page 60
60 Media configuration guidelines
   Dedicated or shared backup environment




   Dedicated or shared backup environment
                          One design decision is whether to make your backup environment dedicated or
                          shared. Dedicated SANs are secure but expensive. Shared environments cost
                          less, but require more work to make them secure. A SAN installation with a
                          database may require the performance of a RAID 1 array. An installation
                          backing up a basic file structure may satisfy its needs with RAID 5 or NAS.



   Pooling
                          Here are some useful conventions for media pools (formerly known as volume
                          pools):
                          ■    Configure a scratch pool for management of scratch tapes. If a scratch pool
                               exists, EMM can move volumes from that pool to other pools that do not
                               have volumes available.
                          ■    Use the available_media script in the goodies directory. You can put the
                               available_media report into a script which redirects the report output to a
                               file and emails the file to the administrators daily or weekly. This helps
                               track which tapes are full, frozen, suspended, and so on. By means of a
                               script, you can also filter the output of the available_media report to
                               generate custom reports.
                               To monitor media, you can also use the NetBackup Operations Manager
                               (NOM). For instance, NOM can be configured to issue an alert if there are
                               fewer than X number of media available, or if more than X% of the media is
                               frozen or suspended.
                          ■    Use the none pool for cleaning tapes.
                          ■    Do not create too many pools. The existence of too many pools causes the
                               library capacity to become fragmented across the pools. Consequently, the
                               library becomes filled with many partially-full tapes.



   Disk versus tape
                          Disk is becoming more common as a backup medium. Storing backup data on
                          disk generally provides faster restore.
                          Tuning disk-based storage for performance is similar to tuning tape-based
                          storage. The optimal buffer settings for a site can vary according to its
                          configuration. It takes thorough testing to determine these settings.
                          Disk-based backup storage can be useful if you have a lot of incremental backups
                          and the percentage of data change is small. If the volume of data in incremental
                                                      Media configuration guidelines   61
                                                                   Disk versus tape



copies is insufficient to ensure streaming to tape drives, writing to disk can
speed the backup process and alleviate wear and tear on your tape drives.
Here are some factors to consider when choosing to back up a given dataset to
disk or tape:
■   Short or long retention period
■   Incremental or full backup
■   Intermediate (staging) or long-term storage
■   Delay in recovery time
Here are some benefits of backing up to disk rather than tape:
    ■    No need to multiplex
         Writing to disk does not need to be streamed. This means that
         multiplexing is not necessary.
         Multiplexing is only necessary with tape because the tape must be
         streamed. Multiplexing allows multiple clients and multiple file
         systems to be backed up to the same tape simultaneously, thus
         streaming the drive. However, this functionality slows down the
         restore. (See “Tape streaming” on page 126 for an explanation of
         streaming.)
    ■    Instant access to data
         Most tape drives on the market have a “time to data” of close to two
         minutes. This time includes the amount of time to move the tape from
         its slot, load it into the drive and seek an appropriate place on tape.
         Disk has an effective time to data of zero seconds. To understand the
         significance of eliminating this delay, consider that restoring a large
         file system whose backups reside on 30 different tapes means that a
         two-minute delay per tape adds almost two hours to the restore. This
         includes the time it takes to eject and unload the 30 tapes.
    ■    Fewer full backups.
         With tape-based systems, full backups must be done regularly because
         of the instant access to data issue described above. Otherwise, the
         number of tapes required for a restore significantly increases both the
         time to restore and the chance that a single tape will cause the restore
         to fail. Since disk arrays are protected by RAID software, they do not
         have this problem.
62 Media configuration guidelines
   Disk versus tape
                                          Chapter                   5
Database backup
guidelines
      This chapter gives planning guidelines for database backup.
      This chapter includes the following sections:
      ■   “Introduction” on page 64
      ■   “Considerations for database backups” on page 64
64 Database backup guidelines
   Introduction




   Introduction
                          Before you create a database, decide how to protect the database against
                          potential failures. Answer the following questions before developing your
                          backup strategy.
                          ■     Is it acceptable to lose any data if a hardware failure damages some of the
                                files that constitute a database?
                          ■     Will you ever need to recover to past points-in-time?
                          ■     Does the database need to be available at all times (24x7)?
                          For specific information on backing up and restoring your database, refer to the
                          NetBackup administrator’s guide for your database product. In addition, the
                          manufacturer of your database product may provide publications that document
                          backup recommendations and methods.



   Considerations for database backups
                          When planning your database backups, consider the following.
                          ■     Fragmentation and databases
                                Using a smaller fragment size in a backup of a database such as Oracle will
                                not improve backup performance, and may hinder restore performance.
                                Database backups (when not using Advanced Client) are unaffected by
                                fragmentation since there is only one “file” per backup image. There is no
                                advantage in tape positioning with or without fast-locate blocks.
                          ■     Using Advanced Client
                                NetBackup Advanced Client provides snapshot backup technology
                                combined with off-host data movement for local networks and SAN
                                environments. A data snapshot can be created on disk in seconds and then
                                backed up directly to tape. Users can significantly reduce CPU and I/O
                                overhead from application or database servers while eliminating the backup
                                window altogether.
                                Advanced Client helps reduce the impact on applications that require
                                24x7xforever availability. Advanced Client is available on UNIX and
                                Windows systems, and supports all NetBackup libraries and drives. It can be
                                used with multi-streaming and multiplexing, and with a variety of disk
                                arrays.
                                           Chapter                           6
Best practices
       This chapter describes an assortment of best practices, and includes the
       following sections:
       ■   “Best practices: new tape drive technologies” on page 66
       ■   “Best practices: tape drive cleaning” on page 66
       ■   “Best practices: storing tape cartridges” on page 68
       ■   “Best practices: recoverability” on page 68
       ■   “Best practices: naming conventions” on page 71
66 Best practices
   Best practices: new tape drive technologies




   Best practices: new tape drive technologies
                             Symantec provides a white paper on best practices for migrating your
                             NetBackup installation to new tape technologies:
                             “Best Practices: Migrating to or Integrating New Tape Drive Technologies in
                             Existing Libraries,” available at www.support.veritas.com.
                             Recent tape drives offer noticeably higher capacity than the previous generation
                             of tape drives targeted at the open-systems market. Administrators may want to
                             take advantage of these higher-capacity, higher performance tape drives, but
                             are concerned about integrating these into an existing tape library. The white
                             paper discusses different methods for doing so and the pros and cons of each.



   Best practices: tape drive cleaning
                             This section discusses several ways to clean tape drives. Refer to the NetBackup
                             Media Manager System Administrator’s Guide for details on how to use the
                             methods discussed here.


                             Note: The TapeAlert feature is discussed in detail later in this section.

                             Here are the tape drive cleaning methods that can be used in a NetBackup
                             installation:
                             ■    Frequency-based cleaning
                             ■    On-demand cleaning
                             ■    TapeAlert
                             ■    Robotic cleaning

                             Frequency-based cleaning
                             NetBackup does frequency-based cleaning by tracking the number of hours a
                             drive has been in use. When this time reaches a configurable parameter,
                             NetBackup creates a job that mounts and exercises a cleaning tape. This cleans
                             the drive in a preventive fashion. The advantage of this method is that typically
                             there are no drives unavailable awaiting cleaning. There is also no limitation on
                             platform or robot type. On the downside, cleaning is done more often than
                             necessary. This adds system wear and consumes time that could be used to write
                             to the drive. Another limitation is that this method is hard to tune. When new
                             tapes are used, drive cleaning is needed less frequently; the need for cleaning
                             increases as the tape inventory ages. This increases the amount of tuning
                             administration needed and, consequently, the margin of error.
                                                                          Best practices   67
                                                     Best practices: tape drive cleaning



On-demand cleaning
Refer to the NetBackup Media Manager System Administrator’s Guide for more
information on this topic.

TapeAlert
TapeAlert allows reactive cleaning for most drive types. TapeAlert allows a tape
drive to notify EMM when it needs to be cleaned. EMM then performs the
cleaning. You must have a cleaning tape configured in at least one library slot in
order to utilize this feature. TapeAlert is the recommended cleaning solution if
it can be implemented.
Not all drives, at all firmware levels, support this type of reactive cleaning. In
the case where reactive cleaning is not supported on a particular drive,
frequency-based cleaning may be substituted. This solution is not vendor or
platform specific. The specific firmware levels have not been tested by
Symantec, however the vendor should be able to confirm that the TapeAlert
feature is supported.
■   How TapeAlert works
    To understand NetBackup's behavior with drive-cleaning TapeAlerts, it is
    important to understand the TapeAlert interface to a drive. The TapeAlert
    interface to a tape drive is via the SCSI bus, based on a Log Sense page,
    which contains 64 alert flags. The conditions that cause a flag to be set and
    cleared are device-specific and are determined by the device vendor.
    The configuration of the Log Sense page is via a Mode Select page. The
    Mode Sense/Select configuration of the TapeAlert interface is compatible
    with the SMART diagnostic standard for disk drives.
    NetBackup reads the TapeAlert Log Sense page at the beginning and end of
    a write/read job. TapeAlert flags 20 to 25 are used for cleaning management
    although some drive vendors’ implementations may vary from this.
    NetBackup uses TapeAlert flag 20 (Clean Now) and TapeAlert flag 21 (Clean
    Periodic) to determine when it needs to clean a drive.
    When a drive is selected by NetBackup for a backup, the Log Sense page is
    reviewed by bptm for status. If one of the clean flags is set, the drive will be
    cleaned before the job starts.
    If a backup is in progress and one of the clean flags is set, the flag is not read
    until a tape is dismounted from the drive.
    If a job spans media and, during the first tape, one of the clean flags is set,
    the cleaning light comes on and the drive will be cleaned before the second
    piece of media is mounted in the drive.
    The implication is that the present job will conclude its ongoing write
    despite a TapeAlert Clean Now or Clean Periodic message. That is, the
    TapeAlert will not require the loss of what has been written to tape so far.
68 Best practices
   Best practices: storing tape cartridges



                                  This is true regardless of the number of NetBackup jobs involved in writing
                                  out the rest of the media.
                                  Note that the behavior described here may change in the future.
                                  If a large number of media become FROZEN as a result of having
                                  implemented TapeAlert, there is a strong likelihood of underlying media
                                  and/or tape drive issues.
                             ■    Disabling TapeAlert
                                  To disable TapeAlert, create a touch file called NO_TAPEALERT:
                                  UNIX:
                                      /usr/openv/volmgr/database/NO_TAPEALERT
                                  Windows:
                                      install_path\volmgr\database\NO_TAPEALERT

                             Robotic cleaning
                             Robotic cleaning is not proactive, and is not subject to the limitations detailed
                             above. By being reactive, unnecessary cleanings are eliminated, frequency
                             tuning is not an issue, and the drive can spend more time moving data, rather
                             than in maintenance operations.
                             Library-based cleaning is not supported by EMM for most robots, since robotic
                             library and operating systems vendors have implemented this type of cleaning
                             in many different ways.



   Best practices: storing tape cartridges
                             A couple of issues with tape management are how long and where to store the
                             tapes that you need to keep on site. Typically, a DLT tape can be stored for up to
                             three years. The storage location should be climate-controlled and away from
                             sunlight. In addition, the tapes should always be stored in their plastic boxes.



   Best practices: recoverability
                             Recovering from data loss involves both planning and technology to support
                             your recovery objectives and time frames. Table 6-13, describes how you can use
                             NetBackup and other tools to recover from various mishaps or disaster. The
                             methods and procedures you adopt for your installation should be documented
                             and tested regularly to ensure that your installation can recover from disaster.

   Table 6-13           Methods and procedures for recoverability

    Operational Risk                     Recovery Possible?          Methods and Procedures

    File deleted before backup           No                          None
                                                                                                   Best practices    69
                                                                                    Best practices: recoverability



Table 6-13         Methods and procedures for recoverability (continued)

Operational Risk                    Recovery Possible?            Methods and Procedures

File deleted after backup           Yes                           Standard NetBackup restore procedures

Backup client failure                Yes                          Data recovery using NetBackup

Media failure                       Yes                           Backup image duplication

Master/media server failure         Yes                           Manual failover to alternate server

Loss of backup database             Yes                           NetBackup database recovery

No NetBackup software               Yes                           If multiplexing was not used, recovery of
                                                                  media without NetBackup, using GNU tar

Complete site disaster              Yes                           Vaulting and off site media storage


                          Additional material may be found in the following books:
                          ■   The Resilient Enterprise, Recovering Information Services from Disasters, by
                              Symantec and industry authors, published by Symantec Software
                              Corporation.
                          ■   Blueprints for HIGH Availability, Designing Resilient Distributed Systems, by
                              Even Marcus and Hal Stern, published by John Wiley and Sons.
                          ■   Implementing Backup and Recovery: The Readiness Guide for the Enterprise,
                              by David B. Little and David A. Chapa, published by Wiley Technology
                              Publishing.


Suggestions for data recovery planning
                          It is important to have a well-documented and tested plan to recover from a
                          logical error, an operator error, or a site disaster. The following practices have
                          been found effective for recoverability in production environments. Refer also
                          to the NetBackup Troubleshooting Guide and the NetBackup System
                          Administrator's Guide for further information on disaster recovery.
                          ■   Always use a regularly scheduled hot catalog backup
                              Refer to “Catalog Recovery from an Online Backup” in the NetBackup
                              Troubleshooting Guide.
                          ■   Review the disaster recovery plan often
                              Review your site-specific recovery procedures and verify that they are
                              accurate and up-to-date. Also, verify that the more complex systems, such
                              as the NetBackup master and media servers, have current procedures for
                              rebuilding the machines with the latest software.
70 Best practices
   Best practices: recoverability



                             ■      Perform test recoveries on a regular basis
                                    Implement a plan to perform restores of various systems to alternate
                                    locations. This plan should include selecting random production backups
                                    and restoring the data to a non-production system. A checksum can then be
                                    performed on one or many of the restored files and compared to the actual
                                    production data. Be sure to include offsite storage as part of this testing.
                                    The end-user or application administrator can also be involved in
                                    determining the integrity of the restored data.
                             ■      Support NetBackup recoverability:
                                    ■   Back up the NetBackup catalog to two tapes.
                                        The catalog contains information vital for NetBackup recovery. Its loss
                                        could result in hours or days of recovery time through manual
                                        processes. The cost of a single tape is a small price to pay for the added
                                        insurance of rapid recovery in the event of an emergency.
                                    ■   Back up the catalog after each backup.
                                        If a hot catalog backup is used, an incremental catalog backup can be
                                        done after each backup session. Extremely busy backup environments
                                        should also use a scheduled hot catalog backup, since their backup
                                        sessions end infrequently.
                                        In the event of a catastrophic failure, the recovery of images is slowed
                                        by not having all images available. If a manual backup occurs just
                                        before the master server or the drive that contains the backed-up files
                                        crashes, the manual backup must be imported to recover the most
                                        recent version of the files.
                                    ■   Record the IDs of catalog backup tapes.
                                        Record the catalog tapes in the site run book or another public location
                                        to ensure rapid identification in the event of an emergency. If the
                                        catalog tapes are not identified ahead of time, a significant amount of
                                        time may be lost by scanning every tape in a library to find them.
                                        The utility vmphyinv can be used to mount all tapes in a robotic library
                                        and identify the catalog tape(s). The vmphyinv utility will identify cold
                                        catalog tapes.
                                    ■   Designate label prefixes for catalog backups.
                                        Make it easy to identify the NetBackup catalog data in times of
                                        emergency. Label the catalog tapes with a unique prefix such as “DB”
                                        on the tape barcodes, so your operators can find the catalog tapes
                                        without delay.
                                    ■   Place NetBackup catalogs in specific robot slots.
                                        Place a catalog backup tape in the first or last slot of a robot to more
                                        easily identify the tape in an emergency. This also allows for easy tape
                                        movement if manual tape handling is necessary.
                                                                                       Best practices   71
                                                                  Best practices: naming conventions



                   ■    Put the NetBackup catalog on different online storage than the data
                        being backed up.
                        In the case of a site storage disaster, the catalogs of the backed-up data
                        should not reside on the same disks as production data. The reason
                        behind this is straightforward: you want to avoid the case where, if a
                        disk drive loses production data, it also loses the catalog of the
                        production data, resulting in increased downtime.
                   ■    Regularly confirm the integrity of the NetBackup catalog.
                        On a regular basis, such as quarterly or after major operations or
                        personnel changes, walk through the process of recovering a catalog
                        from tape. This essential part of NetBackup administration can save
                        hours in the event of a catastrophe.



Best practices: naming conventions
               Use a consistent naming convention on all NetBackup master servers. Examples
               are provided below. Use lower case for all names. In most cases, the case will not
               cause issues, but some issues can occur when the installation comprises UNIX
               and Windows master and media servers.


Policy names
               One good naming convention for policies is platform_datatype_server(s).
               Example 1: w2k_filesystems_trundle
               This policy name designates a policy for a single Windows server doing file
               system backups.
               Example 2: w2k_sql_servers
               This policy name designates a policy for backing up a set of Windows 2000 SQL
               servers. Several servers may be backed up by this policy. Servers that are
               candidates for being included in a single policy are those running the same
               operating system and with the same backup requirements. Grouping servers
               within a single policy reduces the number of policies and eases the management
               of NetBackup.


Schedule names
               Create a generic scheme for schedule naming. One recommended set of schedule
               names is daily, weekly, and monthly. Another recommended set of names is
               incremental, cumulative, and full. This convention keeps the management of
               NetBackup at a minimum. It also helps with the implementation of Vault, if your
               site uses Vault.
72 Best practices
   Best practices: naming conventions



   Storage unit/storage group names
                           A good naming convention for storage units is to name the storage unit after the
                           media server and the type of data being backed up.
                           Two examples: mercury_filesystems and mercury_databases
                           where “mercury” is the name of the media server and “filesystems” and
                           “databases” identify the type of data being backed up.
                Section II


Performance tuning

      Section II explains how to measure your current NetBackup performance, and
      gives general recommendations and examples for tuning NetBackup.
      Section II includes these chapters:
      ■   Measuring performance
      ■   Tuning the NetBackup data transfer path
      ■   Tuning other NetBackup components
      ■   Tuning disk I/O performance
      ■   OS-related tuning factors
      ■   Additional resources
74
                                          Chapter                           7
Measuring performance
      This chapter provides suggestions for measuring NetBackup performance.
      This chapter includes the following sections:
      ■   “Overview” on page 76
      ■   “Controlling system variables for consistent testing conditions” on page 76
      ■   “Evaluating performance” on page 79
      ■   “Evaluating UNIX system components” on page 84
      ■   “Evaluating Windows system components” on page 85
76 Measuring performance
   Overview




   Overview
                           The final measure of NetBackup performance is the length of time required for
                           backup operations to complete (usually known as the backup window), or the
                           length of time required for a critical restore operation to complete. However, to
                           measure existing performance and improve future performance by means of
                           those measurements calls for performance metrics more reliable and
                           reproducible than simple wall clock time. This chapter will discuss these metrics
                           in more detail.
                           After establishing accurate metrics as described here, you can measure the
                           current performance of NetBackup and your system components to compile a
                           baseline performance benchmark. With a baseline, you can apply changes in a
                           controlled way. By measuring performance after each change, you can
                           accurately measure the effect of each change on NetBackup performance.



   Controlling system variables for consistent testing
   conditions
                           For reliable performance evaluation, eliminate as many unpredictable variables
                           as possible in order to create a consistent backup environment. Only a
                           consistent environment will produce reliable and reproducible performance
                           measurements. Some of the variables to consider are described below as they
                           relate to the NetBackup server, the network, the NetBackup client, or the data
                           itself.


   Server variables
                           It is important to eliminate all other NetBackup activity from your environment
                           when you are measuring the performance of a particular NetBackup operation.
                           One area to consider is the automatic scheduling of backup jobs by the
                           NetBackup scheduler.
                           When policies are created, they are usually set up to allow the NetBackup
                           scheduler to initiate the backups. The NetBackup scheduler will initiate backups
                           based on the traditional NetBackup frequency-based scheduling or on certain
                           days of the week, month, or other time interval. This process is called
                           calendar-based scheduling. As part of the backup policy definition, the Start
                           Window is used to indicate when the NetBackup scheduler can start backups
                           using either frequency-based or calendar-based scheduling. When you perform
                           backups for the purpose of performance testing, this setup might interfere since
                           the NetBackup scheduler may initiate backups unexpectedly, especially if the
                           operations you intend to measure run for an extended period of time.
                                                                               Measuring performance       77
                                          Controlling system variables for consistent testing conditions



              The simplest way to prevent the NetBackup scheduler from running backup jobs
              during your performance testing is to create a new policy specifically for use in
              performance testing and to leave the Start Window field blank in the schedule
              definition for that policy. This prevents the NetBackup scheduler from initiating
              any backups automatically for that policy. After creating the policy, you can run
              the backup on demand by using the Manual Backup command from the
              NetBackup Administration Console.
              To prevent the NetBackup scheduler from running backup jobs unrelated to the
              performance test, you may want to set all other backup policies to inactive by
              using the Deactivate command from the NetBackup Administration Console. Of
              course, you must reactivate the policies to start running backups again.
              You can use a user-directed backup to run the performance test as well.
              However, using the Manual Backup option for a policy is preferred. With a
              manual backup, the policy contains the entire definition of the backup job,
              including the clients and files that are part of the performance test. Running the
              backup manually, straight from the policy, means there is no doubt which policy
              will be used for the backup, and makes it easier to change and test individual
              backup settings: from the policy dialog.
              Before you start the performance test, check the Activity Monitor to make sure
              there is no NetBackup processing currently in progress. Similarly, check the
              Activity Monitor after the performance test for unexpected activity (such as an
              unanticipated restore job) that may have occurred during the test.
              Additionally, check for non-NetBackup activity on the server during the
              performance test and try to reduce or eliminate it.


              Note: By default, NetBackup logging is set to a minimum level. To gather more
              logging information, set the legacy and unified logging levels higher and create
              the appropriate legacy logging directories. For details on how to use NetBackup
              logging, refer to the logging chapter of the NetBackup Troublshooting Guide.
              Keep in mind that higher logging levels will consume more disk space.



Network variables
              Network performance is key to achieving optimum performance with
              NetBackup. Ideally, you would use a completely separate network for
              performance testing to avoid the possibility of skewing the results by
              encountering unrelated network activity during the course of the test.
              In many cases, a separate network is not available. Ensure that non-NetBackup
              activity is kept to an absolute minimum during the time you are evaluating
              performance. If possible, schedule testing for times when backups are not
              active. Even occasional short bursts of network activity may be enough to skew
78 Measuring performance
   Controlling system variables for consistent testing conditions



                             the results during portions of the performance test. If you are sharing the
                             network with production backups occurring for other systems, you must
                             account for this activity during the performance test.
                             Another network variable you must consider is host name resolution.
                             NetBackup depends heavily upon a timely resolution of host names to operate
                             correctly. If you have any delays in host name resolution, including reverse
                             name lookup to identify a server name from an incoming connection from a
                             certain IP address, you may want to eliminate that delay by using the HOSTS
                             (Windows) or /etc/hosts (UNIX) file for host name resolution on systems
                             involved in your performance test environment.


   Client variables
                             Make sure the client system is in a relatively quiescent state during
                             performance testing. A lot of activity, especially disk-intensive activity such as
                             virus scanning on Windows, will limit the data transfer rate and skew the results
                             of your tests.
                             One possible mistake is to allow another NetBackup server, such as a production
                             backup server, to have access to the client during the course of the test. This
                             may result in NetBackup attempting to back up the same client to two different
                             servers at the same time, which would severely impact the results of a
                             performance test in progress at that time.
                             Different file systems have different performance characteristics. For example,
                             comparing data throughput results from operations on a UNIX VxFS or
                             Windows FAT file system to those from operations on a UNIX NFS or Windows
                             NTFS system may not be valid, even if the systems are otherwise identical. If you
                             do need to make such a comparison, factor the difference between the file
                             systems into your performance evaluation testing, and into any conclusions you
                             may draw from that testing.


   Data variables
                             Monitoring the data you are backing up improves the repeatability of
                             performance testing. If possible, move the data you will use for testing backups
                             to its own drive or logical partition (not a mirrored drive), and defragment the
                             drive before you begin performance testing. For testing restores, start with an
                             empty disk drive or a recently defragmented disk drive with ample empty space.
                             This will reduce the impact of disk fragmentation on the NetBackup
                             performance test and yield more consistent results between tests.
                             Similarly, for testing backups to tape, always start each test run with an empty
                             piece of media. You can do this by expiring existing images for that piece of
                             media through the Catalog node of the NetBackup Administration Console, or by
                                                                      Measuring performance    79
                                                                      Evaluating performance



          running the bpexpdate command. Another approach is to use the bpmedia
          command to freeze any media containing existing backup images so that
          NetBackup selects a new piece of media for the backup operation. This step will
          help reduce the impact of tape positioning on the NetBackup performance test
          and will yield more consistent results between tests. It will also reduce the
          impact of tape mounting and unmounting of media that has NetBackup catalog
          images and that cannot be used for normal backups.
          When you test restores from tape, always restore from the same backup image
          on the tape to achieve consistent results between tests.
          In general, using a large data set will generate a more reliable and reproducible
          performance test than a small data set. A performance test using a small data
          set would probably be skewed by startup and shutdown overhead within the
          NetBackup operation. These variables are difficult to keep consistent between
          test runs and are therefore likely to produce inconsistent test results. Using a
          large data set will minimize the effect of start up and shutdown times.
          Design the makeup of the dataset to represent the makeup of the data in the
          intended production environment. For example, if the data set in the production
          environment contains many small files on file servers, then the data set for the
          performance testing should also contain many small files. A representative test
          data set will more accurately predict the NetBackup performance that you can
          reasonably expect in a production environment.
          The type of data can help reveal bottlenecks in the system. Files consisting of
          non-compressible (random) data cause the tape drive to run at its lower rated
          speed. As long as the other components of the data transfer path are keeping up,
          you may identify the tape drive as the bottleneck. On the other hand, files
          consisting of highly-compressible data can be processed at higher rates by the
          tape drive when hardware compression is enabled. This may result in a higher
          overall throughput and possibly expose the network as the bottleneck.
          Many values in NetBackup provide data amounts in kilobytes and rates in
          kilobytes per second. For greater accuracy, divide by 1024 rather than rounding
          off to 1000 when you convert from kilobytes to megabytes or from kilobytes per
          second to megabytes per second.



Evaluating performance
          There are two primary locations from which to obtain NetBackup data
          throughput statistics: the NetBackup Activity Monitor and the NetBackup All
          Log Entries report. The choice of which location to use is determined by the type
          of NetBackup operation you are measuring: non-multiplexed backup, restore, or
          multiplexed backup.
80 Measuring performance
   Evaluating performance



                            You can obtain statistics for all three types of operations from the NetBackup All
                            Log Entries report. You can obtain statistics for non-multiplexed backup or
                            restore operations from the NetBackup Activity Monitor. For multiplexed
                            backup operations, you can obtain the overall statistics from the All Log Entries
                            report after all the individual backup operations which are part of the
                            multiplexed backup are complete. In this case, the statistics available in the
                            Activity Monitors for each of the individual backup operations are relative only
                            to that operation, and do not reflect the actual total data throughput to the tape
                            drive.
                            There may be small differences between the statistics available from these two
                            locations due to slight differences in rounding techniques between the entries in
                            the Activity Monitor and the entries in the All Logs report. For a given type of
                            operation, choose either the Activity Monitor or the All Log Entries report and
                            consistently record your statistics only from that location. In both the Activity
                            Monitor and the All Logs report, the data-streaming speed is reported in
                            kilobytes per second. If a backup or restore is repeated, the reported speed can
                            vary between repetitions depending on many factors, including the availability
                            of system resources and system utilization, but the reported speed can be used
                            to assess the performance of the data-streaming process.
                            The statistics from the NetBackup error logs show the actual amount of time
                            spent reading and writing data to and from tape. This does not include time
                            spent mounting and positioning the tape. Cross-referencing the information
                            from the error logs with data from the bpbkar log on the NetBackup client
                            (showing the end-to-end elapsed time of the entire process) indicates how much
                            time was spent on operations unrelated to reading and writing to and from the
                            tape.

                            To evaluate performance through the NetBackup activity monitor
                            1   Run the backup or restore job.
                            2   Open the NetBackup Activity Monitor.
                            3   Verify that the backup or restore job completed successfully.
                                The Status column should contain a zero (0).
                            4   View the log details for the job by selecting the Actions > Details menu
                                option, or by double-clicking on the entry for the job. Select the Detailed
                                Status tab.
                            5   Obtain the NetBackup performance statistics from the following fields in
                                the Activity Monitor:
                                ■    Start Time/End Time: These fields show the time window during which
                                     the backup or restore job took place.
                                                                                 Measuring performance     81
                                                                                 Evaluating performance



                      ■    Elapsed Time: This field shows the total elapsed time from when the job
                           was initiated to job completion and can be used as in indication of total
                           wall clock time for the operation.
                      ■    KB per Second: This is the data throughput rate.
                      ■    Kilobytes: Compare this value to the amount of data. Although it
                           should be comparable, the NetBackup data amount will be slightly
                           higher because of administrative information, known as metadata,
                           saved for the backed up data.
                           For example, if you display properties for a directory containing 500
                           files, each 1 megabyte in size, the directory shows a size of 500
                           megabytes, or 524,288,000 bytes, which is equal to 512,000 kilobytes.
                           The NetBackup report may show 513,255 kilobytes written, reporting
                           1255 kilobytes more than the file size of the directory. This is true for a
                           flat directory. Subdirectory structures may diverge due to the way the
                           operating system tracks used and available space on the disk. Also, be
                           aware that the operating system may be reporting how much space was
                           allocated for the files in question, not just how much data is actually
                           there. For example, if the allocation block size is 1 kilobyte, 1000 1-byte
                           files will report a total size of 1 megabyte, even though 1 kilobyte of
                           data is all that exists. The greater the number of files, the larger this
                           discrepancy may become.

                  To evaluate performance using the all log entries report
                  1   Run the backup or restore job.
                  2   Run the All Log Entries report from the NetBackup reports node in the
                      NetBackup Administrative Console. Be sure that the Date/Time Range that
                      you select covers the time period during which the job was run.
                  3   Verify that the job completed successfully by searching for an entry such as
                      “the requested operation was successfully completed” for a backup, or
                      “successfully read (restore) backup id...” for a restore.
                  4   Obtain the NetBackup performance statistics from the following entries in
                      the report.

                      Note: The messages shown here will vary according to the locale setting of
                      the master server.


Entry                                           Statistic

started backup job for client <name>,           The Date and Time fields for this entry show the time at
policy <name>, schedule <name> on storage       which the backup job started.
unit <name>
82 Measuring performance
   Evaluating performance




    Entry                                          Statistic

    successfully wrote backup id <name>, copy      For a multiplexed backup, this entry shows the size of
    <number>, <number> Kbytes                      the individual backup job and the Date and Time fields
                                                   show the time at which the job finished writing to the
                                                   storage device. The overall statistics for the multiplexed
                                                   backup group, including the data throughput rate to the
                                                   storage device, are found in a subsequent entry below.

    successfully wrote <number> of <number>        For multiplexed backups, this entry shows the overall
    multiplexed backups, total Kbytes <number>     statistics for the multiplexed backup group including
    at Kbytes/sec                                  the data throughput rate.



    successfully wrote backup id <name>, copy      For non-multiplexed backups, this entry essentially
    <number>, fragment <number>, <number> Kbytes   combines the information in the previous two entries
    at <number> Kbytes/sec                         for multiplexed backups into one entry showing the size
                                                   of the backup job, the data throughput rate, and the
                                                   time, in the Date and Time fields, at which the job
                                                   finished writing to the storage device.

    the requested operation was successfully       The Date and Time fields for this entry show the time at
    completed                                      which the backup job completed. This value is later than
                                                   the “successfully wrote” entry above because it includes
                                                   extra processing time at the end of the job for tasks such
                                                   as NetBackup image validation.

    begin reading backup id <name>, (restore),     The Date and Time fields for this entry show the time at
    copy <number>, fragment <number> from media    which the restore job started reading from the storage
    id <name> on drive index <number>              device. (Note that the latter part of the entry is not
                                                   shown for restores from disk, as it does not apply.)

    successfully restored from backup id           For a multiplexed restore (generally speaking, all
    <name>, copy <number>, <number> Kbytes         restores from tape are multiplexed restores as
                                                   non-multiplexed restores require additional action from
                                                   the user), this entry shows the size of the individual
                                                   restore job and the Date and Time fields show the time
                                                   at which the job finished reading from the storage
                                                   device. The overall statistics for the multiplexed restore
                                                   group, including the data throughput rate, are found in
                                                   a subsequent entry below.

    successfully restored <number> of <number>     For multiplexed restores, this entry shows the overall
    requests <name>, read total of <number>        statistics for the multiplexed restore group, including
    Kbytes at <number> Kbytes/sec                  the data throughput rate.
                                                                                   Measuring performance     83
                                                                                   Evaluating performance




Entry                                             Statistic

successfully read (restore) backup id media       For non-multiplexed restores (generally speaking, only
<number>, copy <number>, fragment <number>,       restores from disk are treated as non-multiplexed
<number> Kbytes at <number> Kbytes/sec            restores), this entry essentially combines the
                                                  information from the previous two entries for
                                                  multiplexed restores into one entry showing the size of
                                                  the restore job, the data throughput rate, and the time,
                                                  in the Date and Time fields, at which the job finished
                                                  reading from the storage device.


                    Additional information
                    The NetBackup All Log Entries report will also have entries similar to those
                    described above for other NetBackup operations such as image duplication
                    operations used to create additional copies of a backup image. Those entries
                    have a very similar format and may be useful for analyzing the performance of
                    NetBackup for those operations.
                    The bptm debug log file for tape backups (or bpdm log file for disk backups) will
                    contain the entries that are in the All Log Entries report, as well as additional
                    detail about the operation that may be useful for performance analysis. One
                    example of this additional detail is the intermediate data throughput rate
                    message for multiplexed backups, as shown below:
                        ... intermediate after <number> successful, <number> Kbytes at
                        <number> Kbytes/sec
                    This message is generated whenever an individual backup job completes that is
                    part of a multiplexed backup group. In the debug log file for a multiplexed
                    backup group consisting of three individual backup jobs, for example, there
                    could be two intermediate status lines, then the final (overall) throughput rate.
                    For a backup operation, the bpbkar debug log file will also contain additional
                    detail about the operation that may be useful for performance analysis.
                    Keep in mind, however, that writing the debug log files during the NetBackup
                    operation introduces some overhead that would not normally be present in a
                    production environment. Factor that additional overhead into any calculations
                    done on data captures while debug log files are in use.
                    The information in the All Logs report is also found in
                    /usr/openv/netbackup/db/error (UNIX) or
                    install_path\NetBackup\db\error (Windows).
                    See the NetBackup Troubleshooting Guide to learn how to set up NetBackup to
                    write these debug log files.
84 Measuring performance
   Evaluating UNIX system components




   Evaluating UNIX system components
                          In addition to evaluating NetBackup’s performance, you should also verify that
                          common system resources are in adequate supply.


   Monitoring CPU load
                          Use the vmstat utility to monitor memory use. Add up the “us” and “sy” CPU
                          columns to get the total CPU load on the system (refer to the vmstat man page
                          for details). The vmstat scan rate indicates the amount of swapping activity
                          taking place.
                          The sar command also provides insight into UNIX memory usage.


   Measuring performance independent of tape or disk output
                          It is possible to measure the disk (read) component of NetBackup’s speed
                          independent of the network and tape components. There are two different
                          techniques, described below. The first, using bpbkar, is easier. The second may
                          be helpful in more limited circumstances.
                          In these procedures, the master server is the client.

                          To measure disk I/O using bpbkar
                          1   Turn on the legacy bpbkar log by ensuring that the bpbkar directory exists.
                              UNIX: /usr/openv/netbackup/logs/bpbkar
                              Windows: install_path\NetBackup\logs\bpbkar
                          2   Set logging level to 1.
                          3   Enter the following:
                              UNIX
                              /usr/openv/netbackup/bin/bpbkar -nocont -dt 0 -nofileinfo
                                -nokeepalives filesystem > /dev/null
                              Where filesystem is the path being backed up.
                              Windows
                              install_path\NetBackup\bin\bpbkar32 -nocont X:\ > NUL
                              Where X:\ is the path being backed up.
                          4   Check the time it took NetBackup to move the data from the client disk:
                              UNIX: The start time is the first PrintFile entry in the bpbkar log, the end
                              time is the entry “Client completed sending data for backup,” and the
                              amount of data is given in the entry Total Size.
                              Windows: Check the bpbkar log for the entry Elapsed time.
                                                                     Measuring performance    85
                                                      Evaluating Windows system components



         To measure disk I/O using the bpdm_dev_null touch file (UNIX only)
         For UNIX systems, the procedure below can be useful as a follow-on to the
         bpbkar procedure (above). If the bpbkar procedure shows that the disk read
         performance is not the bottleneck and does not help isolate the problem, then
         the bpdm_dev_null procedure described below may be helpful. If the
         bpdm_dev_null procedure shows poor performance, the bottleneck is
         somewhere in the data transfer between the bpbkar process on the client and
         the bpdm process on the server. The problem may involve the network, or shared
         memory (such as not enough buffers, or buffers that are too small). To change
         shared memory settings, see “Shared memory (number and size of data buffers)”
         on page 102.


         Caution: If not used correctly, the following procedure can lead to data loss.
         Touching the bpdm_dev_null file redirects all disk backups to /dev/null, not
         just those backups using the storage unit created by this procedure. You should
         disable active production policies for the duration of this test and remove
         /dev/null as soon as this test is complete.

         1   Enter the following:
                 touch /usr/openv/netbackup/bpdm_dev_null

             Note: The bpdm_dev_null file re-directs any backup that uses a disk
             storage unit to /dev/null.

         2   Create a new disk storage unit, using /tmp or some other directory as the
             image directory path.
         3   Create a policy that uses the new disk storage unit.
         4   Run a backup using this policy. NetBackup will create a file in the storage
             unit directory as if this were a real backup to disk. This degenerate image
             file will be zero bytes long.
         5   To remove the zero-length file and clear the NetBackup catalog of a backup
             that cannot be restored, run this command:
         /usr/openv/netbackup/bin/admincmd/bpexpdate -backupid backupid -d 0
             where backupid is the name of the file residing in the storage unit directory.



Evaluating Windows system components
         In addition to evaluating NetBackup’s performance, you should also verify that
         common system resources are in adequate supply. You may want to use the
         Windows Performance Monitor utility included with Windows. For information
         about using the Performance Monitor, refer to your Microsoft documentation.
86 Measuring performance
   Evaluating Windows system components



                          The Performance Monitor organizes information by object, counter, and
                          instance.
                          An object is a system resource category, such as a processor or physical disk.
                          Properties of an object are counters. Counters for the Processor object include
                          %Processor Time, which is the default counter, and Interrupts/sec. Duplicate
                          counters are handled via instances. For example, to monitor the %Processor
                          Time of a specific CPU on a multiple CPU system, the Processor object is
                          selected, then the %Processor Time counter for that object is selected, followed
                          by the specific CPU instance for the counter.
                          When you use the Performance Monitor, you can view data in real time format
                          or collect the data in a log for future analysis. Specific components to evaluate
                          include CPU load, memory use, and disk load.


                          Note: It is recommended that a remote host be used for monitoring of the test
                          host, to reduce load that might otherwise skew results.



   Monitoring CPU load
                          To determine if the system has enough power to accomplish the requested tasks,
                          monitor the % Processor Time counter for the Processor object to determine
                          how hard the CPU is working, and monitor the Process Queue Length counter
                          for the System object to determine how many processes are actively waiting for
                          the processor.
                          For % Processor Time, values of 0 to 80 percent are generally considered safe.
                          Values from 80 percent to 90 percent indicate that the system is being pushed
                          hard, while consistent values above 90 percent indicate that the CPU is a
                          bottleneck.
                          Spikes approaching 100 percent are normal and do not necessarily indicate a
                          bottleneck. However, if sustained loads approaching 100 percent are observed,
                          efforts to tune the system to decrease process load or an upgrade to a faster
                          processor should be considered.
                          Sustained Processor Queue Lengths greater than two indicate too many threads
                          are waiting to be executed. To correctly monitor the Processor Queue Length
                          counter, the Performance Monitor must be tracking a thread-related counter. If
                          you consistently see a queue length of 0, verify that a non-zero value can be
                          displayed.


                          Note: The default scale for the Processor Queue Length may not be equal to 1. Be
                          sure to read the data correctly. For example, if the default scale is 10x, then a
                          reading of 40 actually means that only 4 processes are waiting.
                                                                          Measuring performance    87
                                                           Evaluating Windows system components



Monitoring memory use
              Memory is a critical resource for increasing the performance of backup
              operations. When you examine memory usage, view information on:
              ■   Committed Bytes. Committed Bytes displays the size of virtual memory that
                  has been committed, as opposed to reserved. Committed memory must have
                  disk storage available or must not require the disk storage because the main
                  memory is large enough. If the number of Committed Bytes approaches or
                  exceeds the amount of physical memory, you may encounter issues with
                  page swapping.
              ■   Page Faults/sec. Page Faults/sec is a count of the page faults in the
                  processor. A page fault occurs when a process refers to a virtual memory
                  page that is not in its Working Set in main memory. A high Page Fault rate
                  may indicate insufficient memory.


Monitoring disk load
              To use disk performance counters to monitor the disk performance in
              Performance Monitor, you may need to enable those counters. Windows may
              not have enabled the disk performance counters by default for your system.
              For more information about disk performance counters, from a command
              prompt, type:
                  diskperf -help

              To enable these counters and allow disk monitoring


              Note: On a Windows 2000 system, this is set by default.

              1   From a command prompt, type:
                  diskperf -y
              2   Reboot the system.

              To disable these counters and cancel disk monitoring
              1   From a command prompt, type:
                  diskperf -n
              2   Reboot the system.
              When you monitor disk performance, use the %Disk Time counter for the
              PhysicalDisk object to track the percentage of elapsed time that the selected disk
              drive is busy servicing read or write requests.
88 Measuring performance
   Evaluating Windows system components



                          Also monitor the Avg. Disk Queue Length counter and watch for values greater
                          than 1 that last for more than one second. Values greater than 1 for more than a
                          second indicate that multiple processes are waiting for the disk to service their
                          requests.
                          Several techniques may be used to increase disk performance, including:
                          ■   Check the fragmentation level of the data. A highly fragmented disk limits
                              throughput levels. Use a disk maintenance utility to defragment the disk.
                          ■   Consider adding additional disks to the system to increase performance. If
                              multiple processes are attempting to log data simultaneously, dividing the
                              data among multiple physical disks may help.
                          ■   Determine if the data transfer involves a compressed disk. The use of
                              Windows compression to automatically compress the data on the drive adds
                              additional overhead to disk read or write operations, adversely affecting the
                              performance of NetBackup. Use Windows compression only if it is needed
                              to avoid a disk full condition.
                          ■   Consider converting to a system based on a Redundant Array of Inexpensive
                              Disks (RAID). Though more expensive, RAID devices generally offer greater
                              throughput, and, (depending on the RAID level employed), improved
                              reliability.
                          ■   Determine what type of controller technology is being used to drive the disk.
                              Consider if a different system would yield better results. See Table 1-2
                              “Drive controller data transfer rates” on page 21 for throughput rates for
                              common controllers.
                                          Chapter                          8
Tuning the NetBackup
data transfer path
      This chapter provides guidelines and recommendations for improving
      performance in the data transfer path of NetBackup.
      This chapter includes the following sections:
      ■   “Overview” on page 90
      ■   “The data transfer path” on page 90
      ■   “Basic tuning suggestions for the data path” on page 91
      ■   “NetBackup client performance” on page 95
      ■   “NetBackup network performance” on page 96
      ■   “NetBackup server performance” on page 102
      ■   “NetBackup storage device performance” on page 126
90 Tuning the NetBackup data transfer path
   Overview




   Overview
                           This chapter contains information on ways to optimize NetBackup. This chapter
                           is not intended to provide tuning advice for particular systems. If you would like
                           help fine-tuning your NetBackup installation, please contact Symantec
                           Consulting Services.
                           Before examining the factors that affect backup performance, please note that
                           an important first step is to ensure that your system meets NetBackup’s
                           recommended minimum requirements. Refer to the NetBackup Installation
                           Guide and NetBackup Release Notes for information about these requirements.
                           Additionally, Symantec recommends that you have the most recent NetBackup
                           software patch installed.
                           Many performance issues can be traced to hardware or other environmental
                           issues. A basic understanding of the entire data transfer path is essential in
                           determining the maximum obtainable performance in your environment. Poor
                           performance is often the result of poor planning, which can be based on
                           unrealistic expectations of any particular component of the data transfer path.



   The data transfer path
                           The component that limits the overall performance of NetBackup is of course
                           the slowest component in the backup system. For example, a fast tape drive
                           combined with an overloaded server yields poor performance. Similarly, a fast
                           tape drive on a slow network also yields poor performance.
                           The backup system is referred to as the data transfer path. The path usually
                           starts at the data on the disk and ends with a backup copy on tape or disk.
                           This chapter subdivides the standard NetBackup data transfer path into four
                           basic components: the NetBackup client, the network, the NetBackup server, and
                           the storage device.


                           Note: This chapter discusses NetBackup performance evaluation and
                           improvement from a testing perspective. It describes ways to isolate
                           performance variables in order to get a sense of the effect each variable has on
                           overall system performance, and to optimize NetBackup performance with
                           regard to that variable. It may not be possible to optimize every variable on your
                           production system.

                           The requirements for database backups may not be the same as for file system
                           backups. This information applies to file system backups unless otherwise
                           noted.
                                                         Tuning the NetBackup data transfer path     91
                                                        Basic tuning suggestions for the data path




Basic tuning suggestions for the data path
           In every backup system there is always room for improvement. Obtaining the
           best performance from a backup infrastructure is not complex, but it requires
           careful review of the many factors that can affect processing. The first step is to
           gain an accurate assessment of each hardware, software, and networking
           component in the backup data path. Many performance problems are resolved
           before attempting to change NetBackup parameters.
           NetBackup software offers plenty of resources to help isolate performance
           problems and assess the impact of configuration changes. However, it is
           essential to thoroughly test both backup and restore processes after making any
           changes to the NetBackup configuration parameters.
           This section provides practical ideas to improve your backup system
           performance and avoid bottlenecks. You can find more details on several of the
           topics and solutions described here in the following NetBackup manuals:
           Veritas NetBackup System Administrator’s Guide for UNIX, Volumes I & II
           Veritas NetBackup System Administrator’s Guide for Windows, Volumes I & II
           Veritas NetBackup Troubleshooting Guide (for UNIX and Windows)

           Tuning suggestions:
           ■   Use multiplexing.
                Multiplexing is a NetBackup option that lets you write multiple data
               streams from several clients at once to a single tape drive or several tape
               drives. Multiplexing can be used to improve the backup performance of
               slow clients, multiple slow networks, and many small backups (such as
               incremental backups). Multiplexing reduces the time each job spends
               waiting for a device to become available, thereby making the best use of the
               transfer rate of your storage devices.
               Multiplexing is not recommended when restore speed is of paramount
               interest or when your tape drives are slow. To reduce the impact of
               multiplexing on restore times, you can improve your restore performance
               by reducing the maximum fragment size for the storage units. If the
               fragment size is small, so that the backup image is contained in several
               fragments, a NetBackup restore can quickly skip to the specific fragment
               containing the file to be restored. In contrast, if the fragment size is large
               enough to contain the entire image, the NetBackup restore starts at the very
               beginning of the image and reads through the image until it finds the
               desired file.
               Multiplexed backups can be de-multiplexed to improve restore performance
               by using bpduplicate to move fragmented images to a sequential image
               on a new tape.
92 Tuning the NetBackup data transfer path
   Basic tuning suggestions for the data path



                                  Refer to the Veritas NetBackup System Administrator’s Guide for more
                                  information about using multiplexing.
                            ■     Consider striping a disk volume across drives.
                                  A striped set of disks will pull data from all disk drives concurrently,
                                  allowing faster data transfers between disk drives and tape drives.
                            ■     Maximize the use of your backup windows.
                                  You can configure all your incremental backups to happen at the same time
                                  every day and stagger the execution of your full backups across multiple
                                  days. Large systems could be backed up over the weekend while smaller
                                  systems are spread over the week. You can even start full backups earlier
                                  than the incremental backups. They might finish before the incremental
                                  backups and give you back all or most of your backup window to finish the
                                  incremental backups.
                            ■     Convert large clients into media servers to decrease backup times and
                                  reduce network traffic.
                                  Any machine with locally-attached drives can be used as a media server to
                                  back up itself or other systems. By converting large client systems into
                                  media servers, your backup data no longer travels over the network (except
                                  for catalog data), and you get the fastest transfer speeds afforded by
                                  locally-attached devices. Another benefit of media servers is that you can
                                  use them to balance the load of backing up other clients for your NetBackup
                                  master. A media server can back up clients on a network where it has a local
                                  connection, thus saving network traffic for a master that might have to go
                                  over routers to communicate with those clients. A special case of a media
                                  server is a SAN Media Server, which is a NetBackup media server that backs
                                  up itself only and comes at a lower cost than a regular media server.
                            ■     Use dedicated private networks to decrease backup times and network
                                  traffic.
                                  If you configure one or more networks dedicated to backups, you can reduce
                                  the time it takes to back up the systems on those networks and reduce or
                                  eliminate network traffic on your enterprise networks. In addition, you can
                                  convert to faster technologies and even backup your systems at any time
                                  without affecting the enterprise network’s performance (assuming that
                                  users do not mind the system loads while backups take place).
                            ■     Avoid a concentration of servers on one network.
                                  If you have a concentration of large servers that you back up over the same
                                  general network, you might want to convert some of these into media
                                  servers or attach them to private backup networks. Doing either will
                                  decrease backup times and reduce network traffic for your other backups.
                            ■     Use dedicated backup servers to perform your backups.
                                             Tuning the NetBackup data transfer path     93
                                            Basic tuning suggestions for the data path



    When selecting a server for performing your backups, use a dedicated
    system just for performing backups. A server that shares the load of
    running several applications unrelated to backups can severely affect your
    performance and maintenance windows.
■   Use drives from tape libraries attached to other systems.
    You can use tape drives from a tape library attached to your master server
    or another media server, or you can dedicate a library to your large servers.
    Systems using these drives become media servers that can back up
    themselves and others through locally-attached drives. The robotic control
    arm of the library can be connected to either the master server or the media
    server.
■   Consider the requirements of backing up your catalog.
    Remember that the NetBackup catalog needs to be backed up. To facilitate
    NetBackup catalog recovery, it is highly recommended that the master
    server have access to a dedicated tape drive, either standalone or within a
    robotic library.
■   Try to level the backup load.
    You can use multiple drives to reduce backup times; however, since you may
    not be able to split data streams evenly, you may need to experiment with
    the configuration of the streams or the configuration of the NetBackup
    policies to spread the load across multiple drives.
■   Bandwidth limiting
    The bandwidth limiting feature lets you restrict the network bandwidth
    consumed by one or more NetBackup clients on a network. The bandwidth
    setting appears under Host Properties > Master Servers, Properties. The
    actual limiting occurs on the client side of the backup connection. This
    feature only restricts bandwidth during backups. Restores are unaffected.
    When a backup starts, NetBackup reads the bandwidth limit configuration
    and then determines the appropriate bandwidth value and passes it to the
    client. As the number of active backups increases or decreases on a subnet,
    NetBackup dynamically adjusts the bandwidth limiting on that subnet. If
    additional backups are started, the NetBackup server instructs the other
    NetBackup clients running on that subnet to decrease their bandwidth
    setting. Similarly, bandwidth per client is increased if the number of clients
    decreases. Changes to the bandwidth value occur on a periodic basis rather
    than as backups stop and start. This characteristic can reduce the number
    of bandwidth value changes.
■   Load balancing
    NetBackup provides ways to balance loads between servers, clients, policies,
    and devices. Note that these settings may interact with each other:
94 Tuning the NetBackup data transfer path
   Basic tuning suggestions for the data path



                                  compensating for one issue can cause another. The best approach is to use
                                  the defaults unless you anticipate or encounter an issue.
                                  ■    Adjust the backup load on the server.
                                       Change the Limit jobs per policy attribute for one or more of the
                                       policies that the server is backing up. For example, decreasing Limit
                                       jobs per policy reduces the load on a server on a specific subnetwork.
                                       Reconfiguring policies or schedules to use storage units on other
                                       servers also reduces the load. Another possibility is to use bandwidth
                                       limiting on one or more clients.
                                  ■    Adjust the backup load on the server during specific time periods only.
                                       Reconfigure schedules that execute during the time periods of interest,
                                       so they use storage units on servers that can handle the load (assuming
                                       you are using media servers).
                                  ■    Adjust the backup load on the clients.
                                       Change the Maximum jobs per client global attribute. For example,
                                       increasing Maximum jobs per client increases the number of
                                       concurrent jobs that any one client can process and therefore increases
                                       the load.
                                  ■    Reduce the time to back up clients.
                                       Increase the number of jobs that clients can perform concurrently, or
                                       use multiplexing. Another possibility is to increase the number of jobs
                                       that the server can perform concurrently for the policy or policies that
                                       are backing up the clients.
                                  ■    Give preference to a policy.
                                       Increase the Limit jobs per policy attribute value for the preferred
                                       policy relative to other policies. Alternatively, increase the priority for
                                       the policy.
                                  ■    Adjust the load between fast and slow networks.
                                       Increase the values of Limit jobs per policy and Maximum jobs per
                                       client for the policies and clients on a faster network. Decrease these
                                       values for slower networks. Another solution is to use bandwidth
                                       limiting.
                                  ■    Limit the backup load produced by one or more clients.
                                       Use bandwidth limiting to reduce the bandwidth used by the clients.
                                  ■    Maximize the use of devices
                                       Use multiplexing. Also, allow as many concurrent jobs per storage unit,
                                       policy, and client as possible without causing server, client, or network
                                       performance issues.
                                  ■    Prevent backups from monopolizing devices.
                                                        Tuning the NetBackup data transfer path   95
                                                                  NetBackup client performance



                  Limit the number of devices that NetBackup can use concurrently for
                  each policy or limit the number of drives per storage unit. Another
                  approach is to exclude some of your devices from Media Manager
                  control.



NetBackup client performance
          This section lists some factors to consider when evaluating the NetBackup client
          component of the NetBackup data transfer path. Examine these conditions to
          identify possible changes that may improve the overall performance of
          NetBackup.
          ■   Disk fragmentation. Fragmentation is a condition where data is scattered
              around the disk in non-contiguous blocks. This condition severely impacts
              the data transfer rate from the disk. Fragmentation can be repaired using
              hard disk management utility software offered by a variety of vendors.
          ■   Number of disks. Consider adding additional disks to the system to increase
              performance. If multiple processes are attempting to log data
              simultaneously, dividing the data among multiple physical disks may help.
          ■   Disk arrays. Consider converting to a system based on a Redundant Array of
              Inexpensive Disks (RAID). Though more expensive, RAID devices generally
              offer greater throughput, and, (depending on the RAID level employed),
              improved reliability.
          ■   The type of controller technology being used to drive the disk. Consider if a
              different system would yield better results.
          ■   Virus scanning. If virus scanning is turned on for the system, it may
              severely impact the performance of the NetBackup client during a backup
              or restore operation. This may be especially true for systems such as large
              Windows file servers. You may wish to disable virus scanning during
              backup or restore operations to avoid the impact on performance.
          ■   NetBackup notify scripts. The bpstart_notify.bat and
              bpend_notify.bat scripts are very useful in certain situations, such as
              shutting down a running application to back up its data. However, these
              scripts must be written with care to avoid any unnecessary lengthy delays
              at the start or end of the backup job. If the scripts are not performing tasks
              essential to the backup operation, you may want to remove them.
          ■   NetBackup software location. If the data being backed up is located on the
              same physical disk drive as the NetBackup installation, performance may be
              adversely affected, especially if NetBackup debug log files are being used. If
              they are being used, the extent of the degradation will be greatly influenced
              by the NetBackup verbose setting for the debug logs. If possible, install
96 Tuning the NetBackup data transfer path
   NetBackup network performance



                                NetBackup on a separate physical disk drive to avoid this disk drive
                                contention.
                           ■    Snapshots (hardware or software). If snapshots need to be taken before the
                                actual backup of data, the time needed to take the snapshot will affect the
                                overall performance.
                           ■    Job tracker. If the NetBackup Client Job Tracker is running on the client,
                                then NetBackup will gather an estimate of the data to be backed up prior to
                                the start of a backup job. Gathering this estimate will affect the startup
                                time, and therefore the data throughput rate, because no data is being
                                written to the NetBackup server during this estimation phase. You may
                                wish to avoid running the NetBackup Client Job Tracker to avoid this delay.
                           ■    Client location. You may wish to consider adding a locally attached tape
                                device to the client and changing the client to a NetBackup media server if
                                you have a substantial amount of data on the client. For example, backing
                                up 100 gigabytes of data to a locally attached tape drive will generally be
                                more efficient than backing up the same amount of data across a network
                                connection to a NetBackup server. Of course, there are many variables to
                                consider, such as the bandwidth available on the network, that will affect
                                the decision to back up the data to a locally attached tape drive as opposed
                                to moving the data across the network.
                           ■    Determining the theoretical performance of the NetBackup client software.
                                You can use the NetBackup client command bpbkar (UNIX) or bpbkar32
                                (Windows) to determine the speed at which the NetBackup client can read
                                the data to be backed up from the disk drive. This may eliminate data read
                                speed as a possible performance bottleneck. For the procedure, see
                                “Measuring performance independent of tape or disk output” on page 84.



   NetBackup network performance
                           To improve the overall performance of NetBackup, consider the following
                           network components and factors.


   Network interface settings
                           Make sure your network connections are properly installed and configured.
                           Note the following:
                           ■    Network interface cards (NICs) for NetBackup servers and clients must be
                                set to full-duplex.
                           ■    Both ends of each network cable (the NIC card and the switch) must be set
                                identically as to speed and mode (both NIC and switch must be at full
                                                            Tuning the NetBackup data transfer path   97
                                                                    NetBackup network performance



                   duplex). Otherwise, link down, excessive/late collisions, and errors will
                   result.
               ■   If auto-negotiate is being used, make sure that both ends of the connection
                   are set at the same mode and speed. The higher the speed, the better.
               ■   In addition to NICs and switches, all routers must be set to full duplex.
               Consult the operating system documentation for instructions on how to
               determine and change the NIC settings.


               Note: Using AUTOSENSE may cause network problems and performance issues.



Network load
               There are two key considerations to monitor when you evaluate remote backup
               performance:
               ■   The amount of network traffic
               ■   The amount of time that network traffic is high
               Small bursts of high network traffic for short durations will have some negative
               impact on the data throughput rate. However, if the network traffic remains
               consistently high for a significant amount of time during the operation, the
               network component of the NetBackup data transfer path will very likely be the
               bottleneck. Always try to schedule backups during times when network traffic is
               low. If your network is heavily loaded, you may wish to implement a secondary
               network which can be dedicated to backup and restore traffic.


NetBackup media server network buffer size
               The NetBackup media server has a tunable parameter that you can use to adjust
               the size of the network communications buffer used to receive data from the
               network (a backup) or write data to the network (a restore). This parameter
               specifies the value that is used to set the network buffer size for backups and
               restores.
               UNIX
               The default value for this parameter is 32032.
               Windows
               The default value for this parameter is derived from the NetBackup data buffer
               size (see below for more information about the data buffer size) using the
               following formula:
                    For backup jobs: (<data_buffer_size> * 4) + 1024
                    For restore jobs: (<data_buffer_size> * 2) + 1024
98 Tuning the NetBackup data transfer path
   NetBackup network performance



                           For tape: because the default value for the NetBackup data buffer size is 65536
                           bytes, this formula results in a default NetBackup network buffer size of 263168
                           bytes for backups and 132096 bytes for restores.
                           For disk: because the default value for the NetBackup data buffer size is 262144
                           bytes, this formula results in a default NetBackup network buffer size of
                           1049600 bytes for backups and 525312 bytes for restores.
                           To set this parameter, create the following files:
                           UNIX
                           /usr/openv/netbackup/NET_BUFFER_SZ
                           /usr/openv/netbackup/NET_BUFFER_SZ_REST
                           Windows
                           install_path\NetBackup\NET_BUFFER_SZ
                           install_path\NetBackup\NET_BUFFER_SZ_REST
                           These files contain a single integer specifying the network buffer size in bytes.
                           For example, to use a network buffer size of 64 Kilobytes, the file would contain
                           65536. If the files contain the integer 0 (zero), the default value for the network
                           buffer size is used.
                           If the NET_BUFFER_SZ file exists, and the NET_BUFFER_SZ_REST file does not
                           exist, the contents of NET_BUFFER_SZ will specify the network buffer size for
                           both backup and restores.
                           If the NET_BUFFER_SZ_REST file exists, its contents will specify the network
                           buffer size for restores.
                           If both files exist, the NET_BUFFER_SZ file will specify the network buffer size
                           for backups, and the NET_BUFFER_SZ_REST file will specify the network buffer
                           size for restores.
                           Because local backup or restore jobs on the media server do not send data over
                           the network, this parameter has no effect on those operations. It is used only by
                           the NetBackup media server processes which read from or write to the network,
                           specifically, the bptm or bpdm processes. It is not used by any other NetBackup
                           processes on a master server, media server, or client.
                           This parameter is the counterpart on the media server to the Communications
                           Buffer Size parameter on the client, which is described below. The network
                           buffer sizes are not required to be the same on all of your NetBackup systems for
                           NetBackup to function properly; however, setting the Network Buffer Size
                           parameter on the media server and the Communications Buffer Size parameter
                           on the client (see below) to the same value has significantly improved the
                           throughput of the network component of the NetBackup data transfer path in
                           some installations.
                           Similarly, the network buffer size does not have a direct relationship with the
                           NetBackup data buffer size (described under “Shared memory (number and size
                                                           Tuning the NetBackup data transfer path   99
                                                                   NetBackup network performance



              of data buffers)” on page 102). They are separately tunable parameters.
              However, setting the network buffer size to a substantially larger value than the
              data buffer has achieved the best performance in many NetBackup installations.


              Synthetic full backups on AIX NetBackup servers
              If synthetic full backups on AIX NetBackup servers are running slowly, increase
              the NET_BUFFER_SZ network buffer to 262144 (256KB). To do this, create a file
              called /usr/openv/netbackup/NET_BUFFER_SZ and change the default
              setting (32032) to 262144. This file is unformatted, and should contain only the
              size in bytes:
                  $ cat /usr/openv/netbackup/NET_BUFFER_SZ
                  262144
                  $
              Changing this value can affect backup and restore operations on the media
              servers. Test backups and restores to ensure that the change you make does not
              negatively impact performance.


NetBackup client communications buffer size
              The NetBackup client has a tunable parameter that you can use to adjust the size
              of the network communications buffer used to write data to the network for
              backups.
              This client parameter is the counterpart on the client to the Network Buffer Size
              parameter on the media server, described above. As mentioned, the network
              buffer sizes are not required to be the same on all of your NetBackup systems for
              NetBackup to function properly. However, setting the Network Buffer Size
              parameter on the media server (see above) and the Communications Buffer Size
              parameter on the client to the same value achieves the best performance in
              some NetBackup installations.

              To set the communications buffer size parameter on UNIX clients
              Create the /usr/openv/netbackup/NET_BUFFER_SZ file.
              As with the media server, it should contain a single integer specifying the
              communications buffer size. Generally, performance is better when the value in
              the NET_BUFFER_SZ file on the client matches the value in the
              NET_BUFFER_SZ file on the media server.


              Note: The NET_BUFFER_SZ_REST file is not used on the client. The value in the
              NET_BUFFER_SZ file is used for both backups and restores.
100 Tuning the NetBackup data transfer path
    NetBackup network performance



                            To set the communications buffer size parameter on Windows clients
                            1    From Host Properties in the NetBackup Administration Console, expand
                                 Clients and open the Client Properties > Windows Client > Client Settings
                                 dialog for the Windows client on which the parameter is to be changed.
                            2    Enter the desired value in the Communications buffer field.
                                 This parameter is specified in the number of kilobytes. The default value is
                                 32. An extra kilobyte is added internally for backup operations. Therefore,
                                 the default network buffer size for backups is 33792 bytes. In some
                                 NetBackup installations, this default value is too small. Increasing the value
                                 to 128 improves performance in these installations.
                                 Because local backup jobs on the media server do not send data over the
                                 network, this parameter has no effect on these local operations. This
                                 parameter is used by only the NetBackup client processes which write to the
                                 network, specifically, the bpbkar32 process. It is not used by any other
                                 NetBackup for Windows processes on a master server, media server, or
                                 client.
                            3    If you modify the NetBackup buffer settings, test the performance of
                                 restores with the new settings.


    The NOSHM file
                            When a master or media server backs itself up, NetBackup uses shared memory
                            to speed up the backup. In this case, NetBackup uses shared memory rather than
                            socket communications to transport the data between processes. However,
                            sometimes situations may arise where it is not possible or desirable to use
                            shared memory during a backup. Touching the file NOSHM in the
                            /usr/openv/netbackup (UNIX) directory or the install_path\NetBackup
                            (Windows) directory causes the client and server to use socket communications
                            rather than shared memory to interchange the backup data. (Touching a file
                            means changing the file’s modification and access times.)
                            The file name is NOSHM and should not contain any extension.
                            Each time a backup runs, NetBackup checks for the existence of this file, so no
                            services need to be stopped and started for it to take effect. One example of
                            when it might be necessary to use NOSHM is when the master or media server
                            hosts another application that uses a large amount of shared memory, for
                            instance, Oracle.
                            NOSHM is also useful for testing, both as a workaround while solving a shared
                            memory issue and to verify that an issue is caused by shared memory.
                                                            Tuning the NetBackup data transfer path   101
                                                                    NetBackup network performance




              Note: NOSHM only affects backups when it is applied to a system with a
              directly-attached storage unit.

              NOSHM forces a local backup to run as though it were a remote backup. A local
              backup is a backup of a client that has a directly-attached storage unit, such as a
              client that happens to be a master or media server. A remote backup is a backup
              that passes the data across a network connection from the client to a master or
              media server’s storage unit.
              A local backup normally has one or more bpbkar processes that read from the
              disk and write into shared memory, and a bptm process that reads from shared
              memory and writes to the tape. A remote backup has one or more bptm (child)
              processes that read from a socket connection to bpbkar and write into shared
              memory, and a bptm (parent) process that reads from shared memory and
              writes to the tape. NOSHM forces the remote backup model even when the client
              and the media server are the same system.
              For a local backup without NOSHM, shared memory is used between bptm and
              bpbkar. Whether the backup is remote or local, and whether NOSHM exists or
              not, shared memory is always used between bptm (parent) and bptm (child).


              Note: NOSHM does not affect the shared memory used by the bptm process to
              buffer data being written to tape. bptm uses shared memory for any backup,
              local or otherwise.



Using multiple interfaces
              For a master or media server configured with more than one network interface,
              distributing NetBackup traffic over all available network interfaces can improve
              performance. This can be achieved by configuring a unique hostname for the
              server for each network interface and setting up bp.conf entries for these
              hostnames.
              For example, suppose the server is configured with three network interfaces.
              Each of the network interfaces connects to one or more NetBackup clients. The
              following configuration allows NetBackup to use all three network interfaces:
              ■   In the server’s bp.conf file, add one entry for each network interface:
                  SERVER=server-neta
                  SERVER=server-netb
                  SERVER=server-netc
              ■   In each client’s bp.conf, make the following entries:
                  SERVER=server-neta
                  SERVER=server-netb
                  SERVER=server-netc
102 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 It is okay for a client to have an entry for a server that is not currently on
                                 the same network.



    NetBackup server performance
                            To improve NetBackup server performance, consider the following factors
                            regarding the data transfer path.
                            ■    Shared memory (number and size of data buffers)
                            ■    Parent/child delay values
                            ■    Using NetBackup wait and delay counters
                            ■    Fragment size and NetBackup restores
                            ■    Other restore performance issues


    Shared memory (number and size of data buffers)
                            The NetBackup media server uses shared memory to buffer data between the
                            network and the tape or disk drive (or between the disk and the tape drive if the
                            NetBackup media server and client are the same system). The number and size
                            of these shared data buffers can be configured on the NetBackup media server.
                            The size and number of the tape and disk buffers may be changed so that
                            NetBackup optimizes its use of shared memory. Changing the default buffer size
                            may result in better throughput for high-performance tape drives. These
                            changes may also improve throughput for other types of drives.
                            Buffer settings are for media servers only and should not be used on a pure
                            master server or client.


                            Note: Restores use the same buffer size that was used to back up the images
                            being restored.


                            Default number of shared data buffers
                            The default number of shared data buffers for various NetBackup operations is
                            shown in Table 8-14.

                            Table 8-14         Default number of shared data buffers

                             NetBackup Operation                  Number of Shared Data Buffers

                                                                  UNIX                 Windows

                             Non-multiplexed backup               8                    16
                                                    Tuning the NetBackup data transfer path    103
                                                             NetBackup server performance



Table 8-14         Default number of shared data buffers

NetBackup Operation                      Number of Shared Data Buffers

Multiplexed backup                       4                      8

Restore of non-multiplexed backup        8                      16

Restore of multiplexed backup            12                     12

Verify                                   8                      16

Import                                   8                      16

Duplicate                                8                      16


Default size of shared data buffers
The default size of shared data buffers for various NetBackup operations is
shown in Table 8-15.

Table 8-15         Default size of shared data buffers

NetBackup Operation             Size of Shared Data Buffers

                                UNIX                        Windows

Non-multiplexed backup          64K (tape), 256K (disk)     64K (tape), 256K (disk)

Multiplexed backup              64K (tape), 256K (disk)     64K (tape), 256K (disk)

Restore, verify, or import      same size as used for the   same size as used for the
                                backup                      backup

Duplicate                       read side: same size as     read side: same size as used for
                                used for the backup;        the backup;
                                write side: 64K (tape),     write side: 64K (tape), 256K
                                256K (disk)                 (disk)


On Windows, a single tape I/O operation is performed for each shared data
buffer. Therefore, this size must not exceed the maximum block size for the tape
device or operating system. For Windows systems, the maximum block size is
generally 64K, although in some cases customers are using a larger value
successfully. For this reason, the terms “tape block size” and “shared data buffer
size” are synonymous in this context.

Amount of shared memory required by NetBackup
Use this formula to calculate the amount of shared memory required by
NetBackup:
104 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 (number_data_buffers * size_data_buffers) * number_tape_drives *
                                 max_multiplexing_setting
                            For example, assume that the number of shared data buffers is 16, the size of the
                            shared data buffers is 64 Kilobytes, there are two tape drives, and the maximum
                            multiplexing setting is four. Following the formula above, the amount of shared
                            memory required by NetBackup is:
                                 (16 * 65536) * 2 * 4 = 8 MB
                            Be careful when changing these settings (see the next caution).

                            Changing the number of shared data buffers
                            To change the number of shared data buffers, create the following file(s) on the
                            media server (note that the NUMBER_DATA_BUFFERS_RESTORE file is only
                            needed for restore from tape, not from disk):
                            UNIX
                               For tape
                                 /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS
                                 /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_RESTORE
                                 For disk
                                 /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS_DISK
                            Windows
                               For tape
                                 <install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS
                                 <install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS_RESTORE
                                 For disk
                                 <install_path>\NetBackup\db\config\NUMBER_DATA_BUFFERS_DISK
                            These files contain a single integer specifying the number of shared data buffers
                            NetBackup will use. The integer represents the number of data buffers. For
                            backups (in the NUMBER_DATA_BUFFERS and
                            NUMBER_DATA_BUFFERS_DISK files), the integer’s value must be a power of
                            2.
                            If the NUMBER_DATA_BUFFERS file exists, its contents will be used to
                            determine the number of shared data buffers to be used for multiplexed and
                            non-multiplexed backups.
                            NUMBER_DATA_BUFFERS_DISK allows for a different value when doing
                            backup to disk instead of tape. If NUMBER_DATA_BUFFERS exists but
                            NUMBER_DATA_BUFFERS_DISK does not, NUMBER_DATA_BUFFERS applies
                            to all backups. If both files exist, NUMBER_DATA_BUFFERS applies to tape
                            backups and NUMBER_DATA_BUFFERS_DISK applies to disk backups. If only
                            NUMBER_DATA_BUFFERS_DISK is present, it applies to disk backups only.
                            If the NUMBER_DATA_BUFFERS_RESTORE file exists, its contents will be used
                            to determine the number of shared data buffers to be used for multiplexed
                            restores from tape.
                                              Tuning the NetBackup data transfer path   105
                                                       NetBackup server performance



The NetBackup daemons do not have to be restarted for the new values to be
used. Each time a new job starts, bptm checks the configuration file and adjusts
its behavior.

Changing the size of shared data buffers

Caution: It is critical to perform both backup and restore testing if the shared
data buffer size is changed. If all NetBackup media servers are not running in
the same operating system environment, it is critical to test restores on each of
the NetBackup media servers that may be involved in a restore operation. For
example, if a UNIX NetBackup media server is used to write a backup to tape
with a shared data buffer (block size) of 256 Kilobytes, then it is possible that a
Windows NetBackup media server will not be able to read that tape. In general, it
is strongly recommended you test restore as well as backup operations, to avoid
the potential for data loss. See “Testing changes made to shared memory” on
page 107.

To change the size of the shared data buffers, create the following file on the
media server:
UNIX
   For tape
         /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
    For disk
         /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_DISK
Windows
   For tape
         install_path\NetBackup\db\config\SIZE_DATA_BUFFERS
    For disk
         install_path\NetBackup\db\config\SIZE_DATA_BUFFERS_DISK
This file contains a single integer specifying the size of each shared data buffer
in bytes. The integer must be a multiple of 32 kilobytes (a multiple of 1024 is
recommended); see Table 8-16 for valid values. The integer represents the size
of one tape or disk buffer in bytes. For example, to use a shared data buffer size
of 64 Kilobytes, the file would contain the integer 65536.
The NetBackup daemons do not have to be restarted for the parameter values to
be used. Each time a new job starts, bptm checks the configuration file and
adjusts its behavior.
106 Tuning the NetBackup data transfer path
    NetBackup server performance



                            Analyze the buffer usage by checking the bptm debug log before and after
                            altering the size of buffer parameters.

                            Table 8-16        Absolute byte value to be entered in SIZE_DATA_BUFFERS

                             Kilobytes per Data Buffer               SIZE_DATA_BUFFER Value

                             32                                      32768

                             64                                      65536

                             96                                      98304

                             128                                     131072

                             160                                     163840

                             192                                     196608

                             224                                     229376

                             256                                     262144


                            IMPORTANT: Because the data buffer size equals the tape I/O size, the value
                            specified in SIZE_DATA_BUFFERS must not exceed the maximum tape I/O size
                            supported by the tape drive or operating system. This is usually 256 or 128
                            Kilobytes. Check your operating system and hardware documentation for the
                            maximum values. Take into consideration the total system resources and the
                            entire network. The Maximum Transmission Unit (MTU) for the LAN network
                            may also have to be changed. NetBackup expects the value for NET_BUFFER_SZ
                            and SIZE_DATA_BUFFERS to be in bytes, so in order to use 32k, use 32768 (32 x
                            1024).


                            Note: Some Windows tape devices are not able to write with block sizes higher
                            than 65536 (64 Kilobytes). Backups created on a UNIX media server with
                            SIZE_DATA_BUFFERS set to more than 65536 cannot be read by some Windows
                            media servers. This means that the Windows media server would not be able to
                            import or restore any images from media that were written with
                            SIZE_DATA_BUFFERS greater than 65536.



                            Note: The size of the shared data buffers used for a restore operation is
                            determined by the size of the shared data buffers in use at the time the backup
                            was written. This file is not used by restores.
                                              Tuning the NetBackup data transfer path   107
                                                       NetBackup server performance



Recommended shared memory settings
The SIZE_DATA_BUFFERS setting is typically increased to 256 KB and
NUMBER_DATA_BUFFERS to 16. To configure NetBackup to use 16 x 256 KB
data buffers, specify 262144 (256 x 1024) in SIZE_DATA_BUFFERS and 16 in
NUMBER_DATA_BUFFERS.
Note that increasing the size and number of the data buffers will use up more
shared memory, which is a limited system resource. The total amount of shared
memory used for each tape drive is:
    (buffer_size * num_buffers) * drives * MPX
where MPX is the multiplexing factor. For two tape drives, each with an MPX of
4 and with 16 buffers of 256k, the total shared memory usage would be:
    (16 * 262144) * 2 * 4 = 32768 K (32 MB)
If large amounts of memory are to be allocated, the kernel may require
additional tuning to allow enough shared memory to be available for
NetBackup's requirements. For more information, see “Kernel tuning (UNIX)”
on page 158.


Note: Note that AIX media servers do not need to tune shared memory because
AIX uses dynamic memory allocation.

Be cautious if you change these parameters. Make your changes carefully,
monitoring for performance changes with each modification. For example,
increasing the tape buffer size can cause some backups to run slower. Also,
there have been cases with restore issues. After any changes, be sure to include
restores as part of your validation testing.

Testing changes made to shared memory
After making changes, it is vitally important to verify that the following tests
complete successfully:
1   Run a backup.
2   Restore the data from the backup.
3   Restore data from a backup created prior to the changes to
    SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS.
    Before and after altering the size or number of data buffers, examine the
    buffer usage information in the bptm debug log file. The values in the log
    should match your buffer settings. The relevant bptm log entries are similar
    to the following:
    12:02:55 [28551] <2> io_init: using 65536 data buffer size
    12:02:55 [28551] <2> io_init: CINDEX 0, sched bytes for
    monitoring = 200
    12:02:55 [28551] <2> io_init: using 8 data buffers
108 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 or
                                 15:26:01 [21544] <2> mpx_setup_restore_shm: using 12 data
                                 buffers, buffer size is 65536
                                 When you change these settings, take into consideration the total system
                                 resources and the entire network. The Maximum Transmission Unit (MTU)
                                 for the local area network (LAN) may also have to be changed.


    Parent/child delay values
                            Although rarely changed, it is possible to modify the parent and child delay
                            values for a process. To change these values, create the following files:
                            UNIX
                                 /usr/openv/netbackup/db/config/PARENT_DELAY
                                 /usr/openv/netbackup/db/config/CHILD_DELAY
                            Windows
                                 <install_path>\NetBackup\db\config\PARENT_DELAY
                                 <install_path>\NetBackup\db\config\CHILD_DELAY
                            These files contain a single integer specifying the value in milliseconds to be
                            used for the delay corresponding to the name of the file. For example, to use a
                            parent delay of 50 milliseconds, the PARENT_DELAY file would contain the
                            integer 50.
                            See “Using NetBackup wait and delay counters” below for more information
                            about how to determine if you should change these values.


                            Note: The following section refers to the bptm process on the media server
                            during back up and restore operations from a tape storage device. If you are
                            backing up to or restoring from a disk storage device, substitute bpdm for bptm
                            throughout the section. For example, to activate debug logging for a disk storage
                            device, the following directory must be created:
                            /usr/openv/netbackup/logs/bpdm (UNIX) or
                            install_path\NetBackup\logs\bpdm (Windows).



    Using NetBackup wait and delay counters
                            During a backup or restore operation the NetBackup media server uses a set of
                            shared data buffers to isolate the process of communicating with the tape from
                            the process of interacting with the disk or network. Through the use of Wait and
                            Delay counters, you can determine which process on the NetBackup media
                            server has to wait more often: the data producer or the data consumer.
                                                                 Tuning the NetBackup data transfer path   109
                                                                          NetBackup server performance



                    Achieving a good balance between the data producer and the data consumer
                    processes is an important factor in achieving optimal performance from the
                    NetBackup server component of the NetBackup data transfer path.
            Producer - consumer relationship during a backup




  NetBackup
  Client                                       Network


                                                                           Consumer




  BPTM                                                                  BPTM
  (Child Process)                      Shared Buffers                   (Parent Process)




Producer




                                                                               Tape



                    Understanding the two-part communication process
                    The two-part communication process differs depending on whether the
                    operation is a backup or restore and whether the operation involves a local
                    client or a remote client.

                    Local clients
                    When the NetBackup media server and the NetBackup client are part of the
                    same system, the NetBackup client is referred to as a local client.
                    ■    Backup of local client
                         For a local client, the bpbkar (UNIX) or bpbkar32 (Windows) process reads
                         data from the disk during a backup and places it in the shared buffers. The
                         bptm process reads the data from the shared buffer and writes it to tape.
                    ■    Restore of local client
                         During a restore of a local client, the bptm process reads data from the tape
                         and places it in the shared buffers. The tar (UNIX) or tar32 (Windows)
                         process reads the data from the shared buffers and writes it to disk.
110 Tuning the NetBackup data transfer path
    NetBackup server performance



                            Remote clients
                            When the NetBackup media server and the NetBackup client are part of two
                            different systems, the NetBackup client is referred to as a remote client.
                            ■    Backup of remote client
                                 The bpbkar (UNIX) or bpbkar32 (Windows) process on the remote client
                                 reads data from the disk and writes it to the network. Then a child bptm
                                 process on the media server receives data from the network and places it in
                                 the shared buffers. The parent bptm process on the media server reads the
                                 data from the shared buffers and writes it to tape.
                            ■    Restore of remote client
                                 During the restore of the remote client, the parent bptm process reads data
                                 from the tape and places it into the shared buffers. The child bptm process
                                 reads the data from the shared buffers and writes it to the network. The tar
                                 (UNIX) or tar32 (Windows) process on the remote client receives the data
                                 from the network and writes it to disk.

                            Roles of processes during backup and restore operations
                            When a process attempts to use a shared data buffer, it first verifies that the
                            next buffer in order is in a correct state. A data producer needs an empty buffer,
                            while a data consumer needs a full buffer. The following chart provides a
                            mapping of processes and their roles during backup and restore operations:

                             Operation                  Data Producer               Data Consumer

                             Local Backup               bpbkar (UNIX) or bpbkar32   bptm
                                                        (Windows)

                             Remote Backup              bptm (child)                bptm (parent)

                             Local Restore              bptm                        tar (UNIX) or
                                                                                    tar32 (Windows)

                             Remote Restore             bptm (parent)               bptm (child)


                            If a full buffer is needed by the data consumer but is not available, the data
                            consumer increments the Wait and Delay counters to indicate that it had to wait
                            for a full buffer. After a delay, the data consumer will check again for a full
                            buffer. If a full buffer is still not available, the data consumer increments the
                            Delay counter to indicate that it had to delay again while waiting for a full
                            buffer. The data consumer will repeat the delay and full buffer check steps until
                            a full buffer is available.
                                             Tuning the NetBackup data transfer path   111
                                                      NetBackup server performance



This sequence is summarized in the following algorithm:
    while (Buffer_Is_Not_Full) {
    ++Wait_Counter;
            while (Buffer_Is_Not_Full) {
        ++Delay_Counter;
            delay (DELAY_DURATION);

                 }
             }
If an empty buffer is needed by the data producer but is not available, the data
producer increments the Wait and Delay counter to indicate that it had to wait
for an empty buffer. After a delay, the data producer will check again for an
empty buffer. If an empty buffer is still not available, the data producer
increments the Delay counter to indicate that it had to delay again while waiting
for an empty buffer. The data producer will relate the delay and empty buffer
check steps until an empty buffer is available.
The algorithm for a data producer has a similar structure:
    while (Buffer_Is_Not_Empty) {
           ++Wait_Counter;
          while (Buffer_Is_Not_Empty) {
        ++Delay_Counter;
             delay (DELAY_DURATION);

         }
    }
Analysis of the Wait and Delay counter values indicates which process, producer
or consumer, has had to wait most often and for how long.
There are four basic Wait and Delay Counter relationships:
■   Data Producer >> Data Consumer. The data producer has substantially
    larger Wait and Delay counter values than the data consumer.
    The data consumer is unable to receive data fast enough to keep the data
    producer busy. Investigate means to improve the performance of the data
    consumer. For a back up operation, check if the data buffer size is
    appropriate for the tape drive being used (see below).
    If data consumer still has a substantially large value in this case, try
    increasing the number of shared data buffers to improve performance (see
    below).
■   Data Producer = Data Consumer (large value). The data producer and the
    data consumer have very similar Wait and Delay counter values, but those
    values are relatively large.
112 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 This may indicate that the data producer and data consumer are regularly
                                 attempting to used the same shared data buffer. Try increasing the number
                                 of shared data buffers to improve performance (see below).
                            ■    Data Producer = Data Consumer (small value). The data producer and the
                                 data consumer have very similar Wait and Delay counter values, but those
                                 values are relatively small.
                                 This indicates that there is a good balance between the data producer and
                                 data consumer, which should yield good performance from the NetBackup
                                 server component of the NetBackup data transfer path.
                            ■    Data Producer << Data Consumer. The data producer has substantially
                                 smaller Wait and Delay counter values than the data consumer.
                                 The data producer is unable to deliver data fast enough to keep the data
                                 consumer busy. Investigate ways to improve the performance of the data
                                 producer. For a restore operation, check if the data buffer size (see below) is
                                 appropriate for the tape drive being used.
                                 If the data producer still has a relatively large value in this case, try
                                 increasing the number of shared data buffers to improve performance (see
                                 below).
                            The bullets above describe the four basic relationships possible. Of primary
                            concern is the relationship and the size of the values. Information on
                            determining substantial versus trivial values appears on the following pages.
                            The relationship of these values only provides a starting point in the analysis.
                            Additional investigative work may be needed to positively identify the cause of a
                            bottleneck within the NetBackup data transfer path.


                            Determining wait and delay counter values
                            Wait and Delay counter values can be found by creating and reading debug log
                            files on the NetBackup media server.


                            Note: Writing the debug log files introduces some additional overhead and will
                            have a small impact on the overall performance of NetBackup. This impact will
                            be more noticeable for a high verbose level setting. Normally, you should not
                            need to run with debug logging enabled on a production system.


                            To determine wait and delay counter values for a local client backup:
                            1    Activate debug logging by creating these two directories on the media
                                 server:
                                 UNIX
                                 /usr/openv/netbackup/logs/bpbkar
                                 /usr/openv/netbackup/logs/bptm
                                               Tuning the NetBackup data transfer path   113
                                                        NetBackup server performance



    Windows
    install_path\NetBackup\logs\bpbkar
    install_path\NetBackup\logs\bptm
2   Execute your backup.
    Look at the log for the data producer (bpbkar on UNIX or bpbkar32 on
    Windows) process in:
    UNIX
    /usr/openv/netbackup/logs/bpbkar
    Windows
    install_path\NetBackup\logs\bpbkar
    The line you are looking for should be similar to the following, and will have
    a timestamp corresponding to the completion time of the backup:
    ... waited 224 times for empty buffer, delayed 254 times
    In this example the Wait counter value is 224 and the Delay counter value is
    254.
3   Look at the log for the data consumer (bptm) process in:
    UNIX
    /usr/openv/netbackup/logs/bptm
    Windows
    install_path\NetBackup\logs\bptm
    The line you are looking for should be similar to the following, and will have
    a timestamp corresponding to the completion time of the backup:
    ... waited for full buffer 1 times, delayed 22 times
    In this example, the Wait counter value is 1 and the Delay counter value is
    22.

To determine wait and delay counter values for a remote client backup:
1   Activate debug logging by creating this directory on the media server:
    UNIX
    /usr/openv/netbackup/logs/bptm
    Windows
    install_path\NetBackup\logs\bptm
2   Execute your backup.
3   Look at the log for the bptm process in:
    UNIX
    /usr/openv/netbackup/logs/bptm
    Windows
    install_path\NetBackup\Logs\bptm
4   Delays associated with the data producer (bptm child) process will appear as
    follows:
    ... waited for empty buffer 22 times, delayed 151 times, ...
114 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 In this example, the Wait counter value is 22 and the Delay counter value is
                                 151.
                            5    Delays associated with the data consumer (bptm parent) process will appear
                                 as:
                                 ... waited for full buffer 12 times, delayed 69 times
                                 In this example the Wait counter value is 12, and the Delay counter value is
                                 69.

                            To determine wait and delay counter values for a local client restore:
                            1    Activate logging by creating the two directories on the NetBackup media
                                 server:
                                 UNIX
                                 /usr/openv/netbackup/logs/bptm
                                 /usr/openv/netbackup/logs/tar
                                 Windows
                                 install_path\NetBackup\logs\bptm
                                 install_path\NetBackup\logs\tar
                            2    Execute your restore.
                                 Look at the log for the data consumer (tar or tar32) process in the tar log
                                 directory created above.
                                 The line you are looking for should be similar to the following, and will have
                                 a timestamp corresponding to the completion time of the restore:
                                     ... waited for full buffer 27 times, delayed 79 times
                                 In this example, the Wait counter value is 27, and the Delay counter value is
                                 79.
                            3    Look at the log for the data producer (bptm) process in the bptm log
                                 directory created above.
                                 The line you are looking for should be similar to the following, and will have
                                 a timestamp corresponding to the completion time of the restore:
                                 ... waited for empty buffer 1 times, delayed 68 times
                                 In this example, the Wait counter value is 1 and the delay counter value is
                                 68.

                            To determine wait and delay counter values for a remote client restore:
                            1    Activate debug logging by creating the following directory on the media
                                 server:
                                 UNIX
                                 /usr/openv/netbackup/logs/bptm
                                 Windows
                                 install_path\NetBackup\logs\bptm
                            2    Execute your restore.
                                             Tuning the NetBackup data transfer path   115
                                                      NetBackup server performance



3   Look at the log for bptm in the bptm log directory created above.
4   Delays associated with the data consumer (bptm child) process will appear
    as follows:
    ... waited for full buffer 36 times, delayed 139 times
    In this example, the Wait counter value is 36 and the Delay counter value is
    139.
5   Delays associated with the data producer (bptm parent) process will appear
    as follows:
    ... waited for empty buffer 95 times, delayed 513 times
    In this example the Wait counter value is 95 and the Delay counter value is
    513.


Note: When you run multiple tests, you can rename the current log file.
NetBackup will automatically create a new log file, which prevents you from
erroneously reading the wrong set of values.

Deleting the debug log file will not stop NetBackup from generating the debug
logs. You must delete the entire directory. For example, to stop bptm logging,
you must delete the bptm subdirectory. NetBackup will automatically generate
debug logs at the specified verbose setting whenever the directory is detected.



Using wait and delay counter values to analyze issues
You can use the bptm debug log file to verify that the following tunable
parameters have successfully been set to the desired values. You can use these
parameters and the Wait and Delay counter values to analyze issues. These
additional values include:
■   Data buffer size. The size of each shared data buffer can be found on a line
    similar to:
    ... io_init: using 65536 data buffer size
■   Number of data buffers. The number of shared data buffers may be found on
    a line similar to:
    ... io_init: using 16 data buffers
■   Parent/child delay values. The values in use for the duration of the parent
    and child delays can be found on a line similar to:
    ... io_init: child delay = 20, parent delay = 30 (milliseconds)
■   NetBackup Media Server Network Buffer Size. The values in use for the
    Network Buffer Size parameter on the media server can be found on lines
    similar to these in debug log files:
116 Tuning the NetBackup data transfer path
    NetBackup server performance



                                 The receive network buffer is used by the bptm child process to read from
                                 the network during a remote backup.
                                 ...setting receive network buffer to 263168 bytes
                                 The send network buffer is used by the bptm child process to write to the
                                 network during a remote restore.
                                 ...setting send network buffer to 131072 bytes
                                 See “NetBackup media server network buffer size” on page 97 for more
                                 information about the Network Buffer Size parameter on the media server.
                            Suppose you wanted to analyze a local backup in which there was a 30-minute
                            data transfer duration baselined at 5 Megabytes/second with a total data
                            transfer of 9,000 Megabytes. Because a local backup is involved, if you refer to
                            “Roles of processes during backup and restore operations” on page 110, you can
                            determine that bpbkar (UNIX) or bpbkar32 (Windows) is the data producer
                            and bptm is the data consumer.
                            You would next want to determine the Wait and Delay values for bpbkar (or
                            bpbkar32) and bptm by following the procedures described in “Determining
                            wait and delay counter values” on page 112. For this example, suppose those
                            values were:

                             Process                    Wait                        Delay

                             bpbkar (UNIX)              29364                       58033
                             bpbkar32 (Windows)

                             bptm                       95                          105


                            Using these values, you can determine that the bpbkar (or bpbkar32) process
                            is being forced to wait by a bptm process which cannot move data out of the
                            shared buffer fast enough.
                            Next, you can determine time lost due to delays by multiplying the Delay
                            counter value by the parent or child delay value, whichever applies.
                            In this example, the bpbkar (or bpbkar32) process uses the child delay value,
                            while the bptm process uses the parent delay value. (The defaults for these
                            values are 20 for child delay and 30 for parent delay.) The values are specified in
                            milliseconds. See “Parent/child delay values” on page 108 for more information
                            on how to modify these values.
                                                Tuning the NetBackup data transfer path   117
                                                         NetBackup server performance



Use the following equations to determine the amount of time lost due to these
delays:

bpbkar (UNIX)               = 58033 delays X 0.020 seconds
bpbkar32 (Windows)          =1160 seconds
                            =19 minutes 20 seconds

bptm                        =105 X 0.030 seconds
                            =3 seconds


This is useful in determining that the delay duration for the bpbkar (or
bpbkar32) process is significant. If this delay were entirely removed, the
resulting transfer time of 10:40 (total transfer time of 30 minutes minus delay of
19 minutes and 20 seconds) would indicate a throughput value of 14
Megabytes/sec, nearly a threefold increase. This type of performance increase
would warrant expending effort to investigate how the tape drive performance
can be improved.
The number of delays should be interpreted within the context of how much
data was moved. As the amount of data moved increases, the significance
threshold for counter values increases as well.
Again, using the example of a total of 9,000 Megabytes of data being transferred,
assume a 64-Kilobytes buffer size. You can determine the total number of
buffers to be transferred using the following equation:

Number_Kbytes        = 9,000 X 1024
                     = 9,216,000 Kilobytes

Number_Slots         =9,216,000 / 64
                     =144,000


The Wait counter value can now be expressed as a percentage of the total
divided by the number of buffers transferred:

bpbkar (UNIX)               = 29364 / 144,000
bpbkar32 (Windows)          = 20.39%

bptm                        = 95 / 144,000
                            = 0.07%


In this example, in the 20 percent of cases where the bpbkar (or bpbkar32)
process needed an empty shared data buffer, that shared data buffer has not yet
been emptied by the bptm process. A value this large indicates a serious issue,
118 Tuning the NetBackup data transfer path
    NetBackup server performance



                            and additional investigation would be warranted to determine why the data
                            consumer (bptm) is having issues keeping up.
                            In contrast, the delays experienced by bptm are insignificant for the amount of
                            data transferred.
                            You can also view the Delay and Wait counters as a ratio:

                            bpbkar (UNIX)          = 58033/29364
                            bpbkar32 (Windows)     = 1.98


                            In this example, on average the bpbkar (or bpbkar32) process had to delay
                            twice for each wait condition that was encountered. If this ratio is substantially
                            large, you may wish to consider increasing the parent or child delay value,
                            whichever one applies, to avoid the unnecessary overhead of checking for a
                            shared data buffer in the correct state too often. Conversely, if this ratio is close
                            to 1, you may wish to consider reducing the applicable delay value to check more
                            often and see if that increases your data throughput performance. Keep in mind
                            that the parent and child delay values are rarely changed in most NetBackup
                            installations.
                            The preceding information explains how to determine if the values for Wait and
                            Delay counters are substantial enough for concern. The Wait and Delay counters
                            are related to the size of data transfer. A value of 1,000 may be extreme when
                            only 1 Megabyte of data is being moved. The same value may indicate a
                            well-tuned system when gigabytes of data are being moved. The final analysis
                            must determine how these counters affect performance by considering such
                            factors as how much time is being lost and what percentage of time a process is
                            being forced to delay.


                            Correcting issues uncovered by wait and delay counter
                            values
                            The following lists identify ways to correct issues that are uncovered by the
                            Wait and Delay counter values.
                            ■    bptm read waits
                                 The bptm debug log contains messages such as,
                                 ...waited for full buffer 1681 times, delayed 12296 times
                                 The first number in the message is the number of times bptm waited for a
                                 full buffer, which is the number of times bptm write operations waited for
                                 data from the source. If, using the technique described in the section
                                 “Determining wait and delay counter values” on page 112, you determine
                                 that the Wait counter indicates a performance issue, then changing the
                                 number of buffers will not help, but adding multiplexing may help.
                                                          Tuning the NetBackup data transfer path   119
                                                                   NetBackup server performance



             ■   bptm write waits
                 The bptm debug log contains messages such as,
                 ...waited for empty buffer 1883 times, delayed 14645 times
                 The first number in the message is the number of times bptm waited for an
                 empty buffer, which is the number of times bptm experienced data arriving
                 from the source faster than the data could be written to tape. If, using the
                 technique described in the section “Determining wait and delay counter
                 values” on page 112, you determine that the Wait counter indicates a
                 performance issue, then reduce the multiplexing factor if you are using
                 multiplexing. Also, adding more buffers may help.
             ■   bptm delays
                 The bptm debug log contains messages such as,
                 ...waited for empty buffer 1883 times, delayed 14645 times
                 The second number in the message is the number of times bptm waited for
                 an available buffer. If, using the technique described in the section
                 “Determining wait and delay counter values” on page 112, you determine
                 that the Delay counter indicates a performance issue, this will need
                 investigation. Each delay interval is 30 ms.


Fragment size and NetBackup restores
             Below is a summary of how fragment size affects NetBackup restores for
             non-multiplexed and multiplexed images, followed by a more in-depth
             discussion.
             The fragment size affects where tape markers are placed and how many tape
             markers are used. (The default fragment size is 1 terabyte for tape storage units
             and 512 GB for disk.) As a rule, a larger fragment size results in faster backups,
             but may result in slower restores when recovering a small number of individual
             files.
             The Reduce fragment size to setting on the Storage Unit dialog limits the
             largest fragment size of the image. By limiting the size of the fragment, the size
             of the largest read during restore is minimized, reducing restore time. This is
             especially important when restoring a small number of individual files rather
             than entire directories or file systems.
             For many sites, a fragment size of approximately 10 gigabytes will result in good
             performance for both backup and restore.
             When choosing a fragment size, consider the following:
             ■   Larger fragment sizes usually favor backup performance, especially when
                 backing up large amounts of data. Creating smaller fragments will slow
                 down large backups: each time a new fragment is created, the backup
                 stream is interrupted.
120 Tuning the NetBackup data transfer path
    NetBackup server performance



                            ■    Larger fragment sizes do not hinder performance when restoring large
                                 amounts of data. But when restoring a few individual files, larger fragments
                                 may slow down the restore.
                            ■    Larger fragment sizes do not hinder performance when restoring from
                                 non-multiplexed backups. For multiplexed backups, larger fragments may
                                 slow down the restore. In multiplexed backups, blocks from several images
                                 can be mixed together within a single fragment. During restore, NetBackup
                                 positions to the nearest fragment and starts reading the data from there,
                                 until it comes to the desired file. Splitting multiplexed backups into smaller
                                 fragments can improve restore performance.
                            ■    During restores, newer, faster devices can handle large fragments well.
                                 Slower devices, especially if they do not use fast locate block positioning,
                                 will restore individual files faster if fragment size is smaller. (In some cases,
                                 SCSI fast tape positioning can improve restore performance.)


                            Note: Unless you have particular reasons for creating smaller fragments (such
                            as when restoring a few individual files, restoring from multiplexed backups, or
                            restoring from older equipment), larger fragment sizes are likely to yield better
                            overall performance.



                            Restore of a non-multiplexed image
                            bptm positions to the media fragment and the actual tape block containing the
                            first file to be restored. If fast-locate is available, bptm uses that for the
                            positioning. If fast-locate is not available, bptm uses MTFSF/MTFSR (forward
                            space filemark/forward space record) to do the positioning.
                            The first file is then restored.
                            After that, for every subsequent file to be restored, bptm determines where that
                            file is, relative to the current position. If it is faster for bptm to position to that
                            spot rather than to read all the data in between (and if fast locate is available),
                            bptm uses positioning to get to the next file instead of reading all the data in
                            between.
                            If fast-locate is not available, bptm can read the data as quickly as it can position
                            with MTFSR (forward space record).
                            Therefore, fragment sizes for non-multiplexed restores matter if fast-locate is
                            NOT available. In general, given smaller fragments, a restore reads less
                            extraneous data. You can set the maximum fragment size for the storage unit on
                            the Storage Unit dialog in the NetBackup Administration Console (Reduce
                            fragment size to).
                                              Tuning the NetBackup data transfer path   121
                                                       NetBackup server performance



Restore of a multiplexed image
bptm positions to the media fragment containing the first file to be restored. If
fast-locate is available, bptm uses that for the positioning. If fast_locate is not
available, bptm uses MTFSF (forward space filemark) for the positioning. The
restore cannot “fine-tune” positioning to get to the block containing the first
file, because of the randomness of how multiplexed images are written. So, the
restore starts reading, throwing away all the data (for this client and other
clients) until it reaches the block that contains the first file.
The first file is then restored.
After that, the logic is the same as that for non-multiplexed restores with one
exception: if the current position and the next file position are in the same
fragment, the restore cannot use positioning, for the same reason that it cannot
use “fine-tune” positioning to get to the first file.
However, if the next file position is in a subsequent fragment further down the
media (or even on a different media), then the restore uses positioning methods
to get to that fragment instead of reading all the data in between.
So, there is an advantage to keeping multiplexed fragments to a smaller size.
The optimal fragment size depends on the site's data and situation. For multi-
gigabyte images, it is probably desirable to keep fragments to 1 gigabyte or less.
Remember that the storage unit attribute to limit fragment size is based on the
total amount of data in the fragment (not the total amount of data for any one
client).
Note that when multiplexed images are being written, each time a client backup
stream starts or ends, by definition, that is a new fragment. A new fragment is
also created when a checkpoint occurs for a backup that has checkpoint restart
enabled. So not all fragments are of the maximum fragment size. Of course,
end-of-media (EOM) also causes new fragment(s).
Some examples may help illustrate when smaller fragments do and do not help
restores.

Example 1:
Assume you are backing up four streams to a multiplexed tape, and each stream
is a single, 1 gigabyte file and a default maximum fragment size of 1 TB has been
specified. The resultant backup image logically looks like the following. ‘TM’
denotes a tape mark, or file mark, that indicates the start of a fragment.
TM <4 gigabytes data> TM
When restoring any one of the 1 gigabyte files, the restore positions to the TM
and then has to read all 4 gigabytes to get the 1 gigabyte file.
If you set the maximum fragment size to 1 gigabyte:
122 Tuning the NetBackup data transfer path
    NetBackup server performance



                            TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1
                            gigabyte data> TM
                            this does not help, since the restore still has to read all four fragments to pull
                            out the 1 gigabyte of the file being restored.

                            Example 2:
                            This is the same as Example 1, but assume four streams are backing up 1
                            gigabyte worth of /home or C:\. With the maximum fragment size (Reduce
                            fragment size) set to a default of 1 TB (and assuming all streams are relatively
                            the same performance), you again end up with:
                            TM <4 gigabytes data> TM
                            Restoring /home/file1 or C:\file1 and/home/file2 or C:\file2 from one of the
                            streams will have to read as much of the 4 gigabytes as necessary to restore all
                            the data. But, if you set Reduce fragment size to 1 gigabyte, the image looks like
                            this:
                            TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1
                            gigabyte data> TM
                            In this case, home/file1 or C:\file1 starts in the second fragment, and bptm
                            positions to the second fragment to start the restore of home/file1 or C:\file1
                            (this has saved reading 1 gigabyte so far). After /home/file1 is done, if
                            /home/file2 or C:\file2 is in the third or forth fragment, the restore can position
                            to the beginning of that fragment before it starts reading as it looks for the data.
                            These examples illustrate that whether fragmentation benefits a restore
                            depends on what the data is, what is being restored, and where in the image the
                            data is. In Example 2, reducing the fragment size from 1 gigabyte to half a
                            gigabyte (512 Megabytes) increases the chance the restore can locate by
                            skipping instead of reading when restoring relatively small amounts of an
                            image.


                            Fragmentation and checkpoint restart
                            If the policy’s Checkpoint Restart feature is enabled, NetBackup creates a new
                            fragment at each checkpoint, based on the Take checkpoints every setting. For
                            more information on Checkpoint Restart, refer to the NetBackup System
                            Administrator’s Guide, Volume I.


    Other restore performance issues
                            Common reasons for restore performance issues are described in the following
                            subsections.
                                                 Tuning the NetBackup data transfer path   123
                                                          NetBackup server performance



NetBackup catalog performance
The disk subsystem where the NetBackup catalog resides has a large impact on
the overall performance of NetBackup. To improve restore performance,
configure this subsystem for fast reads. NetBackup binary catalog format
provides scalable and fast catalog access.


NUMBER_DATA_BUFFERS_RESTORE setting
This parameter can help keep other NetBackup processes busy while a
multiplexed tape is positioned during a restore. Increasing this value causes
NetBackup buffers to occupy more physical RAM. This parameter only applies
to multiplexed restores. For more information on this parameter, see “Shared
memory (number and size of data buffers)” on page 102.


Index performance issues
For information, refer to “Indexing the Catalog for Faster Access to Backups” in
the NetBackup 6.0 System Administrator’s Guide, Volume I.


Search performance with many small backups
To improve search performance when you have many small backup images, run
the following command as root on the master server:
UNIX
/usr/openv/netbackup/bin/admincmd/bpimage -create_image_list
-client client_name
Windows
install_directory\bin\admincmd\bpimage -create_image_list -client
client_name
where client_name is the name of the client with many small backup images.
In the directory:
UNIX
    /usr/openv/netbackup/db/images/client_name
Windows
    install_path\NetBackup\db\images\client_name
the bpimage command creates the following files:

IMAGE_LIST             List of images for this client

IMAGE_INFO             Information about the images for this client

IMAGE_FILES            The file information for small images
124 Tuning the NetBackup data transfer path
    NetBackup server performance



                            Do not edit these files, because they contain offsets and byte counts that are
                            used for seeking to and reading the image information.


                            Note: These files increase the size of the client directory.



                            Restore performance in a mixed environment
                            If you encounter restore performance issues in a mixed environment (UNIX and
                            Windows), consider reducing the tcp wait interval parameter,
                            tcp_deferred_ack_interval. Under Solaris 8, the default value of this
                            parameter is 100ms. (Root privileges are required to change this parameter.)
                            The current value of tcp_deferred_ack_interval can be obtained by
                            executing the following command (this example is for Solaris):
                                 /usr/sbin/ndd -get /dev/tcp tcp_deferred_ack_interval
                            The value of tcp_deferred_ack_interval can be changed by executing the
                            command:
                                 /usr/sbin/ndd -set /dev/tcp tcp_deferred_ack_interval value
                            where value is the number which provides the best performance for the system.
                            This may have to be tried and tested as it may vary from system to system. A
                            suggested starting value is 20. In any case, the value must not exceed 500ms as
                            this may break TCP/IP.
                            Once the optimum value for the system is found, the command for setting the
                            value can be permanently set in a script under the directory /etc/rc2.d so
                            that it can be executed at boot time.


                            Multiplexing set too high
                            If multiplexing is too high, needless tape searching may occur. The ideal setting
                            is the minimum needed to stream the drives.


                            Restores from multiplexed database backups
                            NetBackup can run several restores at the same time from a single multiplexed
                            tape. This is done by means of the MPX_RESTORE_DELAY option, which
                            specifies how long, in seconds, the server waits for additional restore requests of
                            files or raw partitions that are in a set of multiplexed images on the same tape.
                            The restore requests received within this period are executed simultaneously.
                            By default, the delay is 30 seconds.
                            This may be a useful parameter to change if multiple stripes from a large
                            database backup are multiplexed together on the same tape. If the
                            MPX_RESTORE_DELAY option is changed, you do not need to stop and restart
                            the NetBackup processes for the change to take effect.
                                              Tuning the NetBackup data transfer path   125
                                                       NetBackup server performance



When bprd, the request daemon on the master server, receives the first stream
of a multiplexed restore request, it triggers the MPX_RESTORE_DELAY timer to
start counting the configured amount of time. At this point, bprd watches and
waits for related multiplexed jobs from the same client before starting the
overall job. If another associated stream is received within the timeout period, it
is added to the total job, and the timer is reset to the MPX_RESTORE_DELAY
period. Once the timeout has been reached without an additional stream being
received by bprd, the timeout window closes, all associated restore requests are
sent to bptm, and a tape is mounted. If any associated restore requests are
received after this event, they are queued to wait until the tape that is now “In
Use” is returned to an idle state.
If MPX_RESTORE_DELAY is not set high enough, NetBackup may need to mount
and read the same tape multiple times to collect all of the necessary header
information necessary for the restore. Ideally, NetBackup would read a
multiplexed tape, collecting all of the header information it needs, with a single
pass of the tape, thus minimizing the amount of time to restore.

Example (Oracle):
Suppose that MPX_RESTORE_DELAY is not set in the bp.conf file, so its value is
the default of 30 seconds. Suppose also that you initiate a restore from an Oracle
RMAN backup that was backed up using 4 channels or 4 streams, and you use
the same number of channels to restore.
RMAN passes NetBackup a specific data request, telling NetBackup what
information it needs to start and complete the restore. The first request is
passed and received by NetBackup in 29 seconds, causing the
MPX_RESTORE_DELAY timer to be reset. The next request is passed and
received by NetBackup in 22 seconds, so again the timer is reset. The third
request is received 25 seconds later, resetting the timer a third time, but the
fourth request is received 31 seconds after the third. Since the fourth request
was not received within the restore delay interval, NetBackup only starts three
of the four restores. Instead of reading from the tape once, NetBackup queues
the fourth restore request until the previous three requests are completed. Since
all of the multiplexed images are on the same tape, NetBackup mounts, rewinds,
and reads the entire tape again to collect the multiplexed images for the fourth
restore request.
Note that in addition to NetBackup's reading the tape twice, RMAN waits to
receive all the necessary header information before it begins the restore.
If MPX_RESTORE_DELAY had been larger than 30 seconds, NetBackup would
have received all four restore requests within the restore delay windows and
collected all the necessary header information with one pass of the tape. Oracle
would have started the restore after this one tape pass, improving the restore
performance significantly.
126 Tuning the NetBackup data transfer path
    NetBackup storage device performance



                            MPX_RESTORE_DELAY needs to be set with caution, because it can decrease
                            performance if its value is set too high. Suppose, for instance, that the
                            MPX_RESTORE_DELAY is set to 1800 seconds. When the final associated
                            restore request arrives, NetBackup resets the request delay timer as it did with
                            the previous requests. NetBackup then must wait for the entire 1800-second
                            interval before it can start the restore.
                            Therefore, try to set the value of MPX_RESTORE_DELAY so it is neither too high
                            or too low.



    NetBackup storage device performance
                            This section looks at storage device functionality in the NetBackup data transfer
                            path. Changes in these areas may improve NetBackup performance.
                            Tape drive wear and tear is much less, and efficiency is greater, if the data
                            stream matches the tape drive capacity and is sustained. Generally speaking,
                            most tape drives have much slower throughput than disk drives. Match the
                            number of drives and the throughput per drive to the speed of the SCSI/FC
                            connection, and/or follow the hardware vendors’ recommendations.
                            These are some of the factors which affect tape drives:

                            Media positioning
                            When a backup or restore is performed, the storage device must position the
                            tape so that the data is over the read/write head. Depending on the location of
                            the data and the overall performance of the media device, this can take a
                            significant amount of time. When you conduct performance analysis with media
                            containing multiple images, it is important to account for the time lag that
                            occurs before the data transfer starts.

                            Tape streaming
                            If a tape device is being used at its most efficient speed, it is said to be streaming
                            the data onto the tape. Generally speaking, if a tape device is streaming, there
                            will be little physical stopping and starting of the media. Instead the media will
                            be constantly spinning within the tape drive. If the tape device is not being used
                            at its most efficient speed, it may continually start and stop the media from
                            spinning. This behavior is the opposite of tape streaming and usually results in a
                            poor data throughput rate.

                            Data compression
                            Most tape devices support some form of data compression within the tape
                            device itself. Compressible data (such as text files) yields a higher data
                            throughput rate than non-compressible data, if the tape device supports
                            hardware data compression.
                                            Tuning the NetBackup data transfer path   127
                                             NetBackup storage device performance



Tape devices typically come with two performance rates: maximum throughput
and nominal throughput. Maximum throughput is based on how fast
compressible data can be written to the tape drive when hardware compression
is enabled in the drive. Nominal throughput refers to rates achievable with
non-compressible data.


Note: Tape drive data compression cannot be set by NetBackup. Follow the
instructions provided with your OS and tape drive to be sure data compression is
set correctly.

In general, tape drive data compression is preferable to client (software)
compression such as that available in NetBackup. Client compression may be
desirable in some cases, such as for reducing the amount of data transmitted
across the network for a remote client backup. See “Tape versus client
compression” on page 133 for more information.
128 Tuning the NetBackup data transfer path
    NetBackup storage device performance
                                          Chapter                          9
Tuning other NetBackup
components
      This chapter provides guidelines and recommendations for improving
      performance in certain features or components of NetBackup.
      This chapter includes the following sections:
      ■   “Multiplexing and multi-streaming” on page 130
      ■   “Encryption” on page 133
      ■   “Compression” on page 133
      ■   “Using encryption and compression” on page 134
      ■   “NetBackup Java” on page 134
      ■   “Vault” on page 134
      ■   “Fast recovery with bare metal restore” on page 135
      ■   “Backing up many small files” on page 135
      ■   “NetBackup Operations Manager (NOM)” on page 137
130 Tuning other NetBackup components
    Multiplexing and multi-streaming




    Multiplexing and multi-streaming
                           Consider the following factors regarding multiplexing and multi-streaming.


    When to use multiplexing and multi-streaming
                           Multiple data streams can reduce the time for large backups. The reduction is
                           achieved by splitting the data to be backed up into multiple streams and then
                           using multiplexing, multiple drives, or a combination of the two for processing
                           the streams concurrently. In addition, configuring the backup so each physical
                           device on the client is backed up by a separate data stream that runs
                           concurrently with streams from other devices can significantly reduce backup
                           times.


                           Note: For best performance, use only one data stream to back up each physical
                           device on the client. Running multiple concurrent streams from a single
                           physical device can adversely affect the time to back up that device because the
                           drive heads must move back and forth between tracks containing the files for
                           the respective streams.

                           Multiplexing is not recommended for database backups, when restore speed is of
                           paramount interest, or when your tape drives are slow.
                           Backing up across a network, unless the network bandwidth is very broad, can
                           nullify the ability to stream. Typically, a single client can send enough data to
                           saturate a single 100BaseT network connection. A gigabit network has the
                           capacity to support network streaming for some clients. Keep in mind that
                           multiple streams use more of the client’s resources than a single stream. We
                           recommend testing to make sure that the client can handle the multiple data
                           streams and that the users are not affected by the high rate of data transfer.
                           Multiplexing and multi-streaming can be powerful tools to ensure that all tape
                           drives are streaming. With NetBackup, both can be used at the same time. It is
                           important to distinguish between the two concepts:
                           ■   Multiplexing writes multiple data streams to a single tape drive.
                                                    Tuning other NetBackup components     131
                                                       Multiplexing and multi-streaming



Figure 9-4              Multiplexing diagram




                                          server


    clients

                                                   backup to tape




■      Multi-streaming writes multiple data streams, each to its own tape drive,
       unless multiplexing is used.

Figure 9-5              Multistreaming diagram




         server




              backup to tape




Here are some things to consider with regard to multiplexing:
Experiment with different multiplexing factors to find one where the tape drive
is just streaming, that is, where the writes just fill the maximum bandwidth of
your drive. This is the optimal multiplexing factor. For instance, if you
determine that you can get 5 Megabytes/sec from each of multiple concurrent
read streams, then you would use a multiplexing factor of two to get the
maximum throughput to a DLT7000 (that is, 10 Megabytes/sec).
■      Use a higher multiplexing factor for incremental backups.
■      Use a lower multiplexing factor for local backups.
■      Expect the duplication of a multiplexed tape to take a longer period of time
       if it is demultiplexed, because multiple read passes of the source tape must
       be made.
132 Tuning other NetBackup components
    Multiplexing and multi-streaming



                           ■   When you duplicate a multiplexed backup, demultiplex it.
                               By demultiplexing the backups when they are duplicated, the time for
                               recovery is significantly reduced.
                           Do not use multi-streaming on single mount points. Multi-streaming takes
                           advantage of the ability to stream data from several devices at once. This
                           permits backups to take advantage of Read Ahead on a spindle or set of spindles
                           in RAID environments. Multi-streaming from a single mount point encourages
                           head thrashing and may result in degraded performance. Only conduct
                           multistreamed backups against single mount points if they are mirrored (RAID
                           0). However, this also is likely to result in degraded performance.


    Effects of multiple data streams on backup/restore
                           ■   Multiplexing
                               To use multiplexing effectively, you must understand the implications of
                               multiplexing on restore times. Multiplexing may decrease overall backup
                               time when you are backing up large numbers of clients over slow networks,
                               but it does so at the cost of recovery time. Restores from multiplexed tapes
                               must pass over all nonapplicable data. This action increases restore times.
                               When recovery is required, demultiplexing causes delays in the restore
                               process. This is because NetBackup must do more tape searching to
                               accomplish the restore.
                               Restores should be tested, before the need to do a restore arises, to
                               determine the impact of multiplexing on restore performance.
                               When you initially set up a new environment, keep the multiplexing factor
                               low. Typically, a multiplexing factor of four or less does not highly impact
                               the speed of restores, depending on the type of drive and the type of system.
                               If the backups do not finish within their assigned window, multiplexing can
                               be increased to meet the window. However, increasing the multiplexing
                               factor provides diminishing returns as the number of multiplexing clients
                               increases. The optimum multiplexing factor is the number of clients needed
                               to keep the buffers full for a single tape drive.
                               Set the multiplexing factor to four and do not multistream. Run
                               benchmarks in this environment. Then, if needed, you can begin to change
                               the values involved until both the backup and restore window parameters
                               are met.
                           ■   Multi-streaming
                               The NEW_STREAM directive is useful for fine-tuning streams so that no
                               disk subsystem is under-utilized or over-utilized.
                                                             Tuning other NetBackup components    133
                                                                                     Encryption




Encryption
             When the NetBackup encryption option is enabled, your backups may run
             slower. How much slower depends on the throttle point in your backup path. If
             the network is the issue, encryption should not hinder performance. If the
             network is not the issue, then encryption may slow down the backup.
             Note that some local backups actually ran faster with encryption than without
             it. In some field test cases, memory utilization has been found to be roughly the
             same with and without encryption.



Compression
             Two types of compression can be used with NetBackup, client compression
             (configured in the NetBackup policy) and tape drive compression (handled by
             the device hardware). Some or all of the files may also have been compressed by
             other means prior to the backup.


How to enable compression
             NetBackup client compression can be enabled by selecting the compression
             option in the NetBackup Policy Attributes window.
             How tape drive compression is enabled depends on your operating system and
             the type of tape drive. Check with the operating system and drive vendors, or
             read their documentation to find out how to enable tape compression.
             With UNIX device addressing, these options are frequently part of the device
             name. A single tape drive has multiple names, each with a different
             functionality built into the name. (This is really done with major and minor
             device numbers.) So, for instance, on Solaris, if you address /dev/rmt/2cbn, you
             get drive 2, hardware-compressed, with no-rewind option. If you address
             /dev/rmt/2n, its function should be uncompressed with the no-rewind option.
             The choice of device names determines device behavior.
             If the media server is UNIX, there is no compression when the backup is to a disk
             storage unit. The compression options in this case are limited to client
             compression. If the media server with the disk storage unit is Windows, and the
             directory used by the disk storage unit is compressed, then there will be
             compression on the disk write just as there would be for any file writes to that
             directory by any application.


             Tape versus client compression
             ■   Tape drive compression is almost always preferable to client compression.
             ■   Tape compression offloads the compression task from the client and server.
134 Tuning other NetBackup components
    Using encryption and compression



                           ■   Avoid using both tape compression and client compression, as this can
                               actually increase the amount of backed-up data.
                           ■   Only in rare cases is it beneficial to use client (software) compression. For
                               very dense data, compression algorithms take a long time and often increase
                               the overall size of the images when compressing an already compressed
                               image. In cases where the files are already compressed, devices should be
                               pointed to native device drivers. In other cases, NetBackup client
                               compression should be turned off, and the hardware should handle the
                               compression.
                           ■   On UNIX: client compression reduces the amount of data sent over the
                               network, but impacts the client. The NetBackup client configuration setting
                               MEGABYTES_OF_MEMORY may help client performance. It is undesirable
                               to compress files which are already compressed. If you find that this is
                               happening with your backups, refer to the NetBackup configuration option
                               COMPRESS_SUFFIX. Edit this setting through bpsetconfig.



    Using encryption and compression
                           If a policy is enabled for both encryption and compression, the client first
                           compresses the backup data and then encrypts it.
                           Client data compression (configured in the policy) is generally not needed,
                           because data compression is handled internally by tape drives. When data is
                           encrypted, it becomes randomized, and is no longer compressible. Therefore,
                           data compression must be performed prior to any data encryption. In
                           considering whether or not to use NetBackup client compression, see
                           “Compression” on page 133.



    NetBackup Java
                           For performance improvement, refer to the following sections in the NetBackup
                           System Administrator’s Guide for UNIX and Linux, Volume I: “Configuring the
                           NetBackup-Java Administration Console,” and the subsection “NetBackup-Java
                           Performance Improvement Hints.” In addition, the NetBackup Release Notes
                           may contain information about NetBackup Java performance.



    Vault
                           Refer to the “Best Practices” chapter of the NetBackup Vault System
                           Administrator’s Guide.
                                                           Tuning other NetBackup components       135
                                                           Fast recovery with bare metal restore




Fast recovery with bare metal restore
           Veritas Bare Metal Restore (BMR) provides a simplified, automated method by
           which to recover an entire system (including the operating system and
           applications). BMR automates the restore process to ensure rapid, error-free
           recovery. This process requires one Bare Metal Restore command and then a
           system boot. BMR guarantees integrity and consistency and is supported for
           both UNIX and Windows systems.


           Note: BMR requires the True image restore option. This option has implications
           on the size of the NetBackup catalog. Refer to “Calculate the size of your
           NetBackup catalog” on page 22 for more details.




Backing up many small files
           NetBackup will take longer to back up multiple small files of the same total size
           as a single large file. The following may improve performance when backing up
           many small files.
           ■   Use the FlashBackup (or FlashBackup-Windows) policy type. This is a
               feature of NetBackup Advanced Client. FlashBackup is described in the
               NetBackup Advanced Client System Administrator’s Guide.
               See “FlashBackup” on page 136 of this Tuning guide for a related tuning
               issue.
           ■   On Windows, make sure virus scans are turned off (this may double
               performance).
           ■   Snap a mirror (such as with the FlashSnap method in Advanced Client) and
               back that up as a raw partition. This does not allow individual file restore
               from tape.
           Some specific things to try to improve performance include:
           ■   Turn off or reduce logging.
               The NetBackup logging facility has the potential to impact the performance
               of backup and recovery processing. Logging is usually enabled only to
               troubleshoot a NetBackup problem, to ensure that any performance impact
               is short in term. The performance impact can be determined by the amount
               of logging used and the verbosity level set.
           ■   Make sure the NetBackup buffer size is the same size on both the servers
               and clients.
           ■   Consider upgrading NIC drivers as new releases appear.
136 Tuning other NetBackup components
    Backing up many small files



                           ■   Run the following bpbkar throughput test on the client with Windows:
                               C:\Veritas\Netbackup\bin\bpbkar32 -nocont > NUL 2>
                               (for example, C:\Veritas\Netbackup\bin\bpbkar32 -nocont c:\
                               > NUL 2> temp.f)
                           ■   When initially configuring the Windows server, optimize TCP/IP throughput
                               as opposed to shared file access.
                           ■   Always select the choice of boosting background performance on Windows
                               versus foreground performance.
                           ■   Turn off NetBackup Client Job Tracker if the client is a system server.
                           ■   Regularly review the patch announcements for every server OS. Install
                               patches that affect TCP/IP functions, such as correcting out-of-sequence
                               delivery of packets.


    FlashBackup
                           If using Advanced Client FlashBackup with a copy-on-write snapshot
                           method
                           If you are using the FlashBackup feature of Advanced Client with a
                           copy-on-write method such as nbu_snap, assign the snapshot cache device to a
                           separate hard drive. This will improve performance by reducing disk contention
                           and potential head thrashing due to the writing of data to maintain the
                           snapshot.

                           Tunable read buffer for Solaris (with nbu_snap method)
                           If the storage unit write speed (either tape or disk) is relatively fast, reading the
                           client disk may become a bottleneck during a FlashBackup raw partition backup.
                           By default, FlashBackup reads the raw partition using fixed 128 KB buffers for
                           full backups and 32 KB buffers for incrementals.
                           In most cases, the default read buffer size will allow FlashBackup to stay ahead
                           of the storage unit write speed. To further minimize the number of iowaits when
                           reading client data, however, you can tune the FlashBackup read buffer size,
                           allowing the nbu_snap driver to read continuous device blocks up to 1 MB per
                           iowait, depending on the disk driver support. The read buffer size can be
                           adjusted separately for both full backup and incremental backup.
                           In general, a larger buffer yields faster raw partition backup (but see the
                           following note). In the case of VxVM striped volumes, if the read buffer is
                           configured as a multiple of the striping block size, data can be read in parallel
                           from the disks, significantly speeding up raw partition backup.
                                                               Tuning other NetBackup components      137
                                                              NetBackup Operations Manager (NOM)



              How to adjust the FlashBackup read buffer for Solaris clients
              1   Create the following touch file on each Solaris client:
                       /usr/openv/netbackup/FBU_READBLKS
              2   Enter the desired values in the FBU_READBLKS file, as follows.
                  On the first line of the file, enter an integer value for the read buffer size in
                  bytes for full backups and/or enter the read buffer size in bytes for
                  incremental backups. The default is to read the raw partition in 131072
                  bytes (128 KB) during full backups and in 32768 bytes (32 KB) for
                  incremental backups. If changing both values, separate them with a space.
                  For example, to set the full backup read buffer to 256 KB and the
                  incremental read buffer to 64 KB, enter the following on the first line of the
                  file:
                  262144 65536
                  You can use the second line of the file to set the tape record write size, also
                  in bytes. The default is the same size as the read buffer. The first entry on
                  the second line sets the full backup write buffer size, the second value sets
                  the incremental backup write buffer size.


              Note: Resizing the read buffer for incremental backups can result in a faster
              backup in some cases, and a slower backup in others. The result depends on such
              factors as the location of the data to be read, the size of the data to be read
              relative to the size of the read buffer, and the read characteristics of the storage
              device and the I/O stack. Experimentation may be necessary to achieve the best
              setting.




NetBackup Operations Manager (NOM)
              The settings described in this section can be modified to adjust NOM
              performance.
              Information is also available on other NOM topics:
              See “Design your NOM server” on page 33.
              See “Using NOM to monitor jobs” on page 43.


Adjusting the NOM server heap size
              If the NOM server processes are consuming a lot of memory (which may happen
              with large NOM configurations), it may be helpful to increase the NOM server
              heap size.
              For example, to increase the NOM server heap size from 512 MB to 2048 MB, do
              the following.
138 Tuning other NetBackup components
    NetBackup Operations Manager (NOM)



                          ■    On Solaris servers, open the /opt/VRTSnom/bin/nomsrvctl file and
                               change the value of the MAX_HEAP parameter.
                               To increase the heap size to 2048 MB, edit the parameter as follows:
                               MAX_HEAP=-Xmx2048m
                               Save the nomsrvctl file. Then stop and restart the NOM processes, as
                               follows:
                               /opt/VRTSnom/bin/NOMAdmin -stop_service
                               /opt/VRTSnom/bin/NOMAdmin -start_service
                          ■    On Windows servers, open the Registry Editor and go to the following
                               location:
                 HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VRTSnomSrvr\Parameters
                               Increase the JVM Option Number 0 value. To increase the heap size to 2048
                               MB, enter -Xmx2048.
                               Then stop and restart the NOM services, as follows:
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -stop_service
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -start_service


    Adjusting the NOM web server heap size
                          If you notice poor performance in the NOM console and restarting the NOM Web
                          service fixes the problem, you should increase the NOM web server heap size.
                          The NOM web server heap size can be increased from 256 MB (the default) up to
                          2048 MB. For example, to change the heap size to 512 MB, do the following:
                          ■    On Solaris servers, run the webgui command as follows:
                               /opt/VRTSweb/bin/webgui maxheap 512
                               Then stop and restart the NOM processes:
                               /opt/VRTSnom/bin/NOMAdmin -stop_service
                               /opt/VRTSnom/bin/NOMAdmin -start_service
                          ■    On Windows servers, run webgui.exe as follows:
                               install_path\VRTSweb\bin\webgui.exe maxheap 512
                               Then stop and restart the NOM services:
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -stop_service
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -start_service


    Adjusting the Sybase cache size
                          The amount of memory available for use as a database server cache is one of the
                          important factors controlling NOM performance. Symantec recommends that
                          you adjust the Sybase cache size after installing NOM. After you install NOM,
                          the database size can grow rapidly as you add more master servers.
                          Sybase automatically adjusts the cache size for optimum performance. You can
                          also set the cache size using the -c server option, as follows.
                                                 Tuning other NetBackup components   139
                                                NetBackup Operations Manager (NOM)



To set the cache size using the -c server option on Solaris servers
1   Open the /opt/VRTSnom/var/global/nomserver.conf file and
    change the value of the -c option.
    For example, you can increase the Sybase cache size to 512 MB by changing
    the nomserver.conf file content in the following manner.
    -n NOM_nom-sol6
    -x tcpip(BROADCASTLISTENER=0;DOBROADCAST=NO;ServerPort=13786;)
    -gp 8192 -ct+ -gd DBA -gk DBA -gl DBA -ti 60 -c 25M -ch 500M -cl
    25M -ud -m
    should be changed to
    -n NOM_nom-sol6
    -x tcpip(BROADCASTLISTENER=0;DOBROADCAST=NO;ServerPort=13786;)
    -gp 8192 -ct+ -gd DBA -gk DBA -gl DBA -ti 60 -c 512M -cs -ud -m

    Note: This example replaced -c 25M -ch 500M -cl 25M with -c 512M
    -cs in the nomserver.conf file to increase the cache size to 512 MB. To
    increase the cache size to 1 GB, you should replace -c 25M -ch 500M -cl
    25M with -c 1G -cs.
    The -ch and -cl server options are used to set the maximum and the
    minimum cache size respectively.
    -cs option logs the cache size changes for the database server.

2   Save the nomserver.conf file.
3   Stop and restart the NOM processes, as follows:
    /opt/VRTSnom/bin/NOMAdmin -stop_service
    /opt/VRTSnom/bin/NOMAdmin -start_service
    The logs for the cache size changes are stored in
    /opt/VRTSnom/logs/nomdbsrv.log.

To set the cache size using the -c server option on Windows servers
1   Open the install_path\db\conf\server.conf file.
    To increase the cache size to 512 MB, add -c 512M -cs to the content of
    server.conf file. This is shown in the following example.
    -n NOM_PUNENOM -x
    tcpip(BROADCASTLISTENER=0;DOBROADCAST=NO;ServerPort=13786) -o
    "install_path\db\log\server.log" -m
    should be changed to
    -n NOM_PUNENOM -x
    tcpip(BROADCASTLISTENER=0;DOBROADCAST=NO;ServerPort=13786) -o
    "install_path\db\log\server.log" -c 512M -cs -m
140 Tuning other NetBackup components
    NetBackup Operations Manager (NOM)




                               Note: -cs option logs the cache size changes for the database server.
                               To increase the cache size to 1 GB, you should add -c 1G -cs to the
                               content of server.conf file.

                          2    Stop and restart the NOM services, as follows:
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -stop_service
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -start_service
                               The logs for the cache size changes are stored in
                               install_path\db\log\server.log.


    Saving NOM databases and database log files on separate physical
    hard disks
                          To improve NOM performance, NOM database files and the log files associated
                          with the NOM databases should be stored on separate physical hard disks. For
                          example, you can store the NOM database files on one hard disk and the log files
                          on another hard disk.
                          Symantec also recommends that you not store the database files on the hard
                          disk that contains your operating system files.
                          Use the following procedures to move the NOM database and log files to a
                          different hard disk. The first two procedures are for moving the NOM database
                          files on Windows or Solaris; the last two procedures are for moving the database
                          log files.

                          To move the NOM database to a different hard disk on Windows
                          1    Stop all the NOM services by entering the following command:
             install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin -stop_service
                          2    Open the databases.conf file with a text editor from the following
                               directory:
                               install_path\NetBackup Operations Manager\db\conf
                               This file has the following contents:
                               “install_path \NetBackup Operations Manager\db\data\vxpmdb.db”
                               “install_path\NetBackup Operations Manager\db\data\vxam.db”
                               These paths specify the default location of the NOM primary database and
                               the alerts database respectively. Enter the new path of the database files in
                               the databases.conf file. If you want to move the database files to a
                               location such as E:\NOM located on a different hard disk, change the
                               content of databases.conf file to the following:
                               “E:\NOM\vxpmdb.db" "E:\NOM\vxam.db”
                                                          Tuning other NetBackup components    141
                                                         NetBackup Operations Manager (NOM)



               Ensure that you specify the path in double quotes. Also, the directories in
               the specified path should not contain any special characters such as %, ~, !,
               @, $, &, ^, # etc. For example, do not specify a path such as E:\NOM%.
           3   Save the databases.conf file.
           4   Copy the database files to the new location.
               Copy vxpmdb.db and vxam.db from install_path\NetBackup
               Operations Manager\db\data to a location such as E:\NOM on another
               hard disk.
               You should run and monitor NOM for a certain period after moving the
               database. If NOM works as expected, you can delete vxpmdb.db and
               vxam.db from install_path\NetBackup Operations
               Manager\db\data.
           5   Restart all the NOM services:
install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin -start_service

           To move the NOM database to a different hard disk on Solaris
           1   Stop all NOM processes by entering the following command:
               /opt/VRTSnom/bin/NOMAdmin -stop_service
           2   Rename /opt/VRTSnom/db/data. For example:
               mv /opt/VRTSnom/db/data /opt/VRTSnom/db/olddata
           3   To move the database to a location such as /usr1/mydata on a different
               hard disk, create a symbolic link to /usr1/mydata in
               /opt/VRTSnom/db/olddata.
               To do this, enter the following command:
               ln -s /usr1/mydata /opt/VRTSnom/db/olddata
           4   Copy the primary and alerts databases to a location such as /usr1/mydata
               on a different hard disk. Enter the following commands to copy the
               database:
               cp /opt/VRTSnom/db/olddata/vxpmdb.db /usr1/mydata
               cp /opt/VRTSnom/db/olddata/vxam.db /usr1/mydata
               You should run and monitor NOM for a certain period after moving the
               database. If NOM works as expected, you can delete vxpmdb.db and
               vxam.db from /opt/VRTSnom/db/olddata.
           5   Restart all NOM processes:
               /opt/VRTSnom/bin/NOMAdmin -start_service

           To move the database log files to a different hard disk on Windows
           1   Stop all NOM services:
install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin -stop_service
           2   Navigate to the following location:
142 Tuning other NetBackup components
    NetBackup Operations Manager (NOM)



                               install_path\NetBackup Operations Manager\db\WIN32
                               Enter the following commands:
                               dblog –t directory_path\vxpmdb.log database_path\vxpmdb.db
                               dblog –t directory_path\vxam.log database_path\vxam.db
                               where directory_path is the path where you want to store the database
                               logs and database_path is the path where your database is located.
                               This command moves the log file associated with the NOM primary
                               database to the new directory (directory_path). It is recommended to
                               use vxpmdb.log or vxam.log as the name of the log file.
                          3    Restart all NOM services:
        install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -start_service

                          To move the database log files to a different hard disk on Solaris
                          1    Stop all the NOM processes:
                               /opt/VRTSnom/bin/NOMAdmin -stop_service
                          2    Set the path of the LD_LIBRARY_PATH variable in the following manner:
                               LD_LIBRARY_PATH=/opt/VRTSnom/db/lib
                               export LD_LIBRARY_PATH
                          3    Navigate to the following location:
                               /opt/VRTSnom/db/bin
                               Enter the following commands:
                               ./dblog –t directory_path/vxpmdb.log database_path/vxpmdb.db
                               ./dblog –t directory_path/vxam.log database_path/vxam.db
                               where directory_path is the path where you want to store your database
                               log file and database_path is the path where the NOM database is located.
                               This command moves the log file associated with the NOM primary
                               database to the new directory (directory_path). It is recommended to
                               use vxpmdb.log or vxam.log as the name of the log file.
                          4    Restart all NOM processes:
                               /opt/VRTSnom/bin/NOMAdmin -start_service


    Defragment NOM databases
                          For optimum performance, you should defragment the NOM databases
                          periodically. It is also important to defragment the NOM databases after a purge
                          operation.
                          To defragment the NOM database, you must first export and then import the
                          database. You must run the export and import commands consecutively
                          (without any time gap) to avoid data loss.

                          To defragment the NOM primary and alerts databases in NOM 6.0 MP5
                          1    Start NOMAdmin:
                                                          Tuning other NetBackup components   143
                                                         NetBackup Operations Manager (NOM)



                  Windows:
                  install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin
                  Solaris:
                  /opt/VRTSnom/bin/NOMAdmin
              2   Enter the following commands:
                  NOMAdmin -export directory_name
                  NOMAdmin -import directory_name
                  The directory location (directory_name) must be the same in both
                  commands.


Purge data periodically
              It is recommended that you purge the NOM data periodically. For optimum
              performance and scalability, Symantec recommends that you manage
              approximately a month of historical data. See the “Database maintenance
              utilities (NOMAdmin)” section in the NetBackup Operations Manager Guide for
              commands to purge the alerts and jobs data.


              Note: The NOM databases should be defragmented after a purge operation.
144 Tuning other NetBackup components
    NetBackup Operations Manager (NOM)
                                          Chapter                     10
Tuning disk I/O
performance
      This chapter describes the hardware issues affecting disk performance with
      NetBackup. This information is intended to provide a general approach to disk
      tuning, not specific recommendations for your environment. Based on your
      hardware and other requirements unique to your site, you can use this
      information to adjust your configuration for better performance.
      This chapter includes the following sections:
      ■   “Hardware performance hierarchy” on page 146
      ■   “Hardware configuration examples” on page 153
      ■   “Tuning software for better performance” on page 154


      Note: The critical factors in performance are not software-based. They are
      hardware selection and configuration. Hardware has roughly four times the
      weight that software has in determining performance.
146 Tuning disk I/O performance
    Hardware performance hierarchy




    Hardware performance hierarchy
                             The following diagram shows the key hardware elements, and the
                             interconnections (levels) between them, which affect performance. The diagram
                             shows two disk arrays and a single non-disk device (tape, Ethernet connections,
                             and so forth).

                             Figure 10-6          Performance hierarchy diagram




                             Host
                                                                              Memory
      Level 5
                                                      PCI bridge                                   PCI bridge
                               PCI bus                                        PCI bus
                                                                                                            PCI bus
      Level 4

                      PCI card                                                             PCI                          PCI
                         1                                                                card 2                       card 3




      Level 3       Fibre channel                                    Fibre channel




                   Array 1                                   Array 2

                               Raid controller                               Raid controller                      Tape,
                                                                                                                Ethernet,
                                                                                                                    or
      Level 2                                                                                                    another
                                                                                                                non-disk
                   Shelf                 Shelf               Shelf                   Shelf                        device
                     Shelf adaptor        Shelf adaptor            Shelf adaptor        Shelf adaptor
      Level 1



                        Drives               Drives                 Drives                Drives




                             Performance hierarchy levels are described in later sections of this chapter.
                                                                       Tuning disk I/O performance   147
                                                                   Hardware performance hierarchy



          In general, all data going to or coming from disk must pass through host
          memory. In the following diagram, a dashed line shows the path that the data
          takes through a media server.

          Figure 10-7      Data stream in NetBackup media server to arrays




              Host
                                              Memory
Level 5
                                                       PCI bridge
                                                                 PCI bus
Level 4
                                                                      PCI card

                                                                                 Data moving
                                                                                 through host
Level 3                       Fibre channel                                        memory


                                 Array 2
                                           Raid controller       Tape,
Level 2                                                         Ethernet
                                                                  , or
                                        Shelf                   another
Level 1                                            Shelf        non-disk


                                                  Drives



          The data moves up through the ethernet PCI card at the far right. The card sends
          the data across the PCI bus and through the PCI bridge into host memory.
          NetBackup then writes this data to the appropriate location. In a disk example,
          the data passes through one or more PCI bridges, over one or more PCI buses,
          through one or more PCI cards, across one or more fibre channels, and so on.
          Sending data through more than one PCI card increases bandwidth by breaking
          up the data into large chunks and sending a group of chunks at the same time to
          multiple destinations. For example, a write of 1 MB could be split into 2 chunks
          going to 2 different arrays at the same time. If the path to each array is x
          bandwidth, the aggregate bandwidth will be approximately 2x.
          Each level in the Performance Hierarchy diagram represents the transitions
          over which data will flow. These transitions have bandwidth limits.
          Between each level there are elements that can affect performance as well.
148 Tuning disk I/O performance
    Hardware performance hierarchy



    Performance hierarchy level 1
                              Level 1 is the interconnect within a typical disk array that attaches individual
                              disk drives to the adaptor on each disk shelf. A shelf is a physical entity placed
                              into a rack. Shelves usually contain around 15 disk drives. If you use fibre
                              channel drives, the Level 1 interconnect is 1 or 2 fibre channel arbitrated loop
                              (FC-AL). When Serial ATA (SATA) drives are used, the Level 1 interconnect is the
                              SATA interface.
                   Shelf                    Shelf                   Shelf                    Shelf                     Tape,
                    Shelf adaptor             Shelf adaptor          Shelf adaptor             Shelf adaptor         Ethernet,
     Level 1                                                                                                             or
                                                                                                                      another
                                                                                                                     non-disk
                                                                                                                       device
                          Drives                 Drives                 Drives                   Drives


                              Level 1 bandwidth potential is determined by the technology used.
                              For FC-AL, the arbitrated loop could be either 1 gigabit or 2 gigabit fibre
                              channel. An arbitrated loop is a shared-access topology, which means that only 2
                              entities on the loop can be communicating at one time. For example, one disk
                              drive and the shelf adaptor can communicate. So even though a single disk drive
                              might be capable of 2 gigabit bursts of data transfers, there is no aggregation of
                              this bandwidth (that is, multiple drives cannot be communicating with the shelf
                              adaptor at the same time, resulting in multiples of the individual drive
                              bandwidth).


    Performance hierarchy level 2
                              Level 2 is the interconnect external to the disk shelf. It attaches one or more
                              shelves to the array RAID controller. This is usually FC-AL, even if the drives in
                              the shelf are something other than fibre channel (SATA, for example). This
                              shared-access topology allows only one pair of endpoints to communicate at any
                              given time.
                Array 1                                   Array 2                                          Tape,
                                                                                                          Ethernet
                           Raid controller                          Raid controller                         , or
                                                                                                          another
     Level 2                                                                                              non-disk
                                                                                                           device

                Shelf               Shelf                 Shelf             Shelf
                     Shelf                   Shelf             Shelf                 Shelf
                                                                             Tuning disk I/O performance   149
                                                                         Hardware performance hierarchy



                    Larger disk arrays will have more than one internal FC-AL. Shelves may even
                    support 2 FC-AL so that there will be two paths between the RAID controller and
                    every shelf, which provides for redundancy and load balancing.


Performance hierarchy level 3
                    Level 3 is the interconnect external to the disk array and host.
                  Host




          Fibre channel                               Fibre channel
Level 3




          Array                                  Array

                    While this diagram shows a single point-to-point connection between an array
                    and the host, a real-world use more typically includes a SAN fabric (having one
                    or more fibre channel switches). The logical result is the same, in that either is a
                    data path between the array and the host.
                    When these paths are not arbitrated loops (for example, if they were fabric fibre
                    channel), they do not have the shared-access topology limitations. That is, if two
                    arrays are connected to a fibre channel switch and the host has a single fibre
                    channel connection to the switch, the arrays can be communicating at the same
                    time (the switch does the coordination with the host fibre channel connection).
                    However, this does not aggregate bandwidth, since the host is still limited to a
                    single fibre channel connection.
                    Fibre channel is generally 1 or 2 gigabit (both arbitrated loop and fabric
                    topology). Faster speeds are coming on the market. A general rule-of-thumb
                    when considering protocol overhead is that one can divide the gigabit rate by 10
                    to get an approximate megabyte-per-second bandwidth. So, 1-gigabit fibre
                    channel can theoretically achieve approximately 100 MB/second and 2-gigabit
                    fibre channel can theoretically achieve approximately 200 MB/second.
                    Fibre channel is also similar to traditional LANs, in that a given interface can
                    support multiple connection rates. That is, a 2-gigabit fibre channel port will
                    also connect to devices that only support 1 gigabit.
150 Tuning disk I/O performance
    Hardware performance hierarchy



    Performance hierarchy level 4
                           Level 4 is the interconnect within a host for the attachment of PCI cards.

                                                 PCI bridge                            PCI bridge
                            PCI bus                                PCI bus                      PCI bus
      Level 4
                     PCI card
                                                                              PCI                         PCI card
                        1
                                                                             card 2                          3


                           A typical host will support 2 or more PCI buses, with each bus supporting 1 or
                           more PCI cards. A bus has a topology similar to FC-AL in that only 2 endpoints
                           can be communicating at the same time. That is, if there are 4 cards plugged into
                           a PCI bus, only one of them can be communicating with the host at a given
                           instant. Multiple PCI buses are implemented to allow multiple data paths to be
                           communicating at the same time, resulting in aggregate bandwidth gains.
                           PCI buses have 2 key factors involved in bandwidth potential: the width of the
                           bus - 32 or 64 bits, and the clock or cycle time of the bus (in Mhz).
                           As a rule of thumb, a 32-bit bus can transfer 4 bytes per clock and a 64-bit bus
                           can transfer 8 bytes per clock. Most modern PCI buses support both 64-bit and
                           32-bit cards. Currently PCI buses are available in 4 clock rates:
                           ■    33 Mhz
                           ■    66 Mhz
                           ■    100 Mhz (sometimes referred to as PCI-X)
                           ■    133 Mhz (sometimes referred to as PCI-X)
                           PCI cards also come in different clock rate capabilities.
                           Backward-compatibility is very common; for example, a bus rated at 100 Mhz
                           will support 100, 66, and 33 Mhz cards.
                           Likewise, a 64-bit bus will support both 32-bit and 64-bit cards.
                           They can also be mixed; for example, a 100-Mhz 64-bit bus can support any mix
                           of clock and width that are at or below those values.
                                                                        Tuning disk I/O performance   151
                                                                    Hardware performance hierarchy




              Note: In a shared-access topology, a slow card can negatively impact the
              performance of other fast cards on the same bus. This is because the bus adjusts
              to the right clock and width for each transfer. One moment it could be doing 100
              Mhz 64 bit to card #2 and at another moment doing 33 Mhz 32 bit to card #3.
              Since the transfer to card #3 will be so much slower, it takes longer to complete.
              The time that is lost may otherwise have been used for moving data faster with
              card #2.

              You should also remember that a PCI bus is a unidirectional bus, which means
              that when it is doing a transfer in one direction, it cannot move data in the other
              direction, even from another card.

              Real-world bandwidth is generally around 80% of the theoretical maximum
              (clock * width). Following are rough estimates for bandwidths that can be
              expected:
              64 bit/ 33 Mhz = approximately 200 MB/second
              64 bit/ 66 Mhz = approximately 400 MB/second
              64 bit/100 Mhz = approximately 600 MB/second
              64 bit/133 Mhz = approximately 800 MB/second


Performance hierarchy level 5
              Level 5 is the interconnect within a host between PCI bridge(s) and memory.
              This bandwidth is rarely a limiting factor in performance.


                  Host
                                                          Memory
   Level 5
                                     PCI bridge                          PCI bridge




General notes on performance hierarchies
              The hardware components between interconnect levels can also have an impact
              on bandwidth.
              ■      A drive has sequential access bandwidth and average latency times for seek
                     and rotational delays.
                     Drives perform optimally when doing sequential I/O to disk. Non-sequential
                     I/O forces movement of the disk head (that is, seek and rotational latency).
152 Tuning disk I/O performance
    Hardware performance hierarchy



                               This movement is a huge overhead compared to the amount of data
                               transferred. So the more non-sequential I/O done, the slower it will get.
                               Reading and/or writing more than one stream at a time will result in a mix
                               of short bursts of sequential I/O with seek and rotational latency in
                               between, which will significantly degrade overall throughput. Because
                               different drive types have different seek and rotational latency
                               specifications, the type of drive selected has a large effect on how much the
                               degradation will be.
                               From best to worst, such drives are fibre channel, SCSI, and SATA, with
                               SATA drives usually twice the latency of fibre channel. However, SATA
                               drives have about 80% the sequential performance that fibre channel drives
                               do.
                           ■   A RAID controller has cache memory of varying sizes. The controller also
                               does the parity calculations for RAID-5. Better controllers have this
                               calculation (called “XOR”) in hardware which makes it faster. If there is no
                               hardware-assisted calculation, the controller processor must perform it,
                               and controller processors are not usually high performance.
                           ■   A PCI card can be limited either by the speed supported for the port(s) or the
                               clock rate to the PCI bus.
                           ■   A PCI bridge is usually not an issue because it is sized to handle whatever
                               PCI buses are attached to it.
                           Memory can be a limit if there is other intensive non-I/O activity in the system.
                           Note that there are no CPUs for the host processor(s) in the “Performance
                           hierarchy diagram” on page 146.
                           While CPU performance is obviously a contributor to all performance, it is
                           generally not the bottleneck in most modern systems for I/O intensive
                           workloads, because there is very little work done at that level. The CPU must
                           execute a read operation and a write operation, but those operations do not take
                           up much bandwidth. An exception is when older gigabit ethernet card(s) are
                           involved, because the CPU has to do more of the overhead of network transfers.
                                                                    Tuning disk I/O performance   153
                                                                Hardware configuration examples




Hardware configuration examples
          The examples below are not intended as particular recommendations for your
          site. They are factors to consider when adjusting hardware for better NetBackup
          performance.

          Example 1
          A general hardware configuration could have dual 2-gigabit fibre channel ports
          on a single PCI card. In such a case, the following is true:
          ■   Potential bandwidth is approximately 400 MB/second.
          ■   For maximum performance, the card must be plugged into at least a 66 Mhz
              PCI slot.
          ■   No other cards on that bus should need to transfer data at the same time.
              That single card will saturate the PCI bus.
          ■   Putting 2 of these cards (4 ports total) onto the same bus and expecting
              them to aggregate to 800 MB/second will never work unless the bus and
              cards are 133 Mhz.

          Example 2
          The following more detailed example shows a pyramid of bandwidth potentials
          with aggregation capabilities at some points. Suppose you have the following
          hardware:
          ■   1x 66 Mhz quad 1 gigabit ethernet
          ■   4x 66 Mhz 2 gigabit fibre channel
          ■   4x disk array with 1 gigabit fibre channel port
          ■   1x Sun V880 server (2x 33 Mhz PCI buses and 1x 66 Mhz PCI bus)
          In this case, for maximum backup and restore throughput with clients on the
          network, the following is one way to assemble the hardware so that no
          constraints limit throughput.
          ■   The quad 1-gigabit ethernet card can do approximately 400 MB/second
              throughput at 66 Mhz.
          ■   It requires at least a 66 Mhz bus, because putting it in a 33 Mhz bus would
              limit throughput to approximately 200 MB/second.
          ■   It will completely saturate the 66 Mhz bus, so do not put any other cards on
              that bus that need significant I/O at the same time.
          Since the disk arrays have only 1-gigabit fibre channel ports, the fibre channel
          cards will degrade to 1 gigabit each.
154 Tuning disk I/O performance
    Tuning software for better performance



                             ■   Each card can therefore move approximately 100 MB/second. With four
                                 cards, the total is approximately 400 MB/second.
                             ■   However, you do not have a single PCI bus available that can support that
                                 400MB /second, since the 66-Mhz bus is already taken by ethernet card.
                             ■   There are two 33-Mhz buses which can each support approximately 200
                                 MB/second. Therefore, you can put 2 of the fibre channel cards on each of
                                 the 2 buses.
                             This configuration can move approximately 400 MB/second for backup or
                             restore. Real-world results of a configuration like this show approximately 350
                             MB/second.



    Tuning software for better performance
                             Note: The size of individual I/O operations should be scaled such that the
                             overhead is relatively low compared to the amount of data moved. That means
                             the I/O size for a bulk transfer operation (such as a backup) should be relatively
                             large.

                             The optimum size of I/O operations is dependent on many factors and varies
                             greatly depending on the hardware setup.
                             Below is the performance hierarchy diagram, but in this version, each array only
                             has a single shelf.
                                                                                        Tuning disk I/O performance   155
                                                                             Tuning software for better performance



                       Figure 10-8          Example hierarchy with single shelf per array



                Host
                                                                   Memory
Level 5
                                               PCI bridge                               PCI bridge

                       PCI bus                                     PCI bus                           PCI bus
Level 4

             PCI card                                                           PCI                          PCI
                1                                                              card 2                       card 3




Level 3    Fibre channel                                    Fibre channel


          Array 1                                     Array 2

                    Raid controller                               Raid controller                      Tape,
                                                                                                     Ethernet,
                                                                                                         or
Level 2                                                                                               another
                                                                                                     non-disk
                    Shelf                                        Shelf                                 device
                     Shelf adaptor                                Shelf adaptor
Level 1



                            Drives                                    Drives



                       Note the following:
                       ■       Each shelf in the disk array has 9 drives because it uses a RAID 5 group of
                               8+1 (that is, 8 data disks + 1 parity disk).
                               The RAID controller in the array uses a stripe unit size when performing I/O
                               to these drives. Suppose that you know the stripe unit size to be 64KB. This
                               means that when writing a full stripe (8+1) it will write 64KB to each drive.
                               The amount of non-parity data is 8 * 64KB, or 512KB. So, internal to the
                               array, the optimal I/O size is 512KB. This means that crossing Level 3 to the
                               host PCI card should perform I/O at 512KB.
                       ■       The diagram shows two separate RAID arrays on two separate PCI buses.
                               You want both to be performing I/O transfers at the same time.
                               If each is optimal at 512K, the two arrays together are optimal at 1MB.
156 Tuning disk I/O performance
    Tuning software for better performance



                                 You can implement software RAID-0 to make the two independent arrays
                                 look like one logical device. RAID-0 is a plain stripe with no parity. Parity
                                 protects against drive failure, and this configuration already has RAID-5
                                 parity protecting the drives inside the array.
                                 The software RAID-0 is configured for a stripe unit size of 512KB (the I/O
                                 size of each unit) and a stripe width of 2 (1 for each of the arrays).
                                 Since 1MB is the optimum I/O size for the volume (the RAID-0 entity on the
                                 host), that size is used throughout the rest of the I/O stack.
                             ■   If possible, configure the file system mounted over the volume for 1MB. The
                                 application performing I/O to the file system also uses an I/O size of 1MB. In
                                 NetBackup, I/O sizes are set in the configuration touch file
                                 .../db/config/SIZE_DATA_BUFFERS_DISK. See “Changing the size of
                                 shared data buffers” on page 105 for more information.
                                          Chapter                  11
OS-related tuning factors
      This chapter provides OS-related tuning recommendations that can improve
      NetBackup performance.
      This chapter includes the following sections:
      ■   “Kernel tuning (UNIX)” on page 158
      ■   “Adjusting data buffer size (Windows)” on page 163
      ■   “Other Windows issues” on page 165
158 OS-related tuning factors
    Kernel tuning (UNIX)




    Kernel tuning (UNIX)
                                Several kernel tunable parameters can affect NetBackup performance on UNIX.


                                Note: Keep in mind that changing these parameters may affect other
                                applications that use the same parameters. Making sizeable changes to these
                                parameters may result in performance trade-offs. Usually, the best approach is
                                to make small changes and monitor the results.



    Kernel parameters on Solaris 8 and 9
                                The Solaris operating system dynamically builds the operating system kernel
                                with each boot of the system.
                                The parameters below reflect minimum settings for a system dedicated to
                                running Veritas NetBackup software.


                                Note: The parameters described in this section can be used on Solaris 8, 9, and
                                10. However, many of the following parameters are obsolete in Solaris 10. See
                                “Kernel parameters in Solaris 10” on page 160 for a list of the parameters now
                                obsolete in Solaris 10 and for further assistance with Solaris 10 parameters.

                                Below are brief definitions of the message queue, semaphore, and shared
                                memory parameters. The parameter definitions apply to a Solaris system. The
                                values for these parameters can be set in the file /etc/system.
                                ■   Message queues
                                    set msgsys:msginfo_msgmax = maximum message size
                                    set msgsys:msginfo_msgmnb = maximum length of a message queue in
                                    bytes. The length of the message queue is the sum of the lengths of all the
                                    messages in the queue.
                                    set msgsys:msginfo_msgmni = number of message queue identifiers
                                    set msgsys:msginfo_msgtql = maximum number of outstanding messages
                                    system-wide that are waiting to be read across all message queues.
                                ■   Semaphores
                                    set semsys:seminfo_semmap = number of entries in semaphore map
                                    set semsys:seminfo_semmni = maximum number of semaphore identifiers
                                    system-wide
                                    set semsys:seminfo_semmns = number of semaphores system-wide
                                    set semsys:seminfo_semmnu = maximum number of undo structures in
                                    system
                                    set semsys:seminfo_semmsl = maximum number of semaphores per id
                                                         OS-related tuning factors   159
                                                             Kernel tuning (UNIX)



    set semsys:seminfo_semopm = maximum number of operations per semop
    call
    set semsys:seminfo_semume = maximum number of undo entries per
    process
■   Shared memory
    set shmsys:shminfo_shmmin = minimum shared memory segment size
    set shmsys:shminfo_shmmax = maximum shared memory segment size
    set shmsys:shminfo_shmseg = maximum number of shared memory
    segments that can be attached to a given process at one time
    set shmsys:shminfo_shmmni = maximum number of shared memory
    identifiers that the system will support
The ipcs -a command displays system resources and their allocation, and is a
useful command to use when a process is hanging or sleeping to see if there are
available resources for it to use.

Example:
This is an example of tuning the kernel parameters for NetBackup master
servers and media servers, for a Solaris 8 or 9 system. Symantec provides this
information only to assist in kernel tuning for NetBackup. See “Kernel
parameters in Solaris 10” on page 160 for Solaris 10.
These are recommended minimum values. If /etc/system already contains
any of these entries, use the larger of the existing setting and the setting
provided here. Before modifying /etc/system, use the command
/usr/sbin/sysdef -i to view the current kernel parameters.
After you have changed the settings in /etc/system, reboot the system to
allow the changed settings to take effect. After rebooting, the sysdef command
will display the new settings.
     *BEGIN NetBackup with the following recommended minimum settings in a
     Solaris /etc/system file
     *Message queues
     set msgsys:msginfo_msgmap=512
     set msgsys:msginfo_msgmax=8192
     set msgsys:msginfo_msgmnb=65536
     set msgsys:msginfo_msgmni=256
     set msgsys:msginfo_msgssz=16
     set msgsys:msginfo_msgtql=512
     set msgsys:msginfo_msgseg=8192
     *Semaphores
     set semsys:seminfo_semmap=64
     set semsys:seminfo_semmni=1024
     set semsys:seminfo_semmns=1024
160 OS-related tuning factors
    Kernel tuning (UNIX)



                                    set semsys:seminfo_semmnu=1024
                                    set semsys:seminfo_semmsl=300
                                    set semsys:seminfo_semopm=32
                                    set semsys:seminfo_semume=64
                                    *Shared memory
                                    set shmsys:shminfo_shmmax=16777216
                                    set shmsys:shminfo_shmmin=1
                                    set shmsys:shminfo_shmmni=220
                                    set shmsys:shminfo_shmseg=100
                                    *END NetBackup recommended minimum settings
                                ■   Socket Parameters on Solaris 8 and 9
                                    The TCP_TIME_WAIT_INTERVAL parameter sets the amount of time to
                                    wait after a TCP socket is closed before it can be used again. This is the time
                                    that a TCP connection remains in the kernel's table after the connection has
                                    been closed. The default value for most systems is 240000, which is 4
                                    minutes (240 seconds) in milliseconds. If your server is slow because it
                                    handles many connections, check the current value for
                                    TCP_TIME_WAIT_INTERVAL and consider reducing it.
                                    For Solaris or HP-UX, use the following command:
                                         ndd -get /dev/tcp tcp_time_wait_interval
                                ■   Force load parameters on Solaris 8 and 9
                                    When system memory gets low, Solaris unloads unused drivers from
                                    memory and reloads drivers as needed. Tape drivers are a frequent
                                    candidate for unloading, since they tend to be less heavily used than disk
                                    drivers. Depending on the timing of these unload and reload events for the
                                    st (Sun), sg (Symantec), and Fibre Channel drivers, various issues may
                                    result. These issues can range from devices “disappearing” from a SCSI bus
                                    to system panics.
                                    Symantec recommends adding the following “forceload” statements to the
                                    /etc/system file. These statements prevent the st and sg drivers from
                                    being unloaded from memory:
                                         forceload: dev/st
                                         forceload: dev/sg
                                    Other statements may be necessary for various Fibre Channel drivers, such
                                    as the following example for JNI:
                                         forceload: dev/fcaw


    Kernel parameters in Solaris 10
                                In Solaris 10, all System V IPC facilities are either automatically configured or
                                can be controlled by resource controls. Facilities that can be shared are memory,
                                                                       OS-related tuning factors   161
                                                                           Kernel tuning (UNIX)



             message queues, and semaphores. For information on tuning these system
             resources, see Chapter 6, “Resource Controls (Overview),” in the Sun System
             Administration Guide: Solaris Containers-Resource Management and Solaris
             Zones.”
             For further assistance with Solaris parameters, refer to the Solaris Tunable
             Parameters Reference Manual, available at:
             http://docs.sun.com/app/docs/doc/819-2724?q=Solaris+Tunable+Parameters
             The following sections of the Solaris Tunable Parameters Reference Manual may
             be of particular interest:
             ■   What’s New in Solaris System Tuning in the Solaris 10 Release?
             ■   System V Message Queues
             ■   System V Semaphores
             ■   System V Shared Memory


             Parameters obsolete in Solaris 10
             The parameters below are obsolete in Solaris 10. Although they can be included
             in the Solaris /etc/system file and are used to initialize the default resource
             control values, Sun does not recommend their use in Solaris 10.
                 semsys:seminfo_semmns
                 semsys:seminfo_semvmx
                 semsys:seminfo_semmnu
                 semsys:seminfo_semaem
                 semsys:seminfo_semume
                 semsys:seminfo_semusz
                 semsys:seminfo_semmap
                 shmsys:shminfo_shmseg
                 shmsys:shminfo_shmmin
                 msgsys:msginfo_msgmap
                 msgsys:msginfo_msgseg
                 msgsys:msginfo_msgssz
                 msgsys:msginfo_msgmax


Message queue and shared memory parameters on HP-UX
                 The kernel parameters that deal with message queues and shared memory
                 can be mapped to work on an HP-UX system. Below is a list of HP kernel
                 tuning parameter settings.

             Table 11-17      Kernel tuning parameters for HP-UX

             Name                       Minimum Value

             mesg                       1
162 OS-related tuning factors
    Kernel tuning (UNIX)



                                Table 11-17     Kernel tuning parameters for HP-UX

                                Name                     Minimum Value

                                msgmap                   514

                                msgmax                   8192

                                msgmnb                   65536

                                msgssz                   8

                                msgseg                   8192

                                msgtql                   512

                                msgmni                   256

                                sema                     1

                                semmap                   semmni+2

                                semmni                   300

                                semmns                   300

                                semmnu                   300

                                semume                   64

                                semvmx                   32767

                                shmem                    1

                                shmmni                   300

                                shmseg                   120

                                shmmax                   Calculate shmmax using the formula provided under
                                                         “Recommended shared memory settings” on page 107.*


                                         *shmmax = NetBackup shared memory allocation =
                                         (SIZE_DATA_BUFFERS * NUMBER_DATA_BUFFERS) * number of
                                         drives * MPX per drive
                                         SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS are also
                                         discussed under “Recommended shared memory settings” on page 107.
                                    To change the above kernel parameters, use the System Administration
                                    Manager (SAM) unless you have great familiarity with changing kernel
                                    parameters and rebuilding the kernel from the command line.
                                    From SAM, select Kernel Configuration > Configurable Parameters. Find
                                    the parameter to change and select Actions > Modify Configurable
                                    Parameter. Then key in the new value. This should be done for all the
                                                                          OS-related tuning factors   163
                                                               Adjusting data buffer size (Windows)



                  desired parameters. Once all the values have been changed, select Actions >
                  Process New Kernel. This will bring up a warning to inform that a reboot
                  will be required to move the values into place. After the reboot, the sysdef
                  command can be used to confirm that the correct value is in place.


              Caution: Any changes to the kernel will require a reboot in order to move the
              new kernel into place. Do not make changes to the parameters unless a system
              reboot can be performed, or the changes will not be saved.



Kernel parameters on Linux
              To modify the Linux kernel tunable parameters, use sysctl. sysctl is used to view,
              set, and automate kernel settings in the /proc/sys directory. Most of these
              parameters can be changed online. To make your changes permanent, edit
              /etc/sysctl.conf. The kernel must have support for the procfs file system
              statically compiled in or dynamically loaded as a module.
              The default buffer size for tapes is 32K on Linux. To change it, either rebuild the
              kernel, making changes to st_options.h, or add a resize parameter to the
              startup of Linux. An example for a grub.conf entry is:
                  title Red Hat Linux (2.4.18-24.7.x)
                  root (hd0,0)
                  kernel /vmlinuz-2.4.18-24.7.x ro root=/dev/hda2
                  st=buffer_kbs:256,max_buffers:8
                  initrd /initrd-2.4.18-24.7.x.img
              For further information on setting boot options for st, see
              /usr/src/linux*/drivers/scsi/README.st, subsection BOOT TIME.



Adjusting data buffer size (Windows)
                  The limit on the size of the data buffers possible under Windows is 1024
                  Kilobytes. This is calculated as a multiple of operating system pages (1 page
                  = 4 Kilobytes). Therefore, the maximum is 256 OS pages counting from 0 to
                  255 (the hex value 0xFF). Setting anything larger will default back to 64
                  Kilobytes, as that is the default size for the Scatter Gather List.
                  The setting of the maximum usable block size is dependent on the Host Bus
                  Adapter (HBA) miniport driver, not the tape driver or the OS. For example,
                  the readme for the QLogic QLA2200 card contains the following:
                  * MaximumSGList
                  Windows includes enhanced scatter/gather list support for doing very large
                  SCSI I/O transfers. Windows supports up to 256 scatter/gather segments of
                  4096 bytes each, allowing transfers up to 1048576 bytes.
164 OS-related tuning factors
    Adjusting data buffer size (Windows)




                                  Note: The OEMSETUP.INF file has been updated to automatically update
                                  the registry to support 65 scatter/gather segments. Normally, no additional
                                  changes will be necessary as this typically results in the best overall
                                  performance.


                             To change the data buffer size, do the following:
                             1    Click Start > Run and open the REGEDT32 program.
                             2    Select HKEY_LOCAL_MACHINE and follow the tree structure down to the
                                  QLogic driver as follows: HKEY_LOCAL_MACHINE > SYSTEM >
                                  CurrentControlSet > Services > Ql2200 > Parameters > Device.
                             3    Double click MaximumSGList:REG_DWORD:0x21
                             4    Enter a value from 16 to 255 (0x10 hex to 0xFF). A value of 255 (0xFF)
                                  enables the maximum 1 Megabyte transfer size. Setting a value higher than
                                  255 reverts to the default of 64-Kilobyte transfers. The default value is 33
                                  (0x21).
                             5    Click OK.
                             6    Exit the Registry Editor, then shut down and reboot the system.
                                  The main definition here is the so-called SGList, that is, Scatter/Gather list.
                                  This is the number of pages that can be either scattered or gathered (that is,
                                  read or written) in one DMA transfer. For the QLA2200, you set the
                                  parameter MaximumSGList to 0xFF (or just to 0x40 for 256Kb) and can then
                                  set 256Kb buffer sizes for NetBackup. Extreme caution should be used when
                                  attempting to modify this registry value, and the vendor of the SCSI/Fiber
                                  card should always be contacted first to ascertain the maximum value that
                                  particular card can support.
                                  The same should be possible for other HBAs as well, especially fiber cards.
                                  The default for JNI fiber cards using driver version 1.16 is actually 0x80
                                  (512Kb or 128 pages). The default for the Emulex LP8000 is 0x81 (513Kb or
                                  129 pages).
                                  Note that for this approach to work, the HBA has to install its own SCSI
                                  miniport driver. If it does not, transfers will be limited to 64 Kilobytes. This
                                  is for legacy cards like old SCSI cards.
                                  In conclusion, the built-in limit on Windows is 1024 Kilobytes, unless you
                                  are using the default Microsoft miniport driver for legacy cards. The
                                  limitations are all to do with the HBA drivers and the limits of the physical
                                  devices attached to them.
                                  For example, Quantum DLT7000 drives work best with 128-Kilobyte buffers
                                  and StorageTek 9840 drives with 256-Kilobyte buffers. If these values are
                                                                      OS-related tuning factors   165
                                                                        Other Windows issues



              increased too far, this could result in damage to either the HBA or the tape
              drives or any devices in between (fiber bridges and switches, for example).



Other Windows issues
          ■   Troubleshooting NetBackup’s use of configuration files on Windows
              systems.
              If you create a configuration file on a Windows system for NetBackup’s use
              (on UNIX systems, such files are called touch files), the file name must
              match the file name that NetBackup is expecting. In particular, make sure
              the file name does not have an extension, such as .txt.
              If, for instance, you create a file called NOexpire to prevent the expiration of
              backup images, this file will not produce the desired effect if the file’s name
              is NOexpire.txt.
              Note also: the file must use a supported type of encoding, such as ANSI.
              Unicode encoding is not supported; if the file is in Unicode, it will not
              produce the desired effect.
              To check the encoding type, open the file using a tool that displays the
              current encoding, such as Notepad. Select File > Save As and check the
              options in the Encoding field. ANSI encoding will work properly.
          ■   Disable antivirus software when running file system backup on Windows
              2000 or Windows XP. Antivirus applications scan all files backed up by
              NetBackup, and load down the client’s CPU and slow its backups.
              As a work around, in the Backup, Archive, and Restore interface, on the
              NetBackup Client Properties dialog, General tab, clear the checkbox next to
              Perform incrementals based on archive bit.
166 OS-related tuning factors
    Other Windows issues
                                              Appendix                               A
Additional resources
              This chapter lists additional sources of information.


Performance tuning information at Vision online
              For additional information on NetBackup tuning, go to http://van.veritas.com
              and click Vision Online 2006, then Data Management. Items S172, S173, and
              S174 are the Veritas NetBackup performance tuning sessions.


Performance monitoring utilities
              ■   Storage Mountain, previously called Backup Central, is a resource for all
                  backup-related issues. It is located at http://www.storagemountain.com.
              ■   The following article discusses how and why to design a scalable data
                  installation: “High-Availability SANs,” Richard Lyford, FC Focus Magazine,
                  April 30, 2002.


Freeware tools for bottleneck detection
              ■   Iperf, for measuring TCP and UDP bandwidth:
                  http://dast.nlanr.net/Projects/Iperf1.1.1/index.htm
              ■   Bonnie, for measuring the performance of UNIX file system operations:
                  http://www.textuality.com/bonnie
              ■   Bonnie++, extends the capabilities of Bonnie:
                  http://www.coker.com.au/bonnie++/readme.html
              ■   Tiobench, for testing I/O performance with multiple running threads:
                  http://sourceforge.net/projects/tiobench/
168 Additional resources




    Mailing list resources
                           ■   You can find Veritas NetBackup news groups at:
                               http://forums.veritas.com.
                               Search on the keyword “NetBackup” to find threads relevant to NetBackup.
                           ■   The email list Veritas-bu discusses backup-related products such as
                               NetBackup. Archives for Veritas-bu are located at:
                               http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
                           ■   The Usenet news group comp.arch.storage.
                                                          Index


Symbols                                             data buffer size (Windows) 163
                                                    error threshold 55
/dev/null 84
                                                    network communications buffer 97
/dev/rmt/2cbn 133
                                                    read buffer size 136
/dev/rmt/2n 133
                                               Advanced Client 64, 135, 136
/etc/rc2.d 124
                                               AIX 99
/etc/system 158
                                               All Log Entries report 79, 81, 83
/proc/sys (Linux) 163
                                               ALL_LOCAL_DRIVES 29, 50
/user/openv/netbackup/db/config/SIZE_DATA_BU
                                               alphabetical order, storage units 44
  FFERS 105
                                               ANSI encoding 165
/user/openv/netbackup/db/config/SIZE_DATA_BU
                                               antivirus software 165
  FFERS_DISK 105
                                               arbitrated loop 148
/usr/openv/netbackup 100
                                               archive bit 165
/usr/openv/netbackup/bin/admincmd/bpimage 12
                                               archiving catalog 47
  3
                                               array RAID controller 148
/usr/openv/netbackup/db/error 83
                                               arrays 95
/usr/openv/netbackup/db/images 123
                                               ATA 21
/usr/openv/netbackup/db/media 56
                                               ATM card 31
/usr/openv/netbackup/logs 84
                                               auto-negotiate 97
/usr/sbin 57
                                               AUTOSENSE 97
/usr/sbin/devfsadm 58
                                               available_media report 60
/usr/sbin/modinfo 57
                                               available_media script 54, 60
/usr/sbin/modload 58
                                               Avg. Disk Queue Length counter 88
/usr/sbin/modunload 57
/usr/sbin/ndd 124
/usr/sbin/tapes 57                             B
                                               backup
Numerics                                           catalog 50
                                                   database 64
1000BaseT 21
                                                   disk or tape 60
100BaseT 21
                                                   environment, dedicated or shared 60
100BaseT cards 31
                                                   large catalog 47
10BaseT 21
                                                   load adjusting 94
10BaseT cards 31
                                                   load leveling 93
                                                   monopolizing devices 94
A                                                  user-directed 77
ACS 58                                             window 64, 92
acsd 58                                        Backup Central 167
ACSLS communications 58                        Backup Tries parameter 56
Activity Monitor 80                            balancing load 93
adjusting                                      bandwidth 151
    backup load 94                             bandwidth limiting 93
170 Index




   Bare Metal Restore (BMR) 135                       archiving 47
   best practices 66, 68, 71                          backup requirements 93
   Bonnie 167                                         backups not finishing 47
   Bonnie++ 167                                       backups, guidelines 46
   boot options (Linux) 163                           calculating size of 23
   bottlenecks 79, 97                                 compression 47, 48
        freeware tools for detecting 167              large backups 47
   bp.conf file 56, 101                               managing 46
   bpbkar 96, 109, 110, 112                      Checkpoint Restart 122
   bpbkar log 83, 84                             child delay values 108
   bpbkar32 96, 109, 110                         CHILD_DELAY file 108
   bpdm log 83                                   cleaning
   bpdm_dev_null 85                                   robots 68
   bpend_notify.bat 95                                tape drives 66
   bpimage 123                                        tapes 60
   bpmount -i 50                                 client
   bprd 41, 44                                        compression 133
   bpsetconfig 134                                    convert to media server 92
   bpstart_notify.bat 95                              tuning performance 95
   bptm 108, 109, 110, 112, 115, 118, 120             variables 78
   bptm log 56, 83                               Client Job Tracker 136
   buffers 97, 163                               clock or cycle time 150
        and FlashBackup 136                      Committed Bytes 87
        changing 104                             common system resources 85
        changing Windows buffers 164             communications
        default number of 102                         buffer 99
        default size of 103                           process 109
        for network communications 99            Communications Buffer Size parameter 98, 99
        shared 102                               comp.arch.storage 168
        tape 102                                 COMPRESS_SUFFIX option 134
        testing 107                              compression 88, 133
        wait and delay 108                            and encryption 134
        Windows 163                                   catalog 47, 48
   bus 54                                             how to enable 133
                                                      tape vs client 133
                                                 configuration files (Windows) 165
   C                                             configuration guidelines 49
   cache device (snapshot) 136                   CONNECT_OPTIONS 42
   calculate                                     controller 88
        actual data transfer rate required 19    copy-on-write snapshot 136
        length of backups 18                     counters 108
        network transfer rate 21                      algorithm 110
        number of robotic tape slots needed 26        determining values of 112
        number of tape drives needed 20               in Windows performance 86
        number of tapes needed 25                     wait and delay 108
        shared memory 103                        CPU 84, 86
        size of catalog 23                            and performance 152
        space needed for NBDB database 23, 24         load, monitoring 84
   cartridges, storing 68                             utilization 42
   catalog 123
                                                                                       Index   171




CPUs needed per media server component 31        reconfiguration 57
critical policies 50                        devlinks 57
cumulative-incremental backup 17            disable TapeAlert 68
custom reports                              disaster recovery 68, 69
     available media 60                          testing 43
cycle time 150                              disk
                                                 full 88
                                                 increase performance 88
D                                                load, monitoring 87
daily_messages log 51                            performance, issues affecting 145
data buffer                                      speed, measuring 84
     overview 102                                staging 44
     size 97, 163                                versus tape 60
data compression 126                        Disk Queue Length counter 88
Data Consumer 111                           disk speed, measuring 85
data path through server 147                Disk Time counter 87
data producer 111                           disk-based storage 60
data recovery, planning for 68, 69          diskperf command 87
data stream and tape efficiency 126         disks, adding 95
data throughput 78                          DNS server 48
     statistics 79                          down drive 55, 56
data transfer path 79, 90                   drive controllers 20
     basic tuning 91                        drive selection 58
data transfer rate                          drive_error_threshold 55, 56
     for drive controllers 20               drives, number per network connection 54
     for tape drives 18                     drvconfig 57
     required 19
data variables 78
database                                    E
     backups 64, 130                        email list (Veritas-bu) 168
     protect against failure 64             EMM 41, 48, 54, 58, 60, 67
     restores 124                           EMM database
databases                                        derived from pre-6.0 databases 24
     list of pre-6.0 databases 24           EMM server
DB2 restores 124                                 calculating space needed for 23, 24
Deactivate command 77                            moving off master 49
dedicated backup servers 92                 encoding, file 165
dedicated private networks 92               encryption 133
delay                                            and compression 134
     buffer 108                             error logs 56, 80
     in starting jobs 40                    error threshold value 54
     values, parent/child 108               Ethernet connection 146
de-multiplexing 91                          evaluating components 84, 85
designing                                   evaluating performance
     master server 27                            Activity Monitor 80
     media server 31                             All Log Entries report 81
Detailed Status tab 80                           encryption 133
devfsadmd daemon 57                              NetBackup clients 95
device                                           NetBackup servers 102
     names 133                                   network 77, 96
172 Index




        overview 76                            elements affecting performance 146
   exclude lists 49                            performance considerations 151
                                          heap size, adjusting
                                               NOM server 137
   F                                           NOM web server 138
   factors                                hierarchy, disk 146
         in choosing disk vs tape 61      host memory 147
         in job scheduling 41             host name resolution 78
   failover, storage unit groups 44       hot catalog backup 46, 50
   fast-locate 120
   FBU_READBLKS 137
   FC-AL 148, 150                         I
   fibre channel 147, 149                 I/O operations
         arbitrated loop 148                   scaling 154
         connection 54                    I/O overhead 64
   file encoding 165                      IMAGE_FILES 123
   file ID on vxlogview 50                IMAGE_INFO 123
   file system space 45                   IMAGE_LIST 123
   files                                  improving performance, see tuning
         backing up many small 135        include lists 49
         Windows configuration 165        increase disk performance 88
   firewall settings 42                   incremental backups 61, 92, 131
   FlashBackup 135, 136                   index performance 123
   force load parameters (Solaris) 160    info on tuning 167
   forward space filemark 120             insufficient memory 87
   fragment size 119, 121                 interfaces, multiple 101
         considerations in choosing 119   ipcs -a command 159
   fragmentation 95                       Iperf 167
         databases 64                     iSCSI 21
         level 88                         iSCSI bus 54
   freeware tools 167
   freeze media 54, 55, 56
   frequency-based tape cleaning 66
                                          J
   frozen volume 55                       Java interface 33, 134
   full backup 61                         job
   full duplex 96                              delays 41
                                               scheduling 40, 41
                                               scheduling, limiting factors 41
   G                                      Job Tracker 96
   Gigabit Ethernet cards 31              jobs queued 40, 41
   Gigabit Fibre Channel 21               JVM Option Number 0 (NOM) 138
   globDB 24
   goodies directory 60
   groups of storage units 44
                                          K
                                          kernel tuning
                                              Linux 163
   H                                          Solaris 158
   hardware
       components and performance 151
       configuration examples 153
                                          L
                                          larger buffer (FlashBackup) 136
                                                                                  Index   173




largest fragment size 119                      not available 54
latency 151                                    pools 60
legacy logs 51                                 positioning 126
leveling load among backup components 93       threshold for errors 54
library-based tape cleaning 68             media and device selection logic 58
Limit jobs per policy attribute 40, 94     media errors database 24
limiting fragment size 119                 Media List report 54
link down 97                               media manager
Linux, kernel tunable parameters 163           drive selection 58
load                                       Media multiplexing setting 40
     leveling 93                           media server 32
     monitoring 86                             convert from client 92
     parameters (Solaris) 160                  designing 31
local backups 131                              factors in sizing 33
Log Sense page 67                              not available 41
logging 77                                     number needed 32
logs 51, 56, 80, 112, 135                      number supported by a master 30
     managing 50                           media_error_threshold 55, 56
     viewing 50                            mediaDB 24
long-term storage 61                       MEGABYTES_OF_MEMORY 134
ltidevs 24                                 memory 147, 151, 152
LTO drives 25, 31                              amount required 32, 103
                                               insufficient 87
                                               monitoring use of 84, 87
M                                              shared 102
mailing lists 168                          merging master servers 48
managing                                   message queue 158
    logs 50                                message queue parameters
    the catalog 46                             HP-UX 161
Manual Backup command 77                   migration 66
master server                              Mode Select page 67
    CPU utilization 42                     Mode Sense 67
    designing 27                           modload command 58
    determining number of 29               modunload command 57
    splitting 48                           monitoring
MAX_HEAP setting (NOM) 138                     data variables 78
Maximum concurrent write drives 40         MPX_RESTORE_DELAY option 124
Maximum jobs per client 40                 MTFSF/MTFSR 120
Maximum Jobs Per Client attribute 94       multiple drives, storage unit 40
Maximum streams per drive 41               multiple interfaces 101
maximum throughput rate 127                multiple small files, backing up 135
Maximum Transmission Unit (MTU) 108        multiplexed backups
MaximumSGList 163, 164                         and fragment size 120
MDS 58                                         database backups 124
measuring                                  multiplexed image, restoring 121
    disk read speed 84, 85                 multiplexing 61, 91, 130
    NetBackup performance 76                   and memory required 32
media                                          effects of 132
    catalog 55                                 schedule 40
    error threshold 55
174 Index




       set too high 124                          tuning and servers 92
       when to use 130                           variables 77
   multi-streaming 130                      Network Buffer Size parameter 99, 115
       NEW_STREAM directive 132             NEW_STREAM directive 132
       when to use 130                      news groups 168
                                            no media available 54
                                            NO_TAPEALERT touch file 68
   N                                        NOexpire touch file 44, 165
   namespace.chksum 24                      NOM 32, 43, 60
   naming conventions 71                         adjusting
       policies 71                                    server heap size 137
       storage units 72                               Sybase cache size 138
   NBDB database 23, 24                               web server heap size 138
   NBDB.log 47                                   database 34
   nbemmcmd command 55                           defragment databases 142
   nbjm and job delays 41                        guidelines for sizing 33
   nbpem 44                                      store database files on separate disks 140
   nbpem and job delays 40                       to monitor jobs 43
   nbu_snap 136                             nominal throughput rate 127
   ndd 124                                  nomsrvctl file (NOM) 138
   NET_BUFFER_SZ 98, 99, 106                none pool 60
   NET_BUFFER_SZ_REST 98                    non-multiplexed restores 120
   NetBackup                                no-rewind option 133
       capacity planning 11                 NOSHM file 100
       catalog 123                          Notepad, checking file encoding 165
       job scheduling 40                    notify scripts 95
       news groups 168                      nslookup 48
       restores 119                         NUMBER_DATA_BUFFERS 104, 107, 162
       scheduler 76                         NUMBER_DATA_BUFFERS_DISK 104
   NetBackup Client Job Tracker 136         NUMBER_DATA_BUFFERS_RESTORE 104, 123
   NetBackup Java console 134
   NetBackup Operations Manager, see NOM
   NetBackup Relational Database 48         O
   NetBackup relational database files 47   OEMSETUP.INF file 164
   NetBackup Vault 134                      offload work to additional master 48
   network                                  on-demand tape cleaning 67
       bandwidth limiting 93                online (hot) catalog backup 50
       buffer size 97                       Oracle 125
       communications buffer 99                  restores 124
       connection options 42                order of using storage units 44
       connections 96                       out-of-sequence delivery of packets 136
       interface cards (NICs) 96
       load 97
       multiple interfaces 101
                                            P
       performance 77                       packets 136
       private, dedicated 92                Page Faults 87
       tapes drives and 54                  parent/child delay values 108
       traffic 97                           PARENT_DELAY file 108
       transfer rate 21                     patches 136
       tuning 96                            PCI bridge 147, 151, 152
                                                                                 Index   175




PCI bus 147, 150, 151                   reload st driver without rebooting 57
PCI card 147, 152                       report 83
performance                                  All Log Entries 81
     and CPU 152                             media 60
     and disk hardware 145              resizing read buffer (FlashBackup) 137
     and hardware issues 151            restore
     see also tuning                         and network 124
     strategies and considerations 91        in mixed environment 124
performance evaluation 76                    multiplexed image 121
     Activity Monitor 80                     of database 124
     All Log Entries report 81               performance of 122
     monitoring CPU 86                  RESTORE_RETRIES for restores 56
     monitoring disk load 87            retention period 61
     monitoring memory use 84, 87       RMAN 125
     system components 84, 85           robot
PhysicalDisk object 87                       cleaning 68
policies                                     types 58
     critical 50                        robotic_def 24
     guidelines 49                      routers 97
     naming conventions 71              ruleDB 24
Policy Update Interval 40
poolDB 24
pooling conventions 60
                                        S
port configuration                      SAN 64
     for robot types 58                 SAN fabric 149
position error 56                       SAN Media Server 92
Process Queue Length 86                 sar command 84
Processor Time 86                       SATA 21
                                        Scatter/Gather list 164
                                        schedule naming, best practices 71
Q                                       scheduling 40, 76
queued jobs 40, 41                           delays 40
                                             disaster recovery 43
                                             limiting factors 41
R                                       scratch pool 60
RAID 61, 88, 95                         SCSI bus 54
     controller 148, 152                SCSI connection 54
rate of data transfer 17                SCSI/FC connection 126
raw partition backup 136                SDLT drives 25, 31
read buffer size                        search performance 123
     adjusting 136                      semaphore (Solaris) 158
     and FlashBackup 136                Serial ATA (SATA) 148
reconfigure devices 57                  server
recovering data, planning for 68, 69         data path through 147
recovery time 61                             splitting master from EMM 49
reduce CPU overhead 64                       tuning 102
Reduce fragment size setting 119             variables 76
reduce I/O 64                           SGList 164
REGEDT32 164                            shared data buffers 102
registry 164                                 changing 104
176 Index




        default number of 102                     suspended volume 55
        default size of 103                       switches 149
   shared memory 100, 102                         Sybase cache size, adjusting 138
        amount required 103                       synthetic backups 99
        parameters, HP-UX 161                     System Administration Manager (SAM) 162
        recommended settings 107                  system resources 85
        Solaris parameters 158                    system variables, controlling 76
        testing 107
   shared-access topology 148, 151
   shelf 148
                                                  T
   SIZE_DATA_BUFFERS 106, 107, 162, 163           Take checkpoints setting 122
   SIZE_DATA_BUFFERS_DISK 105                     tape
   small files, backup of 135                          block size 103
   SMART diagnostic standard 67                        buffers 102
   snap mirror 135                                     cartridges, storing 68
   snapshot cache device 136                           cleaning 60, 67
   snapshots 96                                        compression 133
        and databases 64                               efficiency 126
   socket                                              full, frozen, suspended 60
        communications 100                             number of tapes needed for backups 25
        parameters (Solaris) 160                       position error 56
   software                                            streaming 61, 126
        compression (client) 134                       versus disk 60
        tuning 154                                tape connectivity 54
   Solaris                                             reload st driver 57
        clients and FlashBackup read buffer 137   tape drive 126
        kernel tuning 158                              cleaning 66
   splitting master server 48                          number needed 20
   SSOhosts 24                                         number per network connection 54
   st driver                                           technologies 66
        reloading 57                                   technology needed 18
   staging, disk 44, 61                                transfer rates 18
   Start Window 76                                     types 31
   STK drives 25                                  tape library
   storage device performance 126                      number of tape slots needed 26
   Storage Mountain 167                                using drives 93
   storage unit 44, 95                            TapeAlert 67
        groups 44                                 tape-based storage 60
        naming conventions 72                     tar 110
        not available 41                          tar32 110
   Storage Unit dialog 119                        TCP/IP 136
   storage_units database 24                      tcp_deferred_ack_interval 124
   storing tape cartridges 68                     testing conditions 76
   streaming (tape drive) 61, 126                 threshold
   striped volumes (VxVM) 136                          error, adjusting 55
   striping                                            for media errors 54
        block size 136                            throughput 79
        volumes on disks 92                       time to data 61
   stunit_groups 24                               Tiobench 167
                                                  TLD robotic control 58
                                                                            Index   177




TLM 58                                 volume
tlmd 58                                    frozen 55
tools (freeware) 167                       pools 60
topology (hardware) 150                    suspended 55
touch files 44, 100                    vxlogview 50
     encoding 165                          file ID 50
traffic on network 97                  VxVM striped volumes 136
transaction log file 47
transfer rate
     drive controllers 20
                                       W
     for backups 17                    wait/delay counters 108, 109, 112
     network 21                             analyzing problems 115
     required 19                            correcting problems 118
     tape drives 18                         for local client backup 112
True Image Restore option 23, 135           for local client restore 114
tuning                                      for remote client backup 113
     additional info 167                    for remote client restore 114
     basic suggestions 91              wear of tape drives 126
     buffer sizes 97, 99               webgui command 138
     client performance 95             Wide Ultra 2 SCSI 21
     data transfer path, overview 90   wild cards in file lists 49
     device performance 126            Windows Performance Monitor 85
     FlashBackup read buffer 136       Working Set, in memory 87
     Linux kernel 163
     network performance 96
     restore performance 119, 124
     search performance 123
     server performance 102
     software 154
     Solaris kernel 158


U
Ultra-3 SCSI 21
Ultra320 SCSI 21
Unicode encoding 165
unified logging, viewing 50
Usenet news group 168
user-directed backup 77


V
Vault 134
verbosity level 135
Veritas-bu email list 168
viewing logs 50
virus scans 95, 135
Vision Online 167
vmstat 84
volDB 24
178 Index

								
To top