Improving Storage Infrastructure Utilization April 2007 Introduction Endless capital requests for additional storage resources are the bane of existence for many storage directors and infrastructure VPs. The seemingly unquenchable demand for more space, combined with the nagging feeling that existing space is not being used, has triggered many an internal project aimed at examining storage utilization issues. All too often, these projects attempt to chase the metric of filelevel utilization. Yet unfortunately, this may not be the right approach to solving utilization problems. Especially for large, complex storage environments, attempting to track file utilization on a global basis may be doomed to failure. This white paper proposes a more effective approach to solving the utilization challenge in complex SANs. Why Underutilization What leads application owners to overrequest storage? Through experience helping corporations gain control over networked storage assets, Onaro® has observed the following reasons: · Fear of the 2:00 AM page – It only takes one experience of being paged in the middle of the night due to insufficient database storage for a DBA to forever overrequest space. · Uncertainty of actual demand – Many application teams do not have a clear idea of exactly how much space their application will require. Factor in transient storage demands and the picture gets cloudier. · Lack of a service level for provisioning time – Without an agreedupon time frame to provision new storage, application owners are uncomfortable with an openended schedule for getting new storage in a crisis.. Without this service level, storage provisioning times are uneven at best. To overcompensate for this, teams overrequest to compensate for the unpredictable lead time. · Financial incentive to conservatively estimate storage requirements is outweighed by career disincentive of running out of space – Without a simple cost allocation system, it is always easier to overrequest space. · Space request multiplication – Space request formulas start with the application team requesting “x” amount of space. Then the system administrator doubles that amount to avoid being awakened in the middle of the night due to insufficient storage. Finally, the storage team adds another 2030% to prevent having to scramble to add more space due to the application team’s underestimation of requirements. · Uncertainty over actual loads in the SAN fabric and arrays – Without the ability to understand exactly how applications are loading the SAN fabric and arrays, many storage teams hesitate to push the envelope on both port and array utilization. The result is underutilization of switch ports and arrays that are not fully allocated. The False Promise of FileLevel Utilization To solve the storage utilization challenge, many organizations are attempting to monitor file level utilization. In Onaro’s opinion, once the size of a SAN exceeds a few hundred ports, the time, effort and cost of tracking filelevel utilization across the entire environment makes this an impractical and fruitless exercise that fails to solve the core utilization problems. Why? First, most of the utilization problem is a human behavior and operations challenge. Having reports on filelevel utilization does not address the underlying reasons for overestimating storage requirements – i.e., the 2:00 AM page problem, the uncertainty of demand, the slowness of provisioning new space, the lack of financial incentive, or space request multiplication. What filelevel utilization does provide is an early warning before a system runs out of space and a feedback mechanism that helps calibrate future storage requests. But what is the cost of globally tracking filelevel utilization? Unfortunately, understanding file utilization requires the deployment of operating system agents, along with the supporting infrastructure to manage and interrogate these agents. In the majority of organizations with whom Onaro has worked, the ability to successfully deploy agents drops off dramatically once the host count rises to about 7080. The complexity of maintaining compatible agents, operating systems, and agent control applications – combined with the cross functional demands of coordinating agent deployments makes this a task fit for Sisyphus. In a typical scenario observed by Onaro, an organization with more than 10,000 ports had several fulltime administrators just working on agent deployment. It took them over 18 months to complete a full agent roll out, and at any given point in time a large percentage of agents were not reporting. The result was a constant inability to accurately report on storage resource consumption by application. Moreover, these agentbased systems required over 50 physical servers to collect and analyze data – adding dramatically to the operating and capital cost requirements. Some systems attempt to get file utilization information without agents by logging into operating servers. However, granting such access to a management system using the Unix secure shell or the Windows management interface creates security risks that most organizations are not willing to accept. The bottom line is that there is no free lunch for attempting to capture filelevel utilization information. Furthermore, replicated space makes the success of any of these approaches even more unlikely. Agentbased filelevel utilization programs do not understand the concept of replicated volumes for a particular application. In scenarios where a source volume is replicated 2, 3, or even 10 times, allocating this cost back to the application owner is not easily done without significant manual work. One of the promises of measuring filelevel utilization is that space can be reclaimed. While theoretically possible, the cost and downtime required to reallocate underutilized space makes reclaimation difficult. In most cases, Onaro has found that although lowutilization environments may be uncovered, the space is rarely reclaimed. Instead organizations can only hope that their storage and application owners, armed with filelevel utilization information, will make better decisions on the next application provisioning cycle. Finally, some organizations that have uncovered significant underutilized allocations are hesitant to attempt to improve utilization without an understanding of overall array loading. That is because increasing utilization can decrease array and switch performance. Without a great deal of confidence in the overall load balance across the storage infrastructure, storage teams risk application brownouts by attempting to increase filelevel utilization. A Better Approach Given today’s prevailing trend toward overestimating space requirements and underutilizing storage resources, VPs of infrastructure are faced with the dual challenge of reining in capital expenditures while optimizing existing storage space. Experience proves that attempting to monitor filelevel utilization in a large, complex SAN environment fails to address this challenge. Onaro advocates a far more focused and efficient approach that enables organizations to costeffectively increase the utilization of overall storage resources. Start with Global Visibility Most organizations that attempt to rein in their underutilized storage assets do not have a global, macrolevel view of what assets are allocated to each business unit or application, let alone a microlevel view of how much disk space is actually utilized. For environments where replication is used, a macrolevel view of allocated resources is even more lacking. SANscreen® Foundation combined with SANscreen Replication Assurance can provide the macrolevel view of exactly which assets an application is using. Since SANscreen is service aware, it understands all the resources required to deliver the necessary service to an application. Since SANscreen is agentless and does not interrogate host applications, a typical 1000 to 5000 port datacenter can be up and operating with global visibility in about 8 hours. Focus Your File Utilization Efforts on the Biggest Offenders With SANscreen providing global visibility into the storage environment, storage teams can focus their attention on the worst offenders for underutilized space. Storage teams should determine the minimum amount of space they need to recover from an application to make their efforts costeffective. Is it 1TB? 5TB? 500GB? By understanding the cost to recover storage space and the minimum amount of space needed to deliver the desired ROI, storage teams can then focus on determining filelevel utilization for applications that meet this criteria. Let’s assume that recovering a 1TB block of space from an application makes economic sense. 1TB of space is about $30,000. Factoring in all the labor costs and application downtime, this could be the right amount of space necessary to costjustify the effort. If the target space utilization is 50% and the assumed utilization is about 20%, then any application that is consuming more than 2TB should be investigated to determine file utilization. To accomplish this, the storage team should work with the system administrators on a quarterly or semiannual basis to identify the top candidates for reallocation. This focused approach will yield the ROI results that a global “boiltheocean with agents” approach will not. Finally, SANscreen Foundation has the change management capabilities to successfully, quickly, and safely reallocate storage space. Provide Basic Cost Allocation Reporting Based on Onaro’s experience, most organizations do not have formalized chargeback or even costallocation mechanisms in place. But the lack of a formalized process should not stop the storage teams from reporting on exactly how much capital cost each application is consuming. This amount should also include the cost of both the source and target arrays. Starting with this basic costing information puts the organization on the right path to changing storage over allocation behavior. Shift Storage Between Tiers By monitoring traffic by application over time, it is possible to determine which applications have exceptionally low throughput requirements. Using SANscreen Application Insight combined with the path awareness of SANscreen Foundation, storage teams can easily locate candidates for migrating from Tier 1 to Tier 2 storage. Load Balance Across the SAN and Arrays With all the focus on reclaiming storage space, infrastructure teams often overlook other areas of significant cost savings. Many times storage teams will only provision their switching infrastructure to 50% of available ports out of fear of saturating the fabric. The same holds true for arrays. Utilizing SANscreen Application Insight, storage teams can balance traffic across arrays, switches and fabrics to maximize the allocation of these hardware assets. Load balancing across the SAN and arrays is the lowhanging fruit in capital cost reduction. But without historical, applicationcentric traffic information, load balancing is exceptionally difficult. Investigate Thin Provisioning Technology Finally, much as virtualization software and hypervisors are the solutions to underutilized CPUs for servers, new technologies that support thin provisioning of storage can help reduce underutilized storage assets. This is technology not available from Onaro, but from vendors such as 3PAR Data. Conclusion No one likes underutilized assets of any sort. But solving the problem of underutilized storage assets is more involved than simply tracking filelevel utilization. Onaro advocates starting with a global view of resource allocation and then focusing efforts on the greatest offenders. In addition, taking a tiered approach to storage, implementing thin provisioning technology, and balancing application load across all storage resources can also help optimize storage assets. Using all of these techniques, storage teams can dramatically reduce capital costs without the headaches associated with “boiltheocean” agentbased SRM systems.