CITTIO White Paper April 2008 The Power of Automation Why Automated Network Monitoring Is Critical to IT and Business Performance **** Table of Contents Introduction The Explosive Growth in Network Infrastructure The Business Implications of Network Monitoring The Need for Automation in Network Monitoring Why Network Management Is Seldom Fully Deployed How Automation Revolutionizes Network Monitoring Introducing the CITTIO WatchTower Automation Stack Asset Discovery Fault Monitoring Performance Monitoring Service Level Management Performance Data Visualization Root Cause Analysis and Alarm Suppression The CITTIO WatchTower Monitoring Capability Gauge Three Key Myths of Network Monitoring and Automation Conclusion Introduction Today‟s network administrators can relate to the plight of Sisyphus. In ancient Greek mythology, Sisyphus was condemned by the gods to roll a boulder up a hill. Before he reached the top, however, the boulder would thunder back down to the bottom. Sisyphus would then start again, only to have the same vicious cycle repeat itself, into eternity. Administrators of network monitoring systems face a similar dilemma. The problem is two-fold. One, explosive growth in the number of network devices and applications has outstripped the ability of IT personnel to configure and maintain network monitoring systems with traditional manual approaches. Two, systems management “mega-suites” have grown so bloated with features and functionality that they are virtually impossible to effectively implement and maintain, and typically require an inordinate amount of manual configuration. Combined, these two characteristics are tantamount to the proverbial Sisyphean boulder and can mean a never-ending, no-win battle for network administrators. The Explosive Growth in Network Infrastructure A decade ago, manual configuration and maintenance of network monitoring systems was not a problem. The pace of change in data centers and networks was modest. It wasn‟t difficult for IT personnel to manually install proprietary monitoring agents or establish thresholds on devices when only a handful of data center servers were added each quarter. Now the network extends far beyond the core IT infrastructure of data center servers, storage, applications, and network equipment. IT administrators are challenged to monitor a proliferation of IP-enabled network devices, including geographically dispersed point-of-sale (POS) terminals, environmental monitoring controls, surveillance cameras, voice over IP (VoIP) phones, radio frequency identification (RFID) scanners, and other so-called edge devices. By some expert estimates, IP-enabled devices now outnumber servers by a 20:1 ratio. Callout quote Explosive growth in network devices, systems, and applications is translating into a never-ending, no-win battle for IT personnel attempting to manually configure and maintain network monitoring systems. The core data center infrastructure is in flux, as well. Heterogeneity is on the rise as IT departments seek to support business objectives with best-of-breed hardware and software solutions. Virtualization is changing the way IT deploys and manages servers and storage by enabling systems to be added and removed on the fly. Off-the-shelf appliances are routinely plugged in or repurposed or removed. It‟s not uncommon for hundreds of systems to be added or adapted in a single evening as IT strives to meet business demands with technology. And the pace of application development has accelerated. Mashups and reusable components in service-oriented architectures (SOAs) have greatly simplified development and deployment. Microsoft .NET, Java technology, and the open source LAMP (Linux, Apache, MySQL, and PHP) stack have evolved into high productivity environments that have slashed the time required to build and deploy new applications. The sheer volume, complexity, and dynamics of the modern network defy the ability of most IT departments for manual configuration, maintenance, and modification of network monitoring systems. This isn‟t just a headache for IT personnel and a drain on IT productivity. It‟s increasingly a problem that can degrade business performance. The Business Implications of Network Monitoring A manual approach to network monitoring can have profoundly negative effects on the business that the system is intended to support. These consequences can include: Misinvestment of valuable IT resources. Network administrators burdened with the coal- shoveling chore of managing and monitoring legions of devices, systems, and applications are unable to pursue initiatives that would generate greater business value. Poor visibility into network health. The massive scope of manual labor required to monitor large device populations typically means that network monitoring implementation is chronically behind schedule; administrators have a dated and imprecise view of crucial performance characteristics. Subpar performance and greater risk of downtime. The time lag between manual configuration and real-time operations can mean that subpar performance for email servers, Web site infrastructure components, ERP applications, and other mission-critical systems goes undetected—introducing risks of deadly downtime that can cost some organizations millions of dollars an hour. Inability to make informed IT spending decisions. Without real-time insight into network performance, IT buyers can be left to a guessing game when it comes to deciding on further IT investments. Manual-oriented monitoring systems rarely deliver performance data of the scope and quality required to make informed, metrics-based IT investments. The inherent limitations of manual network monitoring configuration and maintenance are on a collision course with the business imperative for the real-time data center. Business increasingly demands real-time process execution to support strategic objectives of greater profitability and growth, and to interoperate seamlessly with growing numbers of customers, suppliers, and partners. The ability to achieve those goals depends heavily on real-time visibility into network systems performance. The Need for Automation in Network Monitoring In many organizations, subpar network monitoring is the result of over-reliance on systems management “mega-suites” that have bundled in more features and functionality than are practical in a single platform. The many bells and whistles in systems management software from the largest vendors typically require manual configuration (often with high-priced vendor consulting engagements), followed by ceaseless manual modifications as new devices, systems, and applications are incorporated. In a sense, customers are collapsing under the weight of these behemoth suites. Shortcomings in automation capabilities in mega-suite offerings and an embarrassment of features often means that customers buy a mile of functionality, but end up using just an inch. Unsurprisingly, dissatisfaction is high among users of HP, IBM, BMC, and CA products in the $10 billion management software and services market. For instance, IT operations management software providers received an average grade of slightly worse than “C” in a Gartner survey of 1 637 users of management software. And when asked in which large systems management software vendor they had the most confidence in, the majority of users selected “none of the above,” according to Gartner‟s study. In its report, Gartner said, “…it will be three to five years before [the Big Four] vendors engineer management software into integrated, consumable suites—and even longer for customers to deploy them.” Callout quote “It will be three to five years before [the Big Four] vendors engineer management software into integrated, consumable suites—and even longer for customers to deploy them.” Gartner Inc. Why Network Management Is Seldom Fully Deployed An alternative approach of custom-built network monitoring technology is similarly wanting. Growth in network systems has made it impractical for all but the smallest enterprises to build from scratch the infrastructure required to effectively monitor systems performance. Graphic Network Computing Reader Poll chart A lack of automation capabilities is a key reason why so few network management platforms are ever fully deployed. When asked when they had finished deployment of a network management platform, an alarming 70 percent of 755 respondents to a Network Computing magazine chose 2 the answer, “Finished? Ha! No one has ever finished a deployment.” Typical manual configuration tasks include: 1 Gartner Inc., “„Big Four‟ Management Software Vendor Face Competitive Threats,” April 13, 2007. 2 Network Computing, 2007 Reader Survey, November 9, 2006. Determine operating system and services Install proprietary heavy agents on individual nodes Establish polling frequency and alert thresholds Set up data collection and trap handling Devise reporting, graphing, and trending Compose alert routing and escalation rules Map network topology and root cause analysis None of these tasks is particularly challenging for a skilled network administrator. But with a network comprised of many thousands of hardware and software components, accomplishing all those chores would take a team of IT professionals many months. By then, of course, new devices will have been added, servers will have shifted, and new applications incorporated. Manually configuring a network monitoring system is only the start. The system also requires ongoing patching, upgrades, analysis, and other maintenance activities that, given a crush of other IT priorities, will often be put on the back burner or never done at all. How Automation Revolutionizes Network Monitoring th Economies of scale demand automation. From Gutenberg‟s 15 century invention of the printing press made to robotic automobile assembly lines, automation has throughout history ushered in revolutionary changes that have transformed the world. In network monitoring, automation arguably offers the single greatest potential of any innovation in improving the overall effectiveness and value of network monitoring. As networks expand at exponential rates, automated network monitoring is not just a nice-to- have—it‟s a prerequisite for large and mid-sized enterprises to ensure the health and performance of mission-critical systems. With an automated approach to network monitoring, organizations can realize: Rapid time to value: With rapid configuration capabilities, an automated system delivers dividends in days or weeks, not months or years. Platform for real-time datacenter operations: Automation eliminates the long lag times required to manually configure network monitoring and helps ensure that systems are monitored as soon as they are added or modified. Lower total cost of ownership: Large organizations can avoid hundreds of thousands of dollars in IT configuration and maintenance costs. Easy extensibility to IP-enabled edge devices: Automation makes its practical and painless to extend monitoring to such IP-enabled edge devices as POS terminals, environmental sensors, healthcare equipment, surveillance cameras, and others. Liberation of IT to pursue value-add initiatives: Network administrators are freed from menial chores and able to focus on IT initiatives that better support strategic business objectives. “Single pane of glass” visibility into network performance: An automated system helps ensure a single, consolidated, real-time view across the extended network. Introducing the CITTIO WatchTower Automation Stack CITTIO WatchTower is the first network and system monitoring software platform that automates the discovery, configuration, and monitoring of any networked device. From servers, network equipment, applications, and databases to network traffic and IP-enabled edge devices, IT organizations can monitor hundreds to tens of thousands of network nodes in a fraction of the time it would take using traditional or custom-built monitoring tools. Powered by the innovative CITTIO Automation Stack, WatchTower is a direct response to the ongoing frustrations CITTIO‟s founders experienced using traditional network monitoring products that require extensive manual effort to configure every time a device is added to the network or changed. With CITTIO WatchTower, automation does not mean inflexibility. The software‟s Web-based interface provides an easy way for administrators to overwrite or customize template-based automation rules to meet unique needs. With full support for such industry standards as SNMP, IPFIX, IPMI, NetFlow, sFlow, and others, CITTIO WatchTower Automation Stack capabilities extend across the following key areas of asset discovery, fault monitoring, performance monitoring, service level management, performance data visualization, and root cause analysis and alarm suppression. Asset Discovery Asset discovery is the process of determining what nodes (servers or devices), operating systems, applications, and services exist on a network. A manual approach to asset discovery is usually time-consuming and difficult to sustain in the face of rapid network change. WatchTower automates processes of: Node discovery (e.g., servers, routers, switches, storage) OS fingerprinting (operating system type and version) IP address parenting (determine multiple IP addresses associated with a device) Data collection (e.g., CPU, memory, hard drive characteristics) Node services (e.g., HTTP, Oracle, Exchange) Fault Monitoring Fault monitoring is a fundamental, reactive means of determining the health and performance of network components. CITTIO WatchTower automatically executes perpetual and non-invasive pings and synthetic transactions that both determine system availability and quality of performance. Service polling and testing (e.g., SQL Server running) Node quality assessment SNMP trap collection and analysis Automated alerts to faults Performance Monitoring Performance monitoring provides a more proactive examination of network systems to alert administrators of trends and anomalies that could signal performance degradation or an impending failure. End-to-end performance monitoring automation capabilities in CITTIO WatchTower spares administrators from manually devising graphs and analyzing data with: Real-time performance graphing Service and network latency graphing Historic trending for analysis and capacity planning NetFlow protocol IP traffic data collection and analysis Thresholding and proactive performance notifications Service Level Management [screenshot] Service level management provides a broader or business-oriented context to granular performance data. It enables administrators to view performance information by such service groups or categories as business unit, geographical location, or systems type (e.g., ERP, online storefront Web servers, data warehouse), rather than trying to assemble those views by hand. This higher level view is important in enabling organizations to meet both internal and external service level agreements (SLAs). WatchTower ships with baseline service level management calculation algorithms that may be readily customized by administrators to meet unique needs and establish specific SLA objectives. It automatically aggregates performance data from disparate systems, devices, and applications to supply: Business process performance views across devices, servers, applications SLA calculations at service, node, group, and category levels Analytics on SLA actuals versus goals Service level calculations on mean time to repair, to acknowledge, and between failures Performance Data Visualization [screenshot] With many monitoring solutions, building tables, graphs, port maps, and network topology diagrams is a time-consuming manual task. Too often this grunt work is never performed, and consequently organizations are unable to realize a single pane of glass view of network performance. CITTIO WatchTower relieves administrators of this burden with dynamically generated and visually rich representations of network infrastructure and performance, including automatic updates when nodes are added or removed from service. These multidimensional graphic representations may be analyzed from both historic and current perspectives to enable administrators to monitor service level trends, manage change and capacity planning, and better control network operations with: Executive dashboards with drill-through analytics Network topology diagrams Physical facility and port maps Performance graphs and tables Root Cause Analysis and Alarm Suppression Building on performance data visualization, CITTIO WatchTower features a robust root cause analysis engine to enable IT personnel to zero in on what can often be elusive and insidious failure points or weaknesses in the network infrastructure. The CITTIO solution features SNMP trap de-duplication and host-based root cause analysis to prevent “alarm floods” from multiple devices when in fact only one device is at fault. Meanwhile, network-based root cause analysis enables event correlation and suppression of downstream outrages if a switch or a switch port fails or is brought down. WatchTower supplies: Suppression to prevent floods of redundant alarms Host-based root cause analysis Network-based root cause analysis SNMP trap de-duplication Callout box Customer Success: Monitoring Edge Devices [graphic PacSun logo] Pacific Sunwear, a California-based retailer of surfing-oriented merchandise, is typical of the growing number of organizations with pressing business demands to extend network monitoring to IP-enabled edge devices. Manual configuration was impractical. IT personnel costs would have been staggering to implement and maintain network monitoring solution across edge devices in PacSun‟s 1,200 stores. CITTIO WatchTower proved an ideal solution. In just 90 days, a full system was deployed to monitor more than 6,100 POS terminals, workstations, surveillance cameras, and other devices. The rapid, high-value deployment leveraged PacSun‟s previous success with WatchTower for datacenter monitoring and gave the company new visibility and control over in-store systems essential to revenue and bottom-line growth. “Our aggressive growth has brought us to the point where we had too much at stake to rely on piecemeal approaches to network monitoring,” said Ron Ehlers, PacSun‟s vice president of information services. “CITTIO‟s core focus on ease of use and high-end functionality were key elements in our decision to implement WatchTower.” The CITTIO WatchTower Monitoring Capability Gauge CITTIO WatchTower supplies a high-level view of network monitoring maturity in the form of a speedometer-like gauge that reflects an organization‟s position in seven stages of network monitoring maturity. The technology draws on aggregated performance data to deliver a maturity rating that may be readily understood by non-technical individuals. Chaos No tools, policies, or documentation Discovery Siloed and reactive firefighting with basic fault and availability monitoring Performance-Centric Qualitative performance data with alerting and thresholding Partial Visibility Silos of visibility with <85% of nodes monitored Complete Visibility “Single pane of glass” with 85%+ of nodes monitored Service-Centric Correlation between performance and service delivery Business-Centric Network and business alignment with compliance support Autonomic Computing The intelligent, self-aware network Three Key Myths of Network Monitoring and Automation Many years of industry experience with network monitoring systems that must be manually configured and maintained have given rise to certain myths and misconceptions. Myth #1. Network Monitoring Can Never Be Fully Deployed The statement would be true if enterprises relied on manual configuration and maintenance, which simply cannot keep pace with rapid changes as data centers evolve and legions or IP- enabled devices are incorporated under the network umbrella. The Reality: Automation technology eliminates the need to invest hundreds or thousands of hours in building and maintaining network monitoring systems. CITTIO‟s more than 200 customers Myth #2. A Feature-Laden Mega-Suite Must Be the Best Network monitoring mega-suite vendors are notorious for drawing customers in with large feature sets. Yet large feature sets don‟t necessarily translate into value, especially when these systems require high degrees of manual labor. The Reality: Mega-suite network management offerings fall short in capabilities for monitoring automation; automation capabilities may consist of nothing more than basic node discovery. Many continue to require the installation of heavy proprietary monitoring agents, which becomes impractical as node populations grow to tens of thousands. Myth #3. Network Monitoring Is a Sunk Cost Without Quantifiable ROI The high price of traditional monitoring solutions and techniques have led some to believe that network monitoring is an unavoidable cost of doing business. The complexity of these systems and inordinate amounts of manual effort make it difficult if not impossible to quantify ROI or determine overall TCO. The Reality: A lean, focused network monitoring solution with robust automation capabilities lends itself to ROI and TCO calculations with a single pane of glass view that clearly reflects network performance status, streamlines troubleshooting, and aids in capacity planning. Callout box CITTIO Customers Across a range of industries, CITTIO‟s 200+ customers are realizing ROI from automation capabilities in the WatchTower network monitoring platform. America Online Booz Allen Hamilton Boston Celtics Blue Cross and Blue Shield of Hawaii BP British Aerospace Capitol Advantage Chiron Circuit City Federal Bureau of Investigation First Republic Bank Itek Systems Mervyns National Parks Conservation Association NetGear New York State Emergency Management Office Pacific Sunwear Pizza Hut T. Rowe Price Conclusion In the face of rapid network change, the approach to network monitoring must change as well. Automation is becoming a necessity for organizations that want to decisively and efficiently transition towards the real-time data center and better control fast-growing quantities of servers, devices, and applications. CITTIO believes in the power of automation to revolutionize network monitoring. The CITTIO Automation Stack is one of the ways that CITTIO WatchTower provides rapid time to value, stays relevant over time, and offers a low TCO. With automation, CITTIO WatchTower delivers "single pane of glass" visibility into IT infrastructure performance and availability so IT personnel can be proactive and focus on the business—not the Sisyphean chores of manual configuration and maintenance.