Document Sample
IndicativeDataCollectionApproach Powered By Docstoc
					Effective Data Collection for Enterprise Monitoring
It’s not about “agent v. agentless” anymore

Executive Summary
At the heart of every approach to enterprise management is the need to monitor the IT environment for what is working and what isn’t. Today’s sophisticated monitoring approaches provide data to help IT operations determine whether a component-level problem is impacting key business services and applications. The most effective products efficiently collect performance data without a lot of setup and overhead, to provide a clear picture of the impact a problem is having (or could have) on a network or the entire enterprise. Data collection mechanisms used by enterprise management products for monitoring are categorized as either agent-based or agentless. This white paper looks at the pros and cons of agent and agentless approaches, and describes Indicative’s unique approach that combines the best of both techniques while limiting the liabilities associated with each.

Management Tools and the Rise of Agents Agent-based data collection emerged with the decentralized computing model of the 1980s, when networks of servers and powerful desktop computers came into mainstream business use. Such environments were, relative to today’s enterprise IT environments, fairly small and simple to manage. This was particularly true if agents— really just a small amount of code written to reside on a given server, network node, desktop or other device—were deployed to help collect and deliver data regarding the operational health of a given device. Such network and system management (NSM) agents are often proprietary to a given NSM vendor’s tools; for example, Hewlett-Packard’s agents only work with HP management tools, IBM’s with IBM tools, and so on. More recently, standards-based agents using the Simple Network Management Protocol (SNMP) are gaining wide acceptance for “open systems” (e.g., Unix and Linux servers, network devices, environmental and security devices) and are often included on the devices by vendors to enable monitoring. Agents have evolved to the point where they’ve become very good at “seeing” and collecting data about the health of devices on which they reside. After all, they were designed to collect device-specific information that IT administrators need to track performance and availability and be alerted on inevitable problems that arise. The downside of agents in use today is that there are significant costs associated with their use. The average purchase price of specialized agents ranges from $700 - $1500 per system or device to be monitored and there is also a real cost of administration of agent-based monitoring tools. Agents must be installed, configured and maintained on all monitored devices (typically hundreds to thousands in an enterprise IT environment).

Page 1 of 5

Standard (SNMP) agents also have limitations. They collect defined, generic sets of performance, usage and availability metrics and make these available through a standard management information base (MIB) to external monitors. The standard MIBs typically do not include important metrics needed to analyze performance and usage of today’s sophisticated application and database servers (e.g. Java platforms like BEA WebLogic™and IBM WebSphere, and vendor proprietary systems like Oracle Enterprise™, and Citrix). Vendors may add metrics to the standard MIB on their systems but typically don’t publish details of custom MIBs for external monitoring. What makes agents even more problematic is that as networks grow (e.g., become linked via the Internet into broader, interdependent enterprise environments), different tools from different vendors require that multiple agents be added to devices in order for the management tools to be at least somewhat effective. A single server, for example, could have several agents to keep it linked with every monitoring tool in use by an IT department. What’s worse, any time each management tool needs to be updated, all of the hundreds or even thousands of devices being monitored by a given tool often need to be updated as well. This involves major effort that can cause even bigger problems if they aren’t adequately maintained for upkeep. It’s a costly effort as well. Because event-based tools depend largely upon agents, they can’t tap into the wealth of valuable built-in self-monitoring data now common with today’s systems and applications. Data from those self-monitoring technologies is extremely valuable for troubleshooting, capacity planning and performance monitoring, but it’s often hidden in system logs, database and other files that proprietary agents can’t reach or even see. Take Microsoft’s Performance Monitor (Perfmon) utility, for example. It logs hundreds of Windows status and performance metrics that can be used to track nearly every aspect of Windows system health. Most conventional (eventbased) tools that rely upon agents can’t take advantage of device health data available via Perfmon. Another major issue inherent with agent-based monitoring tools is that they lack the ability to translate raw management data from simplistic alerts and logs into valuable, actionable information for IT administrators and operations personnel. The ever-growing volume of raw, unprioritized data, epitomized by event storms, often leads to delayed response to problems triggering confusion and operator overload chasing a high volume of trouble tickets. Raw event data originating at the device-level agents is not correlated (associated) with possible impact on key business applications and services – making these alerts even more difficult to manage. In a nutshell, the pros of agent-based monitoring include: • • • agents provide a widely used and well understood method for collecting server and device health data; agents are the only source of management data for some legacy systems/devices and applications that are not instrumented for self-monitoring and logging of performance information; agents may be required on proprietary systems for in-depth collection of custom troubleshooting metrics or where security policies limit open access to management data.

Page 2 of 5

The cons include: • • • • • • in large environments, the installation and upkeep of agents is inordinately timeconsuming and costly; agents impose an unacceptable “footprint” on monitored system/devices and consume system resources at a level that impacts system performance; some devices that do need monitoring can get overlooked; agents may be locked into use with specific tools from specific vendors, and are only as useful as the tool they work with; agents can consume or tie up local resources, such as host server CPU availability or device memory; agents can deliver too much data, creating event storms. Tools that use them can’t adequately sift and prioritize for meaningful events for impact or potential impact on the enterprise, network, services, etc.

Agentless Monitoring System hardware and software OEMs (most notably Microsoft and Intel) now incorporate generic self-monitoring capabilities with devices they ship, largely because of the negative issues such as the cost and complexity involved with installation and upkeep of device-centric agents. As a result, a new breed of “super” or “proxy” data collection mechanisms have been appearing in the market since the end of the 1990s. These “agentless” products are designed to take advantage of new self-monitoring capabilities being shipped with devices, such as Microsoft’s Performance Monitor (Perfmon) and Windows Monitoring Interface (WMI). These tools enable remote monitoring over network connections using “agentless” data collectors. Some enterprise applications have similar capabilities. SAP, for example, provides its own self-monitoring feature set. The SAP Computer Center Management System (CCMS) provides an overwhelming amount of performance data through built in monitoring and trace tools. The CCMS integrates information from the entire SAP environment including the SAP System, database, servers and SAP network itself. Likewise, the widely used Java application platforms like BEA WebLogic™ provide open interfaces via the Java Management eXtensions (JMX) for remote monitoring and management. The more effective agentless approaches have access to rich management data by taking full advantage of these built-in monitoring feature sets. Such agentless management products are not really “agentless,” but utilize a much smaller number of agent-like data collection elements (DCE) that take advantage of data from Perfmon and other built-in monitoring standards such as the Common Information Model (or iterations such as Microsoft’s WMI standard). The advent of IP-based discovery widened the use of agentless monitoring as a complementary mix-or-match option with agent-based approaches to management. While agentless products lift the burden of complexity imposed by agent-based tools, they sacrificed the immediacy offered by on-device monitoring technologies, rendering a delay in retrieving timely information that could introduce gaps in management visibility. Agentless

Page 3 of 5

approaches also rely on available bandwidth, which could, under degraded network conditions, interrupt system and device monitoring. To summarize, “agentless” positives include: • data collectors on only a few servers can efficiently monitor an entire enterprise that would otherwise involve hundreds of on-device agents; this simplicity means easier deployment and maintenance and overhead • multiple options for remotely collecting management data including on-board agent (SNMP) queries, log file and database queries, and standard application programming interfaces (APIs). • ability to leverage increasingly sophisticated built-in monitoring and management interface technologies such as Perfmon, WMI and JMX • the option to gather device health intelligence that’s relevant, and weed out event storms that agents are prone to creating The cons of “agentless” products are as follows: • can’t view everything, everywhere (limited to open interfaces and accessible log and database files) • real-time monitoring isn’t possible. Small sampling intervals for management data minimize gaps but can miss instantaneous fault alerts • central data collection elements become a potentially critical point of management system failure – that can be minimized by management system self-monitoring.

A New Approach: Autonomous Data Collection To avoid the issues associated with agent-based monitoring, Indicative’s engineers developed an “agentless” approach that supports with a single data collection element (DCE) both active response testing and passive data collection of device health information. It combines the monitoring capabilities of many agents into fewer, more powerful—and perhaps most importantly—autonomous data collection mechanisms that reside on strategically located servers. Through use of aggregation, performance baselines, and intelligent alert thresholds, Indicative’ translates immense volumes of raw management data into useful and actionable information. DCEs can be deployed on the local LAN to centrally monitor data center and client IT infrastructure. By deploying DCEs remotely to regional data centers or business campus LANs system and network performance data can be collected, aggregated and efficiently uploaded to the management server. The same remote DCE can execute active transaction tests to monitor user experience (application, server, network response) with centrally hosted business applications. For example, a load-balanced set of Web servers can be monitored with a single DCE for both utilization efficiency and response time. The Indicative solution delivers real-time Web service performance summaries as well as system health history and growth planning statistics. Performance baselines (normal response times for a given day and time) are automatically calculated stored and displayed in configurable graphs. Baselines provide the means for setting accurate alerting thresholds, as a percent deviation from the “normal” performance rather than arbitrary static (fixed-point) thresholds. This compensates for normal performance variations that occur during the business day as a result of service load variations. Baseline trends for

Page 4 of 5

individual servers or groups of servers are made available for highly accurate capacity planning. Operations personnel can set intelligent thresholds at different levels for different days and points in time, enabling alerting limits to be tailored according to usage patterns and staff operating procedures. This eliminates the “event storms” and high trouble ticket load typical of many agent-based enterprise monitoring systems. The most effective and efficient approach to monitoring heterogeneous and secure environments is to employ a combination of “agentless” and agent-based techniques. Indicative’s approach uses predominantly agentless monitoring techniques except where security or policy constraints dictate agent-based server monitoring. Active and transactionbased tests literally exercise technology, emulating actual users, at specified intervals while passive data collectors simply pull diagnostic data from the network and components (e.g. CPU utilization, disk space.) Open monitoring protocols such as Perfmon, SNMP, and remote shell, are used to enable data collection in virtually any network and system environment. Unlike other solutions, Indicative agents can perform any and all tests as well as executing multi-faceted auto-discovery of devices and applications to be monitored. For those organizations that already have agents from other management vendors (e.g., IBM, BMC, HP), Indicative can leverage that data and seamlessly integrate it into its overall monitoring scheme. To accommodate typical enterprise network security perimeters, firewalls and intrusion detection services, Indicative includes a number of secure communication options. These enable configuration of agentless and agent-based solutions within and across enterprise LAN and WAN environments to comply with corporate security policies. Indicative’s DCE mechanism allows monitoring of any combination of network services and devices. Such combinations can include Windows, UNIX, and Linux servers, Java application platforms such as BEA WebLogic™ and IBM WebSphere™, Citrix client/server access systems and distributed database environments like Oracle. Agentless monitoring of these environments plus a host of other network and data center elements is supported—all from the same management console.

The argument over whether agent versus agentless is the best monitoring approach is really no longer relevant. The fact is that neither approach for monitoring enterprises can adequately stand alone. There really isn’t any such thing as true “agentless” monitoring anyway, but the simplicity afforded by fewer, server-based data collection elements, used by the likes of Indicative, eliminates many of the cost, complexity and maintenance issues associated with predominantly agent-based solutions. Indicative’s autonomous data collection approach embraces the best of both agentless and agent-based monitoring to provide a cost-effective, easy to implement and maintain way to gain the visibility needed to effectively monitor for problems that matter, before they can have an impact on applications, services or the business.

For more information on Indicative’s approach to end-to-end service delivery optimization, contact us at or 970.530.0790

Page 5 of 5

Shared By:
Tags: WhitePaper