7 Rules for New IT Architectures

Document Sample
7 Rules for New IT Architectures Powered By Docstoc
					                   	
  




              Monitoring The Unknowable:
           7 Rules for New IT Architectures
                          Why modern applications fail differently and
                                   how Big Data changes everything




                                                    FOR MORE INFORMATION
                                                   Tel. US +1-415-500-2180
                                                      service@boundary.com

                                                                April, 2012




www.boundary.com
                                                                                                                                                                            	
  

TABLE OF CONTENTS
INTRODUCTION ........................................................................................................................................................ 3
DISTRIBUTED SYSTEMS ARE NETWORKED SYSTEMS ....................................................................................... 3
WHAT’S DIFFERENT ABOUT NEW IT ARCHITECURES ......................................................................................... 3
FAILURE HAPPENS .................................................................................................................................................. 4
OBSERVE, ORIENT, DECIDE AND ACT .................................................................................................................... 4
SEVEN RULES FOR NEW IT ARCHITECTURES ...................................................................................................... 5
CONCLUSION ............................................................................................................................................................ 7




  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                                                                                   2

	
  
                                                                                                            	
  

INTRODUCTION
Distributed systems don’t fail like they used to—they fail in a distributed manner. Gone are the days
when your applications experienced a black-out—when your systems were either up or down.
Monitoring solutions were pretty good at telling you which component caused the outage, so restoring
service was a fairly straightforward process: find the failed part and fix it.

Now, applications are more likely to falter than quickly fail—like a brown-out. New applications
designed to solve problems like big data analytics are more dynamic, more distributed and far less
likely to experience problems as up or down events.


More often, problems develop as a series of small events that build over time in unpredictable—and
invisible ways. Developers and operations teams need to see the impact of changes in a timely,
meaningful way that traditional monitoring methods cannot see. Because the network plays a critical
role as the fabric that connects all of the components/tiers of the application together, it is the first
and best place to look to understand application behavior.


DISTRIBUTED SYSTEMS ARE NETWORKED SYSTEMS
What is a distributed system, and how are they different today than a few years ago? Google Code
University defines a distributed system this way:

A distributed system is an application that executes a collection of protocols to coordinate the actions
of multiple processes on a network, such that all components cooperate together to perform a single
or small set of related tasks.

                                                                                     Where can you find
By this definition, almost any application of significance today is a distributed
system. Whether an application is running in a datacenter—your own or
                                                                                     “truth” in an IT
Amazon’s—chances are it is a distributed system. To further illustrate, consider     environment that’s like
modern web application architecture. Most websites operating at scale are an         a house of mirrors
amalgamation of networked components. You’re likely to see ten or more               without static
networked services powering anything of significance, written in a variety of        boundaries or borders?
different languages and runtime platforms. Some of them may even be black
box services, hosted on AWS or another public cloud provider.


WHAT’S DIFFERENT ABOUT NEW IT ARCHITECURES
Today’s distributed systems possess characteristics that require a modern interpretation of Google’s
definition. While the definition still holds true, the core elements are radically amplified and
accelerated to new dimensions by new technologies. The result is distributed applications that are
highly complex and dynamic, supporting new business requirements and customers with high
expectations for quality of service. Here are the game-changers for monitoring modern distributed
systems:



  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                        3

	
  
                                                                                                           	
  


       §   Applications are highly distributed (networked), consisting of many interdependent services
            crossing multiple tiers. Application tiers and components can run on different cloud servers
            and virtual machines—and be redistributed in an instant.
       §   Infrastructures are increasingly stateless, dynamic, and must operate at greater scale. Risk
            and uncertainty are much higher in these environments, making it increasingly difficult to
            predict future behaviors or see developing problems.
       §   Applications are delivered iteratively and deployed more frequently. Agile methodologies
            support continuous delivery whereby code changes can be pushed into production as many as
            fifty times a day—or more
       §   The adoption of new technologies aimed at creating business value by analyzing huge volumes
            of data in real-time across distributed components and services. A new class of “big data
            applications” place new burdens on developers and operations teams to gather, analyze, and
            share data faster/more than ever before

These characteristics define an IT environment that is at best partially understood, and then only a
moment at a time. Any attempt to get the big picture by sampling data from every component—
applications, hardware, and network—is like trying to assemble a puzzle blindfolded while someone
continuously shuffles the pieces: you’re in the dark, never knowing if or when the puzzle is complete.


FAILURE HAPPENS
Designers and developers of distributed systems design and build applications with the expectation of
failure. When he was a senior research scientist at Sun Microsystems, Ken Arnold, a pioneer in
distributed systems said “When you design distributed systems, you have to say, ‘Failure happens all
the time." A decade ago Arnold understood the risk that’s inherent in complex distributed systems
that rely on successful communication across disparate components.

Modern distributed systems, like those we describe here, require the same design philosophy today.
While the cost of hardware has gone down and reliability has increased,
                                                                            “At any given point,
the complexity of applications has risen significantly thereby
                                                                            monitoring IT
increasing the opportunities that lead to failure. It is the collective of
networked components—hardware, software and network that we                 infrastructure should be
expect to deliver services that satisfy business requirements.              able to reflect upon its
Development and operations are not about any single part, category,         current state, past state,
location or point in time—they are about managing environments that         and forecasting”
are increasingly unknowable and ephemeral. Where, then can you find
“truth” in an environment that’s like a house of mirrors without static boundaries or borders?


OBSERVE, ORIENT, DECIDE AND ACT
Organizations have been monitoring computer systems as long as there have been computers. And
while the approaches to monitoring have evolved incrementally with technology advances, monitoring
as we know it today must make a quantum leap forward in order for companies to achieve success



  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                       4

	
  
                                                                                                              	
  


 with modern application architectures. Whether your definition of modern falls to technology
(Hadoop, Erlang, NoSQL…), public/private cloud, development methodology (agile, DevOps), or big
data—you must rethink your approach to monitoring.

United States Air Force Colonel Robert Boyd formulated the OODA loop. OODA stands for observe,
orient, decide and act on the premise that the faster a team could understand what’s happening,
orient themselves to the situation, decide how to respond to it, and act — the greater their readiness
and haste of response. Boyd’s insight suggests that teams iterating through the loop faster gain a
competitive advantage over opponents. I’d suggest that any well-designed monitoring tool can help
automate the OODA loop for operations teams. Here are the essential components of monitoring
infrastructure to enable fast-paced teams.


SEVEN RULES FOR NEW IT ARCHITECTURES
1. Deep integration

Most open source monitoring tools only tackle one aspect or a subset of the OODA loop. For instance:
Graphite and Cacti provide trending (orientation), Nagios provides alerting (decision and action) and
Statsd and Collectd gather metrics (observation). But integrating these projects is a daunting task
and often takes the form of a Perl scripts and PHP dashboards. While each of these tools are helpful,
they only paint part of the picture. An ideal tool might integrate all four steps of the OODA loop into one
harmonious system. Where necessary, one would also expect API endpoints to allow for custom
behavior and flexibility to further automate a team’s action.

2. Contextual alerting and pattern recognition

Most monitoring tools require the user to predefine all of the conditions on which to alert. For
instance, one would set static thresholds that say, “Notify me when disk usage goes above 90
percent,” or, “Notify me when CPU usage goes above 75 percent.” However, static thresholds are a
poor substitute for pattern recognition, the basis of cognitive decision-making. Setting static
thresholds for applications whose load varies throughout the day, week, or month is painful. At any
given point, monitoring infrastructure should be able to reflect upon its current state, past state, and
forecasting and ask, “Are current trends sufficiently deviant enough to warrant action?” And if so, it
should immediately notify the team with context. What if ops teams could look at a graph and say to
the system, “Alert us when something looks (or doesn’t look) like this?”

3. Timeliness

The term “real time” has been watered down, but it carries a specific meaning. Real-time computing
concepts in monitoring systems relate to an intrinsic property of events: they happen on a timeline.
Monitoring systems must be real time, because the timeliness of the data impacts its correctness and
utility. All aspects of a monitoring system must respond immediately to events. The OODA loop is only


  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                         5

	
  
                                                                                                            	
  
effective when it is faster than the environment or opponent that it is running against. If you’re
operating on assumptions that are a minute old, it’s hard to say much of anything about what’s
happening now.

4. High resolution

The resolution of monitoring systems is critical. With most options offering updates once every one to
five minutes, low-resolution monitoring obscures a world of patterns that are invisible until you’ve
zoomed in. The difference between a one-second graph updated in real time and a one-minute graph
updated every five minutes is the difference between a fluid HD film and a paper flip-book.

5. Dynamic configuration

The fluidity of modern architectures demands monitoring infrastructure that can keep up with the
changes that ops teams require. The rise of virtualized infrastructure combined with dynamic
configuration management systems means that there may be a great deal of host churn. This churn
challenges the concepts of host identity that traditional monitoring tools have built in as fundamental
abstractions.

6. Simple

The complexity of modern distributed systems demands an approach to monitoring that’s simple. The
bigger and more complex the system, the greater the need for monitoring tools that can be
operational in minutes, and scaled easily. Performance data should be presented at a high-level that
provides a holistic view of infrastructure tiers and/or logical groupings of components. Developers
and operations teams should be able to make decisions quickly from dashboards and graphs that
support faster time to problem resolution by making it easy to navigate to and from high resolution to
big perspective views.


7. Compatible

New monitoring tools should be compatible with other IT monitoring and application development
tools. Whether it’s the ability to provide data directly to other tools and applications by API calls, or
“monitoring the monitors,” they should work together forging “DevOps toolchain” with a full range of
capabilities.


	
  




  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                        6

	
  
                                                                                                              	
  




CONCLUSION
New IT architectures require monitoring tools that can keep pace with dynamic, unknowable, and
ephemeral IT environments. The traditional approach of collecting OS and application level metrics can
certainly highlight the failure of individual machines and services. However, distributed systems have a
way of failing in a distributed manner—more like a brown-out.

The best way to ascertain the health of a distributed system is to look at the fabric that holds it
together: the network. Any change that occurs, such as new application builds pushed into
production, will show up in the network first. The ability to see the effects of subtle changes not visible
in traditional monitoring tools give developers and operations teams critical insight to deliver high
quality service. Data-centric monitoring tools will redefine how companies design, build and ultimately
get value from distributed systems.

For more information please visit Boundary at www.boundary.com




  Copyright © 2012. Boundary, Inc. All rights reserved.	
                                         7

	
  

				
DOCUMENT INFO
Shared By:
Stats:
views:37
posted:4/23/2012
language:English
pages:7
Description: Why modern applications fail differently and how Big Data changes everything