Six-Step Approach to Identify a Big Data
Problem and Choose the Right Solution
Author : Manju Devadas
VP Solutions and Technology, Bodhtree
Many have heard the dire predictionsabout the state of information technology with 10x data
growth projections over the coming years. While there is truth to the exploding growth rate of
data and the accompanying complexity of analysis, we have faced similar exponential growth in
dataovereach of the most recent decades; and every time,technology has risen to the challenge
and delivered needed capacity for business, governments and individual users. For parallels we
need only look to distributed computing in the 1990s and websites in the 2000s.
In the 2010s, big data is a phenomenon nearly everyone comes into contact with, whether they
realize it or not. If you carry a smartphone for work or dump thousands ofdigital photos on your
home computer, you're already swimming in the Big Data ocean. You just may not have the
tools yet to capture, store and process that massive data flow for better decision-making.
The purpose of this paper is to demystify Big Data and provide a methodology to assess
whether the problems you encounter in your enterprise are Big Data problems. Working with
large companies and start-ups in the Silicon Valley has allowed me to validate this methodology
in diverse business verticals and company sizes.If nothing else, this paper will help you act from
a position of knowledge surrounding Big Data, avoiding the hype and misinformation that
commonly accompanies the latest technologies.
First Our Ecosystem
Each time you press a key, rate a product or navigate a GPS map, you generate data in some
form. Of all this stored data,usually only a small portion is being analyzed to find new answers
to challenging questions.
If Samsung launches a new phone, the most unbiased and direct feedback today probably
comes from Facebook rather than traditionalcustomer surveys and support lines. Book a flight
online today at Kayak.comor Expedia.com and switch to your inbox to find emails with Priceline
hotel recommendations in a matter of seconds. How the hell did Priceline know that I wanted a
3.5 star hotel in Monterey for the holiday weekend?With or without my knowledge, I allowed
them to capture my personal preferences and travel plans, analyze current web offerings using
Big Data, then email merecommendations.
The Australian government is using Big Data to analyze seismology patterns and predict
earthquakes precious minutes earlier. Big data analysis has found a role in vehicle
maintenance, predicting part failure; space exploration such as the Rover landings on Mars; and
fraud prevention, often identifying unauthorized purchases before a customer even realizes
their credit card is stolen.
The total computing power provided by an optimized Big Data system is capable of analyzing all
the data on every desktop in your neighborhood in less than 1 second. The digital image of a
hundred-year-old document could be retrieved from a city government’s archive databases in
less time than it takes to pull a book from the shelf. The evolving technologies in Big Data
world are not only making this kind of analytical power possible, they are democratizing it
through approaches affordable even to the smallest businesses.
The analysis of large volumes of data at lightning speedsis great, but does it actually create real
value for people and businesses?Let’s start with your next promotion, which depends on the
successful launch of a new product line. Big Data becomes your cheat sheet for understanding
customers, allowing you to proactively analyze buying trends and marketing strategies over the
last ten years of similar launches. Or consider biometric monitoring that signals you to go to
the ER before you actually feel any symptoms. Or maybe your goal is enterprise efficiency in a
competitive industry, and you need to identify and eliminate bottlenecks in your supply chain.
Each of these challenges can be addressed with Big Data solutions.
Solving most business problems in large companies involves some form of data analysis. With
the data now being captured in all forms including human, environment, and machine-
generated, it is necessary to identify which problems are Big Data-relatedand which can be
solved using traditional data analysis techniques. The last thing management wants is to
purchase a new system only to realize existing tools were capable of achieving the same results.
(Remember the story of NASA spending millions to invent a pen to write in space when a pencil
would have been an adequate solution). Nothing is more wasteful in business than a great
solution in search of a nonexistent problem.
What exactly is Big Data?
Big Data is simply complex data sets in massivevolumes (petabytes) and multiple formats (table
contents, text, audio, video). With the speed and amount of data being generated today, the
corresponding technology demand is drivingnew ways to analyze the information faster,
cheaper, and with better results.
Three types of data may be present in your enterprise:
a. Large Volumes, e.g. Data stored in Database tables, Excel spreadsheets, Access
b. Unstructured data, e.g. Video, Audio, Facebook, Twitter, Blogs, Customer Reviews,
c. ‘Gray’ data, e.g. web traffic where the exact usage is yet to be determined based on
business needs that may arise
Enterprises with ever increasing data volumes must take measures to better analyze these data
sets to accelerate progress toward company goals and objectives. Even if business seems fine
now, you may be ignoring this data at your peril since your competition could be using it to run
more efficiently, respond faster, and make better business decisions. As is so frequently the
case with technology, if you’re holding still, you’re falling behind.
Enterprises also need to have people who can think about data in new ways – not just
information stored in tables, rows and columns, but also data as blogs, videos, Facebook posts,
GPS coordinates, and traffic sensors. As of today, these ‘Data Scientists’ are difficult to train
internally; and the natural reaction is to look for outside hires. But hiring from the outside can
present its own set of challenges as these transplants may not bring the same understanding of
your business challenges and differentiators. I recommend that you begin with the employees
you already have and apply the methodology outlined below toward creating an effective Big
Data strategy and roadmap.
How do I know if my problem is a Big Data Problem?
Without delving into details about the nature of the business challenge and existing sources of
data, it is difficult for anyone to determine for sure if the problem is a Big Data problem. A
Fortune 100 High Tech company in San Jose, California, paid us to fix what they labeled a Big
Data problem. Following the initial analysis, we concluded the problem was best solved with
traditional data analysis techniques rather than a Big Data implementation. We educated the
customer about the unique characteristics of a Big Data problem, and saved the team
substantial money since their existing tools were adequate to solve the issues. Hence, even
though no general model can substitute for a thorough hands-on analysis, the simple
methodology we outline below has been highly effective at quickly determining whether a
challenge isBig Data-related.
Quite a few companies see Big Data as a concern only forweb product companies like Facebook
and Google with petabytes of data to organize and process. However, a 2011 McKinsey Global
Institute study argues otherwise. The McKinsey report found that investment firms averaging
less than 1,000 employees have 3.8 petabytes of data stored, a data growth rate of 40 percent
per year and a mix of structured, semi-structured and unstructured data types. Overall,
McKinsey found in 15 of 17 USindustry sectors have more data stored per company than the
U.S. Library of Congress (which currently has 235 terabytes of data) and companies from all
sectors have at least 100 terabytes stored, as shown in Figure 1:
Big Data Solution classification:
There are five data conditions called the “Vs” that assist in defining a Big Data problem:
1. Volume, e.g. multiple petabytes of data
2. Velocity e.g. results need to be analyzed in seconds or less
3. Variety, e.g. Structured and unstructured data like social media posts and video files
4. Variability, e.g. Constantly changing like a stock market
Value, e.g. You’ve identified the clear business value you plan to derive from the data
What does all this mean?
The relentless growth of data, new data formats todeal with, and the competitive advantages
achieved from managing large volumes of data all emphasize why Big Data should matter to
you. If you are an IT professional, you already recognize how difficult it can be to find a solution
capable of handling a task as monumental as big data management. Whether you are looking
for growth, profitability or productivity in your organization, you are invariably dealing with
data; and when that data shows the 5 V characteristics, you now need to start thinking of it as a
Big data problem and approach it differently than traditional solutions.
How do you get started?
Many of the enterprises fail to implement a Big Data solution because they have not identified
clear business cases for the tools. The common trigger to initiate Big Data development is a
data blast that existing systems can no longer manage. As these datasets continue to grow in
size, the enterprises face the problem of managing, storing and processing the data at the
speed required for timely business response.
Below is the Bodhtree’s six-step process to take enterprises from Big Data Problem
Definition to Solution Implementation, a methodology which has been applied with excellent
results at a large Bay Area networking company and several other Bodhtree customer locations:
Bodhtree Six Step process for a Big Data problem definition to solution delivery:
Step 1: Understand the Use Case
Depending on where you reside in the organization,the chances are high that you will
first feel a sense of data overload before you can articulate a clear business case to
leverage that data. Often this prompts enterprises to reactively implement a Big Data
solution without deciding in advance what problems it will be used to resolve.
It is critical that you deep dive and understand the business case first before even
thinking along the lines of Big Data. Otherwise, there will be a lack of focusthat feels a
little like staring through a microscope at unintelligible detail without ever stepping back
to see what is specimen sitting on the glass. In terms of IT, one Bodhtree client
managing a large warehouse of customer, product and geography information with 100s
of terabytes of data said he had a Big Data use case, but everything he spoke about
involved only structured data, failing the 5 Vs test. Even before worrying about Big
Data, do a litmus test by asking the following questions:
Business Case – Do I understand the value of solving the problem in hand? Can I
quantify the potential value of a big data solution or at least articulate the
• Dependencies – Have I collectedall the relevant information about the customer,
install base etc.?
• Complexity –Have I inventoried the data sources and characteristics to
determine the complexity?
• Lead Time – Have I created a reasonable plan with adequate time to acquire
relevant hardware and data?
• Initiative Alignment – Is the project aligned with corporate objectives and are
project sponsors committed to the end-to-end process?
Step 2: Understand the Current Landscape
• Carefully analyzing the use cases defined in step 1 enables you to identifyall data entry
and storage points. Often critical data entry points are discovered during this review
process which were not realized initially.
• Map the end-to-end process and data flows for the business capabilities, e.g.How does
the data flow to you from the customer and among internal teams?
• Build a Reference Architecture to highlight the current systems and tools and its
readiness for Big Data. Validate you have access to the data you plan to analyze.
Step 3: Build a Blueprint
• Define your overarching architecturalchallenges in doing theBig Data analysis defined in
the use cases, e.g. What architecture will I need to store the customer install base
information along with product information?
• Identify the right high level Big Data solutions leveraging technology agnostic vendors
• Document a clear delta between As-Is & To-Be with the introduction of the Big Data
solutions while addressing the pain pointsat eachtransition phases
• Document the Risks & Dependencies that could impact business results, cost or
schedule. Remember rolling out sophisticated tools does not guarantee success. Watch
out for hidden landmines, e.g. Data Quality.
Step 4: Identify the Big Data Technologies
• Deep dive into the Big Data technology dependencies and the impacts they have on the
system/tools and organizations. For example, you might consider howHadoop adoption
overlaps with your Business Objects installation to analyze the customer and product
• Determine which users will be consuming the information and analysis. What formats
do these reports need to be in? Do they require mobile interfaces? Current BI reports
and subscribers often provide relevant insight to these questions.
Step 5: Build a Big Data Roadmap
• Avoid the traps of either over investing or under investing – have the business cases
drive the solution.
• Plan the roadmap for your Big Datarollout based on such factors as –
• Business priority and management support. Remember, your execs may need to
be educated in order to understand the relative business value offered by each
• Timeframe of expected results and ROI.
• Big Data technology complexities, i.e. Apply the right order to ensure a clean
data foundation before conducting analytics.
Step 6: Big Data Solution Rollout
• Formalize the right team, experienced in conducting multiple implementations.
• Divide scope items across multiple phases/releases to track progress and provide
important quality checkpoints.
• Document Business Requirement, Functional Analysis and the Solution Architecture
• Begin user training before the implementation is complete so analysts can immediately
realize business value, building momentum for expanded uses.
Big Data, by its very nature, contains endless possibilities for business insight and improved
operations. But much like venturing into space without a defined mission, the Big Data world
demands that businesses clearly define what they intend to achieve in advance. Otherwise
enterprises can spend substantially on fancy tools that may never happen upon real business
Once those business goals are defined, and you have captured a clear picture of the current
state of your data, apply the 5 Vs screening questions to determine if the problem truly
warrants a Big Data solution. An objective vendor that specializes in a broad cross section of BI
and Big Data solutions can assist in this process and advise solutions that maximize your ROI.
Upon identifying a Big Data problem, carefully proceed through the six steps of the Problem to
Solution Methodology. Realize the real value of Big Data solutions do not come simply with
implementation but through applying creative and insightful approaches to harvesting business
value from the data. Ensure all dependencies are considered so that your data foundation is
clean, comprehensive and current. Finally, proceed with the implementation, highlighting
“quick wins” to convey business value to execs and analysts building momentum for the full
If the above methodologies are applied right, you will end up with saving time, energy and
achieving better results with by applying the right Big Data prescription for a REAL Big Data
Contributors : Ryan Madsen, Sushanth Reddy
• McKinsey Global Institute study
• Bodhtree Customer Case Studies