Don't Rock the Boat: Managing Data Flow

Document Sample
Don't Rock the Boat: Managing Data Flow Powered By Docstoc
					       POINT OF view
    POINT OF view

Don’t Rock the Boat:
Managing Data Flow
By: Anand Raman, Commerce Technology Practice Manager, and Arvind Naik,
Technical Architect, SapientNitro

In any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog,
or merchandise. And to drive successful e-commerce, a business must have complete, accurate,
combined data available in a timely manner.

Often, data flow is not a top priority. In fact, it’s perhaps a secondary thought at best. And although each
e-commerce project brings its own unique challenges, there are common basal data elements across
the board.

After understanding the challenges and lessons in this paper, technical architects, developers,
and project managers alike will be able to identify data flow design and data availability as a
fundamental aspect of any e-commerce project and prepare a cohesive plan to address their
unique business challenges.

At the on-set of an e-commerce project, businesses typically provide little to no specific requirements
around data flow. They might plan to have some catalog systems, search features, and product data
loaded through backend systems but that’s about it. Typically the focus is on the customer experience
and how to get the pertinent data to those customers and other business users. But without a
comprehensive data flow, there will be setbacks in the future.

Attention must also be paid to the timing of flow development. During the later part of the development,
questions often arise such as: When should I expect my price or promotion to show up? What should
I do if I want to remove a product right away? Unless you’ve thought about those questions early on, it
could be too little too late. For instance, if you’re in the testing cycle of a project, it’s likely impossible to
design a solution for data flow issues unless you have been thinking about them early on. Likewise, it’s
difficult to have timely, frequent site refreshes without a comprehensive data flow strategy.

It’s paramount to think about data flow as it pertains to every functional requirement—what kind of
data with what kind of system within what time frame—in order to maintain efficiency and control
throughout every process.

                       IDEA ENGINEERS                                                         © Sapient Corporation 2012
           POINT OF view


                             Fig. 1. A typical data flow diagram                                               	
Data mapping is a critical part of logical data flow design and this pictorial diagram represents a
typical e-commerce data flow solution. Of course, the process can be much more complicated, but this
representation offers a basic outline of what to expect when planning requirements around data flow.

Mapping data in an accessible way will facilitate the discussion on data flow. Laying out the best, typical,
and worst case Service-Level Agreement (SLA) is paramount in order to arrive at an agreeable set of
service levels. At times, new integration techniques and solutions may need to be identified if none of
the existing integration techniques are sufficient. Be prepared to even change the foundation of the
solution architecture if certain service levels are critical to the existence of the business.

Though each e-commerce system is unique and has special business needs, most share common
basic categories of data. And all e-commerce systems need to handle many types of data each with its
own source, lifecycle, rules, and criticality. Add in the multiple systems, business logic, workflows, and
processing businesses must go through, and you’ve got a tremendously complex maze on your hands.
The first step in defining a data flow strategy is to identify data types relevant to you. They include, but
are not limited to:

Product Data
• Product information (e.g., specifications)
• Product lifecycle (e.g., launch date)
• Product images (e.g., various renditions)
• Product rich content (e.g., multimedia)
• Product merchant relationships (e.g., cross-sell, up-sell)
• Product social data (e.g., ratings and reviews)
• Product pricing (e.g., MSRP, sale price)
• Pricing promotions and messages (e.g., discounts, clearance)

Category Data
• Category information (e.g., taxonomies: master, product, sales)
• Category images
• Category attributes

                       IDEA ENGINEERS                                                      © Sapient Corporation 2012
          POINT OF view

Marketing Data
• Marketing promotions (e.g., order or shipping offers)
• Merchandizing relationships (e.g., personalized recommendations)
• Shipping rates calculations

Inventory Data
• Availability
• Stock-in-hand
• Release/street date
• Backorder/pre-order

Search Index Data
• Searchable attributes
• Facets, keywords, SEO

Once data types are identified, understand the expectations of the data by engaging in conversations
with business stakeholders, analysts and other experts. Many times, the requirements are unclear,
even for key stakeholders. In such situations, starting with the necessities that are practical and
feasible is often the right approach.

It can also help to articulate relationships and dependencies using an entity-relationship diagram. A
typical diagram may have hundreds of tables and a number of dependencies, which have significant
impact on the SLAs.

                              Fig. 2. An entity-relationship diagram

Major corporations have multiple sources to gather data; e-commerce data does not always originate
from a single source. And, for each piece of data, you have to consider where the best source for that
data lies. It is important to recognize the benefits and limitations all sources upfront to make the best
possible decision.

                     IDEA ENGINEERS                                                     © Sapient Corporation 2012
          POINT OF view

Data can originate from a number of systems such as Product Information Management Systems
(PIM), Content Management Systems (CMS), Marketing Categorization Systems, Pricing and Sales
Management Systems, Marketing Promotion Management Systems, Social Network and Ratings
Systems, and Analytics Data Systems.

Each system comes with its own technology, integration options, throughput, data quality, and error
handling methods. Articulation of these system boundaries is critical as there may be a need to invest
time and money to reduce limitations of certain systems in the ecosystem.

Once you’ve chosen the kinds of data and sources you require, you can then choose your data
processing and integration systems. Below is a list that are commonly used, but it could get much
longer with a real-life project:

• Standard DataStage (e.g., ETL)
• IBM BODL (Business Object Data Loader)
• IBM WebSphere MQ Broker
• Custom Integration Layer
• WCS Stage Propagation Utility
• Secure FTP/MFT

Regardless of what system(s) you choose, you must then optimize them. Optimization, a process of
improving the performance without compromising quality and maintainability, is a critical activity.
Optimization challenges differ based on technology and integration techniques, but these strategies
can help:

1. Tune Structured Query Language (SQL) several times to ensure efficiency.
2. Cache frequently used attributes to avoid unnecessary trips to the database.
3. Use batches to commit and process as much as possible, and to avoid high overheads.
4. Use parallel threads of processing wherever possible.
5. Use persistent MQ queues to protect the messages.
6. Pass only the required data to be updated to avoid unnecessary back-and-forth data.
7. Use smart updates when it’s not feasible to minimize the message payloads.
8. Conduct performance tests to ensure that the end-to-end data flow is optimized.

There is no shortage of challenges when it comes to data flow. A well-designed data solution requires
that you recognize that:

Time matters. Every content type is different in terms of its lifecycle and frequency of change. A lot of
content is refreshed monthly or weekly, but some content types (e.g., promotions) have the propensity
to change much quicker, forcing related messages (e.g., promotional merchandizing content) to
change at the same rate. And building any e-commerce system doesn’t happen overnight; instead
of months, it typically takes years. Also, each system may be under a different development cycle or
timeline, which can add to the complexity.

Data flow management requires constant attention. Providing a consistent customer experience in
the face of ever-changing business and IT priorities is taxing. Businesses continue to adapt and change,
as do their products and priorities. Combine that with on-going maintenance, integration specifications,
bug fixes, releases, product upgrades … this is no simple endeavor.

                     IDEA ENGINEERS                                                     © Sapient Corporation 2012
          POINT OF view

Architecture and design choices have an impact. Caching, an integral part of any e-commerce
implementation, plays spoilsport to the overall strategy if not attended to during the early stages of
implementation and development. It is imperative that all architecture and design decisions take into
account the entire strategy.

These all affect business decision-making and the stability of integration. So how do we build systems
then to meet the ever-changing demands of business? And how do we build data flow around this fluid
environment? The point is that when we think about data flow, all of these challenges (among others)
must become considerations in order to guarantee data availability that’s quality-driven and timely
given that we’re standing on such shaky ground.

When we think about data flow aspects in an e-commerce system, we need to stress several goals for a
successful business solution.

First, identify critical data entities early on and identify the SLA requirements for them. It’s also crucial
to identify how soon a data entity can be made available across the systems because the changes
may need to be reflected in multiple areas, not just at the front end. In addition, be sure to identify
emergency scenarios. You must be prepared for any circumstance that may arise, since it could have a
detrimental impact in regards to legal issues, profitability, customer satisfaction, and overall business
success—just to name a few.

Second, understand your technology and system limitations in order to deliver all data in a timely
manner. What can seem sufficient in the beginning can later reveal gaping holes. It’s mandatory that
you and your team are thorough and understand each and every data system critical to the data flow
design, not just in the day-to-day but in extreme situations as well.

Third, set expectations for data availability. When you design a system, there are always limitations and
it’s important to set the expectations up front so the business can plan out solutions well in advance.
Along those lines, too, make sure to understand the impact on the business if an entity is not available
as expected.

And last, proactively determine solutions to improve the data flow and update SLA as needed. Doing
this upfront gives you the padding necessary to counteract any issues that may arise in the future, such
as strains on budgets and timelines.


                             Lifecycle Changes    Product Attributes

                                                                          Fig. 3. The SLA consistency map

                                                  Promotions, Image
                              Price. Inventory

                     IDEA ENGINEERS                                                       © Sapient Corporation 2012
               POINT OF view

     This is an example of an SLA consistency map we created for one of our clients, which used four
     quadrants to help them visualize and prioritize critical data entities. On the x-axis on this particular
     example we have Availability, the things you need as quickly as possible—in this case, entities like
     up-to-date images, inventory, and price data. On the y-axis we have Consistency, which reflects the
     importance of accuracy and precision with attributes like lifecycle changes.

     It is essential to keep in mind that these frequencies have significant impact on the resources and cost
     required to architect a data flow management solution.

     The complexity and importance of a high-functioning data flow system should be clear at this point.
     And with so many systems and options available, there are several questions you should be asking

     1. Do I really need this system? Make sure you’re picking the systems that will allow you to optimize
     the workflow and make data available as quickly and consistently as possible.

     2. Is it fit to handle throughput? If the answer is yes, decide what entities it is suitable for.

     3. Can I minimize the systems between the source and the destination? The more steps you take,
     the higher the risk of out-of-sync data, lost data, or increased time to availability.

     4. Can a system be upgraded or replaced with a higher-performing system, and can I
     improvise the systems for handling data? Again, making these decisions will best serve you if
     you make them upfront.

     Data flow management is critical to the success of an e-commerce site. It does not end once the
     data entities are identified and reasonable data flow architecture and integration techniques are
     implemented. Constant communication to understand expectations, communicate changes, and
     ensure alignment on an ongoing basis is absolutely essential. Data flow management should not be
     an afterthought but must be a priority that is addressed during the early phases—and every phase
     thereafter—of any e-commerce solution.

About the Authors
                            Anand has been involved in design, implementation, and support of high
                            volume transactional applications for the retail & travel industry. Over the
                            past few years he has been involved in the build and rollout of the
                            platform. Prior to Sapient, Anand worked with one of India’s largest media
                            houses and worked on putting their popular properties online.

Anand Raman

                            Arvind Naik, Technical Architect, has rich e-commerce sites implementation
                            experience at Borders, David’s Bridal, Agriliance, and Sprint in similar roles.
                            He enjoys large-scale technical problem solving and working with data flows.
                            He has been instrumental in end-to-end data flow management and creating
                            strategy roadmap projects for several projects across clients. He is inter-
                            ested in adding Cloud, PIM, and MDM to his technical portfolio.

Arvind Naik

                          IDEA ENGINEERS                                                        © Sapient Corporation 2012

Description: In any e-commerce solution, it is integral to manage every piece of data: whether it’s product, catalog, or merchandise. And to drive successful e-commerce, a business must have complete, accurate, combined data available in a timely manner. POV By Anand Raman, Commerce Technology Practice Manager, and Arvind Naik,Technical Architect, SapientNitro
About SapientNitro, part of Sapient®, is a new breed of agency redefining storytelling for an always-on world. We’re changing the way our clients engage today’s connected consumers by uniquely creating integrated, immersive stories across brand communications, digital engagement, and omni-channel commerce. We call it Storyscaping, where art and imagination meet the power and scale of systems thinking. For more information, visit