Content Aggregation &
Information availability continues to be an indispensable need for organizations with the
increasingly complex and challenging market environment. This need has compelled
organizations to adopt advanced technologies, which mine data to derive information. This
information is used for superior business decisions, leading to enhanced business performance.
The online space is already driven by the need to own and govern aggregation and distribution of
content. The inherent nature of e-commerce websites is to integrate information and services.
Hence, they require applications that help them showcase their services in user friendly formats.
In addition, these applications are expected to facilitate optimum utilization of the available
To deliver superior customer services organizations are making huge investments in technology
which can transform raw data collected from various sources, into business intelligence.
Organizations are continuously looking at solutions that help them derive more value from real
time information by enabling better decision making and superior customer service
This whitepaper is an attempt to throw light on the business imperative of content aggregation,
content transformation and publishing. The paper also gives a technological perspective to the
various aspects involved in content aggregation. Collabera has a proven track record in providing
world-class IT consulting services and solutions to leading global organization across the globe.
The write-up also gives an overview of Collabera’s innovative contribution in the content
Content Aggregation & Transformation – Bridging the Gap
Content aggregation & transformation as a dynamic process facilitates generation of new revenue
streams by delivering content over different platforms & hence extending customer services. It is
a service which helps organization aggregate content from various sources, transform it to a
common format and publish it to the consumer. The service acts like an information bridge
between the content sources & the end customers.
Where does content aggregation and syndication play an important role?
• Collect user-subscribed information across various services/products
and act as a powerful tool for Financial Services Institutions (FSI) for
providing consolidation services
• Address the needs of wireless service providers/ operators by
converting all existing content from HTML into XML format
• Used as a catalog aggregator across multiple suppliers and vendors of
• Real time data aggregation and delivery has found great utility in the
media domain where latest news updates or entertainment related
content could be syndicated as it happens
• Extending customer services over web, mobiles, handhelds and smart
Collabera- Aggregation & Transformation Services
Collabera has developed a niche for itself in the content aggregation & transformation space
especially in the online world. Collabera has built an extensive content aggregation &
transformation framework that is highly scalable and is provided as an Asset based Service –
ABS™ offering. ABS™ - helps assuage the two main issues in any application development
scenario – predictability of cost & quality.
As a component of content aggregation and transformation services CATS - the software
framework does all the work of aggregation from various content sources, transforming them and
storing them. There is a human service element involved that is used to fine tune the aggregation
mechanism, setting up the aggregation/publishing jobs, monitoring the jobs etc.
The following diagram is a snapshot of various components & benefits of CATS:
Collating multiple Software framework Integrated &
content file parameterized
Managing updates CATS ™ Organized
Managing updates Organized
& large volumes of
& large volumes of CoreDais + Associated & updated
data Data Agg services information
collection from performance
disparate sources Result Oriented Service efficiency
disparate sources efficiency
Syndicating data Enable publishing
in aacost effective
in cost effective Over different
CATS helps you enjoy the combined benefits of a mature software framework and experienced
approach of implementation- mitigating any risk associated with cost or quality. It is a service that
is useful when enterprises have various sources from which content has to be aggregated,
transformed to a common format and published to different consumers.
Moreover, CATS manages both incremental and full updates that are required usually repeatedly
on an hourly, daily, weekly and monthly basis.
The CATS platform has been built on Microsoft .NET technology and has been built using the
concept of adapters. Each adapter is built to aggregate a specific type of content, format or
communication protocol. Similarly, adapters for standard protocols manage delivery of the
content. New adapters can be built as per type of content using a standard API defined by the
Features of CATS
CATS is a combination of 3 components: CoreDais + Data Aggregation + Associated services
CoreDais is an information aggregation platform that extracts information from various data
sources like databases/file systems/Websites/portals and transforms the collected information
that may be in the form of HTML, into a structured XML format. This is delivered to a relational
database or network folder for easy use by enterprise applications. A detailed overview of the
CoreDais platform can be visualized as below:
This readily deployable framework built on J2EE provides an editor to define web scarping rules
and web management tools coupled with enterprise class availability, scalability and reliability.
The framework can help accelerate the process of collection to distribution of content with
minimum manual intervention – giving a definite cost advantage as well as process efficiency.
The other major component of CATS service is data aggregation. The DataAgg is a platform that
aggregates text, images, audio and video from relational database, network folders and other
locations and transforms the aggregated content to standard formats as defined by business
users. It also delivers content to relational databases, network folders and web services. The data
aggregator platform workflow is depicted as:
File formats – XML, PDF, HTML,DBF, XLS
Protocols - HTTP,HTTPS,FTP,DB Access
Data aggregation Data aggregation
Workflow Aggregation Workflow
Trace Mapper Subscriber
Define data source configuration Define transformational rules
This readily deployable framework is built on .NET and supports various aggregation protocols
and formats. It is a configurable transformation definition tool that can be used by business users.
Associated Services Approach
CATS is a component of our asset pool which enable us to deliver asset based services – ABS
assuring the clients desired outputs within a short delivery cycle. This proprietary asset allows
clients to accelerate to value faster by leveraging the underlying software framework that is highly
The framework has been developed based on best practices evolved over vast experience
delivering this service to various customers under different scenarios but having an inherent
nature of being information intensive. Our experience gives us the confidence to deliver flexible &
reliable services -giving our clients the predictability of cost & quality.
Collabera successfully implements CATS in a leading organization
A leading real estate company has been using CATS for their vast content aggregation needs
successfully. The company had launched its mobile services for real estate and wanted a solution
that can aggregate large amounts of data along with updation of the content real time.
The engine version of CoreDais is deployed in 6-7 Workflow Nodes (Server Class machines) of
the existing Data Aggregation System in the organization. The solution helps in indexing about 23
MB of data per hour. This means about 1200+ websites are indexed daily. The CoreDais has also
resulted in significant time savings for the organization. The solution takes only 0.5 to 1 second
per record with 40 elements & approx size of 5 KB, depending on hardware.
The DataAgg component of the solution has helped in reducing the time between listing to be
aggregated and to appear on the CATS client system to about 20 minutes. The number of listings
that pass through CATS DataAgg every day is about 9 million. It also helps in map creation which
is an important feature of the services offered by the company to its customers.
The solution has brought significant benefits to the organization. Apart from having a positive
impact on the overall profitability of the organization, CATS™ also resulted in reduction of costs
and improvement in operational performance.
• Availability of organized and updated information
• Enhanced performance efficiency
• Provides a cost effective solution for content aggregation and
• Supports multilingual and multi-format output display
• Enables classification and storage of unstructured data from disparate
• High level of customization through parameterization
• Hassle free deployment - can be "executed" immediately after
installation, like a standalone "product”
• Provides an end-to-end content lifecycle delivery dashboard