ETL-marketplacereview by ashrafp


									                                  Snapshot review of data architecture tools
The following short report is an incomplete survey of the marketplace for data loading, data analyses and data cleaning tools. This
brief review is being published early so others may use it to inform their own research. This is in line with the IDMAPS Team’s
strategy of publishing information early; by publishing as soon as we think information might have value to the community we hope
to enhance the benefits to the community. The IDMAPS project will be selecting a toolkit in this space and will describe our reasons
once a final decision has been made.

The Data Architecture field seems to fall between three marketing labels, with vendors deliberately blurring the boundaries in order
to make their product fit with whichever term has value with the customer. Tools in Data Warehousing, Extract Transform Load
(ETL), and Business Intelligence (BI) all appear to have some relevance to the data architectures. While Business Intelligence
revolves around data reporting, it tends to be cross marketed with data loading and transformation tools. Overall, “extract transform
load” is the term most often used in this space and the marketplace for these tools seems relatively mature. Below is a brief
summary table of our research so far. Products are listed in current order of preference, though this order can and most likely will
change as we look at the tools more in depth. We have deliberately excluded tools that require their own bespoke hardware as they
are likely to be expensive and an overkill for this situation.

                                              ETL toolkits – brief survey of mark place
    Toolkit                    url                                                      Comments
Pentaho Data          Has good demos and a variety of screen casts explaining how it works. Pentaho is
Integration                                        reviewed well in a comparison of ETL tools at

                                                   Quick 2 hour test drive was somewhat disappointing: couldn’t find a way of easily
                                                   getting it to sanity check the flat file it was importing (it had blank lines at the start).
Talend     Talend has a data synchronizer tool, claims to have direct access to SAP. It appears to
                                                   have a good match to our current data model (flat files) and the demo looks good. Gets
                                                   well reviewed by InfoWorld.
Clover ETL        Looks viable: the ETL tool is free, but a server to run it "with higher performance" is
                                                   paid for (price on asking) and a GUI designer tool is also paid for (450 USD per head).
Apatar             Looks a viable tool. Demos on website not massively related to our area, and seems
                                                      to focus on integration with packages like Salesforce etc.
Datacleaner   Mainly focuses on cleaning data and loading it in to database; has good screen casts
                                                      on their website.
Redhat          SOA for building data webservices, may not be that good at supporting our legacy of
(Metamatrix)       trix/                              file based bulk transfer.
SAGA.M31           Terms and conditions only in German and yet to be translated, hence ruled out. Looked
Galaxy                                                webservice heavy too.
Chainbuilder      ChainBuilder ESB is a Java Business Integration (JBI) compliant product which
                   munity                             consists of a set of Eclipse GUI plug-ins, runtime server components and a Web-based
                                                      Admin console. It primary uses are in Service Oriented Architecture (SOA)
                                                      environments and Enterprise Application Integration (EAI).
Jitterbit   Jitterbit is a data & application integration suite available in open source. It was
                   /index.php                         developed to provide business users a quick, cost-effective and simple way to
                                                      configure, test, deploy and manage integration solutions. However it has a limited
                                                      number of connecters and none for messaging.

Many big commercial vendors also have offerings in this space:

       SAP Netweaver Process Integration toolkit which supports a broad set of ETL and
        SOA techniques, however the licensing model employed may rule it out for consideration as a toolkit for data loading within
        our institute.
       Tibco also have a well regarded toolkit in this space but again licensing may be an issue.
       IBM Websphere particularly websphere transformation extender.
       Oracle has its data warehousing product
       There are many more vendors with their own offering in this space.

To top