Snapshot review of data architecture tools The following short report is an incomplete survey of the marketplace for data loading, data analyses and data cleaning tools. This brief review is being published early so others may use it to inform their own research. This is in line with the IDMAPS Team’s strategy of publishing information early; by publishing as soon as we think information might have value to the community we hope to enhance the benefits to the community. The IDMAPS project will be selecting a toolkit in this space and will describe our reasons once a final decision has been made. The Data Architecture field seems to fall between three marketing labels, with vendors deliberately blurring the boundaries in order to make their product fit with whichever term has value with the customer. Tools in Data Warehousing, Extract Transform Load (ETL), and Business Intelligence (BI) all appear to have some relevance to the data architectures. While Business Intelligence revolves around data reporting, it tends to be cross marketed with data loading and transformation tools. Overall, “extract transform load” is the term most often used in this space and the marketplace for these tools seems relatively mature. Below is a brief summary table of our research so far. Products are listed in current order of preference, though this order can and most likely will change as we look at the tools more in depth. We have deliberately excluded tools that require their own bespoke hardware as they are likely to be expensive and an overkill for this situation. ETL toolkits – brief survey of mark place Toolkit url Comments Pentaho Data http://www.pentaho.com/ Has good demos and a variety of screen casts explaining how it works. Pentaho is Integration reviewed well in a comparison of ETL tools at (Kettle) http://www.pentaho.com/docs/informatica_pentaho_etl_tools_comparison.pdf Quick 2 hour test drive was somewhat disappointing: couldn’t find a way of easily getting it to sanity check the flat file it was importing (it had blank lines at the start). Talend http://uk.talend.com/index.php Talend has a data synchronizer tool, claims to have direct access to SAP. It appears to have a good match to our current data model (flat files) and the demo looks good. Gets well reviewed by InfoWorld. Clover ETL http://www.cloveretl.com/ Looks viable: the ETL tool is free, but a server to run it "with higher performance" is paid for (price on asking) and a GUI designer tool is also paid for (450 USD per head). Apatar http://www.apatar.com/ Looks a viable tool. Demos on website not massively related to our area, and seems to focus on integration with packages like Salesforce etc. Datacleaner http://datacleaner.eobjects.org/ Mainly focuses on cleaning data and loading it in to database; has good screen casts on their website. Redhat http://www.redhat.com/metama SOA for building data webservices, may not be that good at supporting our legacy of (Metamatrix) trix/ file based bulk transfer. SAGA.M31 http://galaxy.sagadc.com/ Terms and conditions only in German and yet to be translated, hence ruled out. Looked Galaxy webservice heavy too. Chainbuilder http://www.chainforge.net/com ChainBuilder ESB is a Java Business Integration (JBI) compliant product which munity consists of a set of Eclipse GUI plug-ins, runtime server components and a Web-based Admin console. It primary uses are in Service Oriented Architecture (SOA) environments and Enterprise Application Integration (EAI). Jitterbit http://www.jitterbit.com/Product Jitterbit is a data & application integration suite available in open source. It was /index.php developed to provide business users a quick, cost-effective and simple way to configure, test, deploy and manage integration solutions. However it has a limited number of connecters and none for messaging. Many big commercial vendors also have offerings in this space: SAP Netweaver Process Integration https://www.sdn.sap.com/irj/sdn/nw-pi71 toolkit which supports a broad set of ETL and SOA techniques, however the licensing model employed may rule it out for consideration as a toolkit for data loading within our institute. Tibco http://www.tibco.com/ also have a well regarded toolkit in this space but again licensing may be an issue. IBM Websphere http://www-01.ibm.com/software/websphere/ particularly websphere transformation extender. Oracle has its data warehousing product http://www.oracle.com/technology/products/warehouse/index.html There are many more vendors with their own offering in this space.
Pages to are hidden for
"ETL-marketplacereview"Please download to view full document