ETL Specs
Document Sample


1 ETL tool should provide graphical interface for creating jobs.There should be no need for witing code. 2 ETL tool should provide parallelism of its own and not depend on the parallelism of database 3 ETL tool should have balanced optimization feature 4 ETL tool should be able to process data in-stream as it transfers from source to target 5 ETL tool should be able to directly process data efficiently from flat file without the need to load into the database 6 ETL tool should be able to run on SMP or MPP hardware 7 ETL tool should support pipepile parallelism 8 ETL tool should support shared and local containers for re-usability 9 ETL tool should support data sets which can preserve partitioning 10 ETL tool should not require co-location of data sets in order to do its work 11 ETL tool should be able to scale with separate hardware and scalability should not be dependent on database. 12 ETL should handle Partitioning and parallelism independent of the data model, database layout, and source data m 13 ETL tool should support impact analysis at various levels and should not be limited to Schema Metadata. 14 ETL tool should provide common metadata repository, administration and reporting. 15 ETL tool should have the option to create shared and local containers. 16 ETL tool should have parallel job range lookup 17 ETL tool should have a transformer for Slowly changing dimension out of the box 18 ETL tool should have developer collaboration. Should open a read-only copy of Job in case a job is locked by a use 19 ETL tool should have a monitoring tool which has options to show complete log of job execution and should have sc 20 ETL tool should have option to create sequence job which should have options for handling conditions, errors. 21 ETL tool should have more than 400 pre-built functions and routines. This complete set of data transformation capa 22 There should be inbuilt robust graphical palette that can help developers diagram the flow of data through their envi 23 ETL tool should have an option to deploy the job created on development environment ( less number of processors 24 ETL tool should have options to create intermediate datasets which preserve partitioned data 25 Explain the specification of ETL/ ELT/ ELTL functions using pre-packaged transformation objects,accessible via an 26 Availability of features like Splitting data streams/multiple targets, Conditional, splitting,Union, Pivoting, De-pivoting, 27 The data integration suit should be enabled to solve large-scale business problems through high performance proce 28 ETL tool should provide reporting for the jobs over the web browser. It should provide report templates. 29 ETL tool should have a resource estimator 30 ETL tool should have performance analysis tool which should option for static model and dynamic model. 31 ETL tool should provide feature to search/find jobs in the tool. 32 ETL tool should have strong feature to find dependencies of different tables, files,transformers 33 ETL tool should have option to create wrappers 34 ETL tool should have transformer for Dynamic RDBMS 35 ETL tool should have data pipelining and data partitioning 36 ETL tool should be able to run any application like a C++ code or a java code in parallel 37 ETL tool should option for RCP (run-time column propagation) 38 ETL tool should provide multi-processing and not multi-threading. This is to gain unlimited parallelism and free the d 39 ETL tool should allow the developers to develop the jobs in such a way that they develop sequentially but deploy in p 40 ETL tool should support large volumes and scalability (30 GB/day Load to 400 GB/day Load) 41 Change data capture option of Tool should provide capability to propagate changes in real time without creating an 42 Change data capture option of Tool should have minimal impact on source systems. 43 Change data capture option of Tool should read the changed data from database log files of the source(Oracle,Syb 44 Change data capture option of Tool should provide bi-directional data synchronization capabilities 45 Change data capture option of Tool should support transactional integrity. 46 Change data capture option of Tool should provide guaranteed delivery 47 Change data capture option of Tool should have logical restart point in the case of an interruption. 48 Change data capture should have option of data translation while reading changes from log files in real-time and se 49 Change data capture should have option of creating derived fields on target while replicating data on target 50 Change Data Capture should have Monitoring Dashboard and GUI tool to configure the whole CDC process 51 Change Data Capture should support mirror continuous and periodic mirroring 52 Change Data capture should have option to filter the rows to be replicated from source to target 53 Change Data Caoture should have minimal impact on source system 54 Change Data Capture should have option to create audit trails of selected tables for traceability 55 Change Data Capture should support User Exits and should have options for detecting conflicts 56 Change Data Capture should support mapping methods like adaptive apply, summarization, liveaudit and consolida 57 Log reader process of CDC should reside outside database memory space 58 Change Data Capture should have heterogenous database support 59 Change Data Capture should have capability to support different versions of database on source and target 60 Change Data Capture should have capability to support across different hardware platform on source and target 61 Change data Capture should have basic transformation capabilities 62 Change Data Capture should provide GUI to start/stop processes, monitor and configuration 63 Data Quality tool should have probabilistic matching engine 64 Data Quality tool should have graphical Matching specification Designer 65 Data Quality tool should have a graphical Match designer which displays the Match statistics of data 66 Data Quality tool should have an easy-to-use GUI with an intuitive, point-and-click interface for specifying automated eed for witing code. need to load into the database be dependent on database. abase layout, and source data model architecture. d to Schema Metadata. b in case a job is locked by a user and should inform which user has locked the Job job execution and should have scheduler to schedule jobs handling conditions, errors. te set of data transformation capabilities should make it easy to map data from source to target and enrich it along the way. the flow of data through their environment via simple GUI-driven drag-and-drop design components. Using this tool, the developers should b ment ( less number of processors) to the production environment(more number of processors) without making any changes to the job. rmation objects,accessible via an intuitive graphical user interface. ting,Union, Pivoting, De-pivoting, Key lookups in memory, Key lookups reusable across processes, Slowly changing dimensions, Error hand s through high performance processing of massive data volumes. By leveraging the parallel processing capabilities of multi-processor hardw ide report templates. del and dynamic model. nlimited parallelism and free the developers from worrying about thread-safe code evelop sequentially but deploy in parallel in order to simplify the job development and gain maximum performance at the same time. es in real time without creating any staging area. log files of the source(Oracle,Sybase,DB2 etc) with minimal impact on the performance of source system. tion capabilities an interruption. s from log files in real-time and sending changes to target replicating data on target e the whole CDC process urce to target or traceability cting conflicts marization, liveaudit and consolidation ase on source and target platform on source and target h statistics of data interface for specifying automated data quality processes - data investigation, standardization, matching, and survivorship ich it along the way. ng this tool, the developers should be benefited from a versatile scripting language, powerful debugging capabilities, and an open application making any changes to the job. wly changing dimensions, Error handling within job. capabilities of multi-processor hardware platforms. the tool should scale to satisfy the demands of ever growing data volumes and ever shrin rformance at the same time. g, and survivorship capabilities, and an open application programming interface (API) for leveraging external code. growing data volumes and ever shrinking batch windows. This can minimize the time-processing requirements and, by fully leveraging the pa ements and, by fully leveraging the parallel processing capabilities, linearly can increase speed of data throughput for integrating massive am hroughput for integrating massive amounts of data.
Related docs
Other docs by koyalsinha
IBM Case Manager Architecture Overview and Building a Solution (PowerPoint)
Views: 86 | Downloads: 1
Get documents about "