ETL Specs

W
Shared by: koyalsinha
-
Stats
views:
32
posted:
11/24/2011
language:
English
pages:
11
Document Sample
scope of work template
							 1   ETL tool should provide graphical interface for creating jobs.There should be no need for witing code.
 2   ETL tool should provide parallelism of its own and not depend on the parallelism of database
 3   ETL tool should have balanced optimization feature
 4   ETL tool should be able to process data in-stream as it transfers from source to target
 5   ETL tool should be able to directly process data efficiently from flat file without the need to load into the database
 6   ETL tool should be able to run on SMP or MPP hardware
 7   ETL tool should support pipepile parallelism
 8   ETL tool should support shared and local containers for re-usability
 9   ETL tool should support data sets which can preserve partitioning
10   ETL tool should not require co-location of data sets in order to do its work
11   ETL tool should be able to scale with separate hardware and scalability should not be dependent on database.
12   ETL should handle Partitioning and parallelism independent of the data model, database layout, and source data m
13   ETL tool should support impact analysis at various levels and should not be limited to Schema Metadata.
14   ETL tool should provide common metadata repository, administration and reporting.
15   ETL tool should have the option to create shared and local containers.
16   ETL tool should have parallel job range lookup
17   ETL tool should have a transformer for Slowly changing dimension out of the box
18   ETL tool should have developer collaboration. Should open a read-only copy of Job in case a job is locked by a use
19   ETL tool should have a monitoring tool which has options to show complete log of job execution and should have sc
20   ETL tool should have option to create sequence job which should have options for handling conditions, errors.
21   ETL tool should have more than 400 pre-built functions and routines. This complete set of data transformation capa
22   There should be inbuilt robust graphical palette that can help developers diagram the flow of data through their envi
23   ETL tool should have an option to deploy the job created on development environment ( less number of processors
24   ETL tool should have options to create intermediate datasets which preserve partitioned data
25   Explain the specification of ETL/ ELT/ ELTL functions using pre-packaged transformation objects,accessible via an
26   Availability of features like Splitting data streams/multiple targets, Conditional, splitting,Union, Pivoting, De-pivoting,
27   The data integration suit should be enabled to solve large-scale business problems through high performance proce
28   ETL tool should provide reporting for the jobs over the web browser. It should provide report templates.
29   ETL tool should have a resource estimator
30   ETL tool should have performance analysis tool which should option for static model and dynamic model.
31   ETL tool should provide feature to search/find jobs in the tool.
32   ETL tool should have strong feature to find dependencies of different tables, files,transformers
33   ETL tool should have option to create wrappers
34   ETL tool should have transformer for Dynamic RDBMS
35   ETL tool should have data pipelining and data partitioning
36   ETL tool should be able to run any application like a C++ code or a java code in parallel
37   ETL tool should option for RCP (run-time column propagation)
38   ETL tool should provide multi-processing and not multi-threading. This is to gain unlimited parallelism and free the d
39   ETL tool should allow the developers to develop the jobs in such a way that they develop sequentially but deploy in p
40   ETL tool should support large volumes and scalability (30 GB/day Load to 400 GB/day Load)
41   Change data capture option of Tool should provide capability to propagate changes in real time without creating an
42   Change data capture option of Tool should have minimal impact on source systems.
43   Change data capture option of Tool should read the changed data from database log files of the source(Oracle,Syb
44   Change data capture option of Tool should provide bi-directional data synchronization capabilities
45   Change data capture option of Tool should support transactional integrity.
46   Change data capture option of Tool should provide guaranteed delivery
47   Change data capture option of Tool should have logical restart point in the case of an interruption.
48   Change data capture should have option of data translation while reading changes from log files in real-time and se
49   Change data capture should have option of creating derived fields on target while replicating data on target
50   Change Data Capture should have Monitoring Dashboard and GUI tool to configure the whole CDC process
51   Change Data Capture should support mirror continuous and periodic mirroring
52   Change Data capture should have option to filter the rows to be replicated from source to target
53   Change Data Caoture should have minimal impact on source system
54   Change Data Capture should have option to create audit trails of selected tables for traceability
55   Change Data Capture should support User Exits and should have options for detecting conflicts
56   Change Data Capture should support mapping methods like adaptive apply, summarization, liveaudit and consolida
57   Log reader process of CDC should reside outside database memory space
58   Change Data Capture should have heterogenous database support
59   Change Data Capture should have capability to support different versions of database on source and target
60   Change Data Capture should have capability to support across different hardware platform on source and target
61   Change data Capture should have basic transformation capabilities
62   Change Data Capture should provide GUI to start/stop processes, monitor and configuration
63   Data Quality tool should have probabilistic matching engine
64   Data Quality tool should have graphical Matching specification Designer
65   Data Quality tool should have a graphical Match designer which displays the Match statistics of data
66   Data Quality tool should have an easy-to-use GUI with an intuitive, point-and-click interface for specifying automated
eed for witing code.



need to load into the database




 be dependent on database.
 abase layout, and source data model architecture.
d to Schema Metadata.




b in case a job is locked by a user and should inform which user has locked the Job
 job execution and should have scheduler to schedule jobs
  handling conditions, errors.
te set of data transformation capabilities should make it easy to map data from source to target and enrich it along the way.
the flow of data through their environment via simple GUI-driven drag-and-drop design components. Using this tool, the developers should b
ment ( less number of processors) to the production environment(more number of processors) without making any changes to the job.

rmation objects,accessible via an intuitive graphical user interface.
 ting,Union, Pivoting, De-pivoting, Key lookups in memory, Key lookups reusable across processes, Slowly changing dimensions, Error hand
s through high performance processing of massive data volumes. By leveraging the parallel processing capabilities of multi-processor hardw
 ide report templates.

del and dynamic model.




nlimited parallelism and free the developers from worrying about thread-safe code
evelop sequentially but deploy in parallel in order to simplify the job development and gain maximum performance at the same time.

es in real time without creating any staging area.

log files of the source(Oracle,Sybase,DB2 etc) with minimal impact on the performance of source system.
tion capabilities


 an interruption.
s from log files in real-time and sending changes to target
replicating data on target
 e the whole CDC process
 urce to target

or traceability
 cting conflicts
marization, liveaudit and consolidation


 ase on source and target
 platform on source and target




h statistics of data
interface for specifying automated data quality processes - data investigation, standardization, matching, and survivorship
 ich it along the way.
 ng this tool, the developers should be benefited from a versatile scripting language, powerful debugging capabilities, and an open application
making any changes to the job.


wly changing dimensions, Error handling within job.
 capabilities of multi-processor hardware platforms. the tool should scale to satisfy the demands of ever growing data volumes and ever shrin




 rformance at the same time.
g, and survivorship
capabilities, and an open application programming interface (API) for leveraging external code.




growing data volumes and ever shrinking batch windows. This can minimize the time-processing requirements and, by fully leveraging the pa
ements and, by fully leveraging the parallel processing capabilities, linearly can increase speed of data throughput for integrating massive am
hroughput for integrating massive amounts of data.

						
Related docs
Other docs by koyalsinha
Service Desk and Asset Management sp
Views: 23  |  Downloads: 0
BPM Specifications
Views: 8  |  Downloads: 0
Optim_Specs
Views: 7  |  Downloads: 0
Automation_EMS_Specifications
Views: 7  |  Downloads: 0
DB2 specification v2
Views: 13  |  Downloads: 0
driving_down_it_costs_with_zi
Views: 2  |  Downloads: 0
ETL Specs
Views: 32  |  Downloads: 0
Server and Application Monitoring
Views: 25  |  Downloads: 0
CDE-HigherEd-DesktopVirtualization-Whitepaper
Views: 5  |  Downloads: 0