Docstoc

ETL Solutions

Document Sample
ETL Solutions Powered By Docstoc
					1. Where do we use connected and unconnected lookups? If return port is only one then go for unconnected. More than one return port is not possible with Unconnected. If more than one returns port then go for Connected. 2. What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA. The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks 3. What is the difference between ETL tool and OLAP tools ETL tool is meant for extraction data from the legacy systems and load into specified database with some process of cleansing data. Eg: Informatica, data stage ....etc OLAP is meant for Reporting purpose in OLAP data available in Multidimensional model. so that u can write simple query to extract data fro the data base. Eg: Businee objects, Cognos....etc ETL tools are used to extract the data from different sources & OLAP tools are used to analyze the data ETL tools are used to extract, transformation and loading the data into data warehouse / data mart OLAP tools are used to create cubes/reports for business analysis from data warehouse / data mart 4. What is ODS (operation data source)? ODS - Operational Data Store. ODS Comes between staging area & Data Warehouse. The data is ODS will be at the low level of granularity. Once data was populated in ODS aggregated data will be loaded into EDW through ODS. ODS is the Operational Data Source which is also called transactional data ODS is the source of a warehouse. Data from ODs is staged, transformed and then moved to data warehouse.

An updateable set of integrated operational data used for enterprise- wide tactical decisionmaking. Contains live data, not snapshots, and has minimal history retained can we lookup a table from source qualifier transformation. ie. unconnected lookup You cannot lookup from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.

5. What are the different Lookup methods used in Informatica? In the lookup transformation mainly 2 types 1) connected 2)unconnected lookup Connected lookup: 1) It receive the value directly from pipeline 2) It will use both dynamic and static 3) It return multiple value 4) It support user defined value Unconnected lookup: 1)it receives the value : lkp expression 2)it will be use only dynamic 3)it return only single value 4)it does not support user defined values 6. What is a mapping, session, worklet, workflow, mapplet? Session: A session is a set of instructions that tells the Informatica Server how and when to move data from sources to targets. Mapplet : Mapplet is the set of transformation which we can make for reusability.It is a whole logic. Workflow: it is the pipeline, which pass or flow the data from source to target.

7. What is the difference between Power Center & Power Mart? Power Mart is designed for:  Low range of warehouses only for local repositories mainly desktop environment.  we can connect to single and multiple Repositories, generally used in big Enterprises. Power mart is designed for:  High-end warehouses Global as well as local repositories  ERP support.  Power Mart: we can connect to only a single Repository.
8. What are the various tools? - Name a few

The various ETL tools are as follows. Informatica Data stage Business Objects Data Integrator OLAP tools are as follows. Cognos Business Objects 9. What are snapshots? What are materialized views? Materialized view: Answer 1.Materialized view is a view in which data is also stored in some temp table.i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.
Answer 2. Materialized view means it stores pre calculated data, it is a physical representation and it's occupies the space

Snapshot: Answer 1. A snapshot is a table that contains the results of a query of one or more tables or views, often located on a remote database. Answer 2.Snapshot is a specific interval of data, 10. What is partitioning? What are the types of partitioning?

Partitioning is a part of physical data warehouse design that is carried out to improve performance and simplify stored-data management. Partitioning is done to break up a large table into smaller, independently manageable components because it: 1. Reduces work involved with addition of new data. 2. Reduces work involved with purging of old data. Two types of partitioning are: 1. Horizontal partitioning. 2. Vertical partitioning (reduces efficiency in the context of a data warehouse). 11. What are the modules in Power Mart? 1. Power Mart Designer 2. Server 3. Server Manager 4. Repository 5. Repository Manager 12. What is a staging area? Do we need it? What is the purpose of a staging area? Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data, and perform data cleansing and merging, before loading the data into warehouse. In the absence of a staging area, the data load will have to go from the OLTP system to the OLAP system directly, which in fact will severely hamper the performance of the OLTP system. This is the primary reason for the existence of a staging area. In addition, it also offers a platform for carrying out data cleansing. According to the complexity of the business rule, we may require staging area, the basic need of staging area is to clean the OLTP source data and gather in a place. 13. How to determine what records to extract? Data modeler will provide the ETL developer, the tables that are to be extracted from various sources. When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record, which gets reset when record changes. Draw the inference if slowly changing dimension and based on the Type 1/2 or3 tables defined.

14. What are the various transformation available? Transformation plays an important role in Data warehouse. Transformation is used when data is moved from source to destination. Depending upon criteria transformations are done. Some of the transformations are

The Various Type Of Transformation In Informatica  Source Qualifier  Aggregate  Sequence Generator  Sorter  Router  Filter  Lookup  Update Strategy  Joiner  Normalizer  Expression  Rank  Stored Procedure 15. What is a three-tier data warehouse? Three-tier data warehouse contains three tiers such as bottom tier, middle tier and top tier. Bottom tier deals with retrieving related data or information from various information repositories by using SQL. Middle tier contains two types of servers. 1.ROLAP server 2.MOLAP server Top tier deals with presentation or visualization of the results. The 3 tiers are: 1. Data tier - bottom tier - consists of the database 2. Application tier - middle tier - consists of the analytical server 3. Presentation tier - tier that interacts with the end-user 16.How can we use mapping variables in Informatica? Where do we use them? After creating a variable, we can use it in any expression in a mapping or a mapplet. Also they can be used in source qualifier filter, user defined joins or extract overrides and in expression editor of reusable transformations. Their values can change automatically between sessions.

17.What are the various methods of getting incremental records or delta records from the source systems? Getting incremental records from source systems to target can be done by using incremental aggregation transformation One foolproof method is to maintain a field called 'Last Extraction Date' and then impose a condition in the code saying 'current_extraction_date > last_extraction_date'. Using mapping parameters and variable or type1 we can easily define from where parameter will start and how variable will change as deltas will get from OLTP systems 18.Can we use procedural logic inside Infromatica? If yes how, if now how can we use external procedural logic in Infromatica? We can use External Procedure Transformation to use external procedures. Both COM and Informatica Procedures are supported using External procedure Transformation Can we override a native sql query within Informatica? Where do we do it? How do we do it? we can override a sql query in the sql override property of a source qualifier 19.What is latest version of Power Center / Power Mart? The Latest Version is 7.2 20.How do we call shell scripts from informatica? You can use a Command task to call the shell scripts, in the following ways: 1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands. 2. Pre- and post-session shell command. You can call a Command task as the pre- or postsession shell command for a Session task. For more information about specifying pre-session and post-session shell commands There is a task named command task, using that you can write or call Shell script, DOS commands or BAT files 21.What are active transformation / Passive transformations? Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A passive transformation does not change the number of rows

that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation Active transformations Advanced External Procedure Aggregator Application Source Qualifier Filter Joiner Normalizer Rank Router Update Strategy Passive transformation Expression External Procedure Maplet- Input Lookup Sequence generator XML Source Qualifier Maplet - Output 22.When do we analyze the tables? How do we do it? When the data in the data warehouse changes frequently we need to analyze the tables. Analyze tables will compute/update the table statistics that will help to boost the performance of your SQL. 23.Compare ETL & Manual development? There are pros and cons of both tools based ETL and hand-coded ETL. Tool based ETL provides maintainability, ease of development and graphical view of the flow. It also reduces the learning curve on the team. Hand coded ETL is good when there is minimal transformational logic involved. It is also good when the sources and targets are in the same environment. However, depending on the skill level of the team, this can extend the overall development time.

Can anyone please explain why and where do we exactly use the lookup transformations?

You can use the Lookup transformation to perform many tasks, including: ♦ Get a related value. For example, your source includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. ♦ Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). ♦ Update slowly changing dimension tables. You can use a Lookup transformation to determine whether rows already exist in the target. Lookup Transformation can be used mainly for slowly changing dimensions and for getting related values Look Up Transformation is generally used when a fixed data is not present in the mappings we use but is required in the warehouse or look up is more importantly used to compare the values... Ex1) in the transactional data we have only name and custid .. but the complete name (with first and last is required by the biz user..) and there is a separate table (either in source or target data base) that has the first n last names in it. Ex2) u need to compare the prices of the existing goods with its previous prices (referred as type3 ) a look up table containing the OLAP data could be handy In real time scenario where update strategy transformation is used? if we DML operations in session properties then what is the use of update strategy transformation ? We can use Update strategy transformation in two ways . 1.Mapping level. 2.session level. Importance of Update strategy transformation in both cases as follows. In real time if we want to update the existing record with the same source data you can go for session level update logic. If you want to apply different set of rules for updating or inserting a record, even that record is existed in the warehouse table .you can go for mapping level Update strategy transformation. It means that if you are using Router transformation for performing different activities.

EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance with 10% less. If not, inserting the record with new Bonus in the Warehouse table. Lets suppose we have some 10,000 odd records in source system and when load them into target how do we ensure that all 10,000 records that are loaded to target doesn't contain any garbage values. 24. How do we test it? We can't check every record, as number of records is huge?

Select count (*) From both source table and Target table and compare the result.

25.

What is Entity relation? How is works with Data warehousing ETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in terms of Logical view. The entity is called as a table in terms of Physical view. The Entity relationship is nothing but maintaining a primary key, foreign key relation between the tables for keeping the data and satisfying the Normal form. There are 4 types of Entity Relationships. 1.One-One, 2.One-Many, 3.Many-One, 4.Many-Many. In the Data warehouse modeling Entity Relationship is nothing but, a Relationship between dimension and facts tables (ie: Primary, foreign key relations between these tables). The fact table getting data from dimensions tables because it containing primary keys of dimension tables as a foreign keys for getting summarized data for each record. 26. Where do we use connected and un connected lookups

If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected. 27. Explain the process of extracting data from source systems, storing in ODS and how data modeling is done.

There are various ways of Extracting Data from Source Systems. For example, you can use a DATA step; an Import Process .It depends with your input data styles. What kind of File/database it is residing in. Storing your data in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, again which depends on the file & data format, you want your output to be in. IDP is the portal for display of reports, stored process, information maps and a whole bunch of thing ideally required for a dashboard reporting. IMS, is the GUI to help u convert your technical data and map it to business data (change names, add filters, add new columns etc)

28.

What is the difference between ETL tool and OLAP tools

ETL tool is meant for extraction data from the legacy systems and load into specified database with some process of cleansing data. ex: Informatica, data stage ....etc OLAP is meant for Reporting purpose. in OLAP data available in Multidimensional model. so that u can write simple query to extract data fro the data base. ex: Businee objects,Cognos....etc ETL tools are used to extract the data from different sources & OLAP tools are used to analyze the data ...... ETL tools are used to extract, transformation and loading the data into data warehouse / data mart OLAP tools are used to create cubes/reports for business analysis from data warehouse / data mart 29. What are the various tools? - Name a few

1) ETL Tools
  

IBM Web sphere Information Integration(Ascential Data stage) Ab Initio Informatica

2) OLAP Tools

    

Business Objects Cognos Hyperion Microsoft Analysis Services Micro strategy

3) Reporting Tools
  

Business Objects (Crystal Reports) Cognos Actuate

30.

What is the difference between Power Center & Power Mart?

Power Mart is designed for: Low range of warehouses only for local repositories mainly desktop environment. Power mart is designed for: High-end warehouses Global as well as local repositories ERP support Power Center : we can connect to single and multiple Repositories, generally used in big Enterprises. Power Mart : we can connect to only a single Repository. Informatica Power Center is used to maintain the Global Repository, But not in the case of Informatica Power mart. For more you can analyse the architecture of Informatica Powermart: We can register only local repositories Partitioning is not available here doesnot support ERP PowerCentre:

We can make repositories to GLOBAL Partitioning is available Supports ERP 31. What is Entity relation? How is works with Data warehousing ETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in terms of Logical view. The entity is called as a table in terms of Physical view. The Entity relationship is nothing but maintaining a primary key, foreign key relation between the tables for keeping the data and satisfying the Normal form. There are 4 types of Entity Relationships. 1.One-One, 2.One-Many, 3.Many-One, 4.Many-Many. In the Data warehouse modeling Entity Relationship is nothing but, a Relationship between dimension and facts tables (ie: Primary, foreign key relations between these tables). The fact table getting data from dimensions tables because it containing primary keys of dimension tables as a foreign keys for getting summarized data for each record.

In real time scenario where update strategy transformation is used? if we dml operations in session properties then what is the use of update strategy transformation ? We can use Update strategy transformation in two ways . 1.Mapping level. 2.session level. Importance of Update strategy transformation in both cases as follows. In real time if we want to update the existing record with the same source data you can go for session level update logic.

If you want to apply different set of rules for updating or inserting a record, even that record is existed in the warehouse table .you can go for mapping level Update strategy transformation. It means that if you are using Router transformation for performing different activities. EX: If the employee 'X1234 ' is getting Bonus then updating the Allowance with 10% less. If not, inserting the record with new Bonus in the Warehouse table. 32. What is a staging area? Do we need it? What is the purpose of a staging area?

Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data , and perform data cleansing and merging , before loading the data into warehouse. In the absence of a staging area, the data load will have to go from the OLTP system to the OLAP system directly, which in fact will severely hamper the performance of the OLTP system. This is the primary reason for the existence of a staging area. In addition, it also offers a platform for carrying out data cleansing. Staging area is a temp schema used to 1. Do Flat mapping i.e. dumping all the OLTP data in to it without applying any business rules. Pushing data into staging will take less time because there is no business rules or transformation applied on it. 2. Used for data cleansing and validation using First Logic. 33. What are active transformation / Passive transformations?

An active transformation can change the number of rows as output after a transformation, while a passive transformation does not change the number of rows and passes through the same number of rows that was given to it as input. Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation Active transformations Advanced External Procedure Aggregator Application Source Qualifier Filter Joiner

Normalizer Rank Router Update Strategy Passive transformation Expression External Procedure Maplet- Input Lookup Sequence generator XML Source Qualifier Maplet - Output 34. How do we extract SAP data Using Informatica? What is ABAP? What are IDOCS? To extract SAP DATA. Go to source analyzer, click on source, now u will get option 'Import from SAP' Click on this now give your SAP access user, client, password and filter criteria as table name (so it will take lesser time). After connecting, import the sap source. Now one important thing after finishing the map save it and generate ABAP Code for the map. Then only workflow will be running fine. 35. Where do we use semi and non-additive facts?

Additive: A measure can participate arithmetic calculations using all or any dimensions. Ex: Sales profit Semi additive: A measure can participate arithmetic calculations using some dimensions. Ex: Sales amount Non-Additive measure can’t participate arithmetic calculations using dimensions. Ex: temperature 36. What is a mapping, session, worklet, workflow, and mapplet?

Session: A session is a set of instructions that tells the Informatica Server how and when to move data from sources to targets.

Mapplet: Mapplet is the set of transformation, which we can make for reusability. It is a whole logic. Workflow: it is the pipeline, which pass or flow the data from source to target. A mapping represents dataflow from sources to targets. A mapplet creates or configures a set of transformations. A workflow is a set of instructions that tell the Informatica server how to execute the tasks. A worklet is an object that represents a set of tasks. A session is a set of instructions to move data from sources to targets. Mapping - represents the flow and transformation of data from source to target. Mapplet - a group of transformations that can be called within a mapping. Session - a task associated with a mapping to define the connections and other configurations for that mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow that can be called within a workflow.

Session - a task associated with a mapping to define the connections and other configurations for that mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow that can be called within a workflow. Mapping - represents the flow and transformation of data from source to target.

Mapplet - a group of transformations that can be called within a mapping.

37.

What is a three-tier data warehouse?

Three-tier data warehouse contains three tier such as bottom tier, middle tier and top tier. Bottom tier deals with retrieving related datas or information from various information repositories by using SQL. Middle tier contains two types of servers. 1.ROLAP server

2.MOLAP server Top tier deals with presentation or visualization of the results The 3 tiers are: 1. Data tier - bottom tier - consists of the database 2. Application tier - middle tier - consists of the analytical server 3. Presentation tier - tier that interacts with the end-user 38. What are the various methods of getting incremental records or delta records from the source systems? systems to target can be done

Getting incremental records from source by using incremental aggregation transformation

One foolproof method is to maintain a field called 'Last Extraction Date' and then impose a condition in the code saying 'current_extraction_date > last_extraction_date'. 39. Compare ETL & Manual development?

There are pros and cons of both tool based ETL and hand-coded ETL. Tool based ETL provides maintainability, ease of development and graphical view of the flow. It also reduces the learning curve on the team. Hand coded ETL is good when there is minimal transformational logic involved. It is also good when the sources and targets are in the same environment. However, depending on the skill level of the team, this can extend the overall development time. 40. Can Informatica load heterogeneous targets from heterogeneous sources?

Yes! it loads from heterogeneous sources.. 41. What are snapshots? What are materialized views & where do we use them? What is a materialized view log?

Materialized view is a view in which data is also stored in some temp table.i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables. A snapshot is a table that contains the results of a query of one or more tables or views, often located on a remote database. Snapshot is a specific interval of data,

Materialized view means it stores precaluculated data, it is a physical representation and it's occupies the space 42. What is Full load & Incremental or Refresh load?

By Full Load or One-time load we mean that all the data in the Source table(s) should be processed. This contains historical data usually. Once the historical data is loaded we keep on doing incremental loads to process the data that came after one-time load. Full Load is the entire data dump load taking place the very first time. Gradually to synchronize the target data with source data, there are further 2 techniques:Refresh load - Where the existing data is truncated and reloaded completely. Incremental - Where delta or difference between target and source data is dumped at regular intervals. Timestamp for previous delta load has to be maintained.
Full Load: completely erasing the contents of one or more tables and reloading with fresh data. Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.

43.

What is the metadata extension?

Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions. Informatica Client applications can contain the following types of metadata extensions:




Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them. User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.

For this purpose only they came with new product called Informatica SuperGlue. Informatica Metadata Exchange (MX) provides a set of relational views that allow easy SQL access to the Informatica metadata repository. The Repository Manager generates these views when you create or upgrade a repository. MX views provide information to help you analyze the following types of metadata stored in the repository:

    

Database definition metadata Source metadata Target metadata Mapping and transformation metadata Session and workflow metadata

Try Super glue also good tool 44. How do we call shell scripts from informatica?

You can use a Command task to call the shell scripts, in the following ways: 1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands. 2. Pre- and post-session shell command. You can call a Command task as the pre- or postsession shell command for a Session task. For more information about specifying pre-session and post-session shell commands There is a task named command task, using that you can write or call Shell script, DOS commands or BAT files What is Full load & Incremental or Refresh load? By Full Load or One-time load we mean that all the data in the Source table(s) should be processed. This contains historical data usually. Once the historical data is loaded we keep on doing incremental loads to process the data that came after one-time load. Full Load is the entire data dump load taking place the very first time. Gradually to synchronize the target data with source data, there are further 2 techniques:Refresh load - Where the existing data is truncated and reloaded completely. Incremental - Where delta or difference between target and source data is dumped at regular intervals. Timestamp for previous delta load has to be maintained Full Load: completely erasing the contents of one or more tables and reloading with fresh data. Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule. When u load data initially in to data warehouse it will be Full Load. If u r loading data everyday in to data warehouse it will be incremental load Do we need an ETL tool? When do we go for the tools in the market? ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above-mentioned ETL process was done

manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers. These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debugging and loading into data warehouse when compared to the old method. 1. Normally ETL Tool stands for Extraction Transformation Loader 2. This helps you to extract the data from different ODS/Database, 3. If you have a requirement like this you need to get the ETL tools, else you no need any ETL 4. What is Informatica Metadata and where is it stored? Informatica Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process The Informatica Metadata is stored in Informatica repository


				
DOCUMENT INFO
Shared By:
Categories:
Tags: Solutions
Stats:
views:1209
posted:7/28/2009
language:English
pages:20