Wednesday, January 3, 2007

Overview of ETL

Data needs to be loaded into the data warehouse regularly so that it can serve its purpose of facilitating business analysis. To do this, data from one or more operational systems needs to be extracted and copied into the warehouse. The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading. The acronym ETL is perhaps too simplistic, because it omits the transportation phase and implies that each of the other phases of the process is distinct. Therefore ETL must not be restricted to just the three. What happens during the ETL process?
Extraction is the operation of extracting data from a source system for further use in a data warehouse environment.
Transportation is the operation of moving data from one system to another system.
In a data warehouse environment, the most common requirements for transportation are in moving data from:
A source system to a staging database or a data warehouse database
A staging database to a data warehouse
A data warehouse to a data mart
Data Transformations are often the most complex and, in terms of processing time, the most costly part of the ETL process. They can range from simple data conversions to extremely complex data scrubbing techniques
Loading is the operation of loading the transformed data into the data warehouse environment (target tables).

Issues with ETL:
Consumes 70-80% of project time:
This is the part that consumes most of the project time. Extreme careful steps must be taken to prevent overlapping of project scope, time and resources.
Heterogeneous source systems
There are various sources that can be the source systems like PeopleSoft for HR , legacy systems for registry of OLTP transactions etc.
Little or no control over source systems
Scattered source systems working is different time zones having different currencies
Different measurement unit
Data Quality


More details of each step in the next session.

No comments: