Wednesday, December 20, 2006

Architecture of a Data warehouse

Architecture of a Data warehouse:

The architecture of a data warehouse is shown above.

Since normally a data warehouse is used for analyzing trends and preparing reports out of it, the common architecture (as shown above) will have a layer for reporting purpose.

1) Source Systems: Source Systems are systems that provide data to the data warehouse. Source systems , typically , can be OLTP systems or legacy systems that are used for operational purposes. There are 4 broad categories of source data:

a) Production data : Production data comes from operational systems. Any organization can have multiple operational systems, which may or may not be secluded from each other. The data format in each system may vary one from the other.

b) Internal Data: This data is internal to the organization or maybe a department in the organization.

c) Archived data: Most of the organization stores the data in archives which can be in large storage media. This data may be needed for reporting. This data can be fed into the warehouse ( temporarily ) , so that reporting for any trends can take place.

d) External data: Some data come from external sources. Eg: Weather reports , Base interest rates as announced by the Central Bank ( like Reserve Bank of India )or also can be the news reports etc.

(Next session will be centered around the next layers i.e. staging area and datawarehouse ).

Saturday, December 16, 2006

Data Warehouse

Definition : A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a format that they can understand and use in a business context.
Why?
The users want :
Data should be integrated across the enterprise
Data reporting should be uniform irrespective of how it is stored
Data should be available when we want it
Summary data had a real value to the organization
Historical data holds the key to understanding data over time


Who are users? Users are primarily managers and business people who want to analyse the data to bring the changes to their process , hence increasing profits.

Goals of a Data warehouse:

1) It must make an organization’s information more accessible
2) It must make the organization’s information consistent
3) It must adapt itself to change
4) It must defend organization’s data
5) It must provide data for improved decision making

1) It must make an organization’s information more accessible:

This is one of the main goals of a datawarehouse. Any organisation can have multiple databases and systems for controlling and operating data. A Datawarehouse gets inputs from all these systems and creates a single source for data requirements.

2) It must make the organization’s information consistent

The datawarehouse should be able to provide a singe source of data ( of course a single version of truth )
3) It must adapt itself to change

A datawarehouse a\can increase in size. It should be able to scale itself.
4) It must defend organization’s data

The datawarehouse must be able to protect itself from unauithorized access. Also updates to datawarehouse should be subjected to access control.

5) It must provide data for improved decision making

Final result of the datawarehouse must be its capability of providing data for reports.
What I intend to use this blog is to spread the knowledge that I have gained in Datawarehousing during my carreer. Comments or criticisms are welcome through this blog. Also post your doubts , opinions and views so that in the end others may profit from this blog