Ask Shaku  
Ask Shaku Overview
  1. What is a Data Warehouse? How does it differ from a Database Management System (DBMS)?

  2. How do I know if my organization needs a Data Warehouse?

  3. What's involved in designing, building, and implementing a Data Warehouse?

  4. Which of the steps take the longest?

  5. What is data mining ? Is it a part of a data warehouse effort?

  6. What is OLAP?

Q 1.What is a Data Warehouse? How does it differ from a Database Management System (DBMS)?
A 1.Data Warehouse is a database that provides users with data extracted from online transaction processing systems, batch systems, and externally syndicated data.

By contrast, a DBMS is software that controls the data in a database. It provides data security, data integrity, interactive queries, interactive data-entry and updating, and data independence.
Back to top

Q 2.How do I know if my organization needs a Data Warehouse?
A 2.Ideal candidates for a Data Warehouse display three common characteristics: They operate in a highly competitive industry. They have vast amounts of data. And they are struggling with the integration of widely dispersed data. If your organization fits this profile, it could well benefit from implementing a Data Warehouse.
Back to top

Q 3.What's involved in designing, building, and implementing a Data Warehouse?
A 3.Atre Group has identified 12 distinct steps in what we call our Data Warehouse Navigator. They are:

  1. Determine users' needs

  2. Determine DBMS server platform

  3. Determine hardware platform

  4. Information and data modeling

  5. Construct metadata repository

  6. Data acquisition and cleansing

  7. Data transform, transport and populate

  8. Determine middleware connectivity

  9. Prototyping, querying and reporting

  10. Data mining

  11. Online analytic processing (OLAP)

  12. Deployment and system management
Back to top

Q 4.Which of the steps take the longest?
A 4.It may vary from organization to organization, but in most of the situations, activities surrounding Extract, Transform and Load (ETL) are the most time consuming activities. There are multiple reasons for this:

  1. Data is scattered in various disparate sources, and it is stored repeatedly in different files. Most organizations don't necessarily have decent documentation that is the authoritative source that says what is what. Identifying the correct sources of data to be used for a data warehouse is a daunting task.

  2. In most organizations, there are various versions of a so called Meta Data repository. Meta Data is data about data. Sorting out the information in the Meta Data repositories is very time consuming.

  3. Quality of data leaves a lot to be desired. Dirty data is either inaccurate data or inconsistent data. Once again, it is challenging to determine what is clean and what is dirty. In order to make this distinction, one needs to work together with business representatives who are knowledgeable in the business rules. The best business representatives are always busy. As a result, it is very difficult to get their attention.

  4. It is time consuming to identify the data needed for analysis purposes to be used in a data warehouse.

  5. The sub steps are: Extraction, Scrubbing, Reconciling, Aggregating, and summarizing the data. Each one of these sub steps is also time consuming.

And as a result, the entire process of ETL is the longest lasting step.
Back to top

Q 5.What is data mining ? Is it a part of a data warehouse effort?
A 5.Data mining is finding patterns in the data that are not easily detectable by intuition or experience. Data mining could be a part of a data warehouse effort, or it could be a separate activity.

A major difference between a data warehouse and data mining is that most times, one uses summaries while using a data warehouse; whereas for data mining, detailed level data is needed. The patterns usually get lost when the data is summarized.

In a number of organizations, data mining is considered a part of the data warehouse effort.
Back to top

Q 6.What is OLAP?
A 6.OLAP stands for Online Analytical Processing and is a technique for processing large amounts of data for the purposes of business analysis. The fundamental goal of OLAP is to exponentially improve the time it takes to query or read business data. And, therefore, fundamentally differs from operational processing, commonly referred to as OLTP (On-Line Transaction Processing), which is built to achieve better write performance. OLAP Servers process data summaries to pre-determine results of "What If" analysis. Normally, OLAP servers extract data from the data warehouse and then summarize and organize the data into multidimensional structures, commonly known as Cubes. The multidimensional data structures (or cubes) make it simple and efficient for users to formulate complex queries, arrange data on a report, switch from summary to detail data and filter or slice data into meaningful subsets.
Back to top