1/20/2010

Database Technologies for Decision Support System

Database technologies can be applied into two types of scenarios:
- Transaction Processing(OLTP)
- Analytic Processing, using statistical method(OLAP) or machine/computational learning method(Data Mining)

OLTP, which is based on E.F. Codd's relation model, is the traditional (maybe most popular) application type of DBMS and most people are very familiar with it. This post tries to summarize related technologies in analytic processing, which is widely adopted in decision support systems.

Part I - What Data to Analyze?

In Decision Support(or Business Intelligence) system, data to be analyzed usually comes from operational system, i.e., OLTP relational database. These relational databases are often located at different departments/sites, may be using different DBMS vendor's products, using different data schema and merely contain data within a relatively short time span.

To make a good business decisions, it's strongly desired to hold historical data, view them in a uniformed way and not bothering the daily operational environment. Thus comes out the Data Warehouse, which is a repository of an organization's electronically stored data and is designed to facilitate reporting and analysis.

Operational data (in OLTP system) is extracted, transformed(also cleaned) and loaded(by ETL subsystem) into the data warehouse for further analyzing. OLAP and DM systems read these data, analyze them, produce useful reports and present them to end(business) users. (See diagram below)


Decision Support System Life Cycle, based on MS SQL Tech Doc

Part II - How to Analyze?

There are two ways to analyze data in data warehouse:

1. Data Mining

Data Mining is the extraction of hidden predictive information from large databases. Someone also defines it as knowledge discovery process(using machine learning algorithms) in database.

Technical highlights:
- Typical Data Mining enabled information systems process data in terms of Record(or Case).
- Such system also provides some Language Extension to facilitate composing data mining related queries. (for example, MS SQL Server provides DMX - Data Mining Extensions)

The challenging part of data mining is various mining algorithms. Here is a list of data mining algorithms available in MS SQL Server Analysis Service.

See - A Basic Data Mining Tutorial using MS SQL Server

2. OnLine Analytical Processing(OLAP)

OLAP is the processing of large scale multidimensional data using statistical based methods. A typical OLAP system provides:
- Multidimensional Model
- Analytical Query Language (for example, MS SQL Server provides MDX - Multidimensional Expression)
- Analyzing Server(or Engine) that executes analytical queries

See - A Basic Tutorial for OLAP in MS SQL Server

2.1 Multidimensional Model

Multidimensional model view data as as cubes that generalize spreadsheets to any number of dimensions. It categorizes data either as numerical values(a.k.a. measures) associated with some facts or textual values(a.k.a. dimensions) that characterize the facts.

Facts represent the subject - the interesting event in a enterprise that need to be analyzed.
Dimensions represent context information for facts, perspectives to view facts.
Measures represent those numeric properties of facts that decision makers want to analyze.

For example, in a shoe shop, shoe purchasing events are the facts, the selling price is a measure attribute and the color, the size, the manufacture and the brand are all dimension attributes.

More tutorial explanation of this model can be found in [7][8].

2.2 OLAP Server Architecture

There are three ways to implement multidimensional model:

- ROLAP (Relational OLAP)
Fact data is stored in relation model based storage system and some special induces technologies may be adopted. In this architecture, measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

This architecture can be further divided into two types:
 Star Schema
 Snowflake Schema

Both of them contains fact table and dimension table, but in star schema, there is only one table for one dimension, while in snowflake schema, there are usually multiple tables for one particular dimension.

- MOLAP (Multidimensional OLAP)
Fact data is stored in an optimized multi-dimensional array storage, i.e., the server supports the multidimensional model directly.
It's usually regarded to be faster but less scalable than ROLAP.

- HOLAP (Hybrid OLAP)
It is a combination of ROLAP and MOLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store.

Part III - Other technologies

Other related technologies include data visualization, metadata management, analytical query parallelizting etc.

Analytical database technology is very promising and also complicated, more survey paper can be found at[5][7][8].

[Reference]

General

[01] Course on DM & OLAP
[02] OLAP & Data Mining Links
[03] OLAP & Data Warehouse Bibliography
[04] A Brief Tutorial on Data Mining and OLAP

[05] An Overview of Data Warehousing and OLAP technology
[06] Providing OLAP to User-Analysts: An IT Mandate

[07] An Overview of Data Warehouse, OLAP and Data Mining Technology
[08] Multidimensional Database Technology

Data Warehousing

[21] Data Warehouse Architecture
[22] Oracle Data Warehousing Guide

Data Mining

[31] Data Mining FAQ
[32] Data Mining Introduction
[33] Data Mining Tech Summary

[34] Oracle Data Mining Concept
[35] Microsoft SQL Server Analysis Service - Data Mining

OLAP

[41] OLAP Introduction
[42] OLAP Overview
[43] OLAP Council
[44] OLAP Wiki

[45] Oracle OLAP User Guide
[46] Microsoft SQL Server Analysis Service - OLAP

1 comment:

怎麼會這樣 said...

看看blog調整心情,又要來繼續工作,大家加油.........................