12/20/2010

Parallel Database for OLTP and OLAP

Just a survey article on materials on parallel database products and technologies for OLTP/OLAP applications. It mainly covers major commercial/academic efforts on developing parallel dbms to solve the ever growing large amount of relational data processing problem.
 
Part I - Parallel DBMSs

1.1 Parallel Database for OLAP (Shared-Nothing/MPP)

TeraData
- TeraData Home
- Teradata DBC/1012 Paper
- NCR Teradata VS Oracle Exadata

Vertica
- Vertica Home
- The original research project: C-Strore

Paraccel
- Paraccel Home
- MPP Based Architecture
- Columnar Based Storage
- Flash Based Storage

DataLlegro(now MS Madison)
- Design Choices in MPP Data Warehousing Lessons from DATAllegro V3
- Microsoft SQL Server Parallel Data Warehousing

Netezza
- Netezza Home
- Acquired by IBM
- Hadoop & Netezza: Synergy in Data Analytics (Part 1, Part 2
- Netezza Twinfin VS Oracle Exadata (eBook, Blog)

GreenPlum:
- GreenPlum Home
- Combined: PostGreSQL/ZFS/MapReduce
- Acquired by EMC

Oracle ExaData:
- ExaData Home
- OLTP & OLAP Hybrid Orientation
- 1 * RAC + N * Exadata Cells (Storage Node) + Infiniband Network
- Exadata Cell: Flash Cache + Disk Array + Data Filtering Logic (partial SQL execution)
- Exadata – the Sequel is a great Exadata study article

IBM DB2 Data Partitioning Feature (can work with both OLAP/OLTP)
- formerly known as DB2 Parallel Edition (An Shorter Overview)
- DB2 At a Glance - Data Partitioning Feature
- Simulating Massively Parallel Database Processing on Linux

AsterData:
- Supercharging Analytics with SQL-MapReduce
- Aster Data brings Applications inside an MPP Database 

Misc Articles:
- What's MPP?
- Comparison of Oracle to IBM DB2 UDB and NCR Teradata for Data Warehousing
- SMP or MPP for Data Warehouse
- Dividing the data Warehousing work among MPP Nodes
- SANs vs. DAS in MPP data Warehousing
- Three ways Oracle or Microsoft could go MPP

1.2 Parallel Database for OLTP (Shared-Disk/SMP)

Oracle Real Application Cluster
- Oracle RAC Concepts
- Oracle Parallel Database Server Concepts
- Oracle RAC Case Study on 16-Node Linux Cluster

IBM DB2 for z/OS (with Sysplex Technology)
- Share Disk and Share Nothing for IBM DB2
- What's DB2 Data Sharing?

IBM DB2 for LUW (with pureScale Technology)
- IBM DB2 pureScale: The Next Big Thing or a Solution Looking for a Problem?
- What is DB2 pureScale?
- DB2 pureScale Scalability (section 1, section 2)

Part II - Academic Readings

2.1 Overview
1). Parallel Database System: The Future of High Performance Database Processing
2). Survey of Architecture of Parallel Database System
3). The Case for Shared Nothing
4). Much Ado About Shared-Nothing 

2.2 Research System
1). XPS: A High Performance Parallel Database Server
2). The Design of XPRS
3). Prototyping Buuba, H High Parallel Database System
4). The Gamma Database Machine Project
5). NonStop SQL, A Distributed, High-Performance, High-Availability Implementation of SQL
6). Parallel Query Processing in Shared Disk Database System
7). Architecture of SDC, the Super Database Computer

2.3 Commercial System
1). A Study of A Parallel Database Machine and Its Performance - The NCR/TERADATA DBC/1012
2). A Practical Implementation of the Database Machine - Teradata DBC/1012
3). DB2 Parallel Edition
4). Parallel SQL Execution in Oracle 10g
6). Shared Cache - The Future of Parallel Database
7). Cache Fusion: Extending Shared-Disk Clusters with Shared Caches

12/15/2010

Lecture Notes - AltaVista Indexing and Search Engine

01/18/2000, Michael Burrows gave a technical presentation  at UW. In this video, he talked about the design of the AltaVista indexing system and the search engine site. The presentation is short and brief, but covers many core design and concepts which are used in today's commercial search engine systems.

The presentation video can be found at uwtv: http://uwtv.org/programs/displayevent.aspx?rid=2123

And I had recreated the PPT used in his video for further use. I tried my best to record the text and redraw the diagrams, but there may be many errors during this process. The copyright is of Mike.
I think the most interesting design is the Location Space and ISR abstraction. The first one enables store any information using inverted index mechanism and the second one solve the problem of interpreting complicated search query semantic.

But it's not easy to fully understand how the whole ISR system works to serve various query semantic.

And in the second part of his presentation, Mike mentioned many aspects of AltaVista search engine web site. Many of the experiences and designs are still good reference for today's Internet web application.


[Reference]
1. http://www.searchenginehistory.com/
2. http://en.wikipedia.org/wiki/Search_engine
3. http://en.wikipedia.org/wiki/AltaVista