7/11/2007

Hash Index in Microsoft SQL Server

In fact, there is no real Hash Index (index that implemented using hash mechanism ONLY)in MS Sql Server, all indices are b-tree based.

But when you use some relatively long string field in search condition or join condition, there is a way to improve the query performance. This technique uses both Hash and Index concepts, so it is named as Hash Index in SQL Server Book Online.

This technique involved two steps:
1. Add a numeric column to the existing table, which is is hash value of the string field

CREATE TABLE website(
ip nvarchar(16),
site_uri nvarchar(1024) NOT NULL,
uri_hash AS (checksum([site_uri])),
);

2. Build a non-clustered, non-unique index on this numeric field

CREATE INDEX idx_uri_hash ON
website(uri_hash) INCLUDE (site_uri);

Later when you do some equal comparison on this field, you can compose your query like:
SELECT ip
FROM website
WHERE uri_hash = CHECKSUM(N'http://www.live.com')
AND site_uri = N'http://www.live.com'

Sql Server Storage Engine will use the hash index first, and only do string comparison on the records returned from previous step.

No comments: