5/29/2009

Implementing Consistency - Protocols for Data Replication and Cache Coherence

Various concrete consistency models have been described in [1], now it's time to discuss how to implement these models.

Consistency semantic is divided into two categories in [1] - "Coherence and Replication are very similar concepts and deal with the same problem, but the former is often used in hardware system (for example, in SMP Cache system) and the later is often used in software system (for example, in distributed files system and database system)".

Some other differences:
1. Coherence deals with data item at client side, while Replication deals with data item at server side
2. Coherence deals with relicated data item that has a corresponding one in a reliable storage backend, but Replication doesn't have that.

We will discuss protocols for Data Replication and Cache Coherence in different sections.

Part 1 - Data Replication

To keep replicated data item consistent in distributed storage system, there are some core problems to think about: where and how to store various replicas, what, when and how to propagate update to replicas? how to keep consistency?

I - Replica Server & Replica Content Placement
- As close to client as possible to improve performance
- Adjust repica placement dynamicly according to access history

II - General Update
Propagation Problems

1. What to Propagate?
- Notification (Invalid Message)
- Command (operation & parameters)
- Data (updated region)
- Change Log (to accumulate several updates into one)

2. When to Propagate?
- Pull (each replica asks for updates)
- Push (primary replica send updates to all secondaries)

3. How to Propagate?
- Unicasting (send updates to each replica one by one)
- Multicasting (send update message to all replicas at one time)

III - Replication Protocols

A replication protocol aims to implement some specific consistency model, the detail of the protocol is highly dependent on the target model. Here we only describe protocols which implement models that are popular in practical systems.

1. Sequential Consistency

1.1 Primary Based Protocol
- all write operations to a data item x is served by one special replica called primary replica, this replica is responsible for updating other replicas, client only interact with this special replica.

1. 2. Replicated Write Protocol - write operations are sent to each replica to execute:
- Active Replication, a total order of all write operations is required in order to make each replica execute the same order of write commands.
- Quorum Based, write operations only need to be executed on part of all replicas before return. It use votes to prevent write-write confilict and write-read conflict:
  • Suppose each data item has N replicas
  • To read a value, client must contact with at least Nr replicas
  • To write a data item, client must contact with at least Nw replicas
  • To ensure no WW and WR conflicts, Nr + Nw > N and Nw + Nw > N should be statisfied
2. Eventual Consistency Protocol

Two requirements should be meet for this kind of protocol:
- All update operations to a data item should reach and be executed all replicas at some time
- These operatioins should be executed in the same order

Some popular methods to ensure these requirements are:
- Use write set and read set (use Nw, Nr to control how many porcesses should be involved in write or read operation), thus update operations are ordered.
- Expose data item version number to client, so when client accesses data, it can pass known latest version number to server to implement some client centric consistency model
- Limit update operation execution process, so write-write conflicts can be solved easily

Note:
- All upper protocols ignore process/node & communication failure, which would occur often in practical distributed system.
- Replication protocols than deal with failure, such as Paxos, Two-Phase Commit, are much more complicated.

Part 2 - Cache Coherence

Cache Coherence Protocol is used to keep client side replicas consistent in the context that a reliable data item exists in storage backend (it's true for smp cache like hardware system and for distributed file system cache like software system).

I. Core problems of Cache Coherence Protocol:

1. Coherence Detection Strategy
That is to say, when inconsistencies are actually detected. For example, in distributed database system, this detection can be performed:
- At the beginning of a transaction
- Parallel with the on going transaction but before it commits
- When transaction commits

2. Coherence Enforcement Strategy
That is to say, how all caches are kept consistent with each other. Generally, there are two methods
- Invalidating: if a data item is modified in one cache, other caches that hold the same data item will receive a invalidation notification.
- Propagating: if a data item is modified in one cache, the update is propagated to other caches that hold the same data item. So all replicas in cache are update to the same version.

3. Cache-Server Consistency
That is to say, how to keep data items in cache and in storage server consistent with each other. There are serveral policies:
- Use Read Only Cache, modification can only be made on items in storage server, cache pull these updates
- Write Back, modifications to data items are accumulated at cache side, and are written to storage server at some other time.
- Write Through, modifications are made at cache, and are also propagated to storage server. In SMP system, bus or directory may be used to serialize all operations. While in distributed file systems, an exclusive lock may be needed for a data cache that can perform modification operations to avoid write-write confliction.

II Implementing Cache Coherence

1. Implementation Mechanism

1.1 Directory Based - In a directory-based system, some information about the data being shared is placed in a common directory. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed the directory either updates or invalidates the other caches with that entry.[2] It is not as fast as snooping but more scalable.

1.2 Snooping - In such system, the individual caches monitor address lines for accesses to memory locations that they have cached. When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates or update its own copy of the snooped memory location.[2] Since it needs broadcasting, not very scalable.

2. Protocols

The various cache coherence protocols use Directory or Snooping mechanism to solve the three core problems described above:

2.1 Invalidation Protocols
- Write-Once
- Berkeley
- Illinois/MESI

2.2 Propagating Protocols
- Firefly
- Dragon

Most of the cache protocols in multiprocessors are supporting sequential consistency model, while in software distributed shared memory more popular are models supporting release consistency or weak consistency.

[Reference]
[1] http://xcybercloud.blogspot.com/2009/05/data-consistency-model-survey.html
[2] http://en.wikipedia.org/wiki/Cache_coherence
[3] http://www.nedprod.com/NedHAL/Cache%20Coherency%20solutions.html
[4] http://en.wikipedia.org/wiki/Memory_coherence
[5] Shared Memory Architecture, 2001, Chinese Higher Education Press, Weiwu Hu
([5] 共享存储系统体系结构, 2001, 中国高等教育出版社, 胡伟武)
[6] Distributed Systems - Principles and Paradigms (2e), 2006, by Tanenbaum & Steen
[7] Distributed Systems - Concepts and Design (4e), 2005, by Coulouris & Dollimore & Kindberg

No comments: