Hardware Multithreading Primer

A multithreading processor is able to pursue two or more threads of control in parallel within the processor pipeline. The contexts of two or more threads are often stored in separate on-chip register sets.

Formally speaking, CMT(Chip Multi-Threading), is a processor technology that allows multiple hardware threads of execution (also known as strands) on the same chip, through multiple cores per chip, multiple threads per core, or a combination of both.

Let's see various techniques that enable hardware multithreading:

1. Multiple Cores per Chip

(Chip Multi-Processing, a.k.a. Multicore), is a processor technology that combines multiple processors (a.k.a. cores) on the same chip. (see Figure 2 (b))

The idea is very similar to SMP, but implemented within a single chip. [10] is the most famous paper about this technology.

2. Multiple Threads per Core

2.1 Vertical Multithreading - Instructions can be issued only from a single thread in any given CPU cycle.

- Interleaved Multithreading(a.k.a. Fine Grained Multithreading), the instruction(s) of other threads is fetched and fed into the execution pipeline(s) at each processor cycle. So context switches at every CPU cycle.(see Figure 1 (b))

- Blocked Multithreading(a.k.a. Coarse Grained Multithreading), the instruction(s) of other threads is executed successively until an event in current execution thread occurs that may cause latency. This delay event induces a context switch. (see Figure 1 (c))

2.2 Horizontal Multithreading - Instructions can be issued from multiple threads in any given cycle.

This is so called Simultaneous multithreading (SMT): Instructions are simultaneously issued from multiple threads to the execution units of a superscalar processor. Thus, the wide superscalar instruction issue is combined with the multiple-context approach. (see Figure 2 (a))

Figure 1 - Single Thread Multiple Issue (from [3])

Figure 2 - Multiple Thread Multiple Issue (from [3])

In summary[3]:

- Unused instruction slots, which arise from latencies during the pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed among those thread contexts that are loaded in the register sets.

- Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads in each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different threads simultaneously.


Superpipeline - extreme pipeline processor technology, where the instruction pipeline is divided into extreme amount (usually, 8+) of pipe-lined stages.

Superscalar - (a.k.a. multiple issue), is a processor technology, where multiple instructions can be issued to the instruction execution unit.


[1] CMT vs CMP vs SMT
Chip Multithreading: Opportunities and Challenges
[3] A Survey of Processors with Explicit Multithreading

[5] Simultaneous Multithreading - Maximizing On-Chip Parallelism
[6] Converting TLP to ILP via Simultaneous Multithreading
[7] Simultaneous Multithreading - A Platform for Next-Generation Processors

[10] The Case for a Single-Chip Multiprocessor

Case Studies
[11] Niagara: A 32-Way Multithreaded SPARC Processor
[12] Niagara2: A Highly Threaded Server-on-a-Chip

SMT Research Group
[15] http://www.cs.washington.edu/research/smt/

Multicore Computing Course
[20] http://www.cs.rice.edu/~johnmc/comp522/

No comments: