Abstract
In this chapter, we introduce a configurable chip multiprocessor
architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel
random access machine (PRAM) variants, the arbitrary multioperation concurrent
read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary
concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and
writes so that in the case of a write arbitrary of the participating threads
succeeds, MCRCW provides multioperations that can e.g. sum the values sent by
all participating threads into a memory location concurrently. The
architecture is optimized for efficient execution of programs containing
enough TLP to hide the latency of the intercommunication network and
co-exploitation of virtual ILP with TLP but it is also able to execute
programs with low TLP efficiently by providing seamless configurability of
PRAM threads to non-uniform memory access (NUMA) bunches combining the
computational power of two or more threads within a processor core. We will
describe the principles of PRAM realization, integration of NUMA bunching to
TOTAL ECLIPSE operation, as well as overall architectural structure and
operation of the TOTAL ECLIPSE architecture. Performance evaluation by
executing simple programs with a clock-accurate simulator is provided and
silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP
configurations are given. This chapter acts also as a case-driven introduction
to novel techniques for parallel architectures, unknown from the theory of
sequential architectures.
Original language | English |
---|---|
Title of host publication | Parallel and Distributed Computing |
Editors | Alberto Ros |
Place of Publication | Vienna |
Publisher | InTech |
Chapter | 3 |
Pages | 39-64 |
ISBN (Print) | 978-953-307-057-5 |
DOIs | |
Publication status | Published - 2010 |
MoE publication type | A3 Part of a book or another research book |