TOTAL ECLIPSE

An Efficient Architectural Realization of the Parallel Random Access Machine

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleScientificpeer-review

Abstract

In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.
Original languageEnglish
Title of host publicationParallel and Distributed Computing
EditorsAlberto Ros
Place of PublicationVienna
PublisherInTech
Chapter3
Pages39-64
ISBN (Print)978-953-307-057-5
DOIs
Publication statusPublished - 2010
MoE publication typeA3 Part of a book or another research book

Fingerprint

Data storage equipment
Inductive logic programming (ILP)
Parallel architectures
Clocks
Electric power utilization
Simulators
Silicon

Cite this

Forsell, M. (2010). TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . In A. Ros (Ed.), Parallel and Distributed Computing (pp. 39-64). Vienna: InTech. https://doi.org/10.5772/9446
Forsell, Martti. / TOTAL ECLIPSE : An Efficient Architectural Realization of the Parallel Random Access Machine . Parallel and Distributed Computing. editor / Alberto Ros. Vienna : InTech, 2010. pp. 39-64
@inbook{24d87c6f96ff46eb9687be6c4a84db97,
title = "TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine",
abstract = "In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.",
author = "Martti Forsell",
year = "2010",
doi = "10.5772/9446",
language = "English",
isbn = "978-953-307-057-5",
pages = "39--64",
editor = "Alberto Ros",
booktitle = "Parallel and Distributed Computing",
publisher = "InTech",
address = "Croatia",

}

Forsell, M 2010, TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . in A Ros (ed.), Parallel and Distributed Computing. InTech, Vienna, pp. 39-64. https://doi.org/10.5772/9446

TOTAL ECLIPSE : An Efficient Architectural Realization of the Parallel Random Access Machine . / Forsell, Martti.

Parallel and Distributed Computing. ed. / Alberto Ros. Vienna : InTech, 2010. p. 39-64.

Research output: Chapter in Book/Report/Conference proceedingChapter or book articleScientificpeer-review

TY - CHAP

T1 - TOTAL ECLIPSE

T2 - An Efficient Architectural Realization of the Parallel Random Access Machine

AU - Forsell, Martti

PY - 2010

Y1 - 2010

N2 - In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.

AB - In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.

U2 - 10.5772/9446

DO - 10.5772/9446

M3 - Chapter or book article

SN - 978-953-307-057-5

SP - 39

EP - 64

BT - Parallel and Distributed Computing

A2 - Ros, Alberto

PB - InTech

CY - Vienna

ER -

Forsell M. TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . In Ros A, editor, Parallel and Distributed Computing. Vienna: InTech. 2010. p. 39-64 https://doi.org/10.5772/9446