REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

5 Citations (Scopus)

Abstract

Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

Original languageEnglish
Title of host publicationConference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers
EditorsMichael B. Matthews
PublisherIEEE Institute of Electrical and Electronic Engineers
Pages1709-1713
Number of pages5
ISBN (Electronic)978-147998297-4
DOIs
Publication statusPublished - 2015
MoE publication typeNot Eligible
Event48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States
Duration: 2 Nov 20145 Nov 2014

Conference

Conference48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
CountryUnited States
CityPacific Grove
Period2/11/145/11/14

Fingerprint

Synchronization
Parallel processing systems
Parallel algorithms
Embedded systems
Particle accelerators
Electric power utilization
Bandwidth
Costs
Network-on-chip

Cite this

Forsell, M., & Roivainen, J. (2015). REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. In M. B. Matthews (Ed.), Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers (pp. 1709-1713). [7094759] IEEE Institute of Electrical and Electronic Engineers . https://doi.org/10.1109/ACSSC.2014.7094759
Forsell, Martti ; Roivainen, Jussi. / REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. editor / Michael B. Matthews. IEEE Institute of Electrical and Electronic Engineers , 2015. pp. 1709-1713
@inproceedings{ef50ab9360994f0ab9b9c5ec306dd0ec,
title = "REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor",
abstract = "Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.",
author = "Martti Forsell and Jussi Roivainen",
year = "2015",
doi = "10.1109/ACSSC.2014.7094759",
language = "English",
pages = "1709--1713",
editor = "Matthews, {Michael B.}",
booktitle = "Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers",
publisher = "IEEE Institute of Electrical and Electronic Engineers",
address = "United States",

}

Forsell, M & Roivainen, J 2015, REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. in MB Matthews (ed.), Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers., 7094759, IEEE Institute of Electrical and Electronic Engineers , pp. 1709-1713, 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015, Pacific Grove, United States, 2/11/14. https://doi.org/10.1109/ACSSC.2014.7094759

REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. / Forsell, Martti; Roivainen, Jussi.

Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. ed. / Michael B. Matthews. IEEE Institute of Electrical and Electronic Engineers , 2015. p. 1709-1713 7094759.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

TY - GEN

T1 - REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

AU - Forsell, Martti

AU - Roivainen, Jussi

PY - 2015

Y1 - 2015

N2 - Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

AB - Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

UR - http://www.scopus.com/inward/record.url?scp=84940469616&partnerID=8YFLogxK

U2 - 10.1109/ACSSC.2014.7094759

DO - 10.1109/ACSSC.2014.7094759

M3 - Conference article in proceedings

AN - SCOPUS:84940469616

SP - 1709

EP - 1713

BT - Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers

A2 - Matthews, Michael B.

PB - IEEE Institute of Electrical and Electronic Engineers

ER -

Forsell M, Roivainen J. REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. In Matthews MB, editor, Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. IEEE Institute of Electrical and Electronic Engineers . 2015. p. 1709-1713. 7094759 https://doi.org/10.1109/ACSSC.2014.7094759