REPLICA MBTAC: multithreaded dual-mode processor

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)

Abstract

Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

Original languageEnglish
Pages (from-to)1911-1933
Number of pages23
JournalThe Journal of Supercomputing
Volume74
Issue number5
DOIs
Publication statusPublished - 1 May 2018
MoE publication typeA1 Journal article-refereed

Fingerprint

Storage allocation (computer)
Thread
Chip multiprocessors
Parallel programming
Parallel processing systems
Instruction Level Parallelism
Data structures
Field programmable gate arrays (FPGA)
Synchronization
Parallel Programs
Parallel Programming
Parallel Computing
Locality
Data storage equipment
Computational Model
Building Blocks
Field Programmable Gate Array
Communication
Data Structures
High Performance

Keywords

  • Chaining of functional units
  • FPGA prototype
  • Multithreaded processor
  • NUMA mode
  • Parallelism
  • PRAM mode

Cite this

@article{5fd015fe04584ed4a8523afe54e89471,
title = "REPLICA MBTAC: multithreaded dual-mode processor",
abstract = "Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.",
keywords = "Chaining of functional units, FPGA prototype, Multithreaded processor, NUMA mode, Parallelism, PRAM mode",
author = "Martti Forsell and Jussi Roivainen and Ville Lepp{\"a}nen",
year = "2018",
month = "5",
day = "1",
doi = "10.1007/s11227-017-2199-z",
language = "English",
volume = "74",
pages = "1911--1933",
journal = "The Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer",
number = "5",

}

REPLICA MBTAC : multithreaded dual-mode processor. / Forsell, Martti; Roivainen, Jussi; Leppänen, Ville.

In: The Journal of Supercomputing, Vol. 74, No. 5, 01.05.2018, p. 1911-1933.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - REPLICA MBTAC

T2 - multithreaded dual-mode processor

AU - Forsell, Martti

AU - Roivainen, Jussi

AU - Leppänen, Ville

PY - 2018/5/1

Y1 - 2018/5/1

N2 - Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

AB - Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

KW - Chaining of functional units

KW - FPGA prototype

KW - Multithreaded processor

KW - NUMA mode

KW - Parallelism

KW - PRAM mode

UR - http://www.scopus.com/inward/record.url?scp=85038104069&partnerID=8YFLogxK

U2 - 10.1007/s11227-017-2199-z

DO - 10.1007/s11227-017-2199-z

M3 - Article

AN - SCOPUS:85038104069

VL - 74

SP - 1911

EP - 1933

JO - The Journal of Supercomputing

JF - The Journal of Supercomputing

SN - 0920-8542

IS - 5

ER -