A PRAM-NUMA model of computation for addressing low-TLP workloads

Research output: Contribution to journalArticleScientificpeer-review

Abstract

It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating non-uniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem and provide a natural way for migration of the legacy code written for a sequential or multi-core NUMA machine. The obtained PRAM-NUMA hybrid model is defined and architectural implementation of it is outlined on our ECLIPSE ESM CMP framework. A high-level programming language example is given.
Original languageEnglish
Pages (from-to)21-35
Number of pages15
JournalInternational Journal of Networking and Computing
Volume1
Issue number1
Publication statusPublished - 2011
MoE publication typeA1 Journal article-refereed
Event12th Workshop on Advances in Parallel and Distributed Computational Models (APDCM) - Atlanta, United States
Duration: 19 Apr 201023 Apr 2010

Fingerprint

Data storage equipment
Memory architecture
Computer programming languages
Throughput
Bandwidth

Keywords

  • Parallel computing
  • Computational models
  • Thread-level parallelism
  • PRAM
  • NUMA

Cite this

@article{202c19a6ac9a4bb88a8334c6ddab6680,
title = "A PRAM-NUMA model of computation for addressing low-TLP workloads",
abstract = "It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating non-uniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem and provide a natural way for migration of the legacy code written for a sequential or multi-core NUMA machine. The obtained PRAM-NUMA hybrid model is defined and architectural implementation of it is outlined on our ECLIPSE ESM CMP framework. A high-level programming language example is given.",
keywords = "Parallel computing, Computational models, Thread-level parallelism, PRAM, NUMA",
author = "Martti Forsell",
year = "2011",
language = "English",
volume = "1",
pages = "21--35",
journal = "International Journal of Networking and Computing",
issn = "2185-2839",
publisher = "Hiroshima University",
number = "1",

}

A PRAM-NUMA model of computation for addressing low-TLP workloads. / Forsell, Martti.

In: International Journal of Networking and Computing, Vol. 1, No. 1, 2011, p. 21-35.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A PRAM-NUMA model of computation for addressing low-TLP workloads

AU - Forsell, Martti

PY - 2011

Y1 - 2011

N2 - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating non-uniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem and provide a natural way for migration of the legacy code written for a sequential or multi-core NUMA machine. The obtained PRAM-NUMA hybrid model is defined and architectural implementation of it is outlined on our ECLIPSE ESM CMP framework. A high-level programming language example is given.

AB - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating non-uniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem and provide a natural way for migration of the legacy code written for a sequential or multi-core NUMA machine. The obtained PRAM-NUMA hybrid model is defined and architectural implementation of it is outlined on our ECLIPSE ESM CMP framework. A high-level programming language example is given.

KW - Parallel computing

KW - Computational models

KW - Thread-level parallelism

KW - PRAM

KW - NUMA

M3 - Article

VL - 1

SP - 21

EP - 35

JO - International Journal of Networking and Computing

JF - International Journal of Networking and Computing

SN - 2185-2839

IS - 1

ER -