A PRAM-NUMA model of computation for addressing low-TLP workloads

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

4 Citations (Scopus)

Abstract

It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.
Original languageEnglish
Title of host publicationProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
Subtitle of host publicationAtlanta, Georgia, USA, 19-23 April 2010
Place of PublicationPiscataway, NJ, USA
PublisherInstitute of Electrical and Electronic Engineers IEEE
Number of pages8
ISBN (Electronic)978-1-4244-6534-7
ISBN (Print)978-1-4244-6533-0, 978-1-4244-6532-3
DOIs
Publication statusPublished - 2010
MoE publication typeA4 Article in a conference publication

Fingerprint

Data storage equipment
Memory architecture
Throughput
Bandwidth

Cite this

Forsell, M. (2010). A PRAM-NUMA model of computation for addressing low-TLP workloads. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010 Piscataway, NJ, USA: Institute of Electrical and Electronic Engineers IEEE. https://doi.org/10.1109/IPDPSW.2010.5470846
Forsell, Martti. / A PRAM-NUMA model of computation for addressing low-TLP workloads. Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA : Institute of Electrical and Electronic Engineers IEEE, 2010.
@inproceedings{97b4ed45e00441bcaa2d1905461f220a,
title = "A PRAM-NUMA model of computation for addressing low-TLP workloads",
abstract = "It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.",
author = "Martti Forsell",
year = "2010",
doi = "10.1109/IPDPSW.2010.5470846",
language = "English",
isbn = "978-1-4244-6533-0",
booktitle = "Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010",
publisher = "Institute of Electrical and Electronic Engineers IEEE",
address = "United States",

}

Forsell, M 2010, A PRAM-NUMA model of computation for addressing low-TLP workloads. in Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Institute of Electrical and Electronic Engineers IEEE, Piscataway, NJ, USA. https://doi.org/10.1109/IPDPSW.2010.5470846

A PRAM-NUMA model of computation for addressing low-TLP workloads. / Forsell, Martti.

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA : Institute of Electrical and Electronic Engineers IEEE, 2010.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

TY - GEN

T1 - A PRAM-NUMA model of computation for addressing low-TLP workloads

AU - Forsell, Martti

PY - 2010

Y1 - 2010

N2 - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.

AB - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.

U2 - 10.1109/IPDPSW.2010.5470846

DO - 10.1109/IPDPSW.2010.5470846

M3 - Conference article in proceedings

SN - 978-1-4244-6533-0

SN - 978-1-4244-6532-3

BT - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

PB - Institute of Electrical and Electronic Engineers IEEE

CY - Piscataway, NJ, USA

ER -

Forsell M. A PRAM-NUMA model of computation for addressing low-TLP workloads. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA: Institute of Electrical and Electronic Engineers IEEE. 2010 https://doi.org/10.1109/IPDPSW.2010.5470846