A PRAM-NUMA model of computation for addressing low-TLP workloads

Martti Forsell

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    4 Citations (Scopus)

    Abstract

    It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.
    Original languageEnglish
    Title of host publicationProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
    Subtitle of host publicationAtlanta, Georgia, USA, 19-23 April 2010
    Place of PublicationPiscataway, NJ, USA
    PublisherIEEE Institute of Electrical and Electronic Engineers
    Number of pages8
    ISBN (Electronic)978-1-4244-6534-7
    ISBN (Print)978-1-4244-6533-0, 978-1-4244-6532-3
    DOIs
    Publication statusPublished - 2010
    MoE publication typeA4 Article in a conference publication

    Fingerprint

    Dive into the research topics of 'A PRAM-NUMA model of computation for addressing low-TLP workloads'. Together they form a unique fingerprint.

    Cite this