A PRAM-NUMA model of computation for addressing low-TLP workloads

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    4 Citations (Scopus)

    Abstract

    It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.
    Original languageEnglish
    Title of host publicationProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
    Subtitle of host publicationAtlanta, Georgia, USA, 19-23 April 2010
    Place of PublicationPiscataway, NJ, USA
    PublisherIEEE Institute of Electrical and Electronic Engineers
    Number of pages8
    ISBN (Electronic)978-1-4244-6534-7
    ISBN (Print)978-1-4244-6533-0, 978-1-4244-6532-3
    DOIs
    Publication statusPublished - 2010
    MoE publication typeA4 Article in a conference publication

    Fingerprint

    Data storage equipment
    Memory architecture
    Throughput
    Bandwidth

    Cite this

    Forsell, M. (2010). A PRAM-NUMA model of computation for addressing low-TLP workloads. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010 Piscataway, NJ, USA: IEEE Institute of Electrical and Electronic Engineers . https://doi.org/10.1109/IPDPSW.2010.5470846
    Forsell, Martti. / A PRAM-NUMA model of computation for addressing low-TLP workloads. Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA : IEEE Institute of Electrical and Electronic Engineers , 2010.
    @inproceedings{97b4ed45e00441bcaa2d1905461f220a,
    title = "A PRAM-NUMA model of computation for addressing low-TLP workloads",
    abstract = "It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.",
    author = "Martti Forsell",
    year = "2010",
    doi = "10.1109/IPDPSW.2010.5470846",
    language = "English",
    isbn = "978-1-4244-6533-0",
    booktitle = "Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010",
    publisher = "IEEE Institute of Electrical and Electronic Engineers",
    address = "United States",

    }

    Forsell, M 2010, A PRAM-NUMA model of computation for addressing low-TLP workloads. in Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. IEEE Institute of Electrical and Electronic Engineers , Piscataway, NJ, USA. https://doi.org/10.1109/IPDPSW.2010.5470846

    A PRAM-NUMA model of computation for addressing low-TLP workloads. / Forsell, Martti.

    Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA : IEEE Institute of Electrical and Electronic Engineers , 2010.

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    TY - GEN

    T1 - A PRAM-NUMA model of computation for addressing low-TLP workloads

    AU - Forsell, Martti

    PY - 2010

    Y1 - 2010

    N2 - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.

    AB - It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy parallel programmability crucial to wider penetration of CMPs to general purpose computing. This implementation relies on exploitation of the slack of parallel applications to hide the latency of the memory system instead of caches, sufficient bisection bandwidth to guarantee high throughput, and hashing to avoid hot spots in intercommunication. Unfortunately this solution can not handle workloads with low thread-level parallelism (TLP) efficiently because then there is not enough parallel slackness available for hiding the latency. In this paper we show that integrating nonuniform memory access (NUMA) support to the PRAM implementation architecture can solve this problem. The obtained PRAM-NUMA hybrid model is described and architectural implementation of it is outlined on our Eclipse ESM CMP framework.

    U2 - 10.1109/IPDPSW.2010.5470846

    DO - 10.1109/IPDPSW.2010.5470846

    M3 - Conference article in proceedings

    SN - 978-1-4244-6533-0

    SN - 978-1-4244-6532-3

    BT - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

    PB - IEEE Institute of Electrical and Electronic Engineers

    CY - Piscataway, NJ, USA

    ER -

    Forsell M. A PRAM-NUMA model of computation for addressing low-TLP workloads. In Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010: Atlanta, Georgia, USA, 19-23 April 2010. Piscataway, NJ, USA: IEEE Institute of Electrical and Electronic Engineers . 2010 https://doi.org/10.1109/IPDPSW.2010.5470846