Prototyping the MBTAC Processor for the REPLICA CMP

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    3 Citations (Scopus)

    Abstract

    Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous compu- tational model of threads, often require maximizing the local- ity of memory references to get decent performance, and fea- ture high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose comput- ing from the very beginning like done in processors for config- urable emulated shared memory (CESM) CMPs. They pro- vide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchro- nous shared memory for easy variable allocation and conven- ient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multi- threading, parallel computing savvy functional unit organiza- tion, support for fast synchronization between the instruc- tions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked sup- port for ordered multiprefix operations and full configurabil- ity of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC processor for our new REPLICA CESM architecture and the first FPGA imple- mentations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.
    Original languageEnglish
    Title of host publicationProceedings
    Subtitle of host publicationIEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014
    PublisherIEEE Institute of Electrical and Electronic Engineers
    Pages709-716
    ISBN (Electronic)978-1-4799-4116-2
    ISBN (Print)978-1-4799-4117-9
    DOIs
    Publication statusPublished - 2014
    MoE publication typeA4 Article in a conference publication
    Event28th IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014 - Phoenix, Arizona, United States
    Duration: 19 May 201423 May 2014
    Conference number: 28th

    Conference

    Conference28th IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014
    Abbreviated titleIPDPSW 2014
    CountryUnited States
    CityPhoenix, Arizona
    Period19/05/1423/05/14

    Fingerprint

    Data storage equipment
    Memory architecture
    Field programmable gate arrays (FPGA)
    Synchronization
    Parallel programming
    Electronic data interchange
    Parallel processing systems
    Testing

    Cite this

    Forsell, M., Roivainen, J., & Leppänen, V. (2014). Prototyping the MBTAC Processor for the REPLICA CMP. In Proceedings: IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014 (pp. 709-716). IEEE Institute of Electrical and Electronic Engineers . https://doi.org/10.1109/IPDPSW.2014.82
    Forsell, Martti ; Roivainen, Jussi ; Leppänen, V. / Prototyping the MBTAC Processor for the REPLICA CMP. Proceedings: IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Institute of Electrical and Electronic Engineers , 2014. pp. 709-716
    @inproceedings{d86c2e54f5c045f8b134a05c120ddfa5,
    title = "Prototyping the MBTAC Processor for the REPLICA CMP",
    abstract = "Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous compu- tational model of threads, often require maximizing the local- ity of memory references to get decent performance, and fea- ture high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose comput- ing from the very beginning like done in processors for config- urable emulated shared memory (CESM) CMPs. They pro- vide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchro- nous shared memory for easy variable allocation and conven- ient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multi- threading, parallel computing savvy functional unit organiza- tion, support for fast synchronization between the instruc- tions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked sup- port for ordered multiprefix operations and full configurabil- ity of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC processor for our new REPLICA CESM architecture and the first FPGA imple- mentations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.",
    author = "Martti Forsell and Jussi Roivainen and V. Lepp{\"a}nen",
    year = "2014",
    doi = "10.1109/IPDPSW.2014.82",
    language = "English",
    isbn = "978-1-4799-4117-9",
    pages = "709--716",
    booktitle = "Proceedings",
    publisher = "IEEE Institute of Electrical and Electronic Engineers",
    address = "United States",

    }

    Forsell, M, Roivainen, J & Leppänen, V 2014, Prototyping the MBTAC Processor for the REPLICA CMP. in Proceedings: IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Institute of Electrical and Electronic Engineers , pp. 709-716, 28th IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014, Phoenix, Arizona, United States, 19/05/14. https://doi.org/10.1109/IPDPSW.2014.82

    Prototyping the MBTAC Processor for the REPLICA CMP. / Forsell, Martti; Roivainen, Jussi; Leppänen, V.

    Proceedings: IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Institute of Electrical and Electronic Engineers , 2014. p. 709-716.

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    TY - GEN

    T1 - Prototyping the MBTAC Processor for the REPLICA CMP

    AU - Forsell, Martti

    AU - Roivainen, Jussi

    AU - Leppänen, V.

    PY - 2014

    Y1 - 2014

    N2 - Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous compu- tational model of threads, often require maximizing the local- ity of memory references to get decent performance, and fea- ture high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose comput- ing from the very beginning like done in processors for config- urable emulated shared memory (CESM) CMPs. They pro- vide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchro- nous shared memory for easy variable allocation and conven- ient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multi- threading, parallel computing savvy functional unit organiza- tion, support for fast synchronization between the instruc- tions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked sup- port for ordered multiprefix operations and full configurabil- ity of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC processor for our new REPLICA CESM architecture and the first FPGA imple- mentations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.

    AB - Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous compu- tational model of threads, often require maximizing the local- ity of memory references to get decent performance, and fea- ture high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose comput- ing from the very beginning like done in processors for config- urable emulated shared memory (CESM) CMPs. They pro- vide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchro- nous shared memory for easy variable allocation and conven- ient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multi- threading, parallel computing savvy functional unit organiza- tion, support for fast synchronization between the instruc- tions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked sup- port for ordered multiprefix operations and full configurabil- ity of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC processor for our new REPLICA CESM architecture and the first FPGA imple- mentations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.

    U2 - 10.1109/IPDPSW.2014.82

    DO - 10.1109/IPDPSW.2014.82

    M3 - Conference article in proceedings

    SN - 978-1-4799-4117-9

    SP - 709

    EP - 716

    BT - Proceedings

    PB - IEEE Institute of Electrical and Electronic Engineers

    ER -

    Forsell M, Roivainen J, Leppänen V. Prototyping the MBTAC Processor for the REPLICA CMP. In Proceedings: IEEE International Parallel & Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Institute of Electrical and Electronic Engineers . 2014. p. 709-716 https://doi.org/10.1109/IPDPSW.2014.82