REPLICA MBTAC: multithreaded dual-mode processor

    Research output: Contribution to journalArticleScientificpeer-review

    1 Citation (Scopus)

    Abstract

    Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

    Original languageEnglish
    Pages (from-to)1911-1933
    Number of pages23
    JournalThe Journal of Supercomputing
    Volume74
    Issue number5
    DOIs
    Publication statusPublished - 1 May 2018
    MoE publication typeA1 Journal article-refereed

    Fingerprint

    Storage allocation (computer)
    Thread
    Chip multiprocessors
    Parallel programming
    Parallel processing systems
    Instruction Level Parallelism
    Data structures
    Field programmable gate arrays (FPGA)
    Synchronization
    Parallel Programs
    Parallel Programming
    Parallel Computing
    Locality
    Data storage equipment
    Computational Model
    Building Blocks
    Field Programmable Gate Array
    Communication
    Data Structures
    High Performance

    Keywords

    • Chaining of functional units
    • FPGA prototype
    • Multithreaded processor
    • NUMA mode
    • Parallelism
    • PRAM mode

    Cite this

    @article{5fd015fe04584ed4a8523afe54e89471,
    title = "REPLICA MBTAC: multithreaded dual-mode processor",
    abstract = "Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.",
    keywords = "Chaining of functional units, FPGA prototype, Multithreaded processor, NUMA mode, Parallelism, PRAM mode",
    author = "Martti Forsell and Jussi Roivainen and Ville Lepp{\"a}nen",
    year = "2018",
    month = "5",
    day = "1",
    doi = "10.1007/s11227-017-2199-z",
    language = "English",
    volume = "74",
    pages = "1911--1933",
    journal = "The Journal of Supercomputing",
    issn = "0920-8542",
    publisher = "Springer",
    number = "5",

    }

    REPLICA MBTAC : multithreaded dual-mode processor. / Forsell, Martti; Roivainen, Jussi; Leppänen, Ville.

    In: The Journal of Supercomputing, Vol. 74, No. 5, 01.05.2018, p. 1911-1933.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - REPLICA MBTAC

    T2 - multithreaded dual-mode processor

    AU - Forsell, Martti

    AU - Roivainen, Jussi

    AU - Leppänen, Ville

    PY - 2018/5/1

    Y1 - 2018/5/1

    N2 - Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

    AB - Prevailing trend in design of chip multiprocessors (CMP) has been that single-core processors are replicated. Therefore, they typically define asynchronous computational model, require heavily locality-aware memory allocation, and present high overheads in intercommunication. This kind of properties make parallel programming very challenging and prone to errors. We introduce our new dual-mode MultiBunched/Threaded Architecture with Chaining (MBTAC) processor core, the main building block of the REPLICA CMP. It provides a modern, sophisticated way for writing general purpose parallel programs backed up by native execution capabilities/realization of key concepts. These include support for cost-efficient machine instruction-level synchronization and uniform shared global memory for enabling easy-to-program memory allocation of data structures and data movement. MBTAC makes use of low-overhead thread-context switching solution; it has parallel computing savvy functional unit organization to exploit inter-thread instruction-level parallelism and highly efficient multioperations. To evaluate the goodness of our proposal, we implemented three MBTAC constellations featuring up to 2048 parallel threads on FPGA, compared it with respect to DLX and Intel’s Core i7 processors. The results point toward high performance in communication-intensive problems, simplified parallel programmability, and regular, implementation-friendly structure.

    KW - Chaining of functional units

    KW - FPGA prototype

    KW - Multithreaded processor

    KW - NUMA mode

    KW - Parallelism

    KW - PRAM mode

    UR - http://www.scopus.com/inward/record.url?scp=85038104069&partnerID=8YFLogxK

    U2 - 10.1007/s11227-017-2199-z

    DO - 10.1007/s11227-017-2199-z

    M3 - Article

    AN - SCOPUS:85038104069

    VL - 74

    SP - 1911

    EP - 1933

    JO - The Journal of Supercomputing

    JF - The Journal of Supercomputing

    SN - 0920-8542

    IS - 5

    ER -