REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    5 Citations (Scopus)

    Abstract

    Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

    Original languageEnglish
    Title of host publicationConference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers
    EditorsMichael B. Matthews
    PublisherIEEE Institute of Electrical and Electronic Engineers
    Pages1709-1713
    Number of pages5
    ISBN (Electronic)978-147998297-4
    DOIs
    Publication statusPublished - 2015
    MoE publication typeNot Eligible
    Event48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015 - Pacific Grove, United States
    Duration: 2 Nov 20145 Nov 2014

    Conference

    Conference48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
    CountryUnited States
    CityPacific Grove
    Period2/11/145/11/14

    Fingerprint

    Synchronization
    Parallel processing systems
    Parallel algorithms
    Embedded systems
    Particle accelerators
    Electric power utilization
    Bandwidth
    Costs
    Network-on-chip

    Cite this

    Forsell, M., & Roivainen, J. (2015). REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. In M. B. Matthews (Ed.), Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers (pp. 1709-1713). [7094759] IEEE Institute of Electrical and Electronic Engineers . https://doi.org/10.1109/ACSSC.2014.7094759
    Forsell, Martti ; Roivainen, Jussi. / REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. editor / Michael B. Matthews. IEEE Institute of Electrical and Electronic Engineers , 2015. pp. 1709-1713
    @inproceedings{ef50ab9360994f0ab9b9c5ec306dd0ec,
    title = "REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor",
    abstract = "Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.",
    author = "Martti Forsell and Jussi Roivainen",
    year = "2015",
    doi = "10.1109/ACSSC.2014.7094759",
    language = "English",
    pages = "1709--1713",
    editor = "Matthews, {Michael B.}",
    booktitle = "Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers",
    publisher = "IEEE Institute of Electrical and Electronic Engineers",
    address = "United States",

    }

    Forsell, M & Roivainen, J 2015, REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. in MB Matthews (ed.), Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers., 7094759, IEEE Institute of Electrical and Electronic Engineers , pp. 1709-1713, 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015, Pacific Grove, United States, 2/11/14. https://doi.org/10.1109/ACSSC.2014.7094759

    REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. / Forsell, Martti; Roivainen, Jussi.

    Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. ed. / Michael B. Matthews. IEEE Institute of Electrical and Electronic Engineers , 2015. p. 1709-1713 7094759.

    Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

    TY - GEN

    T1 - REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor

    AU - Forsell, Martti

    AU - Roivainen, Jussi

    PY - 2015

    Y1 - 2015

    N2 - Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

    AB - Processor-based solutions are getting increasingly popular over dedicated logic/accelerators among embedded system designers due to their flexibility and programmability. The drawbacks - weaker performance and higher power consumption - are usually compensated with multicore and application-specific technologies. Unfortunately, these optimizations - exploiting parallelism and heterogeneity - lead to direction that makes programming difficult and result to less flexible designs. REPLICA is VTT's effort to solve the performance and programmability problems of current multicore processors without tampering flexibility. For performance, it addresses the essence of parallel computing - cost-efficient synchronization, high intercommunication bandwidth and latency toleration - with a new collection of architectural techniques: multithreading, sparse/multimesh network-on-chip and wave-based synchronization. Programmability is made simple by supporting efficient execution of synchronous parallel algorithms and flexibility is provided with parametric nature of the architecture allowing for highly different configurations. In this paper we introduce a 2048-threaded 16-core prototype of the REPLICA chip multiprocessor. The main principles of the architecture as well as the structure of the prototype are explained. Preliminary comparison to current alternatives is given.

    UR - http://www.scopus.com/inward/record.url?scp=84940469616&partnerID=8YFLogxK

    U2 - 10.1109/ACSSC.2014.7094759

    DO - 10.1109/ACSSC.2014.7094759

    M3 - Conference article in proceedings

    AN - SCOPUS:84940469616

    SP - 1709

    EP - 1713

    BT - Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers

    A2 - Matthews, Michael B.

    PB - IEEE Institute of Electrical and Electronic Engineers

    ER -

    Forsell M, Roivainen J. REPLICA T7-16-128 - A 2048-threaded 16-core 7-FU chained VLIW chip multiprocessor. In Matthews MB, editor, Conference Record of The Forty-Eighth Asilomar Conference on Signals, Systems & Computers. IEEE Institute of Electrical and Electronic Engineers . 2015. p. 1709-1713. 7094759 https://doi.org/10.1109/ACSSC.2014.7094759