TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine

    Research output: Chapter in Book/Report/Conference proceedingChapter or book articleScientificpeer-review

    Abstract

    In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.
    Original languageEnglish
    Title of host publicationParallel and Distributed Computing
    EditorsAlberto Ros
    Place of PublicationVienna
    PublisherInTech
    Chapter3
    Pages39-64
    ISBN (Print)978-953-307-057-5
    DOIs
    Publication statusPublished - 2010
    MoE publication typeA3 Part of a book or another research book

    Fingerprint

    Data storage equipment
    Inductive logic programming (ILP)
    Parallel architectures
    Clocks
    Electric power utilization
    Simulators
    Silicon

    Cite this

    Forsell, M. (2010). TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . In A. Ros (Ed.), Parallel and Distributed Computing (pp. 39-64). Vienna: InTech. https://doi.org/10.5772/9446
    Forsell, Martti. / TOTAL ECLIPSE : An Efficient Architectural Realization of the Parallel Random Access Machine . Parallel and Distributed Computing. editor / Alberto Ros. Vienna : InTech, 2010. pp. 39-64
    @inbook{24d87c6f96ff46eb9687be6c4a84db97,
    title = "TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine",
    abstract = "In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.",
    author = "Martti Forsell",
    year = "2010",
    doi = "10.5772/9446",
    language = "English",
    isbn = "978-953-307-057-5",
    pages = "39--64",
    editor = "Alberto Ros",
    booktitle = "Parallel and Distributed Computing",
    publisher = "InTech",
    address = "Croatia",

    }

    Forsell, M 2010, TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . in A Ros (ed.), Parallel and Distributed Computing. InTech, Vienna, pp. 39-64. https://doi.org/10.5772/9446

    TOTAL ECLIPSE : An Efficient Architectural Realization of the Parallel Random Access Machine . / Forsell, Martti.

    Parallel and Distributed Computing. ed. / Alberto Ros. Vienna : InTech, 2010. p. 39-64.

    Research output: Chapter in Book/Report/Conference proceedingChapter or book articleScientificpeer-review

    TY - CHAP

    T1 - TOTAL ECLIPSE

    T2 - An Efficient Architectural Realization of the Parallel Random Access Machine

    AU - Forsell, Martti

    PY - 2010

    Y1 - 2010

    N2 - In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.

    AB - In this chapter, we introduce a configurable chip multiprocessor architecture, TOTAL ECLIPSE, for realizing one of the most powerful parallel random access machine (PRAM) variants, the arbitrary multioperation concurrent read concurrent write (MCRCW) PRAM model. In addition to standard arbitrary concurrent read concurrent write (CRCW) PRAM capable of concurrent reads and writes so that in the case of a write arbitrary of the participating threads succeeds, MCRCW provides multioperations that can e.g. sum the values sent by all participating threads into a memory location concurrently. The architecture is optimized for efficient execution of programs containing enough TLP to hide the latency of the intercommunication network and co-exploitation of virtual ILP with TLP but it is also able to execute programs with low TLP efficiently by providing seamless configurability of PRAM threads to non-uniform memory access (NUMA) bunches combining the computational power of two or more threads within a processor core. We will describe the principles of PRAM realization, integration of NUMA bunching to TOTAL ECLIPSE operation, as well as overall architectural structure and operation of the TOTAL ECLIPSE architecture. Performance evaluation by executing simple programs with a clock-accurate simulator is provided and silicon area and power consumption estimations of selected TOTAL ECLIPSE CMP configurations are given. This chapter acts also as a case-driven introduction to novel techniques for parallel architectures, unknown from the theory of sequential architectures.

    U2 - 10.5772/9446

    DO - 10.5772/9446

    M3 - Chapter or book article

    SN - 978-953-307-057-5

    SP - 39

    EP - 64

    BT - Parallel and Distributed Computing

    A2 - Ros, Alberto

    PB - InTech

    CY - Vienna

    ER -

    Forsell M. TOTAL ECLIPSE: An Efficient Architectural Realization of the Parallel Random Access Machine . In Ros A, editor, Parallel and Distributed Computing. Vienna: InTech. 2010. p. 39-64 https://doi.org/10.5772/9446