Abstract
Commercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.
Original language | English |
---|---|
Pages (from-to) | 3152-3183 |
Number of pages | 32 |
Journal | The Journal of Supercomputing |
Volume | 78 |
Issue number | 3 |
Early online date | 20 Jul 2021 |
DOIs | |
Publication status | Published - Feb 2022 |
MoE publication type | A1 Journal article-refereed |
Keywords
- parallel computing
- multiprocessors
- thick control flow
- performance comparison
- programmability